My server was having a constant income traffic of 1.7mb/s for a service that download RSS from the internet and process them. Basically it need to return the last updates of multiple RSS feeds. It’s a very basic pooling system, but it was downloading too much data for just 15.000 active users. The growth wasn’t looking very feasible..
I was using the ROME java library to parse the XML. So far so good, the problem was that it downloads the whole feed and process it all. With my application scope I don’t need to download the whole RSS, just the new entries that i didn’t downloaded yet.
The solution was to use a custom SAX RSS parser, looping through the “” tags and identifying “”. In this way i can parse item per item, and identify if the current item is not updated, so I can abort the http connection and stop the download of the feed. I wish that ROME had an option to do that, like “stop processing when ‘publishedDate’ minor than..”.
The impact on bandwidth usage and processing time was impressive:


If someone is interested I can post and explain the java class. It’s compatible with com.sun.syndication.feed.synd and uses the SyndEntry and SyndFeed interfaces.

Cool. It’s lovely to see the change in the performance!
It would be great to have a proxy service for that, so I can ask to the proxy for an RSS with a date and it will give me back just the latest posts. Can you build it?!
Comment by Roberto — October 18, 2009 @ 11:57 pm
Hello, Rafael
I liked your blog post.
Recently I’m trying to parse the RSS to get the new items. I’m seeing the excessive bandwidth usage while doing it. If you can post the sample code or a link to project that would be great.
Thanks.
Comment by Anil Madamala — January 8, 2012 @ 3:45 am
Hi Anil,
What kind of project you need to do? Explain me better so I can tell you more details.
thanks
rafa
Comment by mufumbo — January 8, 2012 @ 5:28 pm