Feed43 - generate an RSS feed from HTML page
Saturday May 06th 2006, 2:12 pm
Filed under: Internets

Over the years I’ve written dozens of perl scripts to scrape HTML pages and generate RSS feeds. Then I tried RSSxl which was pretty kludgy and lacked features. Recently, a few others have come out including FeedYes and PonyFish.
But none worked very well. They weren’t flexible enough, or the interface was poor.

Feed43 is the first to impress me. It is fairly powerful allowing you to match any number of different elements on a page and then create a custom template for output. The interface is also excellent. You do need to read the documentation, but after about 10 minutes I was creating RSS feeds to my exact needs.

The Feed43 developer is committed to keeping the service free and make money by having a premium service with extra features. The only down sides that I’ve run into so far is that the feed only gets updated every 6 hours which his fine for pages that rarely change, but much too infrequent for pages that get updated constantly throughout the day. The regular expression matching is also a bit limiting. Instead of full regex support, it has reduced matching to a handful of tags. So far, I’ve been able to make it work, but on nastier pages that need complex matching, I suspect youd’ run into a dead end. For instance, I don’t how it deals with greedy matching.

Here’s a few screenshots of the process to make a feed of the Brattleboro Reformer front page news headlines:

Step One: Enter a URL

Step Two: Write the (ir)regular expression to match your elements (title, link, description)

Step Three: Extract your elements and verify you’re getting what you expected.

Step Four: Make an output template. This one is quite simple, but you can get as fancy as you wish… putting multiple matched elements in the title, link or description.

Step Five: Preview your handywork. Optionally give the feed a pretty name, password protect it, etc.

Step Six: Subscribe in your feed reader of choice.


No Comments so far
Leave a comment



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>