Another huge commit.

Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.
This commit is contained in:
2013-04-15 18:51:55 +02:00
parent a098b7e104
commit af8879049f
4 changed files with 339 additions and 184 deletions

30
rules
View File

@@ -1,15 +1,37 @@
TehranTimes
http://www.tehrantimes.com/component/ninjarsssyndicator/?feed_id=1&format=raw
http://www.tehrantimes.com/*
http://tehrantimes.com/*
//div[@class='article-indent']
FranceInfo
http://www.franceinfo.fr/rss.xml
http://www.franceinfo.fr/rss*
//h2[@class='chapo']/..
Les Echos
http://rss.feedsportal.com/c/499/f/413829/index.rss
http://syndication.lesechos.fr/rss/*
//h1/../..
Spiegel
http://www.spiegel.de/schlagzeilen/tops/index.rss
http://www.spiegel.de/schlagzeilen/*
//div[@id='spArticleSection']
Le Soir
http://www.lesoir.be/feed/La%20Une/destination_une_block/
http://www.lesoir.be/feed/*
//div[@class='article-content']
Stack Overflow
http://stackoverflow.com/feeds/*
//*[@id='question']
Daily Telegraph
http://www.telegraph.co.uk/*
//*[@id='mainBodyArea']
Cracked.com
http://feeds.feedburner.com/CrackedRSS
//div[@class='content']|//section[@class='body']
TheOnion
http://feeds.theonion.com/*
//article