morss

Author	SHA1	Message	Date
pictuga	3176c2a8e8	Fix bad characters detection Now works with any encoding, no longer restricted to utf-8. Uses regex to find encoding (not perfect, but rather fast, since it's used on a substring)	2013-09-15 14:57:37 +02:00
pictuga	3ba74649f6	Test if linked pages are text documents Useful for feeds such as HackerNews	2013-09-10 15:25:55 +02:00
pictuga	1b7fdad6a8	Improve broken XML support TPB feed is a good example <http://rss.thepiratebay.sx/blog>. Now supports ampersand in feed, using the "recover" mode in etree.parse. Broken utf-8 strings in feed are now also supported.	2013-09-08 15:48:34 +02:00
pictuga	5ebd84ee55	Fix broken feeds.py calls for items count	2013-09-08 15:47:15 +02:00
pictuga	fe89a70f24	Add help for new classes	2013-09-01 19:00:22 +02:00
pictuga	50f3c5a552	Use descriptors for lists and to replace property Much nicer. Less duplicate code. More transparent. Big commit.	2013-09-01 18:52:07 +02:00
pictuga	a94d659bc8	Make negation in README more obvious	2013-08-25 00:01:00 +02:00
pictuga	d3c163fb74	Use ETag for user-side caching Pretty hard-code ETag use. ETag is just a timestamp, and the server checks whether it's recent enough.	2013-08-24 23:43:32 +02:00
pictuga	e2c3375eb6	Log url earlier Now logging it in both use cases	2013-08-24 23:41:40 +02:00
pictuga	0c6e28205a	Use seconds for every parameter	2013-08-24 23:40:37 +02:00
pictuga	b350602232	Remove legacy "xml map" declaration	2013-08-24 23:16:23 +02:00
pictuga	1ba22516fe	Small help for etag handler	2013-07-19 00:02:52 +02:00
pictuga	90efb84c57	Don't log word counts Nobody cares	2013-07-18 23:55:58 +02:00
pictuga	9e324465e4	Use etag/last-modified to fetch xml feeds	2013-07-18 23:54:13 +02:00
pictuga	70df746416	Accept None as value to cache	2013-07-18 23:51:11 +02:00
pictuga	71129b5898	Fix headers definition Based on what's done inside urllib2.py.	2013-07-17 14:41:29 +02:00
pictuga	d3213ea1e7	Implement user-agent in HTMLDownloader It was forgotten in the previous commit	2013-07-17 14:40:29 +02:00
pictuga	918dede4be	Extend urllib2 to download pages, use gzip Cleaner than dirty function. Handles decoding, gzip decompression, meta redirects (eg. Washington Post). Might need extra testing.	2013-07-16 23:33:45 +02:00
pictuga	1fa8c4c535	Remove cleanXML() This function is way too strong, and no longer needed (even for the targeted feed). It lead to other bugs with other feeds, where needed spaces were stripped.	2013-07-15 11:10:19 +02:00
pictuga	0718303eb7	Use ' instead of " when possible	2013-07-14 19:00:16 +02:00
pictuga	7275bb1a59	Better content insertion Also takes care of description, by creating one, when missing.	2013-07-14 18:58:48 +02:00
pictuga	054f5c0846	Detect provided content with word count This is instead of character count.	2013-07-14 18:57:12 +02:00
pictuga	7fa183d713	Change morss.py to use feeds.py No other changes should appear in this commit	2013-07-14 18:44:11 +02:00
pictuga	8ac7d8b282	Add feeds.py This is a huge change. Feed parsing is now done in a separate file, much cleaner. The code of the lib tends to repeat itself a lot though. It should be possible to improve it. Code should be more stable.	2013-07-14 18:25:49 +02:00
pictuga	6e891ef6ff	Nicer link display in readme	2013-07-11 14:17:04 +02:00
pictuga	981e83fd1e	Add link to online test version	2013-07-11 14:11:23 +02:00
pictuga	cf3934a513	Change http output mimetype to xml	2013-06-28 13:34:12 +02:00
pictuga	1f4c219880	Common code for url/options handling	2013-06-25 13:13:23 +02:00
pictuga	89662ccbae	typo in readme	2013-06-19 22:16:46 +03:00
pictuga	16f2e3b4c3	todo and newsreader hook update in readme Updated liferea use to reflect code changes. Link to morss.it as live "preview". Added a todo. Added dependencies list.	2013-06-19 21:12:03 +02:00
pictuga	9ad9ffaf91	Use proper markdown for links in readme	2013-06-11 13:10:40 +02:00
pictuga	d2418a47c2	Add support for reddit.com feeds The content of the linked article is used for the content. The original content (with a link to comments) is still available in the "description" of the feed item.	2013-06-11 13:02:47 +02:00
pictuga	f0b237364f	Better annotation of feedsburner/feedsportal code	2013-06-11 13:02:16 +02:00
pictuga	0978e76356	str.decode() within EncDownload()	2013-06-08 17:32:55 +02:00
pictuga	89354e1528	Use file's built-in readlines() to split file	2013-06-08 17:30:53 +02:00
pictuga	bbf5c92ba2	Fix lenHTML() with empty string	2013-06-08 17:30:11 +02:00
pictuga	e05d1c9deb	Replace uppercase title with "title-case"	2013-06-02 23:45:41 +02:00
pictuga	f09dfbacf5	Warning in README: no http server provided	2013-05-23 21:54:11 +02:00
pictuga	a8feac9811	Detail MAX settings in README	2013-05-23 21:48:45 +02:00
pictuga	b78f0bfba5	Improve options and limits New limits are possible: time limit, max number of item fetched, and max number of item taken from cache. Fill third argument is now Fast=True, which is self-explicit. (Complexity of the changes made separate commits impossible).	2013-05-15 17:56:58 +02:00
pictuga	2a71fe07f2	Improve Cache code Removed _new flag. Slightly more stable and cleaner.	2013-05-15 17:48:39 +02:00
pictuga	bf647ba5f8	Make Fill return True when it had done sth useful	2013-05-15 17:38:52 +02:00
pictuga	9694a31052	Add 'feedurl' argument to Fill Was needed for commit f3c2c34	2013-05-15 17:36:00 +02:00
pictuga	8e2aab55e7	Check url before looking for provided content Also use lenHTML() function defined a lately	2013-05-15 17:32:42 +02:00
pictuga	85e40cde4e	Check article length is big enough Avoids replacing rather useful descriptions with empty string	2013-05-15 17:24:27 +02:00
pictuga	222b1369e5	Support for relative urls in feed	2013-05-15 17:13:57 +02:00
pictuga	d88719c87f	Use urlparse library to check feed urls	2013-05-15 17:12:59 +02:00
pictuga	1506a5c0cd	Fix string output in XMLMap	2013-05-05 16:04:42 +02:00
pictuga	adebe23232	Better logging when running as Liferea hook	2013-05-05 15:33:46 +02:00
pictuga	32514941b4	Try to improve support for bogus xml feed	2013-05-05 15:32:57 +02:00

... 8 9 10 11 12

556 Commits