Commit Graph

664 Commits (e6811138fda1712aae3297a7cfcd7d1944e0abec)
 

Author SHA1 Message Date
pictuga 9694a31052 Add 'feedurl' argument to Fill
Was needed for commit f3c2c34
2013-05-15 17:36:00 +02:00
pictuga 8e2aab55e7 Check url before looking for provided content
Also use lenHTML() function defined a lately
2013-05-15 17:32:42 +02:00
pictuga 85e40cde4e Check article length is big enough
Avoids replacing rather useful descriptions with empty string
2013-05-15 17:24:27 +02:00
pictuga 222b1369e5 Support for relative urls in feed 2013-05-15 17:13:57 +02:00
pictuga d88719c87f Use urlparse library to check feed urls 2013-05-15 17:12:59 +02:00
pictuga 1506a5c0cd Fix string output in XMLMap 2013-05-05 16:04:42 +02:00
pictuga adebe23232 Better logging when running as Liferea hook 2013-05-05 15:33:46 +02:00
pictuga 32514941b4 Try to improve support for bogus xml feed 2013-05-05 15:32:57 +02:00
pictuga b34ecb8ad3 Fix cache crash with one entry with empty value 2013-05-05 15:32:05 +02:00
pictuga e518f2cced Better timeout error handling
For older versions of Python
2013-05-05 15:31:11 +02:00
pictuga 03501edccd Add/fix extra modes
'progress' mode now works on Chrome. 'cache' mode only relies on cache to load faster.
2013-05-05 15:30:06 +02:00
pictuga 65090870ac Remove temp debug print statement 2013-05-05 15:28:32 +02:00
pictuga e77278dda9 Remove leftover SERVER var from source code 2013-05-01 19:31:24 +02:00
pictuga 949582ba19 Add progress view. 2013-05-01 17:57:09 +02:00
pictuga 5ee5dbf359 Cache http errors to save time. 2013-05-01 17:56:03 +02:00
pictuga 2f1ae1ce91 Use less suspicious user-agents. 2013-05-01 17:54:17 +02:00
pictuga 0a97a2a2b5 Support for combined feedsportal and feedburner. 2013-05-01 17:43:43 +02:00
pictuga 93b098ab11 Added http timeout. 2013-04-30 19:54:32 +02:00
pictuga 9f175994c6 Fix regex implementation. 2013-04-30 19:51:29 +02:00
pictuga ee08cccf9c Updated README since SERVER var drop. 2013-04-28 11:37:11 +02:00
pictuga 93f971896b Improved feedsportal url recognition. 2013-04-28 10:10:58 +02:00
pictuga fa7cd957df Save Cache when it's new.
So as to avoid crashes on first fetch.
2013-04-23 00:24:41 +02:00
pictuga ca90d082c3 Library import list made cleaner. 2013-04-23 00:04:44 +02:00
pictuga 1480bd7af4 Auto-detection of server-mode, better caching.
The SERVER variable is no longer needed. RSS .xml file is now cached for a very short time, so as to make loading faster, and hopefully reduce bann a little. Use a more common User-Agent to try to cut down bann. Added ability to test whether a key is in the Cache.
2013-04-23 00:00:07 +02:00
pictuga a616c96e32 Removed another unused var. 2013-04-22 23:58:20 +02:00
pictuga f95c5dcf0d Fixed caching. 2013-04-22 22:56:38 +02:00
pictuga 83d0dcce4d Delete unused var declaration. 2013-04-22 22:56:21 +02:00
pictuga 2d05653190 Better detection of feedportal, extra url logging. 2013-04-19 11:44:25 +02:00
pictuga 8ce9812dfd Meta redirects are now supported. 2013-04-19 11:43:47 +02:00
pictuga 80ba60d295 Better detection of feeds with content provided. 2013-04-19 11:42:54 +02:00
pictuga d2b74819b4 Improved caching.
No longer writes everytime a value is added, since it could cause some issues if two instances of the script were run at the same time. Now it only writes when the  Cache object is no longer in use (ie. garbage colllected).
2013-04-19 11:40:35 +02:00
pictuga 4abf7b699c Use readability to fetch article content.
Makes the whole "xpath rules" things useless. Almost any feed is now supported. CSS liferea stylesheets are also uneeded now, since readability cleans up html code a more efficient way. README was updated.
2013-04-19 11:37:43 +02:00
pictuga 437b0da8a9 Updated README to reflect 404 redirection support. 2013-04-19 11:30:34 +02:00
pictuga 17db2584da Fixed caching.
For scary reasons, re-used cache was deleted everytime. This is now fixed. Loading in now *really* fast.
2013-04-16 16:13:42 +02:00
pictuga 5a74babf24 Improved logging on server. 2013-04-16 16:13:14 +02:00
pictuga 7b1c32eac2 Added support for 404 redirect.
ie. http://domain.com/bbc.co.uk/feed.xml will redirect to http://domain.com/morss.py/bbc.co.uk/feed.xml and work.
2013-04-16 16:11:34 +02:00
pictuga af8879049f Another huge commit.
Now uses OOP where it fits. Atom feeds are supported, but no real tests were made. Unix globbing is now possible for urls. Caching is done a cleaner way. Feedburner links are also replaced. HTML is cleaned a more efficient way. Code is now much cleaner, using lxml.objectify and a small wrapper to access Atom feeds as if they were RSS feeds (and much faster than feedparser). README has been updated.
2013-04-15 18:51:55 +02:00
pictuga a098b7e104 Add .htaccess to display cache files as RSS feeds. 2013-04-05 17:05:26 +02:00
pictuga 5898879c8e Added .htaccess to enable script execution. 2013-04-05 17:02:42 +02:00
pictuga d6e6d61199 Bypass feedsportal. 2013-04-04 19:29:22 +02:00
pictuga ad25516e34 Speak about deleteTags in README. 2013-04-04 18:31:26 +02:00
pictuga 851dacdfbc Renamed to .py. 2013-04-04 18:17:12 +02:00
pictuga 6783bbf992 Improved shebang. 2013-04-04 17:56:37 +02:00
pictuga 82084c2c75 Move to OOP.
This is a huge commit. The whole code is ported to Object-Oritented Programming. This makes the code cleaner, which became required to deal with all the different cases, for example with encoding detection. Encoding detection now works better, and uses 3 different methods. HTML pages with an xml declaration are now supported. Feed urls with parameters (eg. "index.php?option=par") are also supported. Cache is now smarter, since it no longer grows indefinitely, since only in-use pages are kept in the cache. Caching is now mandatory. urllib (not urllib2) is no longer needed. Solved a possible crash with log function (when passing list of str with non-unicode encoging).
README is also updated.
2013-04-04 17:43:30 +02:00
pictuga c21af6d9a8 Added lesoir.be rule 2013-04-03 10:22:07 +02:00
pictuga 05b5bc7783 Catch extra errors (timeout). 2013-03-29 20:06:31 +01:00
pictuga f734fb2623 Added quick licence information. 2013-03-29 20:05:53 +01:00
pictuga 6f6c5fbaad Faster xml cleaning 2013-03-01 14:26:51 +01:00
pictuga e305f387ab Hopefully fixed encoding issues
with the dirtiest trick out there...
2013-02-27 15:12:32 +01:00
pictuga 0eaa1b3ab9 Added lesoir.be css rules 2013-02-27 15:12:17 +01:00