Commit Graph

723 Commits (4aa56067523d4526c67a040fd9c44556a3173b03)
 

Author SHA1 Message Date
pictuga 851dacdfbc Renamed to .py. 2013-04-04 18:17:12 +02:00
pictuga 6783bbf992 Improved shebang. 2013-04-04 17:56:37 +02:00
pictuga 82084c2c75 Move to OOP.
This is a huge commit. The whole code is ported to Object-Oritented Programming. This makes the code cleaner, which became required to deal with all the different cases, for example with encoding detection. Encoding detection now works better, and uses 3 different methods. HTML pages with an xml declaration are now supported. Feed urls with parameters (eg. "index.php?option=par") are also supported. Cache is now smarter, since it no longer grows indefinitely, since only in-use pages are kept in the cache. Caching is now mandatory. urllib (not urllib2) is no longer needed. Solved a possible crash with log function (when passing list of str with non-unicode encoging).
README is also updated.
2013-04-04 17:43:30 +02:00
pictuga c21af6d9a8 Added lesoir.be rule 2013-04-03 10:22:07 +02:00
pictuga 05b5bc7783 Catch extra errors (timeout). 2013-03-29 20:06:31 +01:00
pictuga f734fb2623 Added quick licence information. 2013-03-29 20:05:53 +01:00
pictuga 6f6c5fbaad Faster xml cleaning 2013-03-01 14:26:51 +01:00
pictuga e305f387ab Hopefully fixed encoding issues
with the dirtiest trick out there...
2013-02-27 15:12:32 +01:00
pictuga 0eaa1b3ab9 Added lesoir.be css rules 2013-02-27 15:12:17 +01:00
pictuga 682ab253b0 Typo in README 2013-02-25 21:56:16 +01:00
pictuga 217ff0fd8f Use better markdown syntax for default xpath rule 2013-02-25 21:55:17 +01:00
pictuga 27b0fbaf01 Speak about default xpath in README 2013-02-25 21:54:04 +01:00
pictuga be17f0c78f Updated README to markdown 2013-02-25 21:49:38 +01:00
pictuga 9a1b2a8490 Updated README since caching is now implemented 2013-02-25 21:38:56 +01:00
pictuga ed8a45875c Default to "//h1/.." since most website use it
because it is said to be good for SEO. Debug now requires env variable "DEBUG" to be set to something else than "".
2013-02-25 21:36:02 +01:00
pictuga 253bc27f17 Hide <noscript> warnings 2013-02-25 21:35:46 +01:00
pictuga d39604c453 Support for cookies added
NYT needs them
2013-02-25 20:53:59 +01:00
pictuga d6179a734f Clearer debug info 2013-02-25 20:53:22 +01:00
pictuga eb63ce3f4f Handle more errors 2013-02-25 18:32:23 +01:00
pictuga b63f91a151 Added cache, easier debug 2013-02-25 18:01:59 +01:00
pictuga 7dfe92de63 Added README 2013-02-25 16:40:51 +01:00
pictuga 2a146f1a36 Links to rss feeds in rules list 2013-02-25 16:18:52 +01:00
pictuga 51fe6ce81b First commit 2013-02-25 15:50:32 +01:00