index.Html: fix meta tag

sheet.xsl: mobile-friendly view & url fix
sheet.xsl: show feed url (js-based)
2020-03-20 20:21:08 +01:00 · 2020-03-20 19:47:43 +01:00 · 2020-03-20 19:46:20 +01:00 · 2020-03-20 16:44:49 +01:00 · 2020-03-20 15:44:25 +01:00 · 2020-03-20 15:32:44 +01:00
5 changed files with 213 additions and 55 deletions
--- a/README.md
+++ b/README.md
@@ -55,7 +55,9 @@ You do need:
 Simplest way to get these:
 ```shell
 pip install -r requirements.txt
 ```
 You may also need:
@@ -140,7 +142,9 @@ ensure that the provided `/www/.htaccess` works well with your server.
 Running this command should do:
 ```shell
 uwsgi --http :9090 --plugin python --wsgi-file main.py
 ```
 However, one problem might be how to serve the provided `index.html` file if it
 isn't in the same directory. Therefore you can add this at the end of the
@@ -156,8 +160,12 @@ You can change the port and the location of the `www/` folder like this `python
 #### Passing arguments
-Then visit: **`http://PATH/TO/MORSS/[main.py/][:argwithoutvalue[:argwithvalue=value[...]]]/FEEDURL`**  
+Then visit:
 ```
 http://PATH/TO/MORSS/[main.py/][:argwithoutvalue[:argwithvalue=value[...]]]/FEEDURL
 ```
 For example: `http://morss.example/:clip/https://twitter.com/pictuga`
 *(Brackets indicate optional text)*
 The `main.py` part is only needed if your server doesn't support the Apache redirect rule set in the provided `.htaccess`.
@@ -166,8 +174,12 @@ Works like a charm with [Tiny Tiny RSS](http://tt-rss.org/redmine/projects/tt-rs
 ### As a CLI application
-Run: **`python[2.7] -m morss [argwithoutvalue] [argwithvalue=value] [...] FEEDURL`**  
+Run:
 ```
 python[2.7] -m morss [argwithoutvalue] [argwithvalue=value] [...] FEEDURL
 ```
 For example: `python -m morss debug http://feeds.bbci.co.uk/news/rss.xml`
 *(Brackets indicate optional text)*
 ### As a newsreader hook
@@ -177,8 +189,12 @@ To use it, the newsreader [Liferea](http://lzone.de/liferea/) is required
 scripts can be run on top of the RSS feed, using its
 [output](http://lzone.de/liferea/scraping.htm) as an RSS feed.
-To use this script, you have to enable "(Unix) command" in liferea feed settings, and use the command: **`[python2.7] PATH/TO/MORSS/main.py [argwithoutvalue] [argwithvalue=value] [...] FEEDURL`**  
+To use this script, you have to enable "(Unix) command" in liferea feed settings, and use the command:
 ```
 [python[2.7]] PATH/TO/MORSS/main.py [argwithoutvalue] [argwithvalue=value] [...] FEEDURL
 ```
 For example: `python2.7 PATH/TO/MORSS/main.py http://feeds.bbci.co.uk/news/rss.xml`
 *(Brackets indicate optional text)*
 ### As a python library
--- a/morss/feeds.py
+++ b/morss/feeds.py
@@ -21,12 +21,10 @@ json.encoder.c_make_encoder = None
 try:
    # python 2
    from StringIO import StringIO
    from urllib2 import urlopen
    from ConfigParser import RawConfigParser
 except ImportError:
    # python 3
    from io import StringIO
    from urllib.request import urlopen
    from configparser import RawConfigParser
 try:
@@ -164,7 +162,7 @@ class ParserBase(object):
        return self.convert(FeedHTML).tostring(**k)
    def convert(self, TargetParser):
-        if isinstance(self, TargetParser):
+        if type(self) == TargetParser:
            return self
        target = TargetParser()
@@ -208,11 +206,11 @@ class ParserBase(object):
        pass
    def rule_remove(self, rule):
-        # remove node from its parent
+        # remove node from its parent. Returns nothing
        pass
    def rule_set(self, rule, value):
-        # value is always a str?
+        # set the value. Returns nothing
        pass
    def rule_str(self, rule):
@@ -247,25 +245,30 @@ class ParserBase(object):
        return self.rule_search_all(self.rules[rule_name])
-    def get_str(self, rule_name):
+    def get(self, rule_name):
        # simple function to get nice text from the rule name
-        # for use in @property, ie. self.get_str('title')
+        # for use in @property, ie. self.get('title')
        if rule_name not in self.rules:
            return None
-        return self.rule_str(self.rules[rule_name])
+        return self.rule_str(self.rules[rule_name]) or None
-    def set_str(self, rule_name, value):
+    def set(self, rule_name, value):
        # simple function to set nice text from the rule name. Returns nothing
        if rule_name not in self.rules:
-            return None
+            return
        if value is None:
            self.rmv(rule_name)
            return
        try:
-            return self.rule_set(self.rules[rule_name], value)
+            self.rule_set(self.rules[rule_name], value)
        except AttributeError:
            # does not exist, have to create it
            self.rule_create(self.rules[rule_name])
-            return self.rule_set(self.rules[rule_name], value)
+            self.rule_set(self.rules[rule_name], value)
    def rmv(self, rule_name):
        # easy deleter
@@ -291,7 +294,7 @@ class ParserXML(ParserBase):
        'rssfake': 'http://purl.org/rss/1.0/'}
    def parse(self, raw):
-        parser = etree.XMLParser(recover=True)
+        parser = etree.XMLParser(recover=True, remove_blank_text=True, remove_pis=True) # remove_blank_text needed for pretty_print
        return etree.fromstring(raw, parser)
    def remove(self):
@@ -369,10 +372,6 @@ class ParserXML(ParserBase):
            match.getparent().append(element)
            return element
        # try duplicating from template
        # FIXME
        # >>> self.xml.getroottree().getpath(ff.find('a'))
        return None
    def rule_remove(self, rule):
@@ -432,7 +431,7 @@ class ParserXML(ParserBase):
                return etree.tostring(match, method='text', encoding='unicode').strip()
        else:
-            return match or ""
+            return match # might be None is no match
 class ParserHTML(ParserXML):
@@ -441,7 +440,8 @@ class ParserHTML(ParserXML):
    mimetype = ['text/html', 'application/xhtml+xml']
    def parse(self, raw):
-        return lxml.html.fromstring(raw)
+        parser = etree.HTMLParser(remove_blank_text=True) # remove_blank_text needed for pretty_print
        return etree.fromstring(raw, parser)
    def tostring(self, encoding='unicode', **k):
        return lxml.html.tostring(self.root, encoding=encoding, **k)
@@ -467,11 +467,12 @@ class ParserHTML(ParserXML):
            element = deepcopy(match)
            match.getparent().append(element)
    # TODO def rule_set for the html part
 def parse_time(value):
-    if isinstance(value, basestring):
+    if value is None or value == 0:
        return None
    elif isinstance(value, basestring):
        if re.match(r'^[0-9]+$', value):
            return datetime.fromtimestamp(int(value), tz.UTC)
@@ -483,8 +484,9 @@ def parse_time(value):
    elif isinstance(value, datetime):
        return value
    else:
-        return False
+        return None
 class ParserJSON(ParserBase):
@@ -496,8 +498,9 @@ class ParserJSON(ParserBase):
        return json.loads(raw)
    def remove(self):
-        # delete oneself FIXME
+        # impossible to "delete" oneself per se but can clear all its items
-        pass
+        for attr in self.root:
            del self.root[attr]
    def tostring(self, encoding='unicode', **k):
        dump = json.dumps(self.root, ensure_ascii=False, **k) # ensure_ascii = False to have proper (unicode) string and not \u00
@@ -557,11 +560,16 @@ class ParserJSON(ParserBase):
        rrule = self._rule_parse(rule)
        cur = self.root
        try:
            for node in rrule[:-1]:
                cur = cur[node]
            del cur[rrule[-1]]
        except KeyError:
            # nothing to delete
            pass
    def rule_set(self, rule, value):
        if '[]' in rule:
            raise ValueError('not supported') # FIXME
@@ -608,12 +616,12 @@ class Feed(object):
        return [itemsClass(x, self.rules, self) for x in items]
    title = property(
-        lambda f:   f.get_str('title'),
+        lambda f:   f.get('title'),
-        lambda f,x: f.set_str('title', x),
+        lambda f,x: f.set('title', x),
        lambda f:   f.rmv('title') )
    description = desc = property(
-        lambda f:   f.get_str('desc'),
+        lambda f:   f.get('desc'),
-        lambda f,x: f.set_str('desc', x),
+        lambda f,x: f.set('desc', x),
        lambda f:   f.rmv('desc') )
    items = property(
        lambda f:   f )
@@ -660,28 +668,28 @@ class Item(Uniq):
        return id(xml)
    title = property(
-        lambda f:   f.get_str('item_title'),
+        lambda f:   f.get('item_title'),
-        lambda f,x: f.set_str('item_title', x),
+        lambda f,x: f.set('item_title', x),
        lambda f:   f.rmv('item_title') )
    link = property(
-        lambda f:   f.get_str('item_link'),
+        lambda f:   f.get('item_link'),
-        lambda f,x: f.set_str('item_link', x),
+        lambda f,x: f.set('item_link', x),
        lambda f:   f.rmv('item_link') )
    description = desc = property(
-        lambda f:   f.get_str('item_desc'),
+        lambda f:   f.get('item_desc'),
-        lambda f,x: f.set_str('item_desc', x),
+        lambda f,x: f.set('item_desc', x),
        lambda f:   f.rmv('item_desc') )
    content = property(
-        lambda f:   f.get_str('item_content'),
+        lambda f:   f.get('item_content'),
-        lambda f,x: f.set_str('item_content', x),
+        lambda f,x: f.set('item_content', x),
        lambda f:   f.rmv('item_content') )
    time = property(
-        lambda f:   f.time_prs(f.get_str('item_time')),
+        lambda f:   f.time_prs(f.get('item_time')),
-        lambda f,x: f.set_str('item_time', f.time_fmt(x)),
+        lambda f,x: f.set('item_time', f.time_fmt(x)),
        lambda f:   f.rmv('item_time') )
    updated = property(
-        lambda f:   f.time_prs(f.get_str('item_updated')),
+        lambda f:   f.time_prs(f.get('item_updated')),
-        lambda f,x: f.set_str('item_updated', f.time_fmt(x)),
+        lambda f,x: f.set('item_updated', f.time_fmt(x)),
        lambda f:   f.rmv('item_updated') )
@@ -690,6 +698,10 @@ class FeedXML(Feed, ParserXML):
    def tostring(self, encoding='unicode', **k):
        # override needed due to "getroottree" inclusion
        if self.root.getprevious() is None:
            self.root.addprevious(etree.PI('xml-stylesheet', 'type="text/xsl" href="/sheet.xsl"'))
        return etree.tostring(self.root.getroottree(), encoding=encoding, **k)
--- a/morss/morss.py
+++ b/morss/morss.py
@@ -204,7 +204,7 @@ def ItemFill(item, options, feedurl='/', fast=False):
    # twitter
    if urlparse(feedurl).netloc == 'twitter.com':
-        match = lxml.html.fromstring(item.content).xpath('//a/@data-expanded-url')
+        match = lxml.html.fromstring(item.desc).xpath('//a/@data-expanded-url')
        if len(match):
            link = match[0]
            log(link)
@@ -341,6 +341,8 @@ def FeedFetch(url, options):
    else:
        try:
            rss = feeds.parse(xml, url, contenttype)
            rss = rss.convert(feeds.FeedXML)
                # contains all fields, otherwise much-needed data can be lost
        except TypeError:
            log('random page')
@@ -435,8 +437,10 @@ def FeedFormat(rss, options):
    if options.callback:
        if re.match(r'^[a-zA-Z0-9\.]+$', options.callback) is not None:
            return '%s(%s)' % (options.callback, rss.tojson())
        else:
            raise MorssException('Invalid callback var name')
    elif options.json:
        if options.indent:
            return rss.tojson(encoding='UTF-8', indent=4)
@@ -448,6 +452,10 @@ def FeedFormat(rss, options):
        return rss.tocsv(encoding='UTF-8')
    elif options.reader:
        if options.indent:
            return rss.tohtml(encoding='UTF-8', pretty_print=True)
        else:
            return rss.tohtml(encoding='UTF-8')
    else:
--- a/www/index.html
+++ b/www/index.html
@@ -2,7 +2,7 @@
 <html>
 	<head>
 		<title>morss</title>
-		<meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;">
+		<meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;" />
 		<meta charset="UTF-8" />
 		<style type="text/css">
 			body
--- a/www/sheet.xsl
+++ b/www/sheet.xsl
@@ -0,0 +1,122 @@
 <?xml version="1.0" encoding="utf-8"?>
 <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 	<xsl:output method="html"/>
 	<xsl:template match="/">
 		<html>
 		<head>
 			<title>RSS feed by morss</title>
 			<meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;" />
 			<style type="text/css">
 				body {
 					overflow-wrap: anywhere;
 					word-wrap: anywhere;
 				}
 				#url {
 					background-color: rgba(255, 165, 0, 0.25);
 					padding: 1% 5%;
 					display: inline-block;
 					max-width: 100%;
 				}
 				body > ul {
 					background-color: #FFFAF4;
 					padding: 1%;
 					max-width: 100%;
 				}
 				ul {
 					list-style-type: none;
 				}
 				.tag {
 					color: darkred;
 				}
 				.attr {
 					color: darksalmon;
 				}
 				.value {
 					color: darkblue;
 				}
 				.comment {
 					color: lightgrey;
 				}
 				pre {
 					margin: 0;
 					max-width: 100%;
 					white-space: normal;
 				}
 			</style>
 		</head>
 		<body>
 			<h1>RSS feed by morss</h1>
 			<p>Your RSS feed is <strong style="color: green">ready</strong>. You
 			can enter the following url in your newsreader:</p>
 			<div id="url"></div>
 			<ul>
 				<xsl:apply-templates/>
 			</ul>
 			<script>
 				document.getElementById("url").innerHTML = window.location.href;
 			</script>
 		</body>
 		</html>
 	</xsl:template>
 	<xsl:template match="*">
 		<li>
 			<span class="element">
 				&lt;
 					<span class="tag"><xsl:value-of select="name()"/></span>
 					<xsl:for-each select="@*">
 						<span class="attr"> <xsl:value-of select="name()"/></span>
 						=
 						"<span class="value"><xsl:value-of select="."/></span>"
 					</xsl:for-each>
 				&gt;
 			</span>
 			<xsl:if test="node()">
 				<ul>
 					<xsl:apply-templates/>
 				</ul>
 			</xsl:if>
 			<span class="element">
 				&lt;/
 					<span class="tag"><xsl:value-of select="name()"/></span>
 				&gt;
 			</span>
 		</li>
 	</xsl:template>
 	<xsl:template match="comment()">
 		<li>
 			<pre class="comment"><![CDATA[<!--]]><xsl:value-of select="."/><![CDATA[-->]]></pre>
 		</li>
 	</xsl:template>
 	<xsl:template match="text()">
 		<li>
 			<pre>
 				<xsl:value-of select="normalize-space(.)"/>
 			</pre>
 		</li>
 	</xsl:template>
 	<xsl:template match="text()[not(normalize-space())]"/>
 </xsl:stylesheet>
Author	SHA1	Message	Date
pictuga	fbcb23cf88	index.Html: fix meta tag	2020-03-20 20:21:08 +01:00
pictuga	a0e8e84a67	sheet.xsl: mobile-friendly view & url fix	2020-03-20 19:47:43 +01:00
pictuga	a90fd682db	sheet.xsl: show feed url (js-based)	2020-03-20 19:46:20 +01:00
pictuga	2c245f9506	sheet.xsl: improve output formatting Include tags, better CSS	2020-03-20 16:44:49 +01:00
pictuga	3d45451fef	sheet.xsl: improve element content output	2020-03-20 15:44:25 +01:00
pictuga	4d785820d9	feeds: ignore provided stylesheets and add ours Provided sheets usually create errors. Ours is (hopefully) more informative for users not familiar with RSS feeds	2020-03-20 15:32:44 +01:00
pictuga	6a01fc439e	feeds: better handle "empty" datetime	2020-03-20 12:30:42 +01:00
pictuga	d24734110a	morss: convert all feeds to RSS As html feeds might not contain some feeds, leading to data loss	2020-03-20 12:26:34 +01:00
pictuga	a41c2a3a62	morss: fix twitter link detection	2020-03-20 12:26:19 +01:00
pictuga	dd2651061f	feeds & morss: clean up comments/empty lines	2020-03-20 12:25:48 +01:00
pictuga	912c323c40	feeds: make function output more consistent e.g. setters return nothing, getters return sth relevant or None (i.e. no empty strings)	2020-03-20 12:23:15 +01:00
pictuga	5705a0be17	feeds: fix delete/rmv code	2020-03-20 12:22:07 +01:00
pictuga	4735ffba45	feeds: fix .convert auto-convert To fix inheritance loophole	2020-03-20 12:20:41 +01:00
pictuga	08e39f5631	feeds: give simpler name to helper functions	2020-03-20 12:20:15 +01:00
pictuga	765a43511e	feeds: remove unused import	2020-03-20 12:19:08 +01:00
pictuga	5865af64f9	Fix indent output for html/xml	2020-03-20 12:18:13 +01:00
pictuga	ae3bd58386	README: clarify newsreader hook syntax	2020-03-20 11:43:19 +01:00
pictuga	e3be9b5a9e	README: improve layout	2020-03-20 11:41:43 +01:00
pictuga	f8c09af563	README: improve syntax highlighting	2020-03-20 11:33:52 +01:00