Compare commits

...

19 Commits

Author SHA1 Message Date
fbcb23cf88 index.Html: fix meta tag 2020-03-20 20:21:08 +01:00
a0e8e84a67 sheet.xsl: mobile-friendly view & url fix 2020-03-20 19:47:43 +01:00
a90fd682db sheet.xsl: show feed url (js-based) 2020-03-20 19:46:20 +01:00
2c245f9506 sheet.xsl: improve output formatting
Include tags, better CSS
2020-03-20 16:44:49 +01:00
3d45451fef sheet.xsl: improve element content output 2020-03-20 15:44:25 +01:00
4d785820d9 feeds: ignore provided stylesheets and add ours
Provided sheets usually create errors. Ours is (hopefully) more informative for users not familiar with RSS feeds
2020-03-20 15:32:44 +01:00
6a01fc439e feeds: better handle "empty" datetime 2020-03-20 12:30:42 +01:00
d24734110a morss: convert all feeds to RSS
As html feeds might not contain some feeds, leading to data loss
2020-03-20 12:26:34 +01:00
a41c2a3a62 morss: fix twitter link detection 2020-03-20 12:26:19 +01:00
dd2651061f feeds & morss: clean up comments/empty lines 2020-03-20 12:25:48 +01:00
912c323c40 feeds: make function output more consistent
e.g. setters return nothing, getters return sth relevant or None (i.e. no empty strings)
2020-03-20 12:23:15 +01:00
5705a0be17 feeds: fix delete/rmv code 2020-03-20 12:22:07 +01:00
4735ffba45 feeds: fix .convert auto-convert
To fix inheritance loophole
2020-03-20 12:20:41 +01:00
08e39f5631 feeds: give simpler name to helper functions 2020-03-20 12:20:15 +01:00
765a43511e feeds: remove unused import 2020-03-20 12:19:08 +01:00
5865af64f9 Fix indent output for html/xml 2020-03-20 12:18:13 +01:00
ae3bd58386 README: clarify newsreader hook syntax 2020-03-20 11:43:19 +01:00
e3be9b5a9e README: improve layout 2020-03-20 11:41:43 +01:00
f8c09af563 README: improve syntax highlighting 2020-03-20 11:33:52 +01:00
5 changed files with 213 additions and 55 deletions

View File

@@ -55,7 +55,9 @@ You do need:
Simplest way to get these: Simplest way to get these:
```shell
pip install -r requirements.txt pip install -r requirements.txt
```
You may also need: You may also need:
@@ -140,7 +142,9 @@ ensure that the provided `/www/.htaccess` works well with your server.
Running this command should do: Running this command should do:
```shell
uwsgi --http :9090 --plugin python --wsgi-file main.py uwsgi --http :9090 --plugin python --wsgi-file main.py
```
However, one problem might be how to serve the provided `index.html` file if it However, one problem might be how to serve the provided `index.html` file if it
isn't in the same directory. Therefore you can add this at the end of the isn't in the same directory. Therefore you can add this at the end of the
@@ -156,8 +160,12 @@ You can change the port and the location of the `www/` folder like this `python
#### Passing arguments #### Passing arguments
Then visit: **`http://PATH/TO/MORSS/[main.py/][:argwithoutvalue[:argwithvalue=value[...]]]/FEEDURL`** Then visit:
```
http://PATH/TO/MORSS/[main.py/][:argwithoutvalue[:argwithvalue=value[...]]]/FEEDURL
```
For example: `http://morss.example/:clip/https://twitter.com/pictuga` For example: `http://morss.example/:clip/https://twitter.com/pictuga`
*(Brackets indicate optional text)* *(Brackets indicate optional text)*
The `main.py` part is only needed if your server doesn't support the Apache redirect rule set in the provided `.htaccess`. The `main.py` part is only needed if your server doesn't support the Apache redirect rule set in the provided `.htaccess`.
@@ -166,8 +174,12 @@ Works like a charm with [Tiny Tiny RSS](http://tt-rss.org/redmine/projects/tt-rs
### As a CLI application ### As a CLI application
Run: **`python[2.7] -m morss [argwithoutvalue] [argwithvalue=value] [...] FEEDURL`** Run:
```
python[2.7] -m morss [argwithoutvalue] [argwithvalue=value] [...] FEEDURL
```
For example: `python -m morss debug http://feeds.bbci.co.uk/news/rss.xml` For example: `python -m morss debug http://feeds.bbci.co.uk/news/rss.xml`
*(Brackets indicate optional text)* *(Brackets indicate optional text)*
### As a newsreader hook ### As a newsreader hook
@@ -177,8 +189,12 @@ To use it, the newsreader [Liferea](http://lzone.de/liferea/) is required
scripts can be run on top of the RSS feed, using its scripts can be run on top of the RSS feed, using its
[output](http://lzone.de/liferea/scraping.htm) as an RSS feed. [output](http://lzone.de/liferea/scraping.htm) as an RSS feed.
To use this script, you have to enable "(Unix) command" in liferea feed settings, and use the command: **`[python2.7] PATH/TO/MORSS/main.py [argwithoutvalue] [argwithvalue=value] [...] FEEDURL`** To use this script, you have to enable "(Unix) command" in liferea feed settings, and use the command:
```
[python[2.7]] PATH/TO/MORSS/main.py [argwithoutvalue] [argwithvalue=value] [...] FEEDURL
```
For example: `python2.7 PATH/TO/MORSS/main.py http://feeds.bbci.co.uk/news/rss.xml` For example: `python2.7 PATH/TO/MORSS/main.py http://feeds.bbci.co.uk/news/rss.xml`
*(Brackets indicate optional text)* *(Brackets indicate optional text)*
### As a python library ### As a python library

View File

@@ -21,12 +21,10 @@ json.encoder.c_make_encoder = None
try: try:
# python 2 # python 2
from StringIO import StringIO from StringIO import StringIO
from urllib2 import urlopen
from ConfigParser import RawConfigParser from ConfigParser import RawConfigParser
except ImportError: except ImportError:
# python 3 # python 3
from io import StringIO from io import StringIO
from urllib.request import urlopen
from configparser import RawConfigParser from configparser import RawConfigParser
try: try:
@@ -164,7 +162,7 @@ class ParserBase(object):
return self.convert(FeedHTML).tostring(**k) return self.convert(FeedHTML).tostring(**k)
def convert(self, TargetParser): def convert(self, TargetParser):
if isinstance(self, TargetParser): if type(self) == TargetParser:
return self return self
target = TargetParser() target = TargetParser()
@@ -208,11 +206,11 @@ class ParserBase(object):
pass pass
def rule_remove(self, rule): def rule_remove(self, rule):
# remove node from its parent # remove node from its parent. Returns nothing
pass pass
def rule_set(self, rule, value): def rule_set(self, rule, value):
# value is always a str? # set the value. Returns nothing
pass pass
def rule_str(self, rule): def rule_str(self, rule):
@@ -247,25 +245,30 @@ class ParserBase(object):
return self.rule_search_all(self.rules[rule_name]) return self.rule_search_all(self.rules[rule_name])
def get_str(self, rule_name): def get(self, rule_name):
# simple function to get nice text from the rule name # simple function to get nice text from the rule name
# for use in @property, ie. self.get_str('title') # for use in @property, ie. self.get('title')
if rule_name not in self.rules: if rule_name not in self.rules:
return None return None
return self.rule_str(self.rules[rule_name]) return self.rule_str(self.rules[rule_name]) or None
def set_str(self, rule_name, value): def set(self, rule_name, value):
# simple function to set nice text from the rule name. Returns nothing
if rule_name not in self.rules: if rule_name not in self.rules:
return None return
if value is None:
self.rmv(rule_name)
return
try: try:
return self.rule_set(self.rules[rule_name], value) self.rule_set(self.rules[rule_name], value)
except AttributeError: except AttributeError:
# does not exist, have to create it # does not exist, have to create it
self.rule_create(self.rules[rule_name]) self.rule_create(self.rules[rule_name])
return self.rule_set(self.rules[rule_name], value) self.rule_set(self.rules[rule_name], value)
def rmv(self, rule_name): def rmv(self, rule_name):
# easy deleter # easy deleter
@@ -291,7 +294,7 @@ class ParserXML(ParserBase):
'rssfake': 'http://purl.org/rss/1.0/'} 'rssfake': 'http://purl.org/rss/1.0/'}
def parse(self, raw): def parse(self, raw):
parser = etree.XMLParser(recover=True) parser = etree.XMLParser(recover=True, remove_blank_text=True, remove_pis=True) # remove_blank_text needed for pretty_print
return etree.fromstring(raw, parser) return etree.fromstring(raw, parser)
def remove(self): def remove(self):
@@ -369,10 +372,6 @@ class ParserXML(ParserBase):
match.getparent().append(element) match.getparent().append(element)
return element return element
# try duplicating from template
# FIXME
# >>> self.xml.getroottree().getpath(ff.find('a'))
return None return None
def rule_remove(self, rule): def rule_remove(self, rule):
@@ -432,7 +431,7 @@ class ParserXML(ParserBase):
return etree.tostring(match, method='text', encoding='unicode').strip() return etree.tostring(match, method='text', encoding='unicode').strip()
else: else:
return match or "" return match # might be None is no match
class ParserHTML(ParserXML): class ParserHTML(ParserXML):
@@ -441,7 +440,8 @@ class ParserHTML(ParserXML):
mimetype = ['text/html', 'application/xhtml+xml'] mimetype = ['text/html', 'application/xhtml+xml']
def parse(self, raw): def parse(self, raw):
return lxml.html.fromstring(raw) parser = etree.HTMLParser(remove_blank_text=True) # remove_blank_text needed for pretty_print
return etree.fromstring(raw, parser)
def tostring(self, encoding='unicode', **k): def tostring(self, encoding='unicode', **k):
return lxml.html.tostring(self.root, encoding=encoding, **k) return lxml.html.tostring(self.root, encoding=encoding, **k)
@@ -467,11 +467,12 @@ class ParserHTML(ParserXML):
element = deepcopy(match) element = deepcopy(match)
match.getparent().append(element) match.getparent().append(element)
# TODO def rule_set for the html part
def parse_time(value): def parse_time(value):
if isinstance(value, basestring): if value is None or value == 0:
return None
elif isinstance(value, basestring):
if re.match(r'^[0-9]+$', value): if re.match(r'^[0-9]+$', value):
return datetime.fromtimestamp(int(value), tz.UTC) return datetime.fromtimestamp(int(value), tz.UTC)
@@ -483,8 +484,9 @@ def parse_time(value):
elif isinstance(value, datetime): elif isinstance(value, datetime):
return value return value
else: else:
return False return None
class ParserJSON(ParserBase): class ParserJSON(ParserBase):
@@ -496,8 +498,9 @@ class ParserJSON(ParserBase):
return json.loads(raw) return json.loads(raw)
def remove(self): def remove(self):
# delete oneself FIXME # impossible to "delete" oneself per se but can clear all its items
pass for attr in self.root:
del self.root[attr]
def tostring(self, encoding='unicode', **k): def tostring(self, encoding='unicode', **k):
dump = json.dumps(self.root, ensure_ascii=False, **k) # ensure_ascii = False to have proper (unicode) string and not \u00 dump = json.dumps(self.root, ensure_ascii=False, **k) # ensure_ascii = False to have proper (unicode) string and not \u00
@@ -557,11 +560,16 @@ class ParserJSON(ParserBase):
rrule = self._rule_parse(rule) rrule = self._rule_parse(rule)
cur = self.root cur = self.root
try:
for node in rrule[:-1]: for node in rrule[:-1]:
cur = cur[node] cur = cur[node]
del cur[rrule[-1]] del cur[rrule[-1]]
except KeyError:
# nothing to delete
pass
def rule_set(self, rule, value): def rule_set(self, rule, value):
if '[]' in rule: if '[]' in rule:
raise ValueError('not supported') # FIXME raise ValueError('not supported') # FIXME
@@ -608,12 +616,12 @@ class Feed(object):
return [itemsClass(x, self.rules, self) for x in items] return [itemsClass(x, self.rules, self) for x in items]
title = property( title = property(
lambda f: f.get_str('title'), lambda f: f.get('title'),
lambda f,x: f.set_str('title', x), lambda f,x: f.set('title', x),
lambda f: f.rmv('title') ) lambda f: f.rmv('title') )
description = desc = property( description = desc = property(
lambda f: f.get_str('desc'), lambda f: f.get('desc'),
lambda f,x: f.set_str('desc', x), lambda f,x: f.set('desc', x),
lambda f: f.rmv('desc') ) lambda f: f.rmv('desc') )
items = property( items = property(
lambda f: f ) lambda f: f )
@@ -660,28 +668,28 @@ class Item(Uniq):
return id(xml) return id(xml)
title = property( title = property(
lambda f: f.get_str('item_title'), lambda f: f.get('item_title'),
lambda f,x: f.set_str('item_title', x), lambda f,x: f.set('item_title', x),
lambda f: f.rmv('item_title') ) lambda f: f.rmv('item_title') )
link = property( link = property(
lambda f: f.get_str('item_link'), lambda f: f.get('item_link'),
lambda f,x: f.set_str('item_link', x), lambda f,x: f.set('item_link', x),
lambda f: f.rmv('item_link') ) lambda f: f.rmv('item_link') )
description = desc = property( description = desc = property(
lambda f: f.get_str('item_desc'), lambda f: f.get('item_desc'),
lambda f,x: f.set_str('item_desc', x), lambda f,x: f.set('item_desc', x),
lambda f: f.rmv('item_desc') ) lambda f: f.rmv('item_desc') )
content = property( content = property(
lambda f: f.get_str('item_content'), lambda f: f.get('item_content'),
lambda f,x: f.set_str('item_content', x), lambda f,x: f.set('item_content', x),
lambda f: f.rmv('item_content') ) lambda f: f.rmv('item_content') )
time = property( time = property(
lambda f: f.time_prs(f.get_str('item_time')), lambda f: f.time_prs(f.get('item_time')),
lambda f,x: f.set_str('item_time', f.time_fmt(x)), lambda f,x: f.set('item_time', f.time_fmt(x)),
lambda f: f.rmv('item_time') ) lambda f: f.rmv('item_time') )
updated = property( updated = property(
lambda f: f.time_prs(f.get_str('item_updated')), lambda f: f.time_prs(f.get('item_updated')),
lambda f,x: f.set_str('item_updated', f.time_fmt(x)), lambda f,x: f.set('item_updated', f.time_fmt(x)),
lambda f: f.rmv('item_updated') ) lambda f: f.rmv('item_updated') )
@@ -690,6 +698,10 @@ class FeedXML(Feed, ParserXML):
def tostring(self, encoding='unicode', **k): def tostring(self, encoding='unicode', **k):
# override needed due to "getroottree" inclusion # override needed due to "getroottree" inclusion
if self.root.getprevious() is None:
self.root.addprevious(etree.PI('xml-stylesheet', 'type="text/xsl" href="/sheet.xsl"'))
return etree.tostring(self.root.getroottree(), encoding=encoding, **k) return etree.tostring(self.root.getroottree(), encoding=encoding, **k)

View File

@@ -204,7 +204,7 @@ def ItemFill(item, options, feedurl='/', fast=False):
# twitter # twitter
if urlparse(feedurl).netloc == 'twitter.com': if urlparse(feedurl).netloc == 'twitter.com':
match = lxml.html.fromstring(item.content).xpath('//a/@data-expanded-url') match = lxml.html.fromstring(item.desc).xpath('//a/@data-expanded-url')
if len(match): if len(match):
link = match[0] link = match[0]
log(link) log(link)
@@ -341,6 +341,8 @@ def FeedFetch(url, options):
else: else:
try: try:
rss = feeds.parse(xml, url, contenttype) rss = feeds.parse(xml, url, contenttype)
rss = rss.convert(feeds.FeedXML)
# contains all fields, otherwise much-needed data can be lost
except TypeError: except TypeError:
log('random page') log('random page')
@@ -435,8 +437,10 @@ def FeedFormat(rss, options):
if options.callback: if options.callback:
if re.match(r'^[a-zA-Z0-9\.]+$', options.callback) is not None: if re.match(r'^[a-zA-Z0-9\.]+$', options.callback) is not None:
return '%s(%s)' % (options.callback, rss.tojson()) return '%s(%s)' % (options.callback, rss.tojson())
else: else:
raise MorssException('Invalid callback var name') raise MorssException('Invalid callback var name')
elif options.json: elif options.json:
if options.indent: if options.indent:
return rss.tojson(encoding='UTF-8', indent=4) return rss.tojson(encoding='UTF-8', indent=4)
@@ -448,6 +452,10 @@ def FeedFormat(rss, options):
return rss.tocsv(encoding='UTF-8') return rss.tocsv(encoding='UTF-8')
elif options.reader: elif options.reader:
if options.indent:
return rss.tohtml(encoding='UTF-8', pretty_print=True)
else:
return rss.tohtml(encoding='UTF-8') return rss.tohtml(encoding='UTF-8')
else: else:

View File

@@ -2,7 +2,7 @@
<html> <html>
<head> <head>
<title>morss</title> <title>morss</title>
<meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;"> <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;" />
<meta charset="UTF-8" /> <meta charset="UTF-8" />
<style type="text/css"> <style type="text/css">
body body

122
www/sheet.xsl Normal file
View File

@@ -0,0 +1,122 @@
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>RSS feed by morss</title>
<meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0;" />
<style type="text/css">
body {
overflow-wrap: anywhere;
word-wrap: anywhere;
}
#url {
background-color: rgba(255, 165, 0, 0.25);
padding: 1% 5%;
display: inline-block;
max-width: 100%;
}
body > ul {
background-color: #FFFAF4;
padding: 1%;
max-width: 100%;
}
ul {
list-style-type: none;
}
.tag {
color: darkred;
}
.attr {
color: darksalmon;
}
.value {
color: darkblue;
}
.comment {
color: lightgrey;
}
pre {
margin: 0;
max-width: 100%;
white-space: normal;
}
</style>
</head>
<body>
<h1>RSS feed by morss</h1>
<p>Your RSS feed is <strong style="color: green">ready</strong>. You
can enter the following url in your newsreader:</p>
<div id="url"></div>
<ul>
<xsl:apply-templates/>
</ul>
<script>
document.getElementById("url").innerHTML = window.location.href;
</script>
</body>
</html>
</xsl:template>
<xsl:template match="*">
<li>
<span class="element">
&lt;
<span class="tag"><xsl:value-of select="name()"/></span>
<xsl:for-each select="@*">
<span class="attr"> <xsl:value-of select="name()"/></span>
=
"<span class="value"><xsl:value-of select="."/></span>"
</xsl:for-each>
&gt;
</span>
<xsl:if test="node()">
<ul>
<xsl:apply-templates/>
</ul>
</xsl:if>
<span class="element">
&lt;/
<span class="tag"><xsl:value-of select="name()"/></span>
&gt;
</span>
</li>
</xsl:template>
<xsl:template match="comment()">
<li>
<pre class="comment"><![CDATA[<!--]]><xsl:value-of select="."/><![CDATA[-->]]></pre>
</li>
</xsl:template>
<xsl:template match="text()">
<li>
<pre>
<xsl:value-of select="normalize-space(.)"/>
</pre>
</li>
</xsl:template>
<xsl:template match="text()[not(normalize-space())]"/>
</xsl:stylesheet>