Giter VIP home page Giter VIP logo

ttrss_plugin-af_feedmod's People

Contributors

dkopitsa avatar ldidry avatar mbirth avatar rangerer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ttrss_plugin-af_feedmod's Issues

German Umlaute are broken

I added this plugin to my installation of TTRSS and enabled "heise security" (http://www.heise.de/security/news/news-atom.xml) as test. Now the german Umlaute inside content are broken.

Apache is running UTF-8

Title of article is OK

Database is also UTF8

Example: "Draußen meldet sich langsam der Frühling zurück, drinnen rufen Adobe und Microsoft zum Frühjahrsputz des Rechners auf."

UTF-8 problem

I'm having an issue with UTF-8 encoding on http://bankier.pl/ - the feed is at http://feeds.feedburner.com/bankier-wiadomosci-dnia

The XPath expression I'm using is:

"bankier.pl": {
 "type": "xpath",
 "xpath": "div[@id='articleContent']"
},

and the resulting article is encoded like this:

Raport wydano przy cenie 43 zł, a w poniedziałek na zamknięciu akcje PCM kosztowały 42,98 zł. (PAP)

The original article I used in this example is http://www.bankier.pl/wiadomosc/BESI-rozpoczal-wydawanie-rekomendacji-dla-PCM-od-kupuj-z-cena-docelowa-55-zl-3147285.html

Modify after filtering

TT applies plugins before its built-in filters, so when using feedmod you cannot filter articles based on text outside the main content, like a category heading.

It'd be nice if it was the other way around...

Modify article content

First of all, thanks for an excellent plugin! It is the reason I'm now using tt-rss :)

I do have a suggestion though; a way to modify the article content after it's been fetched. This would allow for all kind of nice things, but the problem I have a the moment is a site that uses scheme-less img hrefs, which aren't handled properly by the feed reader I'm using.

I took a stab at implementing a fix for this particular issue, something like this seems to work very well (to be placed right after the cleanup code):

$nodelist = $basenode->getElementsByTagName("img");
foreach($nodelist as $node) {
    $imgsrc = $node->getAttribute("src");
    if (substr($imgsrc, 0, 2) == '//') {
        $node->setAttribute("src", "http:" . $imgsrc);
    }
}

Feeds mit Updates

Es kommt bei einigen Seiten vor, dass ein Artikel erneut im RSS-Feed auftaucht weil es ein Update des Artikels gegeben hat. Leider wird aber der Artikelinhalt nicht erneut geladen von af_feedmod.

Beispiel Tagesspiegel:
Alte URL (bzw. offenbar auch bleibende Feed-URL): http://www.tagesspiegel.de/berlin/polizei-justiz/kaputte-gasleitung-in-berlin-mitte-s-bahnverkehr-lahm-gelegt/9458990.html

URL wenn man den Link im Browser öffnet: http://www.tagesspiegel.de/berlin/polizei-justiz/zwischen-friedrichstrasse-und-alexanderplatz-nach-vollsperrung-bahnverkehr-in-mitte-rollt-wieder/9458990.html

Warning about feedmod during update_daemon2.php

I have feedmod installed but not configured yet. I was watching update_daemon2.php run and noticed this warning:

Warning: Invalid argument supplied for foreach() in /home/.../plugins/af_feedmod/init.php on line 55

My environ is shared hosting through Dreamhost. All of my .php files are vanilla.

Evolution of feedmod

Hi mbirth,

I've played around with feedmod for a while and adding new features in my fork, restructured the code and so on. Now I made a new repository which contains all the changes. It seems to me to be a bit evolutionary to make pull requests. I hope this is ok for you. If not please contact me and I'll delete it.

The repository is located here:
https://github.com/m42e/ttrss_plugin-feediron

Thank you @mbirth

Select /html/body/

Any way to select an entire body of a page? I'm working on one that has no div or classes or much of anything except text wrapped in a body tag.

Problem Golem.de

Ich bekomme es nicht mehr hin, dass mir die Volltexte in ttrss gespeichert werden.
Egal welche Einstellung ich nehme, kein Text mehr seit dem Umzug aufs eigene NAS und damit verbundenen Update auf v1.11.
Bei v1.7.9 funktionierte das noch.

auch in Kombinationen funktioniert weder
"rss.feedsportal.com" noch "golem0Bde0C"
für "xpath"
"article" or "div[@Class='g g4 g-ie6']"}
et cetera...
ratlos.

German Umlaut not properly displayed

I love your tt-rss plugin and while it works for me most of the time without any issues, there are some sites where the German Umlaut is not properly displayed (i.e. iphoneblog.de). My config for this blog is as follows,

"iphoneblog.de": {
"type": "xpath",
"xpath": "div[@Class='beitragstext']"
}

I already tried the "force_charset": "utf-8" option but this does not work either. A post where you can see the wrong encoding is the article from the 10th of November ("Besserer Dateitausch ...") where the Öffnen-In is not correctly encoded.

I would deeply appreciate your help on this issue.

Best regards

Andy

Feature Request: Regex replacements

Feed http://appleinsider.com/appleinsider.rss pointing to article content on the main site have these anti image thief mechanism, where images are replaced by 1x1 pixel.

<div class="article-img"><img src="http://photos.appleinsider.com/v9/images/1x1-white.jpg" width="660" height="362" alt="only cost matters" class="lazy" data-original="http://cdn1.appleinsider.com/JDPower203113.png"><noscript><img src="http://cdn1.appleinsider.com/JDPower203113.png"></noscript></div>

Is it possible to run a regex replace on the content?

$articlebody=~s/<div class="article-img"><img src=".+?" (.+?) class=".+?" data-original="(.+?)"><noscript><img[^>]+><\/noscript><\/div>/<img $1 src="$2">/g;

suggestion: URL_REWRITE Type

Currently I try to make a good xpath extract for a local newspaper website, but their style has pretty many unnecessary stuff inside and no single div tag or something for the pure article text.

My suggestion for cases like this would be some url rewrite feature to fetch the print version instead of the normal article version.
A Simple regex rewrite for the url and it could fetch a very slime and clean version of the article.

the guardian feed

i seem to be unable to pull the guardian's full article.
example:
feed: http://www.theguardian.com/world/rss
article: http://www.theguardian.com/world/2014/jul/30/wikileaks-australia-super-injunction-bribery-allegations

this is the xpath needed:
/html[@id='js-context']/body[@id='top']/div[@Class='l-side-margins l-side-margins--layout-content']/article[@id='article']/div[@Class='gs-container']/div[@Class='content__main-column content__main-column--article']/div[@Class='from-content-api js-article__body']

but even using this
"theguardian": {
"type": "xpath",
"xpath": "div[@Class='gs-container']"
},
doesn't pull anything from their website.
Any idea what i'm doing wrong?

error: Invalid JSON!

I get an "error: Invalid JSON!", even when http://jsonlint.com/ tells me everything is correct. This also happens with the examples on here. What am I doing wrong?
Thanks for any advice!

Trim article (read more option)

Hi,

really useful and nice plugin - helps me a lot :D

Now I've a big wish: an option to trim / shorten the parsed article to an specific length of characters (like 200) and after these 200 characters there could be a text like "read more", which opens the full article (in the same frame/div).

Greetings
Marco

PS: nochmal auf Deutsch, da ich (in Anbetracht der vordefinierten Seiten davon ausgehe, dass Du dies lesen kannst ;)

Ich würde mir wünschen, dass es eine Option gibt, den (aus der Seite heraus gelesenen) Text auf eine definierte Länge zu kürzen und mittels einem "mehr lesen" Link dann vollständig im gleichen Fenster anzeigen zu lassen - wäre super, wenn Du das Implementieren könntest :D

Viele Grüße
Marco

Call to undefined method PluginHost::getInstance()

Hi!

After cloning the plugin on a default debian installation i get following error.
Any idea?

Thanks.


[Sat Jun 22 02:17:37 2013] [error] [client 91.119.71.125] PHP Fatal error: Call to undefined method PluginHost::getInstance() in /usr/share/tt-rss/www/plugins/af_feedmod/init.php on line 162, referer: https://opossum.htu.tuwien.ac.at/tt-rss/prefs.php

tt-rss package:

apt-cache showpkg tt-rss
Package: tt-rss
Versions:
1.7.8+dfsg-2 (/var/lib/apt/lists/gd.tuwien.ac.at_opsys_linux_debian_dists_testing_main_binary-amd64_Packages) (/var/lib/dpkg/status)
Description Language:
File: /var/lib/apt/lists/gd.tuwien.ac.at_opsys_linux_debian_dists_testing_main_binary-amd64_Packages
MD5: 02bd340a64d29c6b17e906e3b16d5f62
Description Language: en
File: /var/lib/apt/lists/gd.tuwien.ac.at_opsys_linux_debian_dists_testing_main_i18n_Translation-en
MD5: 02bd340a64d29c6b17e906e3b16d5f62

Reverse Depends:
Dependencies:
1.7.8+dfsg-2 - debconf (18 0.5) debconf-2.0 (0 (null)) dbconfig-common (0 (null)) libjs-dojo-core (2 1.5.0) libjs-dojo-dijit (2 1.5.0) libjs-scriptaculous (0 (null)) libphp-phpmailer (0 (null)) libphp-simplepie (0 (null)) php-gettext (0 (null)) libapache2-mod-php5 (18 5.3.0) php5-cgi (18 5.3.0) php5 (2 5.3.0) php5-cli (0 (null)) php5-mysql (16 (null)) php5-pgsql (0 (null)) phpqrcode (0 (null)) mysql-server (16 (null)) postgresql (0 (null)) mysql-client (16 (null)) postgresql-client (0 (null)) sphinxsearch (0 (null)) php-apc (0 (null)) apache2 (16 (null)) lighttpd (16 (null)) httpd (0 (null)) php5-gd (0 (null))
Provides:
1.7.8+dfsg-2 -
Reverse Provides:

Error after installation in settings dialog

Hi,

first of all thanks for the nice plugin! Should come in handy now as Google Reader shuts down...

But I have a problem with it: After installing the plugin as instructed in the README, I only get a very generic error message in the settings dialog: "Es ist ein Fehler aufgetreten."

Any clues what causes this? Can I somehow debug it?

Use "readability" to auto-select article body

The "readability" library can extract the article content of a html page. With that, the configuration file would no longer be needed.

More info about readability: it was at first a js lib, which was then turned into a proprietary service (with an API). However there are now a lot of open source ports of the original js library to other languages, including PHP.

php lib: http://code.fivefilters.org/php-readability (or just google "readability php")

Maybe using this lib in this plugin could help :)

Regards

cookies

After some frustration it appears the site I'm trying to get content from the new england journal of medicine (e.g. http://www.nejm.org/doi/full/10.1056/NEJMp1306065) won't talk to browsers at all unless they go accept cookies - which seems to pose a problem. Even if following a link to a specific page you have to go through 2 redirects and only pass if cookies are accepted.

It seems trivial to fix if using curl to do the downloading, just adding two lines. But the function within tt-rss doesn't currently.

Not accepting cookies might not affect anyone else - but was quite tricky to find out this was the problem & may be the reason that other people's seemingly correct xpaths don't work.

Using more than one element

There are articles out there that have two parts, a short and a detailed version, but the detailed version has some important context missing. I tried to get the content with the following:

"xpath": [ "div[@id='artdetail_short']", "div[@id='artdetail_text']" ]

This only extracts the short version of the article. Is there a way to get two parts? I looked through your examples and this did not seem to come up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.