Giter VIP home page Giter VIP logo

Comments (11)

bfly75 avatar bfly75 commented on July 25, 2024

I expect this can be done using a more extensively defined xpath query. Below some examples (not N24.de), which might be useful. Today is a slow news day, so I don't know yet whether tt-rss works well with these queries. Based on xpath validators, they should.

edit: ahh, unfortunately this does not seem to work with your code. So far you only use the first entry from the query, instead of adding all of them to the article text.

Selecting several specific divs / tags:
//h1 | //h2 | //h3
//div[@id='artikelKolom']/div[@Class='zaktxt clear']/div[@Class='zak_normal'] | //div[@id='artikelKolom']/p
Note: sequence matters when doing it like this! //h1 | //h2 | //h3 will show first all h1's, followed by all h2's and then all h3's
//div[@id='artikelKolom']/*[contains(@Class,'zaktxt') or name()='p']
Note: sequence does not seem to matter, sequence is based on sequence in file

Select all div's with certain classes. No need for the div's to have the same parent
//div[@Class='content illustrated' or @Class='post-body']
//div[contains(@Class,'illustration top')] | //div[contains(@Class,'post-body')]
//div[contains(@Class,'illustration top') or contains(@Class,'post-body')]
Note: not sure whether sequence matters

Select all children from div id='artikelKolom', except children with div class='broodtxt' or div class='bannercenter ...'
//div[@id='artikelKolom']/[@Class!='broodtxt']
//div[@id='artikelKolom']/
[not(@Class='broodtxt')]
//div[@id='artikelKolom']/[not(contains(@Class, 'broodtxt'))]
//div[@id='artikelKolom']/
[not(contains(@Class, 'broodtxt')) and not(contains(@Class, 'bannercenter'))]

from ttrss_plugin-af_feedmod.

mbirth avatar mbirth commented on July 25, 2024

I think it'll get too complicated if you need to "puzzle" the result together like this. Also it'll get worse when the source changes its layout (like N24 did some days ago).

Maybe I'll implement a blacklist which will remove certain XPath elements from the result. I think this is more robust.

from ttrss_plugin-af_feedmod.

Kasad avatar Kasad commented on July 25, 2024

A blacklist would be realy nice :D
Also I've a big problem with welt.de ... their feed url links to an overview page... there should be an rewrite of the sourceurl like:
http://www.welt.de/?config=articleidfromurl&artid=115415142
should be
http://www.welt.de/article115415142

Would be phantastic to see this features :D

from ttrss_plugin-af_feedmod.

Kasad avatar Kasad commented on July 25, 2024

Hi,

is there a way to use all entrys from the query, instead of adding only the first to the article text?

div[@Class='news-single-item']/p ==> only returns the first found p content

div[@id='news-single-item']/*[not(div[@Class='comments'])] ==> doesn't work :(

Thank you for your answer.

Kasad

from ttrss_plugin-af_feedmod.

bfly75 avatar bfly75 commented on July 25, 2024

Yes, but you need to make some changes to the init.php file. I did this
last weekend and this week it seems to work as expected. See
https://github.com/bfly75/ttrss_plugin-af_feedmod.

On Sun, Apr 21, 2013 at 12:29 PM, Kasad [email protected] wrote:

Hi,

is there a way to use all entrys from the query, instead of adding only
the first to the article text?

div[@Class https://github.com/class='news-single-item']/p ==> only
returns the first found p content

div[@id https://github.com/id='news-single-item']/*[not(div[@classhttps://github.com/class='comments'])]
==> doesn't work :(

Thank you for your answer.

Kasad


Reply to this email directly or view it on GitHubhttps://github.com//issues/2#issuecomment-16719473
.

Ronald Capel
Wilhelminaplein 127, 4201 GW Gorinchem, The Netherlands
(maphttp://maps.google.nl/maps?f=q&source=s_q&hl=en&geocode=&q=Wilhelminaplein+127,+Gorinchem&aq=0&sll=52.27488,5.515137&sspn=3.97308,9.876709&ie=UTF8&hq=&hnear=Wilhelminaplein+127,+Gorinchem,+Zuid-Holland&ll=51.827477,4.973845&spn=0.007838,0.01929&t=h&z=16
|park http://www.ronaldcapel.nl/prive/parkeren)
Mob: +31-(0)6-55836128 Email: [email protected]

from ttrss_plugin-af_feedmod.

Kasad avatar Kasad commented on July 25, 2024

Wow, thank you very much - this works awesome :D

from ttrss_plugin-af_feedmod.

uusijani avatar uusijani commented on July 25, 2024

I think post-processing should also rip out (at least) id, class and style attributes from the content. Some pages I fetch using feedmod have elements with ids such as "overlay" in them that pick up tt-rss's styling, making things look wonky.

from ttrss_plugin-af_feedmod.

tbar avatar tbar commented on July 25, 2024

@bfly75: Thanks for that modification!
@mbirth: You should consider incorporating bfly75's modification. Maybe by creating a new type (eg. xpath-all-matches).

from ttrss_plugin-af_feedmod.

mbirth avatar mbirth commented on July 25, 2024

I just merged changes from @rangerer which add a new "cleanup" option to remove unwanted parts from the main XPath node. He also has provided a lot of examples.

from ttrss_plugin-af_feedmod.

mbirth avatar mbirth commented on July 25, 2024

Another thing this one should do: Make all URLs absolute (i.e. fully qualified including "http://www.example.org/) because like in #22, relative images are not shown.

from ttrss_plugin-af_feedmod.

Kasad avatar Kasad commented on July 25, 2024

Hi,

after my ttrss crashed I couldn't use the version of bfly75 any longer. Could you please add his way to display more than one div?

Greetings
K

from ttrss_plugin-af_feedmod.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.