Giter VIP home page Giter VIP logo

newscatcher's People

Contributors

dorafmon avatar dwardu89 avatar kotartemiy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

newscatcher's Issues

Error message

Am I doing something wrong to receive this error?

newscatcher error

Article body text

Hi

I'm wondering if you plan to provide article body text in the future. Looking at a handful of publishers, I wasn't able to find it in their news collections.

Exception raised on 'news.ycombinator.com'

Cool project, everything seems to work as intended, but for some reason trying to use news.ycombinator.com as the news_source raises Exception: check internet connection / website is not supported.

It's odd, because other sources with subdomains work without issue, such as wired.co.uk.

[QUESTION] Rate Limiting?

When attempting to retrieve large number of items, is there any type of rate limiting or request cool down?

Cannot pull data.

I am unable to get news feed. My internet connection is working fine. Find below my requests.
Python version 3.7.3

from newscatcher import Newscatcher
nc = Newscatcher(website = 'nytimes.com')
results = nc.get_news()

No results found check internet connection or query parameters

An additional sample request to check my internet connection.

import requests
r = requests.get('https://github.com/timeline.json')
r.json()
{'message': 'Hello there, wayfaring stranger. If you’re reading this then you probably didn’t see our blog post a couple of years back announcing that this API would go away: http://git.io/17AROg Fear not, you should be able to get what you need from the shiny new Events API instead.', 'documentation_url': 'https://developer.github.com/v3/activity/events/#list-public-events'}

RSS feeds

Would it be possible to store the URLs in a compressed flat file? These SQL things are just convoluted and hard to work with. I would like to just open the list of feeds and see what you have, but it takes like 20 steps to get inside...

Not compatible with Python 3.9

Looks like the dependencies in this project need to be updated. Feedparser was patched for 3.9 but the patch hasn't made it into here yet so it throws a base64 error.

Hacker News not working

I can't seem to get YC or YC news to work, even though the former is in urls and the latter is on the frontpage readme. Thank you for this amazing open source project!

from newscatcher import Newscatcher
from newscatcher import urls
url = 'news.ycombinator.com'
url2 = 'ycombinator.com'
eng_lnx = urls(language='en')
nc = Newscatcher(website=url)
try:
    print("looking for " + url + "...")
    nc.get_news()
except Exception as e:
    print(repr(e))
describe_url(url)
print(url + ' in urls: ' + str(url in eng_lnx))
print(url2 + ' in urls: ' + str(url2 in eng_lnx))
nc2 = Newscatcher(website='ycombinator.com')
try:
    print("looking for " + url2 + "...")
    nc2.get_news()
except Exception as e:
    print(repr(e))

Support various tld for google news in database

It would be helpful to support some local url of google news, e.g. news.google.com.uk, news.google.com.au

Full list of countries TLD is here though I am not sure if all countries have google news tld.

.ac
.ad
.ae
.af
.ag
.ai
.al
.am
.ao
.aq
.ar
.as
.at
.au
.aw
.ax
.az
.ba
.bb
.bd
.be
.bf
.bg
.bh
.bi
.bj
.bm
.bn
.bo
.br
.bs
.bt
.bw
.by
.bz
.ca
.cc
.cd
.cf
.cg
.ch
.ci
.ck
.cl
.cm
.cn
.co
.cr
.cu
.cv
.cw
.cx
.cy
.cz
.de
.dj
.dk
.dm
.do
.dz
.ec
.ee
.eg
.er
.es
.et
.eu
.fi
.fj
.fk
.fm
.fo
.fr
.ga
.gd
.ge
.gf
.gg
.gh
.gi
.gl
.gm
.gn
.gp
.gq
.gr
.gs
.gt
.gu
.gw
.gy
.hk
.hm
.hn
.hr
.ht
.hu
.id
.ie
.il
.im
.in
.io
.iq
.ir
.is
.it
.je
.jm
.jo
.jp
.ke
.kg
.kh
.ki
.km
.kn
.kp
.kr
.kw
.ky
.kz
.la
.lb
.lc
.li
.lk
.lr
.ls
.lt
.lu
.lv
.ly
.ma
.mc
.md
.me
.mg
.mh
.mk
.ml
.mm
.mn
.mo
.mp
.mq
.mr
.ms
.mt
.mu
.mv
.mw
.mx
.my
.mz
.na
.nc
.ne
.nf
.ng
.ni
.nl
.no
.np
.nr
.nu
.nz
.om
.pa
.pe
.pf
.pg
.ph
.pk
.pl
.pm
.pn
.pr
.ps
.pt
.pw
.py
.qa
.re
.ro
.rs
.ru
.rw
.sa
.sb
.sc
.sd
.se
.sg
.sh
.si
.sk
.sl
.sm
.sn
.so
.sr
.ss
.st
.su
.sv
.sx
.sy
.sz
.tc
.td
.tf
.tg
.th
.tj
.tk
.tl
.tm
.tn
.to
.tr
.tt
.tv
.tw
.tz
.ua
.ug
.uk
.us
.uy
.uz
.va
.vc
.ve
.vg
.vi
.vn
.vu
.wf
.ws
.ye
.yt
.za
.zm
.zw

Offer language option

Since some news sources have different language options (e.g., spiegel.de), there should be an option to choose a language.

Keeping addresses of RSS feeds up-to-date

Thanks for this great package and for the big collection of RSS feeds for so many news sites.

But how and when did you collect the addresses? The two sites from Germany I tried both had problems: one linked to a broken website (the catching did not work at all), the other one is not the feed you want (some outdated podcast feed). Perhaps it is a good idea to have native speakers review the corresponding feeds, I would volunteer for the german ones

The correct addresses are:

This leads to a second issue: Using a sqlite database might be convenient, but is not so practical to be tracked in git, as mentioned in another Issue. Therefore, I could not contribute with a Pull Request.

Date available ?

This is awesome, but if it would have scraped the date along with the data then this would make even more sense. Is it there which i missed it ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.