Light

agtgreg / newsroom Goto Github PK

View Code? Open in Web Editor NEW

1.0 3.0 0.0 4.8 MB

Scrapes entire blogs, then summarizes, translates their articles to English and produces a list of keywords.

Python 82.87% JavaScript 5.93% HTML 10.38% CSS 0.82%

newsroom's Introduction

Newsroom

This application is intended for educational purposes only.

Scrapes entire blogs, then summarizes, translates their articles to English and produces a list of keywords. The articles are categorized by topics. The user can create topics and assign sources to each for the Newsroom to scrape.

Newsroom hides itself by changing its user agent on random intervals, then scrapes the blogs hidden behind the TOR network which changes its IP address every 10 minutes. This is done only for educational purposes, please respect the request limits of every blog.

Uses: TOR, Gensim, BeautifulSoup, Newspaper, IBM's Watson API and the myMemory API.

CONFIGURATION:

***IBM Watson API**
    Look at the settings.py for the watson settings.
    Go to [IBM Watson translator](https://www.ibm.com/cloud/watson-language-translator) to create a free account
    and get an IAM secret key.

***Tor***
    $ sudo apt-get install tor
    Then run tor:
    $ tor
    When you run the scripts if you get this error:
        requests.exceptions.InvalidSchema: Missing dependencies for SOCKS support.
    Then install SOCKS: pip install 'requests[socks]'

If you're having any trouble setting it up then Google is your best friend. I hope you find this project
insightful.

newsroom's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.