Giter VIP home page Giter VIP logo

nyan's Introduction

NYAN

NYAN is a news filtering engine written in Python and some Ruby.

It was mainly written for my master's thesis. You can find the details here.

The filter is made up of several programs which pass messages to each other over STOMP. I used CoilMQ as a broker.

The feed_crawler crawls news sites for news articles. The feature_extractor converts news articles to a gensim feature model. The article_ranker ranks incoming news articles according to a learned user model. The user_model_trainer captures the user's interests in a user model which is used by the article_ranker.

The programs can be easily used on their own.

The frontend is good for fast prototyping.

You should note that most code was written for academic purposes and was never used for commercially. Saving complete news articles might not be legal.

How to setup and usage

Basically, the crawler, feature extractor, ranker have to be started. They use a config file to connect to the STOMP broker. You should read the corresponding chapter in my thesis paper to get the whole setup.

The front end can be run on Apache with FastCGU. You can find a German how to here.

I used daemontools to make each program a daemon. Some can be run a a daemon without daemontools. However, I don't recommend that.

For model training see shared_models/learn_on_articles.

Dependencies and Requirements

The system uses a MongoDB database.

All programs depend on several libraries. The following list might not be complete.

feed_crawler

  • nokogiri
  • feedzirra
  • log4r
  • psych
  • dbi
  • stomp

article_ranker, feature_extractor

  • yaml
  • gensim
  • numpy, scipy
  • stomp.py
  • Scitkit-learn

frontend

  • flask
  • flask-login plugin
  • mongoengine

Licensing

Most code is licensed under MIT License.

Some code is taken from other libraries with different licenses. Such cases are marked.

nyan's People

Contributors

jeschkies avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.