Giter VIP home page Giter VIP logo

beagle's Introduction

Beagle

Beagle is software used to track changes to web resources. It reads site urls from a MongoDB database and runs a scraper (called beagleboy) to check if the sites have changed. It also looks at resources linked to by the site (in case content is being served with an iframe, swf file etc.

Installation

The recommended way to install the software is to use a virtual environment and assumes you have installed virtualenv and git:

> git clone https://github.com/tryggvib/beagle.git
> cd beagle
> virtualenv venv
> source venv/bin/activate
> pip install -r requirements.txt

Usage

Beagleboy

The scraper is a python software built on scrapy and is used like a scrapy scraper.

This assumes you're in the beagle directory (from step 2 in the installation). If you haven't activated the virtual environment (assuming you called it venv) start py activating it:

> source venv/bin/activate

Then to run beagleboy you initially have to put in your email server settings in beagleboy/beagleboy/settings.py after that it's always the same:

> cd beagleboy
> scrapy crawl webresources

Database structure

Beagleboy fetches the sites from a user collection in the MongoDB database (database name defaults to beagle). A users collection document has the following structure:

{
    _id: <email address of user, e.g. [email protected]>,
    name: <name of user, e.g. Bigtime Beagle>,
    sites: [
            {
                  url: <url of a budget page to be scraped>
                  last_modified: <date when change was last seen>
            },
           ]
}

So to add a page that should be scraped one only needs to push a document like:

{
    url: 'http://scrooge.mcduck.com'
}

to a specific users sites array. Beagleboy will pick this up and notify that particular user when a change is noticed in the url.

Since Beagleboy is built using scrapy it can use scrapyd to schedule scraping jobs with a json configuration file.

Please read the documentation on scrapyd but it's really easy. You install it. It exposes a webservice where you can schedule scraping via a curl request. This would be the curl request for beagleboy

> curl http://localhost:6800/schedule.json -d project=beagleboy -d spider=webresources

You can expose the scrapyd web server if you want but then you should definitely put in some authentication.

Hacking on Beagle

Translation process

  1. Extract messages
  2. Initialise or update translations files
  3. Translate
  4. Compile translations

The process assumes you're in the beagle directory as described in step 2 of the installation.

Extracting messages

Even though all messages are stored in beagleboy/messages.py pybabel works on directories so to extract the run the following command

> pybabel extract -F babel.cfg -o locale/beagleboy.pot .

Initialise or update translations

If you want to create a new language to translate messages into you need to initialise it with the following command (where language code is something like is_IS):

> pybabel init -D beagle -i locale/beagleboy.pot -d locale/ -l <language code>

However if you're updating a translation you don't have to initialise the language but update it with the following command (again where language code is something like en_GB):

> pybabel update -D beagle -i locale/beagleboy.pot -d locale/ -l <language code>

Translate

Translate with your favourite po file translator, e.g. poedit. The project can also be uploaded to Transifex with little effort (not supported at the moment). The po file to be translated will be available in locale//LC_MESSAGES/beagle.po

Compile translations

To compile translations (and thus make them available to the software) one just runs the following command:

> pybabel compile -D beagle -d locale/

This compiles all of the translations in one go and everybody is happy.

beagle's People

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.