Giter VIP home page Giter VIP logo

hdx-scraper-template's Introduction

Template Usage

Replace scrapername everywhere with your scraper's name eg. worldbank Replace ScraperName everywhere with your scraper's name eg. World Bank Look for xxx and ... and replace add text accordingly.

Scrapers can be installed on Jenkins and set up to run on a schedule.

Collector designed to collect ScraperName datasets from the ScraperName website and to automatically register datasets on the Humanitarian Data Exchange project.

For full scrapers following this template see: ACLED, FTS, WHO, World Bank, WorldPop

For a scraper that also creates datasets disaggregated by indicator (not just country) and reads metadata from a Google spreadsheet exported as csv, see: IDMC

Collector for ScraperName's Datasets

Build Status Coverage Status

Usage

python run.py

For the script to run, you will need to either pass in your HDX API key as a parameter or have a file called .hdx_configuration.yml in your home directory containing your HDX key eg.

hdx_key: "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
hdx_read_only: false
hdx_site: test

You will also need to pass in your user agent as a parameter or pass a parameter user_agent_config_yaml specifying where your user agent file is located. It should be of the form:

user_agent: MY_USER_AGENT

If you have many user agents, you can create a file of this form, put its location in user_agent_config_yaml and specify the lookup in user_agent_lookup:

myscraper:
    user_agent: MY_USER_AGENT
myscraper2:
    user_agent: MY_USER_AGENT2

Note for HDX scrapers: there is a universal .useragents.yml file you should use.

Alternatively, you can set up environment variables eg. for production runs: USER_AGENT, HDX_KEY, HDX_SITE, BASIC_AUTH, EXTRA_PARAMS, TEMP_DIR, LOG_FILE_ONLY

hdx-scraper-template's People

Contributors

mcarans avatar teodorescuserban avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.