Giter VIP home page Giter VIP logo

dap-scrapers's People

Contributors

mcarans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dap-scrapers's Issues

Non-violent incidents counted in PVX040

You guys may have handled this already, but just in case:

There are 3 incident types in ACLED that are explicitly non-violent (Non-Violent Conflict Event, Non-Violent Transfer of Location Control, and Headquarters or Base
Establishment), we should be excluding those from the total calculated for PVX040.

copying @ochastats (Javier) for reference.

PVX040 has is_number 0, should be 1

Just a minor issue I stumbled upon while looking at recently scraped data. This indicator records ACLED's count of incidents per year, yet is marked as non-numeric.

I've written a little script to check other indicators, and PVX040 is the only one with this mismatch (my script also turns up CG060, which could also be considered numeric except it's a "code" with leading zeros).

faosec: format change

faosec:
File "faosec.py", line 50, in
do_file()
File "faosec.py", line 35, in do_file
v12 = mts['V12']
File "/home/lib/messytables/messytables/core.py", line 157, in getitem
raise KeyError("No RowSet called '%s'" % name)
KeyError: "No RowSet called 'V12'"

it's now "V7.1" suggesting major format changes.
title = "Prevalence of Undernourishment" - on row 1.

Zeros seem to be excluded from scraped data

It seems that values reported as "0" are excluded from ScraperWiki data, although they may be present in source data.

I did an analysis of the minimum value of the various numeric indicators, and many indicators have a minimum of "1", which seems suspect.

See the "min value" tab of this spreadsheet:
https://docs.google.com/spreadsheet/ccc?key=0AgxtRla5zLd_dDJpWGwzRldCMGRFaFZXVWl3eXE3NXc&usp=sharing#gid=1

which was built from the data in the CSV link in

https://github.com/OCHA-DAP/ProjectWiki/wiki/ScraperWiki-Download-Links

on 2014-01-18.

Looking over the source data for some of these (e.g. EM-DAT) it does seem that zeros are likely "filled in" for missing data at the source level, but I think that inclusion of those zeros is desired.

The page you are searching for may have been moved

I have problems trying to reproduce these two series:

Impact of natural disasters: number of deaths
Impact of natural disasters: population affected (average per year/million)

the link seems to be removed. Any advice?

Examine errors

m49.py
EXIT: 0
acled.py
^TEXIT: 0
echo.py
EXIT: 0
emdat.py
EXIT: 1
esa.py
EXIT: 0
faosec.py
EXIT: 1
faostat.py
EXIT: 0
hdr-disaster.py
EXIT: 0
hdrstats.py
EXIT: 0
mdg.py
EXIT: 0
unicef.py
EXIT: 0
unterm.py
EXIT: 1
weather.py
EXIT: 0
who-athena.py
EXIT: 2
who-athena2.py
EXIT: 2
wikipedia.py
EXIT: 0
worldbank-lendinggroups.py
EXIT: 0
worldbank.py
EXIT: 0
worldaerodata.py
EXIT: 0

Requirements not pinned

This may well make running scrapers difficult to reproduce.

For reference, current state of the ScraperWiki box is:

Mako==1.0.0
MarkupSafe==0.23
PyHamcrest==1.8.0
PyYAML==3.10
SQLAlchemy==0.8.3
Tempita==0.5.1
Unidecode==0.04.14
alembic==0.6.5
chardet==2.1.1
-e git+https://github.com/pudo/dataset@9a91f3d1139a022b8c29f7c4215f6500b9e39b75#egg=dataset-master
decorator==3.4.0
json-table-schema==0.1
lxml==3.2.4
-e git+https://github.com/scraperwiki/messytables@d7b24c85a6216603a2b49a28a857397606f68c1e#egg=messytables-master
nose==1.3.0
pbr==0.5.23
python-dateutil==1.5
python-magic==0.4.3
python-slugify==0.0.6
requests==2.0.1
requests-cache==0.4.4
-e git+https://github.com/scraperwiki/scrumble@45cbf773ff7a3710493f63c82212cbba31c65bcd#egg=scrumble-master
sqlalchemy-migrate==0.8.2
wsgiref==0.1.2
xlrd==0.9.2
-e git+https://github.com/scraperwiki/xypath@b73e47b30e55d8683f3d7656b4063c46c33f1501#egg=xypath-master

This includes the dependencies of dependencies.

Note that the messytables commit listed above doesn't seem to exist anymore. (Neither in the scraperwiki repo or the upstream okfn one either.) Also note that the requirements.txt in the repo has various dependencies just set to pull from GitHub master; would be better if these are pinned.

PVX040 disappeared

As of 2014-02-18, I see it in the indicators table but not in the values table?

It was there in data I downloaded on 2014-01-28, and at first glance I can't find any obvious commits or github issues that would point to its removal.

I see a few other indicators disappeared also during this time, e.g. PVX060 is also gone but it has also disappeared from the indicators table, so maybe that was intentional?

Routine EPI vaccines financed by government

Cyprus was ommited in this series. This country is included in the one-country data series _% of routine EPI vaccines financed by government. Once the merge is done. _% of routine EPI vaccines financed by government can be removed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.