wireservice / lookup Goto Github PK

A repository of journalist's lookup tables.

HTML 100.00%

lookup csv wireservice journalism tables agate python r

lookup's Introduction

wireservice

Name	Badges
agate
agate-charts
agate-dbf
agate-excel
agate-lookup
agate-remote
agate-sql
agate-stats
csvkit
leather
proof

lookup's People

Contributors

Stargazers

Watchers

Forkers

newsroomdev datadesk tonypapousek rotsee aaronwe dannguyen bluengreen mrsweaters radovankavicky gapdata abeusher ws-pittman amccartney isabella232 jeffreyguntzel

lookup's Issues

Correlates of War country codes

http://www.correlatesofwar.org/data-sets/cow-country-codes

Use an existing CSV schema format?

This is pretty awesome, and what I'm suggesting is possibly overkill, but I was wondering if you had considered using one of the CSV schema formats for specifying the fields in the CSV. These seem to be the two biggest ones out there:

I will admit this seems like a bit of overkill for a CSV of states, but it might be useful if you wanted to automatically validate future changes or additions with an automated test and then you get CI for your CSV. For instance, Goodtables is a validator that uses the JSON schema format (although it needs some work). CSVLint is another new entrant I haven't evaluated it but it also uses the JSON schema format (which seems like the one to consider now).

How would you feel about more state metadata, like Associated Press abbreviations?

We have a bunch of stuff like that over in latimes-statestyle that might fit here.

naics/description.2002.csv

state/usps.csv

Scope of this repository

I find the stated description of this repo "A repository of journalist's lookup tables." quite ambiguous.

What types of open data are the maintainers willing to accept?

Should we have an "overflow" repository for other open data which is beyond the scope of this repository, with a more permissive merging strategy?

state/ap.csv

iso2/country and iso3/country are not proper, unicode names

Sao Tome, for instance.

Should `columns` allow for more columns, and/or more metadata?

The documentation makes it seem as if columns will only allow for a key and value pair. But what if there's a 3-way lookup, e.g. "New York", "NY", "N.Y.", etc...I'm guessing that's alluded to here:

but is key/colname: datatype enough? Or rather, is the succinctness worth the limitation in expanding the format?

I'm thinking of Census decade-to-decade lookup tables, in which sometimes later tracts incorporate a combination of past tracts, and this complexity would seemingly be needed to state at the columns level of metadata.

Also, having a "human readable full name" attribute for each column would be nice.

Anyway, I know these aren't easy questions with non-tradeoffs...but thanks for taking charge on this!

Should FIPS code columns be a Number data type?

I say this because I often seen FIPS codes provided with leading zeros. Forcing everything to integers might be a workaround on that problem.

Dates?

I was just grabbing this month's Canadian house price index data, and of course they decided to encode their dates like this:

Date	Index
Jan-2015	167.110
Feb-2015	167.320
Mar-2015	167.830
Apr-2015	168.090
May-2015	169.750
Jun-2015	172.220
Jul-2015	174.530
Aug-2015	176.590
Sep-2015	177.760
Oct-2015	177.960
Nov-2015	178.350
Dec-2015	178.260
Jan-2016	178.010
Feb-2016	179.200

It'd be nice to be able to automatically re-encode these to ISO 8601. Would this be a good application of lookup? There's bound to be some variation in how the months are abbreviated, so I'm not entirely sure. Also, days of the month might not always be in the dataset…