Comments (26)
Popolo doesn't define a CSV representation yet - there is RDF and JSON so far. On RDF path, I'm not sure if Linked CSV is ready. On the JSON path, it should be straight-forward to re-use JSON fields as CSV headers.
Is there a documented version of the CSV schema? The datapackage.json doesn't describe the difference between parent
and parent_key
for example. Once I understand the schema, I can propose one that uses Popolo terms.
from publicbodies.
@jpmckinney list of fields set out above and some initial suggested changes.
@markbrough your thoughts here re IATI very useful ...
from publicbodies.
I'll review more closely in a bit, but to clarify one point, you can use fields outside of those within Popolo while still being conformant: http://popoloproject.com/specs/#conformance So, if you want to keep created_at
, that should be fine (I'll actually be adding it to Popolo as it came up in the previous round of feedback).
from publicbodies.
@jpmckinney any thoughts here. I'm aiming to do a rev (and possibly finalize) this asap. I guess the big question here is CSV vs JSON (I mean for JSON we'd just take the full popolo version I think). If CSV how do we map and how do we handle things like fields with multiple possible values. Options are:
- Inline into field in a hacky way (e.g. aliases could be ; separated)
- Inline JSON into a field :-/
- Have a separate "table" joined to main table
- ...?
from publicbodies.
Sorry for delay, I'll look at this within the next day.
from publicbodies.
The "abbr" column in the CSV would be the "other_names" array in the JSON. Maybe rename "abbr" to "other_name"? Otherwise I think all the other header names conform.
CSV has the big advantage of more people being able to understand, create and use it. Is it anticipated that many fields will be multi-value? Has that come up already? How much detailed info are these lists expected to contain?
If the project is expected to maintain a fairly narrow scope with only essential/primary data, then CSV should be enough. If it's expected to expand to provide detailed info for at least some jurisdictions, then JSON is necessary.
A hybrid approach may allow people to submit CSVs (for those jurisdictions that don't (yet) have detailed info), and a script would be run to convert those CSVs to JSON. Thoughts?
Re: multi-value columns in CSV:
- Within-column separators like ";" or "|" have a small risk of causing parsing issues, and are a real headache to escape if they ever occur within one of the multiple values. Not that bad, on the whole.
- Inline JSON is worse than within-column separators, I think.
- Depending on how important the additional values are, this may be acceptable.
from publicbodies.
OK, so I think we'll go for plain CSV and see how we do. I've made another tweak to include other_names.
from publicbodies.
I don't know if a new other_names
field will be used that frequently - I was just suggesting renaming abbr
, but in retrospect I guess there's utility to picking out the shortest version of a name, e.g. for display on mobiles or other space-constrained places. Why not rename to abbreviation
, though, since no other field name is abbreviated?
For source_url
, I think it may be useful to keep. I write scrapers for public bodies, and assign the source to the page on the authoritative source's website that was scraped.
from publicbodies.
@jpmckinney all good suggestions (as usual!) - let's run with both of them. I've updated the change proposal above to reflect these.
from publicbodies.
Added founding_date and dissolution_date and image to add.
@stefanw could you clarify what contact
is used for versus address
in the de data - see http://datapipes.okfnlabs.org/csv/head%2010/html?url=https://github.com/okfn/publicbodies/raw/master/data/de.csv
from publicbodies.
FIXED.
from publicbodies.
contact
is a text field that contains phone/fax numbers, while address
contains one or more of the physical addresses of the public body.
from publicbodies.
Awesome! Where can I find docs for the schema? Is it datapackage.json
?
from publicbodies.
@stefanw wouldn't it make sense to split phone numbers into voice
, fax
, etc. instead of having an ambiguously named contact
field?
from publicbodies.
@jpmckinney this distinction comes from the German public body dataset out of FragDenStaat.de. The fields were modeled after the original federal data source which was not structured enough to make an easy distinction between voice/fax. Surely this can be inferred from prefixes ("Tel.", "Fax:" etc.). The contact data was never needed, we were only after emails.
This should in no way dictate the structure of an ideal dataset.
from publicbodies.
@jpmckinney for docs of schema see https://github.com/okfn/publicbodies#data which links to http://data.okfn.org/community/okfn/publicbodies (that is nicer than looking at the datapackage.json)
from publicbodies.
@stefanw so could i drop contact field in de dataset in favour of address and email (already in the dataset)?
from publicbodies.
Depends on what you want the publicbodies dataset to contain, I don't mind either. I could also parse out voice/fax if it helps, should be an easy regex.
from publicbodies.
+1 for specific voice / fax etc. fields, with the possibility to have several per line.
from publicbodies.
+1 for specific voice/fax fields.
contact
, as suggested by @stefanw is not appropriate for phone numbers. According to the reference and to popolo is for an address where to send letters to.
from publicbodies.
@augusto-herrmann I did not suggest anything, I merely answered the question and explained the existing fields. Popolo supports many types of contact info (postal address, email, phone, fax etc.) under "contact_details".
from publicbodies.
I'm very happy for a new set of fields to go in: @augusto-herrmann could you distill a core set of changes with descriptor of the fields and we'll review. Also very much welcome input form @jpmckinney here so we keep aligned with popolo on this.
from publicbodies.
I'll be happy to review any proposed changes to the schema, just @-mention me in any new issues.
from publicbodies.
@rgrp, the link http://data.okfn.org/community/okfn/publicbodies (also referenced in the README) has since become broken. Has the schema documentation been moved somewhere else? If so, it would be nice to have a redirect.
from publicbodies.
@augusto-herrmann that's a bug in data.okfn.org which is getting fixed now.
from publicbodies.
@augusto-herrmann ok - the issue was that the data package is actually named public-bodies whilst repo is named publicbodies so redirect was not working correctly. Now fixed.
from publicbodies.
Related Issues (20)
- Nepal: add import scripts and schedule HOT 1
- Bot trying to update 2 data sources simultaneously creates conflict
- Fix infinite redirect and get site back online HOT 9
- Keep running update process even if step fails
- Convert website to Jekyll + Github Pages + Github Actions
- Implement retry in data import scripts
- Automatic parallel updates conflict with each other
- Rename `master` branch to `main` HOT 1
- Greece: values in `id` field are not sluggable
- Github Pages build takes too long and times out
- Data update scripts are still using the master branch HOT 1
- CSV download button still points to `master` branch
- Github Pages default Jekyll deploy does not render some pages properly
- Commit, push & rebase GH Action not working on `main` branch
- Site gives 404 error HOT 3
- Replace broken URL for dados.gov.br CKAN API HOT 2
- Add Switzerland to the list
- `import_br.py` works locally, but fails in Github Actions HOT 2
- Upgrade Frictionless
- Upgrade deprecated Github Action scripts
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from publicbodies.