Giter VIP home page Giter VIP logo

Comments (26)

jpmckinney avatar jpmckinney commented on June 9, 2024

Popolo doesn't define a CSV representation yet - there is RDF and JSON so far. On RDF path, I'm not sure if Linked CSV is ready. On the JSON path, it should be straight-forward to re-use JSON fields as CSV headers.

Is there a documented version of the CSV schema? The datapackage.json doesn't describe the difference between parent and parent_key for example. Once I understand the schema, I can propose one that uses Popolo terms.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@jpmckinney list of fields set out above and some initial suggested changes.

@markbrough your thoughts here re IATI very useful ...

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

I'll review more closely in a bit, but to clarify one point, you can use fields outside of those within Popolo while still being conformant: http://popoloproject.com/specs/#conformance So, if you want to keep created_at, that should be fine (I'll actually be adding it to Popolo as it came up in the previous round of feedback).

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@jpmckinney any thoughts here. I'm aiming to do a rev (and possibly finalize) this asap. I guess the big question here is CSV vs JSON (I mean for JSON we'd just take the full popolo version I think). If CSV how do we map and how do we handle things like fields with multiple possible values. Options are:

  • Inline into field in a hacky way (e.g. aliases could be ; separated)
  • Inline JSON into a field :-/
  • Have a separate "table" joined to main table
  • ...?

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

Sorry for delay, I'll look at this within the next day.

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

The "abbr" column in the CSV would be the "other_names" array in the JSON. Maybe rename "abbr" to "other_name"? Otherwise I think all the other header names conform.

CSV has the big advantage of more people being able to understand, create and use it. Is it anticipated that many fields will be multi-value? Has that come up already? How much detailed info are these lists expected to contain?

If the project is expected to maintain a fairly narrow scope with only essential/primary data, then CSV should be enough. If it's expected to expand to provide detailed info for at least some jurisdictions, then JSON is necessary.

A hybrid approach may allow people to submit CSVs (for those jurisdictions that don't (yet) have detailed info), and a script would be run to convert those CSVs to JSON. Thoughts?

Re: multi-value columns in CSV:

  1. Within-column separators like ";" or "|" have a small risk of causing parsing issues, and are a real headache to escape if they ever occur within one of the multiple values. Not that bad, on the whole.
  2. Inline JSON is worse than within-column separators, I think.
  3. Depending on how important the additional values are, this may be acceptable.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

OK, so I think we'll go for plain CSV and see how we do. I've made another tweak to include other_names.

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

I don't know if a new other_names field will be used that frequently - I was just suggesting renaming abbr, but in retrospect I guess there's utility to picking out the shortest version of a name, e.g. for display on mobiles or other space-constrained places. Why not rename to abbreviation, though, since no other field name is abbreviated?

For source_url, I think it may be useful to keep. I write scrapers for public bodies, and assign the source to the page on the authoritative source's website that was scraped.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@jpmckinney all good suggestions (as usual!) - let's run with both of them. I've updated the change proposal above to reflect these.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

Added founding_date and dissolution_date and image to add.

@stefanw could you clarify what contact is used for versus address in the de data - see http://datapipes.okfnlabs.org/csv/head%2010/html?url=https://github.com/okfn/publicbodies/raw/master/data/de.csv

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

FIXED.

from publicbodies.

stefanw avatar stefanw commented on June 9, 2024

contact is a text field that contains phone/fax numbers, while address contains one or more of the physical addresses of the public body.

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

Awesome! Where can I find docs for the schema? Is it datapackage.json?

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

@stefanw wouldn't it make sense to split phone numbers into voice, fax, etc. instead of having an ambiguously named contact field?

from publicbodies.

stefanw avatar stefanw commented on June 9, 2024

@jpmckinney this distinction comes from the German public body dataset out of FragDenStaat.de. The fields were modeled after the original federal data source which was not structured enough to make an easy distinction between voice/fax. Surely this can be inferred from prefixes ("Tel.", "Fax:" etc.). The contact data was never needed, we were only after emails.

This should in no way dictate the structure of an ideal dataset.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@jpmckinney for docs of schema see https://github.com/okfn/publicbodies#data which links to http://data.okfn.org/community/okfn/publicbodies (that is nicer than looking at the datapackage.json)

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@stefanw so could i drop contact field in de dataset in favour of address and email (already in the dataset)?

from publicbodies.

stefanw avatar stefanw commented on June 9, 2024

Depends on what you want the publicbodies dataset to contain, I don't mind either. I could also parse out voice/fax if it helps, should be an easy regex.

from publicbodies.

hannesgassert avatar hannesgassert commented on June 9, 2024

+1 for specific voice / fax etc. fields, with the possibility to have several per line.

from publicbodies.

augusto-herrmann avatar augusto-herrmann commented on June 9, 2024

+1 for specific voice/fax fields.
contact, as suggested by @stefanw is not appropriate for phone numbers. According to the reference and to popolo is for an address where to send letters to.

from publicbodies.

stefanw avatar stefanw commented on June 9, 2024

@augusto-herrmann I did not suggest anything, I merely answered the question and explained the existing fields. Popolo supports many types of contact info (postal address, email, phone, fax etc.) under "contact_details".

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

I'm very happy for a new set of fields to go in: @augusto-herrmann could you distill a core set of changes with descriptor of the fields and we'll review. Also very much welcome input form @jpmckinney here so we keep aligned with popolo on this.

from publicbodies.

jpmckinney avatar jpmckinney commented on June 9, 2024

I'll be happy to review any proposed changes to the schema, just @-mention me in any new issues.

from publicbodies.

augusto-herrmann avatar augusto-herrmann commented on June 9, 2024

@rgrp, the link http://data.okfn.org/community/okfn/publicbodies (also referenced in the README) has since become broken. Has the schema documentation been moved somewhere else? If so, it would be nice to have a redirect.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@augusto-herrmann that's a bug in data.okfn.org which is getting fixed now.

from publicbodies.

rufuspollock avatar rufuspollock commented on June 9, 2024

@augusto-herrmann ok - the issue was that the data package is actually named public-bodies whilst repo is named publicbodies so redirect was not working correctly. Now fixed.

from publicbodies.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.