Giter VIP home page Giter VIP logo

docs.opencivicdata.org's Introduction

Open Civic Data Technical Documentation

This repository contains documentation for developers including:

  • Writing Scrapers using Pupa
  • Open Civic Data's Data Type Specifications
  • Open Civic Data Proposals

Read these docs at https://open-civic-data.readthedocs.io/en/latest/

docs.opencivicdata.org's People

Contributors

aepton avatar boblannon avatar dependabot[bot] avatar djbridges avatar dsiddy avatar fgregg avatar gordonje avatar hancush avatar jamesturk avatar jpmckinney avatar kaitlin avatar konklone avatar mbacchi avatar mileswwatkins avatar patcon avatar paultag avatar ppival avatar rshorey avatar showerst avatar twneale avatar waldoj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs.opencivicdata.org's Issues

is elections' Party entity necessary?

in https://github.com/opencivicdata/docs.opencivicdata.org/blob/master/proposals/drafts/elections.rst

a new Party type is created that is effectively a subclass of Organization that adds:

  • color
  • is_write_in
  • abbreviation

I'm not convinced on any of these fields being appropriate for inclusion and want to discuss their purpose & usage.

color
Parties (in the US at least) don't have official colors, red & blue have only been in use for the past. This seems like a property that should exist in applications using colors to represent parties, not in the core metadata.

Looking at Wikipedia, it does seem that some parties in other countries have official colors, but it is often more than one.

is_write_in
This doesn't feel like a permanent feature of a party either, as it would vary a lot election to election. Ballot access is a complicated thing, in general no party is guaranteed ballot access in a particular race, it usually depending on their showing in the last election. A good example here would be the US Green party which has ballot access currently in 19 states, but that changes year to year.

abbreviation
This could be a useful addition, but I also wonder if it is redundant w/ alternate names.

It might make more sense to have a recommendation that organizations have an alternate name added with a particular note set.

Open to being convinced on any of these, but wanted to start the discussion before we had an implementation.

Handling unitemized totals and other summary amounts

Something @palewire and I were just discussing. This is something we touched on in the previous conversation, but I don't think we fully resolved.

In CA, filers don't have to itemize contributions under $100. Instead, they can report a total for all unitemized contributions. It isn't clear where this would be recorded in our current schema, and it's pretty essential in order to figuring out the total amount raised.

We think we need a place to store this and other total or summary amounts. What we have in mind is an optional, repeating .totals or .summaries field on Filing that would have the following properties:

  • note: Description of the total (e.g., "Unitemized contributions" or "Total expenditures")
  • amount: Decimal amount of the total.
  • currency: Currency denomination of amount.

Would be interested to hear from others if this would adequately cover other kinds of filings in other jurisdiction. Maybe in some cases we might also need to link the totals to specific elections?

Jurisdiction description misworded

"A governing body that exists within a division. While ‘Florida’ would be a Jurisdiction, the Florida State Legislature would be a jurisdiction."

I assume the second "jurisdiction" is supposed to be "organization"?

Campaign Finance Filing: Review attribute updates

It's not clear that CommitteeStatusUpdate can't be rolled into CommitteeAttributeUpdate - just add description to CommitteeAttributeUpdate.

Also, the class name already establishes the semantics, so I'd change attribute_to_update to property and new_attribute_value to value.

Instead of many-to-many relationship between Filings and Elections, maybe a many-to-one relationship between Transactions and Elections?

In CA, I believe all transactions in a filing apply to the same election, but seems like this isn't the case in other jurisdictions. For example, in our earlier convo, @LindsayYoung raised the point that in the FEC schema one filing could apply to multiple elections, especially during primary season.

However, Lindsay also suggested that "election is generally more useful on the transaction level". If so, modeling that transaction-to-election relationship feels more straight-forward and precise.

Would it be an improvement, then, if we:

  1. Remove the repeating .election field from Filing
  2. Added a non-repeating .election field to Transaction (and maybe CommitteeAttributeUpdate too)?

Add continuous integration docs

Once you have a working scraper, you need a way to run it regularly and monitor its status. Could you add some docs on your Jenkins setup (or whatever else)?

Server returning 500 error as text

The /people endpoint is returning a 500 error this morning, but it appears the HTTP status code is 200, while the text says "

Server Error (500)

".

All errors should be returned as an HTTP status code so clients will act correctly.

Multiple links between `Committee` and `Jurisdiction`

I'm working on filling out @aepton's implementation of the Campaign Finance enhancement proposal, and I've got a few questions regarding the spec.

Currently, a Committee can be linked to a Jurisdiction in at least two ways:

  1. A Committee references a CommitteeType which references a Jurisdiction.
  2. By subclassing Organization, Committee also inherits an optional reference to a Jurisdiction.

Feels redundant. Looking back at our earlier convo, seems like Committee Type was defined as its own data type because we believe the available types and their regulatory meanings will vary across jurisdictions.

I don't doubt this, but I wonder how much bearing it should have on the specification. Why does it matter if "candidate" type committees in WA file at a different frequency, disclose different transactions or otherwise have different rules from "candidate" type filing committees in IL? What's wrong with allowing the filings, transactions, etc. associated with these committees to differ depending on the committee's jurisdiction, even while they're all grouped under the same label?

This of course wouldn't rule out jurisdiction-specific ETL code for integrating the committees and filings into the jurisdictionally agnostic models. I'm just saying the rules surrounding committee types might not need to be reflected in the schema.

For now, I've left the .jurisdiction field off of the Committee model. But my proposal would be:

  1. Not include a Committee Type as it's own data type and, instead, have an open text .type or .classification field on Committee
  2. Just use the .jurisdiction field Committee inherits from Organization.

Event: Fix OCDEP

  • Missing classification and order on Agenda Items
  • Missing date, text, links on Documents
  • name was renamed to note on Media
  • Nothing implements note on Location, remove it?
  • Nothing implements type on Media, remove it?
  • Confirm field names on Participants and Related Entities

modified flags

Sometimes the source text for motions and bills are incomprehensible. It's been Open States practice to rewrite for human readability.

As a data user, I want to know the provenance of the information. If motion or bill title has been rewritten for clarity, I want to know that, and I'd also like to know the original text.

@showerst proposed adding an attribute like this to objects with modified texts (in his example a motion). { "modified" : {"motion": "cp/h lwr"}}

@jamesturk suggested that we use the existing extras field for this, since we may not be ready to standardize on this practice.

I'm curious to hear @jpmckinney's thoughts, as this would effect objects that are part of popolo (which we strive to maintain compatibility with).

Scope attribute for events

Right now, the Event model in opencivicdata-django requires a jurisdiction attribute.

This is so that all events related to particular legislature can be easily grouped together.

However, not all the things that we want to model within the wider OCD world have jurisdictions, i.e, Election days.

I would like to propose that OCD Events have an scope attribute that can be a jurisdiction_id, division_id or None.

This would allow for current pupa practice to be largely unchanged, but also allow for events that are not associated with jurisdictions.

I don't love the name scope.

Thoughts? @jamesturk @gordonje @jpmckinney

0006: Bill feedback

Consistency:

  • Why not use organization to match other schema, instead of from_organization?
  • Would it be consistent with other parts of the schema to rename entity_type to _type?

Choice of terms:

  • mimetype is old-fashioned. This is typically called a content_type for some time.
  • versions.links and documents.links: These refer to different forms of the version/document. DCAT uses the term distributions. Whatever term you decide on, DCAT's definition of Distribution is very clear and can maybe be reused. Strictly speaking, links don't have a content type, but distributions do.
  • versions.name and documents.name: Based on the examples, these are not really the names/titles of the documents - maybe note is closer to the intended meaning?
  • For summaries and other_titles, it's not clear why the property name is text. I would either expect value (as in ContactDetail, Count, and most future Popolo subdocuments) or the singular form (as in Identifier, OtherName) - in this case summary or title.

Here are some suggested terms from the Dublin Core Metadata Terms, which Popolo is likely to adopt in a generic Document class, since they are the most broadly used metadata terms:

  • identifier instead of name, since HB 2117 is better described as an identifier than as a name. The docs already acknowledge that name is easily confused with title.
  • abstract or abstracts instead of summaries

Questions:

  • Isn't primary a classification?
  • "The type of the version" and "The type of the document": Could you provide example values?
  • is actions.text a description of the action ("Referred to committee"), or the actual text of the action which may be identical to the text of a motion ("That Bill HB-1 be referred to the Committee on House Adminstration"), or both? text suggests that the action text is taken from official proceedings, but the definition of the term suggests it's a description of the action, not its official text. Depending on the most common case, it may be clearer to name it description.

Person level participation data in events

In Chicago we have individual level participation in events (which aldermen went to what council meeting or committee meeting). @paultag asked me to open up an issue for extending the Events OCDPEP for this data.

Can `Transaction.date` be optional?

In California, campaign expenditures are reported on Schedule E of Form 460 which does not provide an obvious place for filers to report the date of a payment made. Thus, about 40% of Form 460 Schedule E Items are missing an expense date.

In the current version of the draft spec Transaction.date isn't labeled as optional, but maybe it should be?

In the draft implementation in python-opencivicdata, I'm going to allow NULLs in this field, for now.

This sort of overlaps with #98 in terms of how these decisions facilitate longitudinal analysis of transaction data.

meta-proposal: doc refactor

Right now this repo has:

  • somewhat halfhearted docs on how cities can adopt OCD as a format
  • docs on how to write a scraper with pupa
  • somewhat out-of-date descriptions of the various OCD types
  • OCDEPs
  • a Python style guide

I think in reconsidering the purpose of this repository it should have:

  • an intro page explaining what Open Civic Data is, who is involved, etc.
  • the canonical docs on each OCD type (updated as amended/created by OCDEPs)
  • OCDEPs

The "how cities can adopt OCD" stuff is stale and not worth keeping IMO. If the need arises I imagine we'd take a different approach now than what Sunlight was pursuing when those were started.

And the pupa docs should probably be moved to the pupa repository (& linked from the intro page)

If others are OK with this I'd like to start on this soon so that we can have Open States docs reference good OCD docs where appropriate

Allow people to be responsible for bill actions

Right now, all bill actions have an organization attribute

organization, organization_id
____The organization that this action took place within.

This seems not quite right for the actions of 'signing' or 'vetoing' done by the executive.

Here are the two ways, I've approached this. Neither seem quite right to me.

            bill_action = {'description' : 'Veto',
                           'date' : action_date,
                           'organization' : 'Office of the Mayor',
                           'classification' : 'executive-veto')

I don't really like this approach because the mayor is not the one vetoing legislature because he holds a position in the office of the mayor. He can do it because he is the mayor of the "City of Chicago".

So this is my current approach.

            bill_action = {'description' : 'Veto',
                           'date' : action_date,
                           'organization' : 'City of Chicago',
                           'classification' : 'executive-veto',
                           'related_entitites = [{'name' : 'Rahm Emanuel, 'entity_type': 'person'}                          

I like this better, but "City of Chicago" also doesn't quite seem like the right container.

I think I would like the following to be legal, note the absence of organization"

            bill_action = {'description' : 'Veto',
                           'date' : action_date,
                           'person' : 'Rahm Emanuel',
                           'classification' : 'executive-veto')

Thoughts?

0005: Area versus Division

@jpmckinney wrote:

The "Areas vs. Jurisdictions and Divisions" section conflates two questions: "Jurisdiction versus Division" and "Why we call it Division and not Area". An Area in Popolo is the same thing as a Division. There is nothing in Popolo for a Jurisdiction, and there will probably never be †. If you find it necessary to describe the difference between Jurisdictions and Divisions, that's probably best done in 0003 where Jurisdictions are introduced.

So, for the remaining question of "Why we call it Division and not Area". The justification given is "Area does (in essence) equate to Division however, and the different terminology a remnant of decisions made prior to Area being introduced in Popolo." My reasons for "area":

  1. Popolo uses the term "area" because no one calls these things "divisions".
  2. The terms division and division_id are fairly recent in OCD and not broadly deployed. It would not be expensive to switch to area and area_id.
  3. I don’t think it is overly confusing to have the property area_id have an OCD Division ID as a value; OCD-ID is an identifier scheme. As long as we treat it as such, and not talk about "OCD Divisions" as things that exist separately from their identifiers, then confusion will be avoided.
  4. There is no such thing as 95% conformance to Popolo. A spec either conforms or it does not. Using division_id makes OCD non-conformant. I don't think the argument for division_id is a persuasive reason for breaking conformance.

If area_id is adopted, this should be used in 0003 as well.

†: FYI, the reason jurisdictions will likely never be part of Popolo is that I don't think a jurisdiction actually exists distinctly from its top-level organization. I understand that, for OCD, having jurisdictions makes organizing data and APIs and writing code easier. However, from a pure modeling perspective, there's no real thing as a "jurisdiction".


@jamesturk wrote:

We can clarify this better in that paragraph, but I disagree about Area & Division being the same thing, and this is unlikely to change.

In response to specific points:
1 & 2. We actually do use the term division and have since day one, the use is integrated into other APIs that use Open Civic Data division identifiers too.
3. I'd disagree as to whether or not it is confusing, but more importantly OCD Divisions do exist and have properties (ones that do not match the Area schema Popolo added).
4. We have to be pragmatic and value backwards compatibility and practicality more than conformance, I think 100% adherence is unlikely especially given some fundamental differences in how Votes will be handled. If you prefer we stop using the term Popolo we can, or just give a nod to the fact that we were inspired by Popolo, but we aren't going to be adding/changing things like this for the sake of compliance.


@jpmckinney wrote:

It sounds like I need more information than I currently have, in order to evaluate the best way forward.

  1. By "no one" I meant "no one outside of people adopting OCD"...
  2. Can you point to the APIs that use OCD-IDs and specifically their use of the terms division and/or division_id?
  3. Can you describe the properties that create a mismatch between Division and Area?
  4. To my knowledge, there are no Posts using division_id (except maybe as of a week ago?) and I have yet to see a Jurisdiction using division_id (outside of my own code). Those are the two places where the division_id property is used, according to the OCDEPs. So, what backwards compatibility is being broken if those are renamed to area_id?
  5. I need you to share your feedback on votes, because that way I actually have an opportunity to integrate your feedback and make changes to Popolo so that everyone has the option to conform. If you don't tell me anything, then how can I possibly get us to a place where everyone interoperates? Popolo is not set in stone; you can make changes. But you need to share information for us to get there.

@jamesturk wrote:

2-

Google uses the term division:
https://developers.google.com/civic-information/docs/v1/divisions
As does OpenElections https://github.com/openelections/specs/wiki/Elections-Data-Spec-Version-2
The endpoint for divisions has been http://api.opencivicdata.org/divisions/ since it was published.

3- Since the properties are all optional, I suppose this part is manageable, but there's a mismatch in thinking between Divisions & Areas and how they relate to boundaries. Divisions do not have a boundary but instead have a relationship to a boundary with start & end times. This seems like it'd be another noncompliance point since we'd have a field w/ the (arguably) the same purpose but a different name & structure.

4- These two cases are recent enough revisions and were they the only place I'd be OK with it. I'm more concerned about us being a moving target for others, we've used the term division in our endpoints and IDs for over a year, changing things on them now just isn't practical.

5- Point well taken, we're behind on getting you that feedback but I've just asked Paul to chime in with that today.


@jamesturk wrote:

there is also at least one vendor API that is using the term division_id internally, that's less of an issue (esp. as they haven't published it yet) but worth noting

anthropod docs:

write a general introduction to contributing as a non-developer - pointing people at anthropod, etc.

maybe a good time to come up with a non-anthropod name for the deployment

Update to docs?

The docs currently have a note on them:

 Parts of Open Civic Data underwent a large refactor as of mid-2014, some information on this page may be out of date. 
 We’re working on updating this documentation as soon as possible.

Has the documentation been update yet? If not, any thoughts on when they might be?

VoteEvent: Add `requirement` attribute

Different kinds of votes in different legislatures require different percentages of support to pass. Seems like important information to store about a vote - more important than shoving it into extra attributes. The @opencongress congress scrapers include this attribute in votes as you can see in the following example.

{
  "requires": "1/2", 
  "result": "Failed", 
  "result_text": "Failed"
}

The result_text may also be relevant too since we're storing passing as a boolean value instead of the actual text specific to the vote type. This may make more sense though to push to extra attributes.

via @crdunwel opencivicdata/python-opencivicdata#38

Improve naming of Bill#from_organization field

(Creating this as a placeholder for revisiting during a major revision.)

Related Slack conversation: https://opencivicdata.slack.com/archives/pupa/p1454452385000025

On my reading, from_organization struck me as the organization (in my case, a committee) from which the Bill originated. The actual intention is that it represents the parent legislature organization. My thought is that this should probably be reflected in a better name, perhaps from_legislature.

cc: @fgregg

Campaign Finance Filing: "person" <> contributor

I don't think I get the logic of making contributors "persons"--is this an optional designation?

@aepton, @jpmckinney : Consider the case of a committee giving to another committee. Wouldn't that mean the committee is a committee (and hence a subtype of a popolo org) when it is receiving money and other times the committee is a person (and then a subtype of a popolo person) when it is donating? To my nose that doesn't smell right and makes tracking the flow of money harder, not easier.

Moreover, differentiating contributor types is often the point of this sorta work, even if there aren't easy answers available in the source data. Being able to say that XX percent of funds came from corporate donors is pretty powerful... I don't really understand the rules here, but I'd make donor type it's own field, where person and organization are options, but only assigned if there's solid reason for thinking this (in many jurisdictions this info can be gleaned, at least in part, though I'm sure that's not true everywhere). And, of course, detailed local knowledge may be the only way to know for sure...

Policy for code lists

To make code list changes easier to review independently, I propose the following policy.

Other issues:

  • Define the existing classifications.
    • Do we put the definition as a comment above the classification?
    • This will make it easier to determine whether a new classification would overlap with an existing one.

For all lists:

  • Format:
    • Classifications are in American English.
    • Classification keys are lowercase.
    • Classification values are titlecase.
      • Note: Some BILL_ACTION_CLASSIFICATION_CHOICES are not titlecase.
  • Criteria for inclusion
    • A new classification must not be added if an existing classification would suffice.
    • There must be no words indicating the jurisdiction.
    • The classification must communicate a fact in the real world.
      • It must not be used to indicate some software state (e.g. "pending validation"), or some epistemological status (e.g. "unknown", "waiting for information").

For ORGANIZATION_CLASSIFICATION_CHOICES and BILL_CLASSIFICATION_CHOICES:

  • Criteria for inclusion:
    • An organization classification must describe an executive, legislative or political organization.

For BILL_ACTION_CLASSIFICATION_CHOICES:

  • Format:
    • Classification keys are hyphenated (no spaces).
    • The hyphenated parts should be ordered from general to specific, e.g. committee-passage-unfavorable.
  • Criteria for inclusion:
    • A classification should only be prefixed by an organization classification if that additional information identifies a specific step in a predictable process.

Do Filing coverage start and end need to allow precise times?

On the Filing data type in the campaign finance enhancement proposal, coverage_start_date and coverage_end_date are defined as date and (possibly) time values.

None of the filings we'll be loading from California will have time data associated with these values. Obviously, that's just one state, but do we have a sense of how many jurisdictions will have precise start and end times for each filing's coverage? And even if these precise times are often available, how important is this precision beyond the start and end date?

Maybe I'm getting too far into the implementation details in this space, but I want to be clear what I'm doing. For now, I'll leave these as DateTime fields and plan to set the time portion to midnight UTC when unknown.

If the precision is important, then we might consider, someday, converting these to "fuzzy" DateTime fields that don't require time parts. OCD is already doing something like this for date fields where parts of the date are missing. But it would be better if we converted these to a custom model field with field lookups allowing users to query these fields as if they were regular Date or DateTime fields.

Add database setup instructions

The docs should include or link to instructions for running the database locally--something like

  • Install postgres
  • Install postgis
  • Create pupa database user
  • Create opencivicdata db
  • Run dbinit

Adding docker instructions would be neat too!

Modeling negated transactions

California campaign finance committees are required to itemize returned contributions on the same schedule that includes received contributions. The real world situation would be something like:

  • In May, John Doe gives gubernatorial candidate Gavin Newsom $1,000
  • In October, after learning that John Doe is a total scumbag, Gavin is embarrassed into returning the $1,000

In our source data, we have a line item for the original contribution and another with a negative amount for the returned contribution:

filer contributor amount date type
GAVIN 4 GOV JOHN DOE 1000.00 5/1/2017 Contribution
GAVIN 4 GOV JOHN DOE -1000.00 10/1/2017 Returned

In mapping these records to the Transaction model, my initial thought was to flip the sender and receiver and take the absolute value of the amount. So the above source records would become:

sender recipient amount date classification
JOHN DOE GAVIN 4 GOV 1000.00 5/1/2017 Contribution
GAVIN 4 GOV JOHN DOE 1000.00 10/1/2017 Returned

However, @palewire and I discussed further and decided against this approach. We're worried about the potential for inaccuracies when summing the amount field. Instead, we're planning to leave the source values more or less unchanged in loading the Transaction model:

sender recipient amount date classification
GAVIN 4 GOV JOHN DOE 1000.00 5/1/2017 Contribution
GAVIN 4 GOV JOHN DOE -1000.00 10/1/2017 Returned

Have others seen similar use cases in other jurisdictions and, if so, does our approach for fitting it into our shared models make sense?

If we agree this is proper use, then we might expand the description on Transaction.amount to say that negative numbers are allowed and why.

Votes "requires" attribute

**Not sure where issues of data format modification suggestions should be raised so raising it in this project. Also, the spec in the docs may be outdated so if these things were later included then apologies.

Different kinds of votes in different legislatures require different percentages of support to pass. Seems like important information to store about a vote. The @opencongress congress scrapers include this attribute in votes as you can see in the following example.

{
  "requires": "1/2", 
  "result": "Failed", 
  "result_text": "Failed"
}

The result_text may also be relevant too since we're storing passing as a boolean value instead of the actual text specific to the vote type. For instance, "Nomination Confirmed" for federal votes in OCD would be reduced to a boolean value on passed so we'd lose how the legislature labels the passing of the vote.

Arguments for / against including these attributes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.