Giter VIP home page Giter VIP logo

quorum-legislative-data's Introduction

Quorum Legislative Data test


Requirements

  • Python 3

Datasets

  • /datasets folder

Problem statement

We’re analyzing publicly available government data. We need to provide our clients with the ability to visualize all of the bills that legislators voted for or against.

Goals

Given the datasets, we must answer the following questions:

  1. For every legislator in the dataset, how many bills did the legislator support (voted for the bill)? How many bills did the legislator oppose?
  2. For every bill in the dataset, how many legislators supported the bill? How many legislators opposed the bill? Who was the primary sponsor of the bill?

Non-goals

  • Spend more than 2~3 hours doing this project
  • Do a complicated solution
  • Add complex things like a database

Proposed solution

With Pandas as our data analysis tool, we will make some joins and count and/or distinct votes by bills and legislators.

Our program must output two different .csv files, each of them should answer our questions (described in the goals section).

Risks

All the data comes from .csv files, if their size increases, we must be aware of the time to load and perform operations over the data.

Each operation (merging tables, pivoting, writing .csv files) has different time complexity because of the size and group of data they use to perform.

As of now, the risks are low.

Time complexity of operations

  • pd.read_csv() and df.to_csv() is O(n), where n is the number of rows in the file.
  • pd.merge() depends on the size of the DataFrames being merged and the join type. For example, if we're performing an inner join, the time complexity is O(n * m), where n and m are the numbers of rows in the two DataFrames being merged. However, if we're performing a left join (which is our case), the time complexity is O(n + m), because we only need to loop through each row in the left DataFrame once.
  • df.groupby() is O(n log n), where n is the number of rows in the DataFrame being grouped.
  • df.pivot_table() is O(n log n), where n is the number of rows in the DataFrame being pivoted.

Does this project have any dependencies?

  • Python 3
  • Pandas library
  • Numpy

Future improvements

New columns

To account for new columns, such as Bill Voted On Date and Co-Sponsors, we have to consider where we will gather these columns from.

We can assume that Bill Voted on Date will be at the bills.csv or at Bill’s data source. The same goes for Co-Sponsors (an integer that points to the id of the sponsor - Person).

Different types of input

We might need to modify our code to receive a list of legislators or bills and generate a .csv report for them.

To account for this new requirement, we need to modify the input of data and how we process them.

With a list of legislators or bills, we need to load more data from a data source (database or a server) and then process like we are doing here.

quorum-legislative-data's People

Contributors

jonathangaldino avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.