Giter VIP home page Giter VIP logo

census-data-downloader's Introduction

census-data-downloader

Download American Community Survey data from the U.S. Census Bureau and reformat it for humans.

What's available

All of the data files processed by this repository are published in the data/processed/ folder. They can be called in to applications via their raw URLs, like https://raw.githubusercontent.com/datadesk/census-data-downloader/master/data/processed/acs5_2017_population_counties.csv

The command-line interface

The library can be installed as a command-line interface that lets you download files on demand.

Installation

$ pipenv install census-data-downloader

Command-line usage

There's now a tool named censusdatadownloader ready for you.

Usage: censusdatadownloader [OPTIONS] TABLE COMMAND [ARGS]...

  Download Census data and reformat it for humans

Options:
  --data-dir TEXT  The folder where you want to download the data
  --year [2009-2021]   The years of data to download. By default it gets only the
                   latest year. Not all data are available for every year. Submit 'all' to get every year.
  --force          Force the downloading of the data
  --help           Show this message and exit.

Commands:
  aiannhhomelands            Download American Indian, Alaska Native and...
  cnectas                    Download combined New England city and town...
  congressionaldistricts     Download Congressional districts
  counties                   Download counties in all states
  countysubdivision          Download county subdivisions
  csas                       Download combined statistical areas
  divisions                  Download divisions
  elementaryschooldistricts  Download elementary school districts
  everything                 Download everything from everywhere
  msas                       Download metropolitian statistical areas
  nationwide                 Download nationwide data
  nectas                     Download New England city and town areas
  places                     Download Census-designated places
  pumas                      Download public use microdata areas
  regions                    Download regions
  secondaryschooldistricts   Download secondary school districts
  statelegislativedistricts  Download statehouse districts
  states                     Download states
  tracts                     Download Census tracts
  unifiedschooldistricts     Download unified school districts
  urbanareas                 Download urban areas
  zctas                      Download ZIP Code tabulation areas

Before you can use it you will need to add your CENSUS_API_KEY to your environment. If you don't have an API key, you can go here. One quick way to add your key:

$ export CENSUS_API_KEY='<your API key>'

Using it is as simple as providing one our processed table names to one of the download subcommands.

Here's an example of downloading all state-level data from the medianage dataset.

$ censusdatadownloader medianage states

You can specify the download directory with --data-dir.

$ censusdatadownloader --data-dir ./my-special-folder/ medianage states

And you can change the year you download with --year.

$ censusdatadownloader --year 2010 medianage states

That's it. Mix and match tables and subcommands to get whatever you need.

Python usage

You can also download tables from Python scripts. Import the class of the processed table you wish to retrieve and pass in your API key. Then call one of the download methods.

This example brings in all state-level data from the medianhouseholdincomeblack dataset.

>>> from census_data_downloader.tables import MedianHouseholdIncomeBlackDownloader
>>> downloader = MedianHouseholdIncomeBlackDownloader('<YOUR KEY>')
>>> downloader.download_states()

You can specify the data directory and the years by passing in the data_dir and years keyword arguments.

>>> downloader = MedianHouseholdIncomeBlackDownloader('<YOUR KEY>', data_dir='./', years=2016)
>>> downloader.download_states()

Usage examples

A gallery of graphics powered by our data is available on Observable.

Black and Latino U.S. population shares

The Los Angeles Times used this library for an analysis of Census undercounts on Native American reservations. The code that powers it is available as an open-source computational notebook.

The 2020 census is coming. Will Native Americans be counted?

Contributing to the library

Adding support for a new table

Subclass our downloader and provided it with its required inputs.

import collections
from census_data_downloader.core.tables import BaseTableConfig
from census_data_downloader.core.decorators import register


@register
class MedianHouseholdIncomeDownloader(BaseTableConfig):
    PROCESSED_TABLE_NAME = "medianhouseholdincome"  # Your humanized table name
    UNIVERSE = "households"  # The universe value for this table
    RAW_TABLE_NAME = 'B19013'  # The id of the source table
    RAW_FIELD_CROSSWALK = collections.OrderedDict({
        # A crosswalk between the raw field name and our humanized field name.
        "001": "median"
    })

Add it to the imports in the __init__.py file and it's good to go.

Developing the CLI

The command-line interface is implemented using Click and setuptools. To install it locally for development inside your virtual environment, run the following installation command, as prescribed by the Click documentation.

$ pip install --editable .

That's it. If you make some good ones, please consider submitting them as pull requests so everyone can benefit.

census-data-downloader's People

Contributors

chrislkeller avatar dependabot[bot] avatar gabriellelamarrlemee avatar ghing avatar ian-r-rose avatar irisslee avatar joegermuska avatar palewire avatar pandringa avatar rdmurphy avatar ryanpitts avatar sandhya-k avatar wcraft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

census-data-downloader's Issues

Possible error in column headers

The two income columns for acs5_2017_poverty_zctas.csv both say that the column is counting the number of people below poverty level: income_past12months_below_poverty_level and income_past12months_at_or_below_poverty_level

Estimate and MOE Maps

-5..., -3..., -2... --> margin of error

look for + or - --> estimate (not just single +/-)

"c" in estimate N/A in moe

Processed data contains duplicate data for multiple geographies

Bug/Issue

Census data downloader correctly downloads raw data but creates a CSV duplicated data in the processed directory.

Environment

  • Python 3.8
  • Pipenv version 2018.11.27.dev0
  • Latest version of censusdatadownloader

Reproduce

Install the package and then try to download a data set.

pipenv install census-data-downloader
censusdatadownloader --data-dir data/census race states

Expected behavior

A 52 row CSV file with total population by race in the processed directory.

Actual behavior

A 52 CSV with the same data for each column processed directory.

Possible issues/solutions

It looks like the data is correctly downloaded in the raw directory which makes me think something's happening in the process step. I'm seeing this behavior specifically with the race [geography] arguments.

I noticed the same behavior for internet counties but did get the correct data when I used internet states.

I'll see if I can debug what's happening at the process step but in the meantime I'll rely on the raw data. Thanks for your work on this!

Support more data sources

Sources available via the Census API include:

  • Five-year ACS
  • Three-year ACS
  • One-year ACS
  • Five-year ACS data profiles
  • Three-year ACS data profiles
  • One-year ACS data profiles
  • Decennial Summary File 1
  • Decennial Summary File 3

Am I going crazy or is DC no longer pulling at the tract level?

I've had a set of scripts using census-data-downloader to pull tract data for the U.S. (that's for the awesome library btw).

I noticed today that DC is suddenly missing. I thought it as possibly related to the python-us's handling of DC. I added a DC_STATEHOOD=1 environment variable but I'm still not pulling any data for DC.

I swore the data was always but maybe I'm misremembering. Regardless, is there a way to debug how census-data-downloader pulls records by state? If so, I'm happy to fork and debug further. Thanks!

Some system for merging with the Census SHP files

It would be nice to quickly be able to inspect a map of each dataset.

This could be accomplished by providing a ".csvt" companion for each processed CSV file that includes the data types in a format QGIS will respect. Then the files could be manually merged with the shapes.

Another approach would be to have the downloader, or some downstream module based on it, do the merger automatically. Such a system could create a companion shapefile for each set with the data already merged in.

Support more geographies

Here's what the API now supports:

  • Nationwide
  • States
  • Congressional districts
  • Counties
  • Places
  • Tracts
  • State legislative districts
  • ZIP codes
  • MSAs
  • CSAs
  • combined new england city and town area
  • new england city and town area
  • school district (elementary)
  • school district (secondary)
  • school district (unified)
  • metropolitan division
  • county subdivision
  • tribal census tract
  • alaska native regional corporation

add in csvt

add this in so it downloads with data types automatically

Jam value information

As we think about integrating the output of census-data-downloader to be an input into the census-data-aggregator, is it possible to include the jam value information in the headers of the processed files?

For example, in the household income table the column name "10_and_under" would be replaced with "2499_10." The convention of having the headers in units of 1000s makes the 2499 representation a bit awkward though.

Add additional methods to base classes to let users support additional sources

This is somewhat related to #2.

I find this project to be extremely useful and a great framework for a task that I have to do often. In my projects, I've found myself using the base classes and concepts from this project when I want to download and process data from other Census Bureau API sources.

However, for non-ACS sources, I find myself entirely reimplementing many of the methods on my geotype downloader classes because the changes in functionality aren't possible by just calling super() and then adding additional logic.

I think adding these methods to BaseGeoTypeDownloader could make adding additional data sources easier, both in this project, and for other users in their own projects:

  • BaseGeoTypeDownloader.get_api_client(): This would be called from the constructor to set sefl.api and allow subclasses to specify a customized subclass of census.Census that supports additional API endpoints.
  • BaseGeoTypeDownloader.get_field_type_map(): This would be similar to BaseGeoTypeDownloader.get_raw_field_map() except it would map from raw field names to types that would be passed to pd.Series.astype(). Like BaseGeoTypeDownloader.get_raw_field_map(), this would be called from BaseGeoTypeDownloader.process() when setting the column types after reading in the raw table. The implementation could check for the existence of a FIELD_TYPES attribute on the table configuration class, and if that doesn't exist, default to the existing logic for ACS tables that checks the field name suffix. Adding the ability to explicitly set type conversions allows supporting non-ACS tables that might have field names that don't have the same suffix convention as ACS tables.

Error when installing CLI

Hi all,

Excited to use this tool! I noticed this error when I just tried installing census-data-downloader:

Traceback (most recent call last):
  File "/Users/williamsa/.local/share/virtualenvs/fema-X3N8u1-d/bin/censusdatadownloader", line 6, in <module>
    from census_data_downloader.cli import cmd
  File "/Users/williamsa/.local/share/virtualenvs/fema-X3N8u1-d/lib/python3.7/site-packages/census_data_downloader/__init__.py", line 5, in <module>
    from .tables import TABLE_LIST
ModuleNotFoundError: No module named 'census_data_downloader.tables'

I tried installing via pipx and pipenv and received the same error.

Clarify that this tool works only for ACS tables in README

I could be wrong about this, but looking through the source code quickly, it seems like all of the tables are ACS tables. I love this tool, but always forget whether or not it supports other kinds of census data, e.g. population estimates. It would be selfishly helpful to have a note in the README that clarifies this.

If I'm wrong about this only covering ACS tables, please let me know.

If my request makes sense to you all, I'm happy to make a pull request for this little documentation change.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.