Giter VIP home page Giter VIP logo

amc_confidence_intervals's Introduction

AMC_Confidence_Intervals

This repository contains code illustrating the use of the Approximate Monte Carlo Method (AMC method), applied to 2010 Decennial Census data as an empirical evaluation of the method's reliability and as an illustration for how this method is expected to work with 2020 Decennial Census data; in the future, we expect to provide products based on the same AMC method, applied directly to the 2020 Decennial Census. The AMC approach was designed by United States Census Bureau research staff to provide a method for generating estimates of the amount of uncertainty introduced by the 2020 Decennial Census Top Down Algorithm (TDA), the formally private mechanism used to protect the confidentiality of individuals' census responses in the 2020 Census Redistricting (P.L. 94-171) and Demographic and Housing Characteristics data products.

The AMC method was inspired by traditional Monte Carlo methods, and works by taking a Privacy-Protected Microdata File generated by TDA and then generating a large number of replicates, executing the TDA repeatedly, in iterations which each treat this initial PPMF0 as the ground truth. That is, the PPMF0 is substituted for the confidential 2020 Census Edited File (CEF), and then TDA is run repeatedly in this mode; the set of files this illustration code is based on were constructed by deploying this procedure but using the 2010 CEF in lieu of the 2020 CEF, as a demonstration. This generates a series of iterates, PPMFi, for i=1,2,...,25, where 25 was a value determined empirically to be reasonable, as discussed in the AMC paper. These PPMF files, based on 2010 Decennial Census data, can be downloaded from this page. Comparisons between the PPMF0 and PPMFi, and variability in these comparisons, can then be used to construct estimates of uncertainty, including intervals that behave like traditional confidence intervals. That paper also discusses appropriate use of the AMC method, using the 2010 Decennial Census to examine when the AMC method works well and when its results should be interpreted cautiously.

The full AMC paper examines multiple approaches for generating confidence intervals. This repository does not illustrate every method in the paper, but in the Jupyter Notebook jupyter_Approx_Monte_Carlo_confidence_interval_notebook.ipynb, we provide a full working code example for constructing a conditionally bias-adjusted, RMSE-based confidence interval assuming a Student’s T distribution with 5 degrees of freedom. We hope external groups will find this example useful as a starting point for building their own uncertainty estimates. The Jupyter Notebook reads an example data file, jupyter_data.csv, which contains aggregated counts from the PPMF files for several example queries. For data users who wish to view the example confidence interval code without using Jupyter, they can use a browser to open jupyter_Approx_Monte_Carlo_Confidence_Intervals.html. The requirements.txt and spark_requirements.txt files include all the dependencies required to run the pandas and pyspark code sections of the Jupyter notebook.

The Jupyter Notebook was run and tested using EMR 6.15.0.

amc_confidence_intervals's People

Contributors

leclercp avatar haase avatar michaelhawes avatar

Watchers

Pavel Zhuravlev avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.