Giter VIP home page Giter VIP logo

census2020-das-2010ddp's People

Contributors

garfi303 avatar leclercp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

census2020-das-2010ddp's Issues

available AMI?

Is there an Amazon Machine Image publicly available with the configuration used to generate the DDPs?

How to understand the L1 metric?

Thank you for posting such a wonderful collection of code for exploring the effects of differential privacy algorithms on census data. I'm trying to understand what is captured by the L1 metric that is applied to compare the post-processed data with the incoming data. There is some pretty good documentation in programs/validation.py but I wanted to ask these questions to make sure I'm understanding it as well as I can.

TIA....

  • What is compared with the L1 metric? Histograms for each block?
  • What does it mean to move a person from one cell to another? Is the cell one of the bins of a histogram?
  • Does a reading of .04 mean that 2% of the people were moved from one cell to another? Or is it 4%?
  • Does a reading of 1.0 mean EVERYONE has moved, somehow?
  • Can movement cancel out? So people are moved and yet the L1 distance is zero?
  • Is a low L1 number intrinsically better? Why?

Where is the set up script?

The documentation on census2020-das-2010ddp/das_decennial/README.md says that there's a script: das_decennail/etc/setup_external but I can't find anything in das_decennial/etc/ but a README.md file.

How to configure the right host for dasexperimental?

Is there any good documentation for the best way to configure the code so it can find a way to dasexperimental?

~/Census/census2020-das-2010ddp/das_decennial$ python3 das20*.py configs/census_1940/ipums_1940.ini

=== DAS RUN AT Wed Apr 1 07:48:09 2020 ===
=== UNKNOWN MISSION -- dasexperimental is down===

Traceback (most recent call last):
File "/usr/lib/python3.6/urllib/request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/lib/python3.6/http/client.py", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 974, in send
self.connect()
File "/usr/lib/python3.6/http/client.py", line 946, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/usr/lib/python3.6/socket.py", line 724, in create_connection
raise err
File "/usr/lib/python3.6/socket.py", line 713, in create_connection
sock.connect(sa)
OSError: [Errno 113] No route to host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "das2020_driver.py", line 133, in
dashboard.das_log(mission_name + ' starting', extra={'start':'now()'})
File "/home/pcw/Census/census2020-das-2010ddp/das_decennial/programs/dashboard.py", line 87, in das_log
'instanceId': aws.instanceId()}, **extra}
File "/home/pcw/Census/census2020-das-2010ddp/das_decennial/das_framework/ctools/aws.py", line 78, in instanceId
return instance_identity()['instanceId']
File "/home/pcw/Census/census2020-das-2010ddp/das_decennial/das_framework/ctools/aws.py", line 63, in instance_identity
return get_url_json('http://169.254.169.254/latest/dynamic/instance-identity/document')
File "/home/pcw/Census/census2020-das-2010ddp/das_decennial/das_framework/ctools/aws.py", line 57, in get_url_json
return json.loads(get_url(url, **kwargs))
File "/home/pcw/Census/census2020-das-2010ddp/das_decennial/das_framework/ctools/aws.py", line 53, in get_url
with urllib.request.urlopen(url, context=context) as response:
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
'_open', req)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 1346, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.6/urllib/request.py", line 1320, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 113] No route to host>

dirs `4j` and `ark`

Running the program gives me error

FileNotFoundError: [Errno 2] No such file or directory: '4j/__init__.py'

when its trying to make a release file. The --print_bom option also lists 4j and ark directories that are not on this repo.

Are they missing or shoul I be getting them from somewhere else?

Hierarchy ordering in .ini file is inconsistent.

In the .ini file, these lines are in topdown order:

#budget in topdown order (e.g. County, Tract, Block Group, Block)
geolevel_budget_prop: 0.25,0.25,0.25,0.25

But these are in bottom up order. So the far right above corresponds to the far left below and vice-versa?

# Names of smallest to largest geocode (no spaces)geolevel_names: Enumdist,County,State,National --

Sample CEF file

My understanding is that the Census Edited File is a format from within the census and it is not made public.
Is it possible to get a sample CEF file in order to inspect what it looks like?

My specific use case is this:
I have a bunch of reconstructions of a 2010 state that I would like to run through DDP. I was planning to build them as suggested by the format at the bottom of this file: https://github.com/uscensusbureau/census2020-das-e2e/blob/master/programs/reader/e2e_reader.py Even a key to what these fields mean would be helpful, i think.

but I dont understand where I would enter geographic data like say the Block Group a person is in.

I am familiar with the 1940s IPUMs data format where the Household lines contain the geographic information.

Which levels get allocated the privacy budget in the DAS E2E example?

When using the DAS E2E example AMI, I noticed the documentation embedded in the CONFIG.ini file in das_decennnial says that the privacy budget is split "in topdown order (e.g. County, Tract, Block Group, Block)". But the certificate produced at the end of the run says that it is split between: Enumdist, county, state and national. Is one correct?

Noising 0 population blocks

Is it correct to say that blocks that have a population of 0 are not treated with the DP mechanism?

It seems like the reader works by reading in the person lines from the input file, which means that 0 population blocks never get represented anywhere in the system. This also means that 0 population blocks do not get noised at all, and stay 0.

Is this the correct way to interpret the code, and if not, where are the 0 population blocks being accounted for?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.