opendp / opendp Goto Github PK

View Code? Open in Web Editor NEW

284.0 12.0 46.0 132.3 MB

The core library of differential privacy algorithms powering the OpenDP Project.

Home Page: https://opendp.org

License: MIT License

Python 25.14% Rust 48.52% Shell 0.29% TeX 5.75% HTML 0.06% R 11.39% C 8.84%

opendp differential-privacy privacy dp-programming-framework opendp-commons

opendp's People

Contributors

Stargazers

Watchers

opendp's Issues

Type checking for FFI functions with Measurement/Transformation args

Add type checking for FFI functions with Measurement/Transformation args.

FFI functions like make_chain_mt() don't currently validate the type of their Measurement or Transformation arguments. This is error-prone, because it's easy to supply a Transformation instead of a Measurement, or vice versa. (This was part of the problem in #36.) We should add some type checking like is done for arguments to measurement_invoke() and transformation_invoke().

The naive solution would be to embed the FfiMeasurement or FfiTransformation in an FFIObject, which has a type slot, but that probably won't be workable, because that'll capture the concrete type with all type args resolved. I suspect instead we'll want some way to look at the generic type Measurement<...> or Transformation<...>, not the concrete type.

Clamp Transformation -- Implementation

Implement function
Include comments
Write tests

End-to-end Python code for all major library APIs

We need Python code that exercises all library entry points. There's a start for this in python/test.py, but it doesn't cover all constructors.

Ideally, this would take the form of an integration test we could run CI. But something that does a minimal sanity check to make sure we haven't broken any signatures would be a good start.

Python Bindings

Python bindings for all library APIs. These should be as close as possible to idiomatic Python code. Ideally, they would be generated automatically from metadata.

Calling generic functions
Loading data
Invoking operations
Memory management

Sum Transformation -- Implementation

Implement function
Include comments
Write tests

FFI Macros and Utilities

Tools to make life easier and code cleaner in FFI layer:

Dispatch based on type parameters
Marshaling data
Memory management

Detailed articulation of the choice of Rust

1/11 - Add this to the documentation site

Make the case for rust and memory safety.

Initial document: https://docs.google.com/document/d/16LFjllHI6jAtgURweasJ733X4l-XI8Fq3Ryy0b98Y0w/edit

What is needed for a public doc?

Data Model Design

High-level design for data objects consumed and produced by Measurements and Transformations:

Enumerate common use cases
Define Rust data structures
Prototype a few examples
Write design overview

Gaussian Mechanism -- Implementation

Implement function
Include comments
Write tests

Error Handling Cleanup -- Rust

Implement the strategy in #22:

Make a pass through public APIs, add error annotations as appropriate.
Audit code for existing uses of unwrap(), assert(), other things that panic, convert to proper error handling.
Add plumbing to expose errors at FFI layer.
Write unit tests to check that errors are propagated correctly.

Histogram Transformation -- Implementation

Implement constructor
Write tests
Add documentation

Note: This issue originally covered both category-based and stability-based histograms. In the interest of modularity, and because the proofs will presumably be separate, I've split off the stability part into #116.)

Mean Transformation -- Implementation

Implement function
Include comments
Write tests

CountBy (Histograms)

Inconsistent chain constructors

In the changes for #30, we messed up something, and now test.py is crashing:

/usr/local/bin/python3.8 /Users/av/Projects/opendp/python/test.py
Initialized OpenDP Library
"hello, world!"

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

The crash happens at line 38:

    everything = odp.core.make_chain_tt(composition, parse_dataframe)

Add GitHub CI/Actions for Rust Unit Tests

Error Handling Strategy

Strategy for error handling in OpenDP, especially across the FFI boundary:

Develop general approach for signaling errors from library functions.
Define a way to expose this to FFI in safe manner.
Write some example code for how this should be applied in the library.

Impute Transformation -- Implementation

Implement function
Include comments
Write tests

Chain_MT Combinator -- Implementation

Implement function
Include comments
Write tests

Untrusted mode

Provide a way to activate "untrusted" mode, where privacy guarantees are loosened, and more features are available. This could be used to enable things outside the strict OpenDP constraints:

newly contributed components that haven't been validated yet
user-supplied functions (e.g. row_transform)
permissive floating point calculations

Need to figure out the mechanics of this. Some things we could leverage:

Rust module(s)
Rust conditional compilation
Separate Rust crate
Python package flags that load different versions of the library

Facility for using callbacks implemented in client code

It'd be very nice to have a facility whereby functions implemented in client code (i.e., outside FFI, in Python) could be passed into the library and used as callbacks. This would allow us to support custom transformations and relations. (This would be available only in an explicit "unsafe" mode.)

Dependencies

Metrics and Measures Design

High-level design for Metrics, Measures and Distances from the framework paper:

Define Rust data structures
Prototype a few examples
Write design overview

(This will likely fall out of https://github.com/opendifferentialprivacy/OpenDP-Experimental/issues/21, but opening a separate issue just in case there are some other bits.)

Numerical Instability Audit

We need to audit the code for privacy issues because of numerical instability. Some of this will likely happen as a result of writing proofs for components. But we should also have a system-level view of this.

Where appropriate, we have facilities for doing arbitrary-precision math with MPFR, and some of the mechanisms make use of this via the sampler abstraction.

There will probably be a lot of individual tasks for this. We might want to fork off separate issues for the different components. For now, this issue can serve as a placeholder.

FFI constructor dispatch for Metrics & Measures

We don't have a way for FFI constructors to dispatch on different metrics. Currently, this is handled in a clumsy way by having separate entry points. (E.g., opendp_trans__make_bounded_sum_l1() & opendp_trans__make_bounded_sum_l2().) This should be cleaned up, so that FFI clients can specify the Metrics/Measures they want.

trait constructor calling convention

switch calling convention from individual functions to impls of constructor traits in trans.rs, meas.rs, core.rs
refactor trans.rs and meas.rs into folders
split trans.rs code between mod.rs and dataframe.rs
add num crate and remove OpenDPInto trait

Rationalize types for LaplaceMechanism and GaussianMechanism

LaplaceMechanism and GaussianMechanism currently support any primitive types (including integers), which is probably not what we want. We need to rationalize this. Simplest solution would be to support f64 only, but we should think this through.

Count Transformation -- Implementation

Implement function
Include comments
Write tests

Data Loading

This is a placeholder for some basic means to get data in/out of the library. Specific instances TBD.

Read/write CSV
???

Make DistanceCast properly handle f64 -> f32

DistanceCast properly handles rounding for size change and int -> float changes. But in the corner case of f64 -> f32, it's possible that the resulting distance will be smaller.

Split combinators into separate module

The amount of code in core.rs is becoming unwieldy. It's not totally clear what's the best organization, but a first step would be to take all the combinator-related stuff (make_chain_xx, make_composition, etc) and put it into a separate top-level module. Proposed name of comb.rs.

Integrate proofs on Transformations/Measurements

Write latex documents and check them in on this repository next to the associated rust constructor.
We are, at minimum, using the PR review as a record of the vetting process.

Administrative issues:

#72
#331

Individual Proof Issues:

https://github.com/opendp/opendp/issues?q=is%3Aissue+label%3A%22DP+Proof%22

Cast Transformation -- Implementation

For flexibility, we need a cast operation, to convert T -> U and Vec -> U, where T, U are primitives.

Remove obsolete module data

Since we went with full-on generics everywhere, the ADT model in module data.rs is now obsolete. This needs a sanity check, but I believe that the entire module (Data, Form, Element, TraitObject, etc) can all be removed. Same for the parallel module in opendp-ffi (though opendp_data__from_string() & opendp_data__to_string() will need to live somewhere, see #39).

Python unit test harness

Set up the infrastructure for running unit tests of the Python APIs.

Dependencies:

Chaining hints

Combinator functions make_chain_mt() and make_chain_tt() need an additional argument for a hint function from the framework paper. This function chooses an intermediate distance so that relations can be chained.

Specify APIs
Implement chaining logic
Add forward/backward map functionality
Add convenience constructors using stability constant
Update existing operations to generate relations with maps
Write unit tests

Refactor samplers into separate module

Library Contributor Guide

Create docs to help people developing contributions to the library.

Technical stuff (how the library works, how to structure code)
Logistical stuff (how contributions are validated)

Automated generation of FFI metadata

Create an automated mechanism that can generate FFI metadata from annotations in the code.

Currently, the Python bindings are generated from metadata describing the FFI wrappers. These metadata are contained in JSON files (bootstrap.json). This works very well, but it requires manual creation of the metadata, and duplication of information between Rust code and JSON. It'd be great to have a more robust mechanism.

This could be done with a build script in a couple of ways:

Parse the code and annotations in openddp-ffi to get the metadata directly.
Parse the code and annotations in opendp to infer the metadata. This is more work, but has better long-term potential.

This could be a first step towards fully automatic generation of everything from the core Rust functions. Issue #131 is for the fuller solution (if we get there).

Python library project structure

Organize the Python code into a rational structure for a library.

Currently, the Python wrapper code is just sitting in a bare scripts. This should be reorganized into a proper library project. Proposed layout:

opendp/
    python/
        docs/
        opendp/
            __init__.py
            opendp.py
            ...
        requirements.txt
        tests/
            ....

SQL: Mark components needed for full SQL support

Meet and mark Column C on the components list:
https://docs.google.com/spreadsheets/d/132rAzbSDVCKqFZWeE-P8oOl9f23PzkvNwsrDV5LPkw4/edit#gid=0

Basic Composition Combinator -- Implementation

Implement function
Include comments
Write tests

We have a simple implementation of this, but it only accepts exactly two Measurements, and constructs a function returning a 2-tuple. Now that we have the AnyXXX facilities, it should accept an arbitrary number of Measurements, and construct a function returning Vec.

Operation Design

High-level design for Measurements and Transformations from the framework paper:

Define Rust data structures
Develop strategy for constructors of specific instances
Prototype a few examples
Write design overview

RowTransform Transformation -- Implementation

Implement the RowTransform() concept from the programming framework paper.

This is a Transformation constructor that takes a user-defined function and applies it to every member of a dataset.

Dependencies

FFI Strategy

Strategy for exposing OpenDP Library functionality via FFI, so that bindings can be created for different languages:

Work out approach for constructors, combinators and invocation.
Define C-compatible data structures
Decide on support for generic types and functions
Specify policy around memory management
Prototype a few examples
Write design overview

Error Handling Glue -- Python

Implement the strategy in #22:

Implement a set of Python Exception classes to mirror the Rust error cases.
Write glue to check for errors returned through FFI, raise Python Exceptions.

Resize Transformation -- Implementation

Implement function
Include comments
Write tests

Relations Design

High-level design for Privacy Relations and Stability Relations from the framework paper:

Define Rust data structures
Prototype a few examples
Write design overview

Laplace Mechanism -- Implementation

Implement function
Include comments
Write tests

Library User Guide

Create docs to help people developing applications that use the library.

This is a big undertaking. Here are some initial tasks (add more once we have an outline):

#239

Chain_TT Combinator -- Implementation

Implement function
Include comments
Write tests

FFI data constructors and accessors

When calling OpenDP FFI Measurements/Transformations/Relations, the system requires that values are wrapped as an FFIObject (which ensures types compatibility). We need convenience functions to construct and access these from primitive values. These will be used a lot in FFI contexts, so we should think carefully about signatures. Perhaps something like this:

pub extern "C" fn opendp_data__new_scalar(type_args: *const c_char, val: *const c_void) -> FfiResult<*mut FfiObject> ...

(This would be analogous to the existing opendp_data__from_string() & opendp_data__to_string() functions, which should be folded into this.)

This should also include convenience wrappers in Python that automatically Python objects.

opendp / opendp Goto Github PK

opendp's People

Contributors

Stargazers

Watchers

Forkers

opendp's Issues

Dependencies

Administrative issues:

Individual Proof Issues:

Dependencies

Recommend Projects

Recommend Topics

Recommend Org