opendp / opendp Goto Github PK
View Code? Open in Web Editor NEWThe core library of differential privacy algorithms powering the OpenDP Project.
Home Page: https://opendp.org
License: MIT License
The core library of differential privacy algorithms powering the OpenDP Project.
Home Page: https://opendp.org
License: MIT License
Add type checking for FFI functions with Measurement/Transformation args.
FFI functions like make_chain_mt()
don't currently validate the type of their Measurement
or Transformation
arguments. This is error-prone, because it's easy to supply a Transformation
instead of a Measurement
, or vice versa. (This was part of the problem in #36.) We should add some type checking like is done for arguments to measurement_invoke()
and transformation_invoke()
.
The naive solution would be to embed the FfiMeasurement
or FfiTransformation
in an FFIObject
, which has a type slot, but that probably won't be workable, because that'll capture the concrete type with all type args resolved. I suspect instead we'll want some way to look at the generic type Measurement<...>
or Transformation<...>
, not the concrete type.
We need Python code that exercises all library entry points. There's a start for this in python/test.py, but it doesn't cover all constructors.
Ideally, this would take the form of an integration test we could run CI. But something that does a minimal sanity check to make sure we haven't broken any signatures would be a good start.
Python bindings for all library APIs. These should be as close as possible to idiomatic Python code. Ideally, they would be generated automatically from metadata.
Tools to make life easier and code cleaner in FFI layer:
1/11 - Add this to the documentation site
Make the case for rust and memory safety.
Initial document: https://docs.google.com/document/d/16LFjllHI6jAtgURweasJ733X4l-XI8Fq3Ryy0b98Y0w/edit
High-level design for data objects consumed and produced by Measurements and Transformations:
Implement the strategy in #22:
Note: This issue originally covered both category-based and stability-based histograms. In the interest of modularity, and because the proofs will presumably be separate, I've split off the stability part into #116.)
In the changes for #30, we messed up something, and now test.py is crashing:
/usr/local/bin/python3.8 /Users/av/Projects/opendp/python/test.py
Initialized OpenDP Library
"hello, world!"
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
The crash happens at line 38:
everything = odp.core.make_chain_tt(composition, parse_dataframe)
Strategy for error handling in OpenDP, especially across the FFI boundary:
Provide a way to activate "untrusted" mode, where privacy guarantees are loosened, and more features are available. This could be used to enable things outside the strict OpenDP constraints:
Need to figure out the mechanics of this. Some things we could leverage:
It'd be very nice to have a facility whereby functions implemented in client code (i.e., outside FFI, in Python) could be passed into the library and used as callbacks. This would allow us to support custom transformations and relations. (This would be available only in an explicit "unsafe" mode.)
High-level design for Metrics, Measures and Distances from the framework paper:
(This will likely fall out of https://github.com/opendifferentialprivacy/OpenDP-Experimental/issues/21, but opening a separate issue just in case there are some other bits.)
We need to audit the code for privacy issues because of numerical instability. Some of this will likely happen as a result of writing proofs for components. But we should also have a system-level view of this.
Where appropriate, we have facilities for doing arbitrary-precision math with MPFR, and some of the mechanisms make use of this via the sampler abstraction.
There will probably be a lot of individual tasks for this. We might want to fork off separate issues for the different components. For now, this issue can serve as a placeholder.
We don't have a way for FFI constructors to dispatch on different metrics. Currently, this is handled in a clumsy way by having separate entry points. (E.g., opendp_trans__make_bounded_sum_l1()
& opendp_trans__make_bounded_sum_l2()
.) This should be cleaned up, so that FFI clients can specify the Metrics/Measures they want.
LaplaceMechanism and GaussianMechanism currently support any primitive types (including integers), which is probably not what we want. We need to rationalize this. Simplest solution would be to support f64 only, but we should think this through.
This is a placeholder for some basic means to get data in/out of the library. Specific instances TBD.
DistanceCast properly handles rounding for size change and int -> float changes. But in the corner case of f64 -> f32, it's possible that the resulting distance will be smaller.
The amount of code in core.rs is becoming unwieldy. It's not totally clear what's the best organization, but a first step would be to take all the combinator-related stuff (make_chain_xx, make_composition, etc) and put it into a separate top-level module. Proposed name of comb.rs.
Write latex documents and check them in on this repository next to the associated rust constructor.
We are, at minimum, using the PR review as a record of the vetting process.
https://github.com/opendp/opendp/issues?q=is%3Aissue+label%3A%22DP+Proof%22
For flexibility, we need a cast operation, to convert T -> U and Vec -> U, where T, U are primitives.
Since we went with full-on generics everywhere, the ADT model in module data.rs is now obsolete. This needs a sanity check, but I believe that the entire module (Data
, Form
, Element
, TraitObject
, etc) can all be removed. Same for the parallel module in opendp-ffi (though opendp_data__from_string()
& opendp_data__to_string()
will need to live somewhere, see #39).
Combinator functions make_chain_mt() and make_chain_tt() need an additional argument for a hint function from the framework paper. This function chooses an intermediate distance so that relations can be chained.
Create docs to help people developing contributions to the library.
Create an automated mechanism that can generate FFI metadata from annotations in the code.
Currently, the Python bindings are generated from metadata describing the FFI wrappers. These metadata are contained in JSON files (bootstrap.json
). This works very well, but it requires manual creation of the metadata, and duplication of information between Rust code and JSON. It'd be great to have a more robust mechanism.
This could be done with a build script in a couple of ways:
openddp-ffi
to get the metadata directly.opendp
to infer the metadata. This is more work, but has better long-term potential.This could be a first step towards fully automatic generation of everything from the core Rust functions. Issue #131 is for the fuller solution (if we get there).
Organize the Python code into a rational structure for a library.
Currently, the Python wrapper code is just sitting in a bare scripts. This should be reorganized into a proper library project. Proposed layout:
opendp/
python/
docs/
opendp/
__init__.py
opendp.py
...
requirements.txt
tests/
....
Meet and mark Column C on the components list:
https://docs.google.com/spreadsheets/d/132rAzbSDVCKqFZWeE-P8oOl9f23PzkvNwsrDV5LPkw4/edit#gid=0
We have a simple implementation of this, but it only accepts exactly two Measurements, and constructs a function returning a 2-tuple. Now that we have the AnyXXX facilities, it should accept an arbitrary number of Measurements, and construct a function returning Vec.
High-level design for Measurements and Transformations from the framework paper:
Implement the RowTransform() concept from the programming framework paper.
This is a Transformation constructor that takes a user-defined function and applies it to every member of a dataset.
Strategy for exposing OpenDP Library functionality via FFI, so that bindings can be created for different languages:
Implement the strategy in #22:
High-level design for Privacy Relations and Stability Relations from the framework paper:
Create docs to help people developing applications that use the library.
This is a big undertaking. Here are some initial tasks (add more once we have an outline):
When calling OpenDP FFI Measurements/Transformations/Relations, the system requires that values are wrapped as an FFIObject
(which ensures types compatibility). We need convenience functions to construct and access these from primitive values. These will be used a lot in FFI contexts, so we should think carefully about signatures. Perhaps something like this:
pub extern "C" fn opendp_data__new_scalar(type_args: *const c_char, val: *const c_void) -> FfiResult<*mut FfiObject> ...
(This would be analogous to the existing opendp_data__from_string()
& opendp_data__to_string()
functions, which should be folded into this.)
This should also include convenience wrappers in Python that automatically Python objects.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.