Giter VIP home page Giter VIP logo

opendp's People

Contributors

alexwhitworth avatar andrewvyrros avatar ankke avatar chikeabuah avatar christianlebeda avatar clairemckaybowen avatar ecowan avatar matchaginseng avatar mccalluc avatar michaeleliot avatar orespo avatar paulinemauryl avatar pdurbin avatar raprasad avatar shoeboxam avatar silviacasac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendp's Issues

Type checking for FFI functions with Measurement/Transformation args

Add type checking for FFI functions with Measurement/Transformation args.

FFI functions like make_chain_mt() don't currently validate the type of their Measurement or Transformation arguments. This is error-prone, because it's easy to supply a Transformation instead of a Measurement, or vice versa. (This was part of the problem in #36.) We should add some type checking like is done for arguments to measurement_invoke() and transformation_invoke().

The naive solution would be to embed the FfiMeasurement or FfiTransformation in an FFIObject, which has a type slot, but that probably won't be workable, because that'll capture the concrete type with all type args resolved. I suspect instead we'll want some way to look at the generic type Measurement<...> or Transformation<...>, not the concrete type.

End-to-end Python code for all major library APIs

We need Python code that exercises all library entry points. There's a start for this in python/test.py, but it doesn't cover all constructors.

Ideally, this would take the form of an integration test we could run CI. But something that does a minimal sanity check to make sure we haven't broken any signatures would be a good start.

Python Bindings

Python bindings for all library APIs. These should be as close as possible to idiomatic Python code. Ideally, they would be generated automatically from metadata.

  • Calling generic functions
  • Loading data
  • Invoking operations
  • Memory management

FFI Macros and Utilities

Tools to make life easier and code cleaner in FFI layer:

  • Dispatch based on type parameters
  • Marshaling data
  • Memory management

Data Model Design

High-level design for data objects consumed and produced by Measurements and Transformations:

  • Enumerate common use cases
  • Define Rust data structures
  • Prototype a few examples
  • Write design overview

Error Handling Cleanup -- Rust

Implement the strategy in #22:

  • Make a pass through public APIs, add error annotations as appropriate.
  • Audit code for existing uses of unwrap(), assert(), other things that panic, convert to proper error handling.
  • Add plumbing to expose errors at FFI layer.
  • Write unit tests to check that errors are propagated correctly.

Histogram Transformation -- Implementation

  • Implement constructor
  • Write tests
  • Add documentation

Note: This issue originally covered both category-based and stability-based histograms. In the interest of modularity, and because the proofs will presumably be separate, I've split off the stability part into #116.)

Inconsistent chain constructors

In the changes for #30, we messed up something, and now test.py is crashing:

/usr/local/bin/python3.8 /Users/av/Projects/opendp/python/test.py
Initialized OpenDP Library
"hello, world!"

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

The crash happens at line 38:

    everything = odp.core.make_chain_tt(composition, parse_dataframe)

Error Handling Strategy

Strategy for error handling in OpenDP, especially across the FFI boundary:

  • Develop general approach for signaling errors from library functions.
  • Define a way to expose this to FFI in safe manner.
  • Write some example code for how this should be applied in the library.

Untrusted mode

Provide a way to activate "untrusted" mode, where privacy guarantees are loosened, and more features are available. This could be used to enable things outside the strict OpenDP constraints:

  • newly contributed components that haven't been validated yet
  • user-supplied functions (e.g. row_transform)
  • permissive floating point calculations

Need to figure out the mechanics of this. Some things we could leverage:

  • Rust module(s)
  • Rust conditional compilation
  • Separate Rust crate
  • Python package flags that load different versions of the library

Facility for using callbacks implemented in client code

It'd be very nice to have a facility whereby functions implemented in client code (i.e., outside FFI, in Python) could be passed into the library and used as callbacks. This would allow us to support custom transformations and relations. (This would be available only in an explicit "unsafe" mode.)

Dependencies

Numerical Instability Audit

We need to audit the code for privacy issues because of numerical instability. Some of this will likely happen as a result of writing proofs for components. But we should also have a system-level view of this.

Where appropriate, we have facilities for doing arbitrary-precision math with MPFR, and some of the mechanisms make use of this via the sampler abstraction.

There will probably be a lot of individual tasks for this. We might want to fork off separate issues for the different components. For now, this issue can serve as a placeholder.

FFI constructor dispatch for Metrics & Measures

We don't have a way for FFI constructors to dispatch on different metrics. Currently, this is handled in a clumsy way by having separate entry points. (E.g., opendp_trans__make_bounded_sum_l1() & opendp_trans__make_bounded_sum_l2().) This should be cleaned up, so that FFI clients can specify the Metrics/Measures they want.

trait constructor calling convention

  • switch calling convention from individual functions to impls of constructor traits in trans.rs, meas.rs, core.rs
  • refactor trans.rs and meas.rs into folders
  • split trans.rs code between mod.rs and dataframe.rs
  • add num crate and remove OpenDPInto trait

Rationalize types for LaplaceMechanism and GaussianMechanism

LaplaceMechanism and GaussianMechanism currently support any primitive types (including integers), which is probably not what we want. We need to rationalize this. Simplest solution would be to support f64 only, but we should think this through.

Data Loading

This is a placeholder for some basic means to get data in/out of the library. Specific instances TBD.

  • Read/write CSV
  • ???

Make DistanceCast properly handle f64 -> f32

DistanceCast properly handles rounding for size change and int -> float changes. But in the corner case of f64 -> f32, it's possible that the resulting distance will be smaller.

Split combinators into separate module

The amount of code in core.rs is becoming unwieldy. It's not totally clear what's the best organization, but a first step would be to take all the combinator-related stuff (make_chain_xx, make_composition, etc) and put it into a separate top-level module. Proposed name of comb.rs.

Remove obsolete module data

Since we went with full-on generics everywhere, the ADT model in module data.rs is now obsolete. This needs a sanity check, but I believe that the entire module (Data, Form, Element, TraitObject, etc) can all be removed. Same for the parallel module in opendp-ffi (though opendp_data__from_string() & opendp_data__to_string() will need to live somewhere, see #39).

Chaining hints

Combinator functions make_chain_mt() and make_chain_tt() need an additional argument for a hint function from the framework paper. This function chooses an intermediate distance so that relations can be chained.

  • Specify APIs
  • Implement chaining logic
  • Add forward/backward map functionality
  • Add convenience constructors using stability constant
  • Update existing operations to generate relations with maps
  • Write unit tests

Library Contributor Guide

Create docs to help people developing contributions to the library.

  • Technical stuff (how the library works, how to structure code)
  • Logistical stuff (how contributions are validated)

Automated generation of FFI metadata

Create an automated mechanism that can generate FFI metadata from annotations in the code.

Currently, the Python bindings are generated from metadata describing the FFI wrappers. These metadata are contained in JSON files (bootstrap.json). This works very well, but it requires manual creation of the metadata, and duplication of information between Rust code and JSON. It'd be great to have a more robust mechanism.

This could be done with a build script in a couple of ways:

  • Parse the code and annotations in openddp-ffi to get the metadata directly.
  • Parse the code and annotations in opendp to infer the metadata. This is more work, but has better long-term potential.

This could be a first step towards fully automatic generation of everything from the core Rust functions. Issue #131 is for the fuller solution (if we get there).

Python library project structure

Organize the Python code into a rational structure for a library.

Currently, the Python wrapper code is just sitting in a bare scripts. This should be reorganized into a proper library project. Proposed layout:

opendp/
    python/
        docs/
        opendp/
            __init__.py
            opendp.py
            ...
        requirements.txt
        tests/
            ....

Basic Composition Combinator -- Implementation

  • Implement function
  • Include comments
  • Write tests

We have a simple implementation of this, but it only accepts exactly two Measurements, and constructs a function returning a 2-tuple. Now that we have the AnyXXX facilities, it should accept an arbitrary number of Measurements, and construct a function returning Vec.

Operation Design

High-level design for Measurements and Transformations from the framework paper:

  • Define Rust data structures
  • Develop strategy for constructors of specific instances
  • Prototype a few examples
  • Write design overview

RowTransform Transformation -- Implementation

Implement the RowTransform() concept from the programming framework paper.

This is a Transformation constructor that takes a user-defined function and applies it to every member of a dataset.

Dependencies

FFI Strategy

Strategy for exposing OpenDP Library functionality via FFI, so that bindings can be created for different languages:

  • Work out approach for constructors, combinators and invocation.
  • Define C-compatible data structures
  • Decide on support for generic types and functions
  • Specify policy around memory management
  • Prototype a few examples
  • Write design overview

Error Handling Glue -- Python

Implement the strategy in #22:

  • Implement a set of Python Exception classes to mirror the Rust error cases.
  • Write glue to check for errors returned through FFI, raise Python Exceptions.

Relations Design

High-level design for Privacy Relations and Stability Relations from the framework paper:

  • Define Rust data structures
  • Prototype a few examples
  • Write design overview

Library User Guide

Create docs to help people developing applications that use the library.

This is a big undertaking. Here are some initial tasks (add more once we have an outline):

FFI data constructors and accessors

When calling OpenDP FFI Measurements/Transformations/Relations, the system requires that values are wrapped as an FFIObject (which ensures types compatibility). We need convenience functions to construct and access these from primitive values. These will be used a lot in FFI contexts, so we should think carefully about signatures. Perhaps something like this:

pub extern "C" fn opendp_data__new_scalar(type_args: *const c_char, val: *const c_void) -> FfiResult<*mut FfiObject> ...

(This would be analogous to the existing opendp_data__from_string() & opendp_data__to_string() functions, which should be folded into this.)

This should also include convenience wrappers in Python that automatically Python objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.