Giter VIP home page Giter VIP logo

differential-privacy's Introduction

Differential Privacy

Note
If you are unfamiliar with differential privacy (DP), you might want to go through "A friendly, non-technical introduction to differential privacy".

This repository contains libraries to generate ε- and (ε, δ)-differentially private statistics over datasets. It contains the following tools.

  • Privacy on Beam is an end-to-end differential privacy framework built on top of Apache Beam. It is intended to be easy to use, even by non-experts.
  • Three "DP building block" libraries, in C++, Go, and Java. These libraries implement basic noise addition primitives and differentially private aggregations. Privacy on Beam is implemented using these libraries.
  • A stochastic tester, used to help catch regressions that could make the differential privacy property no longer hold.
  • A differential privacy accounting library, used for tracking privacy budget.
  • A command line interface for running differentially private SQL queries with ZetaSQL.
  • DP Auditorium is a library for auditing differential privacy guarantees.

To get started on generating differentially private data, we recommend you follow the Privacy on Beam codelab.

Currently, the DP building block libraries support the following algorithms:

Algorithm C++ Go Java
Laplace mechanism Supported Supported Supported
Gaussian mechanism Supported Supported Supported
Count Supported Supported Supported
Sum Supported Supported Supported
Mean Supported Supported Supported
Variance Supported Supported Supported
Standard deviation Supported Supported Planned
Quantiles Supported Supported Supported
Automatic bounds approximation Supported Planned Supported
Truncated geometric thresholding Supported Supported Supported
Laplace thresholding Supported Supported Supported
Gaussian thresholding Planned Supported Supported
Pre-thresholding Supported Supported Supported

Implementations of the Laplace mechanism and the Gaussian mechanism use secure noise generation. These mechanisms can be used to perform computations that aren't covered by the algorithms implemented in our libraries.

The DP building block libraries and Privacy on Beam are suitable for research, experimental, or production use cases, while the other tools are currently experimental and subject to change.

How to Build

In order to run the differential privacy library, you need to install Bazel in version 5.3.2, if you don't have it already. Follow the instructions for your platform on the Bazel website

You also need to install Git, if you don't have it already. Follow the instructions for your platform on the Git website.

Once you've installed Bazel and Git, open a Terminal and clone the differential privacy directory into a local folder:

git clone https://github.com/google/differential-privacy.git

Navigate into the differential-privacy folder you just created, and build the differential privacy library and dependencies using Bazel (note: ... is a part of the command and not a placeholder):

To build the C++ library, run:

cd cc
bazel build ...

To build the Go library, run:

cd go
bazel build ...

To build the Java library, run:

cd java
bazel build ...

To build Privacy on Beam, run:

cd privacy-on-beam
bazel build ...

You may need to install additional dependencies when building the PostgreSQL extension, for example on Ubuntu you will need these packages:

sudo apt-get install make libreadline-dev bison flex

Caveats of the DP building block libraries

Differential privacy requires some bound on maximum number of contributions each user can make to a single aggregation. The DP building block libraries don't perform such bounding: their implementation assumes that each user contributes only a fixed number of rows to each partition. That number can be configured by the user. The library neither verifies nor enforces this limit; it is the caller's responsibility to pre-process data to enforce this.

We chose not to implement this step at the DP building block level because it requires some global operation over the data: group by user, and aggregate or subsample the contributions of each user before passing them on to the DP building block aggregators. Given scalability constraints, this pre-processing must be done by a higher-level part of the infrastructure, typically a distributed processing framework: for example, Privacy on Beam relies on Apache Beam for this operation.

For more detail about our approach to building scalable end-to-end differential privacy frameworks, we recommend reading:

  1. Differential privacy computations in data pipelines reference doc, which describes how to build such a system using any data pipeline framework (e.g. Apache Beam).
  2. Our paper about differentially private SQL, which describes such a system. Even though the interface of Privacy on Beam is different, it conceptually uses the same framework as the one described in this paper.

Known issues

Our floating-point implementations are subject to the vulnerabilities described in Casacuberta et al. "Widespread Underestimation of Sensitivity in Differentially Private Libraries and How to Fix it" (specifically the rounding, repeated rounding, and re-ordering attacks). These vulnerabilities are particularly concerning when an attacker can control some of the contents of a dataset and/or its order. Our integer implementations are not subject to the vulnerabilities described in the paper (though note that Java does not have an integer implementation).

Support

We will continue to publish updates and improvements to the library. We are happy to accept contributions to this project. Please follow our guidelines when sending pull requests. We will respond to issues filed in this project. If we intend to stop publishing improvements and responding to issues we will publish notice here at least 3 months in advance.

License

Apache License 2.0

Support Disclaimer

This is not an officially supported Google product.

Reach out

We are always keen on learning about how you use this library and what use cases it helps you to solve. We have two communication channels:

  • A public discussion group where we will also share our preliminary roadmap, updates, events, etc.

  • A private email alias at [email protected] where you can reach us directly about your use cases and what more we can do to help.

Please refrain from sending any personal identifiable information. If you wish to delete a message you've previously sent, please contact us.

Related projects

  • PyDP, a Python wrapper of our C++ DP building block library, driven by the OpenMined open-source community.
  • PipelineDP, an end-to-end differential privacy framework (similar to Privacy on Beam) that works with Apache Beam & Apache Spark in Python, co-developed by Google and OpenMined.
  • OpenDP, a community effort around tools for statistical analysis of sensitive private data.
  • TensorFlow Privacy, a library to train machine learning models with differential privacy.

differential-privacy's People

Contributors

acheam0 avatar bamnet avatar benjamindev avatar brandonedmunds2 avatar celiayz avatar dibakch avatar fbalicchia avatar kaelhawaty avatar laviniamnedelea avatar macjohnny avatar osuketh avatar rossy312 avatar whimsicottmoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

differential-privacy's Issues

Build error in Windows

Receiving following error after running "bazel build differential_privacy/…"

/differential-privacy/differential_privacy/base/testing/BUILD:38:1: C++ compilation of rule '//differential_privacy/base/testing:status_matchers' failed (Exit 2)
cl : Command line error D8021 : invalid numeric argument '/Wno-sign-compare'
INFO: Elapsed time: 137.683s, Critical Path: 20.36s
INFO: 157 processes: 157 local.
FAILED: Build did NOT complete successfully

Add CMake support to the C++ library

Compiling the C++ library currently requires the use of Bazel. Most people outside Google use CMake instead. It would be good to add support for CMake.

Compile with debug

Hello I am writing some additions to this library and would like to compile it in debug mode.

I can compile it successfully in the 'normal' mode with bazel but when i run the build with "-c dbg" then I get the following error:

`bazel build differential_privacy/... --compilation_mode=dbg
Starting local Bazel server and connecting to it...
INFO: Analyzed 65 targets (55 packages loaded, 7297 targets configured).
INFO: Found 65 targets...
INFO: Deleting stale sandbox base /home/rgentz/.cache/bazel/_bazel_rgentz/1ef760ce6fbad04dec10864523b7ab13/sandbox
ERROR: /home/rgentz/differential-privacy/differential_privacy/postgres/BUILD:30:1: C++ compilation of rule '//differential_privacy/postgres:anon_func.so' failed (Exit 1) gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g '-std=c++0x' -MD -MF ... (remaining 66 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
differential_privacy/postgres/anon_func.cc:19:10: fatal error: postgres.h: No such file or directory
19 | #include "postgres.h"
| ^~~~~~~~~~~~
compilation terminated.
INFO: Elapsed time: 26.741s, Critical Path: 1.84s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
`

I do have postgres from source installed and the postgres.h is on the system:
find / -name "postgres.h" 2>/dev/null /home/rgentz/postgresql-11.7/src/include/postgres.h /home/rgentz/.cache/bazel/_bazel_rgentz/1ef760ce6fbad04dec10864523b7ab13/execroot/com_google_differential_privacy/bazel-out/k8-fastbuild/bin/external/postgres/copy_postgres/postgres/include/server/postgres.h /home/rgentz/.cache/bazel/_bazel_rgentz/1ef760ce6fbad04dec10864523b7ab13/execroot/com_google_differential_privacy/bazel-out/k8-fastbuild/bin/external/postgres/postgres/include/server/postgres.h /home/rgentz/.cache/bazel/_bazel_rgentz/1ef760ce6fbad04dec10864523b7ab13/execroot/com_google_differential_privacy/bazel-out/k8-dbg/bin/external/postgres/copy_postgres/postgres/include/server/postgres.h /home/rgentz/.cache/bazel/_bazel_rgentz/1ef760ce6fbad04dec10864523b7ab13/execroot/com_google_differential_privacy/bazel-out/k8-dbg/bin/external/postgres/postgres/include/server/postgres.h /home/rgentz/.cache/bazel/_bazel_rgentz/1ef760ce6fbad04dec10864523b7ab13/external/postgres/src/include/postgres.h /tmp/tmp.jloOVUmkan/postgres/include/server/postgres.h /tmp/tmp.E3Da4NQGti/postgres/include/server/postgres.h /usr/local/pgsql/include/server/postgres.h

Is there some path issue? I just don't understand why it compiles successfully without debug mode...

Is snapping mechanism in LaplaceMechanism->AddNoise implemented correctly ?

Hallo,
I think I found a discrepancy between the implementation of the snapping mechanism in the method

virtual double AddNoise(double result, double privacy_budget)

of the class LaplaceMechanism in the file

differential_privacy/algorithms/numerical-mechanisms.h

and the theory given in Mironov, 2012, (which is cited in the code):

The rounding is done towards 0 while Mironov uses rounding towards closest multiple of nearest_power with ties resolved towards +infty.

This could (not very elegantly) be achieved by

virtual double AddNoise(double result, double privacy_budget) {
    CHECK_GT(privacy_budget, 0);
    // Implements the snapping mechanism defined by
    // Mironov (2012, "On Significance of the Least Significant Bits For
    // Differential Privacy").
    double noise = distro_->Sample(1.0 / privacy_budget);
    double noised_result =
        Clamp<double>(LowerBound<double>(), UpperBound<double>(), result) +
        noise;
    double nearest_power = GetNextPowerOfTwo(diversity_ / privacy_budget);
    double remainder =
        (nearest_power == 0.0) ? 0.0 : fmod(noised_result, nearest_power);
    double rounded_result = noised_result - remainder;
    if (remainder >= nearest_power / 2) {
      rounded_result += nearest_power;
    }
    if (remainder < -nearest_power / 2) {
      rounded_result -= nearest_power;
    }
    return ClampDouble<double>(LowerBound<double>(), UpperBound<double>(),
                               rounded_result);
  }

But, maybe, I oversee some detail of the implementation :-) .

Regards,

Sebastian.

Difference between anon function results and normal function results. Anon function giving 0 where as normal function result is a higher magnitude value(which is not any where close to 0)

Anon function is giving 0 where as normal function result is a higher magnitude value (for example : 20, 30, -25, -40).

Example: difference between anon function and normal function results
anon function : anon_F(D)-> 0   
normal function : F(D) -> 20 

Providing differential privacy enabled aggregated data with the above difference (in example) to an end user might mislead him/her during their analysis.

Is there a way to handle this?

When `epsilon==0`, `PrivacyLossDistribution.from_privacy_parameters()` fails.

If epsilon==0 (or small compared to value_discretization_interval) then rounded_probability_mass_function is assigned a dict with twice the same key, so the second will overwrite the first (python should probably not silently do that).

rounded_probability_mass_function = {
math.ceil(epsilon / value_discretization_interval):
(1 - delta) / (1 + math.exp(-epsilon)),
math.ceil(-epsilon / value_discretization_interval):
(1 - delta) / (1 + math.exp(epsilon))
}

Error when attempting to add to conda-forge

I'm attempting to add this as a package to conda-forge but am getting an error that I'm unsure of how to fix

I download the source from https://github.com/google/differential-privacy/archive/master.tar.gz

And a script that executes bazel build differential_privacy/...
Which returns the error

[start of symlink cycle]
/home/conda/staged-recipes/build_artifacts/google-differential-privacy_1571944966203/_build_env/share/terminfo/N/NCR260VT300WPP
[end of symlink cycle]
Internal error thrown during build. Printing stack trace: java.lang.IllegalStateException: //differential_privacy/algorithms:bounded-algorithm_test BuildConfigurationValue.Key[bb189304b1c2d885f4d7b75c97be3b24] false -> ErrorInfo{exception=com.google.devtools.build.lib.skyframe.FileSymlinkCycleException: Symlink cycle, rootCauses={FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp]}, cycles=[], isCatastrophic=false, rootCauseOfException=FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp], isDirectlyTransient=false, isTransitivelyTransient=false}
	at com.google.common.base.Preconditions.checkState(Preconditions.java:823)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.assertSaneAnalysisError(SkyframeBuildView.java:548)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:365)
	at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:368)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:212)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:121)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:143)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:253)
	at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:83)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:482)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:204)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:750)
	at com.google.devtools.build.lib.server.GrpcServerImpl.access$1600(GrpcServerImpl.java:103)
	at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:819)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

INFO: Elapsed time: 33.934s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (48 packages loaded, 1191 targets configured)
Internal error thrown during build. Printing stack trace: java.lang.IllegalStateException: //differential_privacy/algorithms:bounded-algorithm_test BuildConfigurationValue.Key[bb189304b1c2d885f4d7b75c97be3b24] false -> ErrorInfo{exception=com.google.devtools.build.lib.skyframe.FileSymlinkCycleException: Symlink cycle, rootCauses={FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp]}, cycles=[], isCatastrophic=false, rootCauseOfException=FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp], isDirectlyTransient=false, isTransitivelyTransient=false}
	at com.google.common.base.Preconditions.checkState(Preconditions.java:823)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.assertSaneAnalysisError(SkyframeBuildView.java:548)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:365)
	at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:368)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:212)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:121)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:143)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:253)
	at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:83)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:482)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:204)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:750)
	at com.google.devtools.build.lib.server.GrpcServerImpl.access$1600(GrpcServerImpl.java:103)
	at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:819)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
java.lang.IllegalStateException: //differential_privacy/algorithms:bounded-algorithm_test BuildConfigurationValue.Key[bb189304b1c2d885f4d7b75c97be3b24] false -> ErrorInfo{exception=com.google.devtools.build.lib.skyframe.FileSymlinkCycleException: Symlink cycle, rootCauses={FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp]}, cycles=[], isCatastrophic=false, rootCauseOfException=FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp], isDirectlyTransient=false, isTransitivelyTransient=false}
	at com.google.common.base.Preconditions.checkState(Preconditions.java:823)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.assertSaneAnalysisError(SkyframeBuildView.java:548)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:365)
	at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:368)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:212)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:121)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:143)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:253)
	at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:83)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:482)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:204)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:750)
	at com.google.devtools.build.lib.server.GrpcServerImpl.access$1600(GrpcServerImpl.java:103)
	at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:819)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
java.lang.IllegalStateException: //differential_privacy/algorithms:bounded-algorithm_test BuildConfigurationValue.Key[bb189304b1c2d885f4d7b75c97be3b24] false -> ErrorInfo{exception=com.google.devtools.build.lib.skyframe.FileSymlinkCycleException: Symlink cycle, rootCauses={FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp]}, cycles=[], isCatastrophic=false, rootCauseOfException=FILE:[/home/conda/.cache/bazel/_bazel_conda/09cbb6ed10102e8e294ff82d8a8f8c67/external/embedded_jdk]/[lib/terminfo/N/ncr260vt300wpp], isDirectlyTransient=false, isTransitivelyTransient=false}
	at com.google.common.base.Preconditions.checkState(Preconditions.java:823)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.assertSaneAnalysisError(SkyframeBuildView.java:548)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:365)
	at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:368)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:212)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:121)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:143)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:253)
	at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:83)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:482)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:204)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:750)
	at com.google.devtools.build.lib.server.GrpcServerImpl.access$1600(GrpcServerImpl.java:103)
	at com.google.devtools.build.lib.server.GrpcServerImpl$2.lambda$run$0(GrpcServerImpl.java:819)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
FAILED: Build did NOT complete successfully (48 packages loaded, 1191 targets configured)
Traceback (most recent call last):
  File "/home/conda/.ci_support/build_all.py", line 134, in <module>
    build_all(args.recipes_dir, args.arch)
  File "/home/conda/.ci_support/build_all.py", line 68, in build_all
    build_folders(recipes_dir, new_comp_folders, arch, channel_urls)
  File "/home/conda/.ci_support/build_all.py", line 124, in build_folders
    conda_build.api.build([recipe], config=get_config(arch, channel_urls))
  File "/opt/conda/lib/python3.7/site-packages/conda_build/api.py", line 209, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/opt/conda/lib/python3.7/site-packages/conda_build/build.py", line 2339, in build_tree
    notest=notest,
  File "/opt/conda/lib/python3.7/site-packages/conda_build/build.py", line 1487, in build
    cwd=src_dir, stats=build_stats)
  File "/opt/conda/lib/python3.7/site-packages/conda_build/utils.py", line 399, in check_call_env
    return _func_defaulting_env_to_os_environ('call', *popenargs, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/conda_build/utils.py", line 379, in _func_defaulting_env_to_os_environ
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-e', '/home/conda/staged-recipes/build_artifacts/google-differential-privacy_1571944966203/work/conda_build.sh']' returned non-zero exit status 37.

I'm not sure which is causing the error: conda-forge, bazel, or differential-privacy
Please advise if you've encountered this error before when building

Suggested README updates to cover some postgres build pitfalls

Thanks for open sourcing this library!
As a newcomer to bazel, I've spent quite a while getting it to build and then getting CREATE EXTENSION anon_func to work, and I think highlighting two things in the README would help:

In the main README:

You may need to update the paths in differential-privacy/postgres/BUILD depending on the system building the binary (eg. k8-fastbuild may need to be changed to darwin-fastbuild).

In postgres/README:

The install_extension script assumes postgres is configured to look for extensions in the directory $PG_DIR/share/extension/, and that its $libdir is $PG_DIR/lib/. If you have a pre-existing installation of postgres, your configuration may not fit these path patterns.

If CREATE EXTENSION anon_func; fails with an error such as could not open extension control file "/usr/local/share/postgresql/extension/anon_func.control": No such file or directory, you will need to move the extensions to that location manually, e.g.:

mv $PG_DIR/share/extension/anon_func* /usr/local/share/postgresql/extension/

You can then find where to move the anon_func.so file to by typing pg_config --pkglibdir, and moving it there, i.e.

mv $PG_DIR/lib/anon_func.so `pg_config  --pkglibdir`

[Alternatively you might consider generalizing the script to handle this case.]

[Question] Generate differential private release of a dataset?

Hi,

I can see the examples for different statistics queries.
I am wondering if the Google DP library could be used to generate a differential private release of a dataset. I mean to transform the original dataset to an "anonymized" one.
Is this planned for the future maybe?

Thanks,
Toni

Improving comments in privacy-critical code blocks.

The lack of comments makes this code harder to audit. Consider https://github.com/google/differential-privacy/blob/master/differential_privacy/algorithms/rand.cc

It is not clear from reading the code what SecureURBG::RefreshCache does. Is it that you are computing a whole buffer of randomness at once and then feeding it out byte-by-byte? That is not a cache, then, but a buffer.

In SecureURBG::result_type SecureURBG::operator, it is not clear why old_index is copied from current_index before the memcpy operation:

SecureURBG::result_type SecureURBG::operator()() {
  absl::WriterMutexLock lock(&mutex_);
  if (current_index_ + sizeof(result_type) > kCacheSize) {
    RefreshCache();
  }
  int old_index = current_index_;
  current_index_ += sizeof(result_type);
  result_type result;
  std::memcpy(&result, cache_ + old_index, sizeof(result_type));
  return result;
}

Why the extra copy, and not simply copy it like this:

SecureURBG::result_type SecureURBG::operator()() {
  absl::WriterMutexLock lock(&mutex_);
  if (current_index_ + sizeof(result_type) > kCacheSize) {
    RefreshCache();
  }
  result_type result;
  std::memcpy(&result, cache_ + current_index_, sizeof(result_type));
  current_index_ += sizeof(result_type);
  return result;
}

The programmer's intent is not clear from the C++ code, so it would be useful to have comments to explain it. It would also be nice to know what URBG stands for.

C++ Proposal: Remove privacy_budget parameter

This is a proposed change to the C++ building blocks library. I'm posting it here to solicit feedback and suggestions before we make a final call on implementing it.

Currently all Algorithms offer the ability to get a result while only spending part of an internal "privacy budget." This is implemented as using a user-specified fraction of the Algorithm's epsilon for each calculation, and tracking how much epsilon remains. In our experience this functionality doesn't get much use, and adds the extra complexity of tracking each internal "privacy budget." It's also just plain not that useful - anyone who wants to track overall expenditure of privacy loss budget will need to do extra work as soon as they're using more than one Algorithm.

As a simplification, we'd like to remove the ability to specify a privacy budget fraction when getting a result. Algorithms will only be able to return a single result for a set of input, and will have to be reset before being used again.

[privacy-on-beam] bazel + gazelle import problems

As a Go developer, I expect to be able to take a working directory using go.mod and run commands like the following to generate a valid BUILD.bazel file and update my WORKSPACE to pull in the correct dependencies.

bazel run //:gazelle -- update-repos -from_file=go.mod
bazel run //:gazelle -- update .

When you try this with code that depends on pbeam, you get errors when you try to bazel build your code:

$ bazel build :pipeline
DEBUG: /home/bamnet/.cache/bazel/_bazel_bamnet/4e37c13c4fe56234ff0794b9b267f671/external/bazel_gazelle/internal/go_repository.bzl:184:18: com_github_apache_beam: gazelle: /home/bamnet/.cache/bazel/_bazel_bamnet/4e37c13c4fe56234ff0794b9b267f671/external/com_github_apache_beam/model/job-management/src/main/proto: directory contains multiple proto packages. Gazelle can only generate a proto_library for one package.
gazelle: /home/bamnet/.cache/bazel/_bazel_bamnet/4e37c13c4fe56234ff0794b9b267f671/external/com_github_apache_beam/runners/google-cloud-dataflow-java/worker/windmill/src/main/proto: directory contains multiple proto packages. Gazelle can only generate a proto_library for one package.
gazelle: /home/bamnet/.cache/bazel/_bazel_bamnet/4e37c13c4fe56234ff0794b9b267f671/external/com_github_apache_beam/sdks/java/extensions/protobuf/src/test/proto: directory contains multiple proto packages. Gazelle can only generate a proto_library for one package.
ERROR: /home/bamnet/.cache/bazel/_bazel_bamnet/4e37c13c4fe56234ff0794b9b267f671/external/com_github_google_differential_privacy_privacy_on_beam/pbeam/BUILD.bazel:23:11: no such package '@com_google_go_differential_privacy//noise': The repository '@com_google_go_differential_privacy' could not be resolved and referenced by '@com_github_google_differential_privacy_privacy_on_beam//pbeam:go_default_library'
ERROR: Analysis of target ':pipeline' failed; build aborted: Analysis failed
INFO: Elapsed time: 7.471s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (34 packages loaded, 26 targets configured)

I think the problem is the reference to "'@com_google_go_differential_privacy" specified in pbeam's bazel file, but am still investigating.

How to setup bounds and epsilon in Anonymous Functions PostgreSQL Extension ?

@celiayz When I run the below example query given in the github page, I am getting the following INFO: Bin count threshold was too large to find approximate bounds. Either run over a larger dataset or decrease success_probability and try again. Returning NULL.

SELECT result.fruit, result.number_eaten
FROM (
  SELECT per_person.fruit,
    ANON_SUM(per_person.fruit_count, LN(3)/2) as number_eaten,
    ANON_COUNT(uid, LN(3)/2) as number_eaters
    FROM(
      SELECT * , ROW_NUMBER() OVER (
        PARTITION BY uid
        ORDER BY random()
      ) as row_num
      FROM (
        SELECT fruit, uid, COUNT(fruit) as fruit_count
        FROM FruitEaten
        GROUP BY fruit, uid
      ) as per_person_raw
    ) as per_person
  WHERE per_person.row_num <= 5
  GROUP BY per_person.fruit
) as result
WHERE result.number_eaters > 50;

image

Could someone give me intuition on how to set bounds and epsilon?

Thanks

Generate First Release

Hello there,
Would it be possible for you to generate a release for differential-privacy? Thanks!

Some little typo

Type In differential-privacy/cc/testing/README.md

Not bazel test testing:stochastic_test_test.cc
True bazel test testing:stochastic_tester_test.cc
:shipit:

Build error on differential-privacy/cc on ubuntu18.04

Would you have some suggestion to fix the following error?

cd cc
bazel build //...

INFO: Analyzed 56 targets (0 packages loaded, 0 targets configured).
INFO: Found 56 targets...
ERROR: /home/username/differential-privacy/cc/algorithms/BUILD:37:8: C++ compilation of rule '//algorithms:algorithm_test' failed (Exit 1) gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF ... (remaining 76 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from ./algorithms/algorithm.h:27,
from algorithms/algorithm_test.cc:17:
./algorithms/numerical-mechanisms.h: In member function 'double differential_privacy::GaussianMechanism::CalculateDelta(double, double)':
./algorithms/numerical-mechanisms.h:517:9: error: 'isinf' was not declared in this scope
if (isinf(b)) {
^~~~~
./algorithms/numerical-mechanisms.h:517:9: note: suggested alternative:
In file included from external/com_google_absl/absl/hash/internal/hash.h:24,
from external/com_google_absl/absl/hash/hash.h:73,
from external/com_google_absl/absl/container/internal/hash_function_defaults.h:55,
from external/com_google_absl/absl/container/node_hash_map.h:45,
from ./base/status.h:26,
from ./algorithms/algorithm.h:26,
from algorithms/algorithm_test.cc:17:
/usr/include/c++/8/cmath:605:5: note: 'std::isinf'
isinf(Tp x)
^~~~~
In file included from ./algorithms/bounded-standard-deviation.h:26,
from postgres/dp_func.cc:21:
./algorithms/bounded-variance.h: In instantiation of 'differential_privacy::BoundedVariance<T, >::BoundedVariance(double, T, T, double, double, std::unique_ptr<differential_privacy::LaplaceMechanism::Builder, std::default_delete<differential_privacy::LaplaceMechanism::Builder> >, std::unique_ptr<differential_privacy::NumericalMechanism>, std::unique_ptr<differential_privacy::NumericalMechanism>, std::unique_ptr<differential_privacy::NumericalMechanism>, std::unique_ptr<differential_privacy::ApproxBounds >) [with T = double; std::enable_if_t<std::is_arithmetic<Tp>::value>* = 0]':
./algorithms/bounded-variance.h:135:31: required from 'differential_privacy::base::StatusOr<std::unique_ptr<differential_privacy::BoundedVariance > > differential_privacy::BoundedVariance<T, >::Builder::BuildAlgorithm() [with T = double; std::enable_if_t<std::is_arithmetic<Tp>::value>* = 0; std::enable_if_t<std::is_arithmetic<Tp>::value> = void]'
./algorithms/bounded-variance.h:93:57: required from here
./algorithms/bounded-variance.h:481:16: warning: 'differential_privacy::BoundedVariance<double, 0>::linf_sensitivity
' will be initialized after [-Wreorder]
const double linf_sensitivity
;
^~~~~~~~~~~~~~~~~
./algorithms/bounded-variance.h:479:46: warning: 'std::unique_ptr<differential_privacy::LaplaceMechanism::Builder, std::default_delete<differential_privacy::LaplaceMechanism::Builder> > differential_privacy::BoundedVariance<double, 0>::mechanism_builder
' [-Wreorder]
std::unique_ptrLaplaceMechanism::Builder mechanism_builder;
^~~~~~~~~~~~~~~~~~
./algorithms/bounded-variance.h:276:3: warning: when initialized here [-Wreorder]
BoundedVariance(const double epsilon, const T lower, const T upper,
^~~~~~~~~~~~~~~
In file included from postgres/dp_func.cc:20:
./algorithms/bounded-mean.h: In instantiation of 'differential_privacy::BoundedMean<T, >::BoundedMean(double, T, T, double, double, std::unique_ptr<differential_privacy::LaplaceMechanism::Builder, std::default_delete<differential_privacy::LaplaceMechanism::Builder> >, std::unique_ptr<differential_privacy::NumericalMechanism>, std::unique_ptr<differential_privacy::NumericalMechanism>, std::unique_ptr<differential_privacy::ApproxBounds >) [with T = double; std::enable_if_t<std::is_arithmetic<Tp>::value>* = 0]':
./algorithms/bounded-mean.h:107:11: required from 'differential_privacy::base::StatusOr<std::unique_ptr<differential_privacy::BoundedMean > > differential_privacy::BoundedMean<T, >::Builder::BuildAlgorithm() [with T = double; std::enable_if_t<std::is_arithmetic<Tp>::value>* = 0; std::enable_if_t<std::is_arithmetic<Tp>::value> = void]'
./algorithms/bounded-mean.h:72:53: required from here
./algorithms/bounded-mean.h:325:16: warning: 'differential_privacy::BoundedMean<double, 0>::linf_sensitivity
' will be initialized after [-Wreorder]
const double linf_sensitivity
;
^~~~~~~~~~~~~~~~~
./algorithms/bounded-mean.h:323:46: warning: 'std::unique_ptr<differential_privacy::LaplaceMechanism::Builder, std::default_delete<differential_privacy::LaplaceMechanism::Builder> > differential_privacy::BoundedMean<double, 0>::mechanism_builder
' [-Wreorder]
std::unique_ptrLaplaceMechanism::Builder mechanism_builder;
^~~~~~~~~~~~~~~~~~
./algorithms/bounded-mean.h:210:3: warning: when initialized here [-Wreorder]
BoundedMean(const double epsilon, T lower, T upper,
^~~~~~~~~~~
In file included from postgres/dp_func.cc:22:
./algorithms/bounded-sum.h: In instantiation of 'differential_privacy::BoundedSum<T, >::BoundedSum(double, T, T, double, double, std::unique_ptr<differential_privacy::LaplaceMechanism::Builder, std::default_delete<differential_privacy::LaplaceMechanism::Builder> >, std::unique_ptr<differential_privacy::NumericalMechanism>, std::unique_ptr<differential_privacy::ApproxBounds >) [with T = double; std::enable_if_t<std::is_arithmetic<Tp>::value>* = 0]':
./algorithms/bounded-sum.h:82:11: required from 'differential_privacy::base::StatusOr<std::unique_ptr<differential_privacy::BoundedSum > > differential_privacy::BoundedSum<T, >::Builder::BuildAlgorithm() [with T = double; std::enable_if_t<std::is_arithmetic<Tp>::value>* = 0; std::enable_if_t<std::is_arithmetic<Tp>::value> = void]'
./algorithms/bounded-sum.h:60:52: required from here
./algorithms/bounded-sum.h:335:16: warning: 'differential_privacy::BoundedSum<double, 0>::linf_sensitivity
' will be initialized after [-Wreorder]
const double linf_sensitivity
;
^~~~~~~~~~~~~~~~~
./algorithms/bounded-sum.h:333:46: warning: 'std::unique_ptr<differential_privacy::LaplaceMechanism::Builder, std::default_delete<differential_privacy::LaplaceMechanism::Builder> > differential_privacy::BoundedSum<double, 0>::mechanism_builder
' [-Wreorder]
std::unique_ptrLaplaceMechanism::Builder mechanism_builder;
^~~~~~~~~~~~~~~~~~
./algorithms/bounded-sum.h:199:3: warning: when initialized here [-Wreorder]
BoundedSum(double epsilon, T lower, T upper, const double l0_sensitivity,
^~~~~~~~~~
INFO: Elapsed time: 3.807s, Critical Path: 3.63s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

Use DP library in make project

Hello,
I am sorry for bothering you with this question but I have been struggling for the past days and have not found a solution yet.

I want to use this differential privacy C library in a larger project which relies on make.

The first idea was to create a .so or .a file which are mostly self containing. But apparently this is not possible as all the important code is contained inside .h files. Those do not produce .o files, so from my understanding no .so or .a files can be produced.
Simply copying all files from the library to the new make projects and including them does not work due to the dependencies, e.g. abseil and protobuf.

Does anyone have a suggestion on how this could (or should) be done? I do not want to switch my whole project to Bazel just because a single dependency relies on it.

Edit: Turns out it can be included with sources just like any other C++ third party library. I really got myself confused with trying to create an .so file, which is not possible.

Build failure on Arch Linux

I've followed build instructions, and encountered a build failure.

~/differential-privacy/cc main ❯ bazel build "..."                                                            13:08:04
INFO: Analyzed 57 targets (58 packages loaded, 7301 targets configured).
INFO: Found 57 targets...
INFO: From Compiling com_google_protobuf/src/google/protobuf/generated_message_reflection.cc [for host]:
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc: In member function 'void google::protobuf::Reflection::SwapOneofField(google::protobuf::Message*, google::protobuf::Message*, const google::protobuf::OneofDescriptor*) const':
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:2131:37: warning: 'temp_float' may be used uninitialized in this function [-Wmaybe-uninitialized]
 2131 |   *MutableRaw<Type>(message, field) = value;
      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:494:9: note: 'temp_float' was declared here
  494 |   float temp_float;
      |         ^~~~~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:2131:37: warning: 'temp_uint64' may be used uninitialized in this function [-Wmaybe-uninitialized]
 2131 |   *MutableRaw<Type>(message, field) = value;
      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:493:10: note: 'temp_uint64' was declared here
  493 |   uint64 temp_uint64;
      |          ^~~~~~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:2131:37: warning: 'temp_uint32' may be used uninitialized in this function [-Wmaybe-uninitialized]
 2131 |   *MutableRaw<Type>(message, field) = value;
      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:492:10: note: 'temp_uint32' was declared here
  492 |   uint32 temp_uint32;
      |          ^~~~~~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:2131:37: warning: 'temp_int64' may be used uninitialized in this function [-Wmaybe-uninitialized]
 2131 |   *MutableRaw<Type>(message, field) = value;
      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
external/com_google_protobuf/src/google/protobuf/generated_message_reflection.cc:491:9: note: 'temp_int64' was declared here
  491 |   int64 temp_int64;
      |         ^~~~~~~~~~
ERROR: /home/shpark/differential-privacy/cc/algorithms/BUILD:294:11: C++ compilation of rule '//algorithms:util' failed (Exit 1): gcc failed: error executing command /opt/cuda/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF ... (remaining 22 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox gcc failed: error executing command /opt/cuda/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-std=c++0x' -MD -MF ... (remaining 22 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
algorithms/util.cc: In function 'absl::lts_2020_09_23::Status differential_privacy::ValidateIsSet(std::optional<double>, absl::lts_2020_09_23::string_view, absl::lts_2020_09_23::StatusCode)':
algorithms/util.cc:125:7: error: 'isnan' was not declared in this scope; did you mean 'std::isnan'?
  125 |   if (isnan(d)) {
      |       ^~~~~
      |       std::isnan
In file included from ./algorithms/util.h:21,
                 from algorithms/util.cc:17:
/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/cmath:632:5: note: 'std::isnan' declared here
  632 |     isnan(_Tp __x)
      |     ^~~~~
INFO: Elapsed time: 100.222s, Critical Path: 18.31s
INFO: 595 processes: 107 internal, 488 linux-sandbox.
FAILED: Build did NOT complete successfully

GCC version is 10.2.0, and bazel version is 3.7.0.

After replacing isnan with std::isnan, build succeeded without problems.

JAR release to Maven Central

Hello,

It would be nice if the Java library was available for download from Maven Central. Using the library currently requires downloading the source code and building with bazel. Having a JAR on Maven Central would make it so much easier to install.

ctx has no default value error

I tried to build differential_privacy locally but received the following error:

INFO: Invocation ID: cb40f9d2-f60a-4725-bce4-9ec138569042
ERROR: /private/var/tmp/_bazel_mwilson/1c7c4b22e8073d90fabc5b69394571b5/external/postgres/BUILD.bazel:30:1: in configure_make rule @postgres//:postgres:
Traceback (most recent call last):
	File "/private/var/tmp/_bazel_mwilson/1c7c4b22e8073d90fabc5b69394571b5/external/postgres/BUILD.bazel", line 30
		configure_make(name = 'postgres')
	File "/private/var/tmp/_bazel_mwilson/1c7c4b22e8073d90fabc5b69394571b5/external/rules_foreign_cc/tools/build_defs/configure.bzl", line 29, in _configure_make
		cc_external_rule_impl(ctx, attrs)
	File "/private/var/tmp/_bazel_mwilson/1c7c4b22e8073d90fabc5b69394571b5/external/rules_foreign_cc/tools/build_defs/framework.bzl", line 209, in cc_external_rule_impl
		_define_out_cc_info(ctx, attrs, inputs, outputs)
	File "/private/var/tmp/_bazel_mwilson/1c7c4b22e8073d90fabc5b69394571b5/external/rules_foreign_cc/tools/build_defs/framework.bzl", line 636, in _define_out_cc_info
		cc_common.create_compilation_context(headers = depset([outputs.out_in...]), <4 more arguments>)
parameter 'ctx' has no default value, in method call create_compilation_context(depset headers, depset system_includes, depset includes, depset quote_includes, depset defines) of 'cc_common'
ERROR: Analysis of target '//differential_privacy/postgres:anon_func.so' failed; build aborted: Analysis of target '@postgres//:postgres' failed; build aborted
INFO: Elapsed time: 0.200s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)

Steps to reproduce:
Download .zip from master branch
In parent dir, $ bazel build differential_privacy/...

Other info:

$ bazel version
INFO: Invocation ID: 8616bdf6-873e-4dcc-bfb4-033604013b14
Build label: 0.20.0- (@non-git)
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Sun Dec 2 07:51:13 2018 (1543737073)
Build timestamp: 1543737073
Build timestamp as int: 1543737073

Broken links to "Privacy Loss Distribution tool"

Dear Google differential privacy team,

I just realized that the link in the Readme.md to your "Privacy Loss Distribution tool" repository is broken. The current link to that repository is

https://github.com/google/differential-privacy/tree/main/python/dp_accounting

I believe the problems stems from the library having been shifted to a new path, as the same inconsistency occurs in the Google blog entry

https://opensource.googleblog.com/2020/06/expanding-our-differential-privacy.html

and the PDF document

https://raw.githubusercontent.com/google/differential-privacy/main/common_docs/Privacy_Loss_Distributions.pdf

It would be great could fix that.

Best wishes,
Esfandiar

Expose RNG or Seeded mechanisms outside of `-testing`

Problem Description

Downstream libraries or academic researchers using differential-privacy may desire to implement deterministic tests or substitute an alternative source of randomness, e.g., external random source.

However, RNG and seed-setting only appears to be implemented within algorithms::numerical-mechanisms-testing. These classes are not designed for external use and cannot be linked against in some contexts (e.g., bzl testonly).

Proposed Solution

  1. Create a new RNGNumericalMechanism abstract class and non-testing RNGNumericalMechanismBuilder
  2. Implement non-testing RNGXYZMechanism and, optionally, SeededXYZMechanism as convenience
  3. Re-implement test SeededXYZMechanism classes with non-testing mechanisms
  4. Optionally implement warning/notice for use
  5. Implement safe downstream deterministic test cases, e.g., in PyDP

Alternative

Rely on asymptotic test cases or non-reproducible examples/research.

Out of memory with ZetaSQL

Hello all,

I'm just playing with the DP support in ZetaSQL to prototype a kind of wrapper to a SQL data warehouse with DP.

I'm quickly hitting out of memory errors with datasets of just one column and more than 200k rows.

ERROR: RESOURCE_EXHAUSTED: Out of memory: requested 752 bytes but only 27 are available out of a total of 134217728

I'm executing zetasql under a docker container with 16gb of RAM. Is there some configuration I can tune to allow it to use more RAM, it seems to be set a very small limit.

Thank you.

Support bazel 4.x.x

In #70, bazel was pinned to version 3.7.2 because of challenges with some dependencies.

The recently released fully-homomorphic-encryption library requires bazel 4.0.0, which means I now need two competing bazel installations to experiment with Google's privacy building blocks. It would be great if this differential-privacy library could catch up so I could blow away the old bazel version.

Confusing fragment in the java example

In the java example it reads:

Next, we will demonstrate how to use the library in scenarios where:

  • (...)
  • Visitors can contribute to a partition multiple times.

I believe until the end it is assumed that each visitor contributes at most once to a partition (.maxContributionsPerPartition is never used in the example).

Support for RDRAND

It appears that this project uses openssl/rand.h as its randomness source. Unfortunately, openssl's random number generator is frequently not sufficient for providing privacy protection many statistics. In our analysis, we have found that the only secure way to get private random numbers is to call the RDRAND instruction directly, or to use a pluggable random number generator that supports RDRAND.

You can read more about the issue of OpenSSL and RDRAND here.

So it would be useful if you could support RDRAN directly.

ps: I just discovered this project. Congrats on putting it out!

Postgres install_extension.sh not working

Hi, I tried to follow the steps in cc/postgres but the install_extension.sh script doesn't seem to be working. I think this is because of the wrong DP_DIR (was the path differential_privacy/postgres before?).

I currently get this error

$ ./cc/postgres/install_extension.sh
Currently set postgres directory: /usr/local/pgsql
ERROR: Skipping 'differential_privacy/postgres:anon_func.so': no such package 'differential_privacy/postgres': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /home/anubhav/cs860/differential-privacy/differential_privacy/postgres
WARNING: Target pattern parsing failed.
ERROR: no such package 'differential_privacy/postgres': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /home/anubhav/cs860/differential-privacy/differential_privacy/postgres
INFO: Elapsed time: 0.094s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
/usr/bin/install: cannot stat 'bazel-bin/differential_privacy/postgres/anon_func.so': No such file or directory
/usr/bin/install: cannot stat 'differential_privacy/postgres/anon_func.control': No such file or directory
/usr/bin/install: cannot stat 'differential_privacy/postgres/anon_func--1.0.0.sql': No such file or directory

I tried changing DP_DIR to cc/postgres and now I get the following error, which I don't know how to resolve.

$ ./cc/postgres/install_extension.sh
Currently set postgres directory: /usr/local/pgsql
ERROR: /home/anubhav/cs860/differential-privacy/cc/postgres/BUILD:30:10: no such package '@postgres//': The repository '@postgres' could not be resolved and referenced by '//cc/postgres:anon_func.so'
ERROR: Analysis of target '//cc/postgres:anon_func.so' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.098s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
/usr/bin/install: cannot stat 'bazel-bin/cc/postgres/anon_func.so': No such file or directory
/usr/bin/install: cannot create regular file '/usr/local/pgsql/share/extension/anon_func.control': Permission denied
/usr/bin/install: cannot create regular file '/usr/local/pgsql/share/extension/anon_func--1.0.0.sql': Permission denied

Build error on macos

I get the following error when building for macOS Mojave

error: use of undeclared identifier 'SYNC_FILE_RANGE_WRITE'
        (void) sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WRITE);
                                         ^
1 warning and 1 error generated.
make[2]: *** [file_utils.o] Error 1
make[1]: *** [all-common-recurse] Error 2
make: *** [all-src-recurse] Error 2

_____ END BUILD LOGS _____
rules_foreign_cc: Build script location: bazel-out/darwin-fastbuild/bin/external/postgres/postgres/logs/Configure_script.sh
rules_foreign_cc: Build log location: bazel-out/darwin-fastbuild/bin/external/postgres/postgres/logs/Configure.log

INFO: Elapsed time: 58.778s, Critical Path: 57.91s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

Any ideas?

Bounded Functions fail when large dataset given

When I use BoundedMean<int> and pass a large data to the function, due to large data, the int overflows and gives the output as the lowerbound.

How to reproduce:

#include <vector>
#include <iostream>
#include "absl/flags/flag.h"
#include "absl/strings/str_format.h"
#include "algorithms/util.h"
#include "algorithms/bounded-mean.h"
#include "proto/util.h"
#include "base/statusor.h"
#include "proto/confidence-interval.pb.h"
#include "proto/data.pb.h"
using absl::PrintF;
using differential_privacy::GetValue;
namespace dp=differential_privacy;
int main(int argc, char **argv) {
  auto mean_algorithm = dp::BoundedMean<int>::Builder().SetEpsilon(1.0).SetLower(0).SetUpper(10).Build().ValueOrDie();
  for (const int v : std::vector<int>(600000000, 5)) {
    mean_algorithm->AddEntry(v);
  }
  std::cout<< dp::GetValue<double>(mean_algorithm->PartialResult().ValueOrDie())<<std::endl;
}

Possible fix:
Use SafeAdd() when adding each element to the vector and whenever it throws an error, generate an error.

Can not build the c++ library

Hi,
I am trying to build the C++ library on Linux Arch and I keep encountering the same problem over and over again. I attached the log file, the error occurs at line 39809, any ideas why this happens? Thanks in advance for the help.

Log.txt

Windows - Cannot open include file: 'libgen.h': No such file or directory

I am using windows 10. I got the following error :
ERROR: C:/differentialprivacy/differential-privacy/differential_privacy/base/BUILD:37:1: C++ compilation of rule '//differential_privacy/base:logging' failed (Exit 2)
cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
differential_privacy/base/logging.cc(22): fatal error C1083: Cannot open include file: 'libgen.h': No such file or directory

C++ Proposal: Return doubles

This is a proposed change to the C++ building blocks library. I'm posting it here to solicit feedback and suggestions before we make a final call on implementing it.

Currently all our Algorithms return their results in the form out Output protocol buffers. Most users use our custom utility functions to retrieve the result. Output protocol buffers can contain any one of a number of different data types, which are represented as different fields.

We've seen this lead to a lot of confusion. Some Algorithms don't return the same type as their input (e.g. Count), and if you try to fetch the wrong type of result you'll get a default value rather than an error. "Why does the algorithm always return 0?" is a frequent question for users of Count.

We'd like to stop returning multiple types and return only doubles instead. We'll still use the proto structure (the ability to return more than one double is useful in, e.g. ApproxBounds), but will replace the inner Value proto (which can hold multiple types) with a double. We're doing all the math as doubles anyway, so there's no precision loss. If you want your differentially private count to be an integer rather than a double, you can cast it yourself rather than having us do it under the hood. We'll also modify the GetValue<T> methods to be simple casts, so if you're currently doing GetValue<T>(algorithm.PartialResult()) your code will keep working without any changes. If you are handling the Output protos yourself, though, this will be a breaking change for you.

addGaussianInt64 from scale for C++

Dear privacy team,

I need a function like addGaussianInt64, but in C/C++, for use in a version of DPSGD).

Ideally the interface would expose an additive mechanism taking only an input value and scale at execution, giving a noised value and epsilon (to track with an accountant) as output. L2 sensitivity or stddev and delta are known at initialization.

The closest I could find was AddInt64Noise, but this takes the privacy_budget as an argument.

Is there a way around this? If not, I'd be happy to do a port of the Go function, if it is wanted for the C++ library. If that is the case, should it be integrated in the existing mechanism builder or as a seperate one?

I think it might be useful as a building block for iterated training procedures, but the above would be quite specific. It could be extended to double types and other distributions later though.

In any case, thanks for this amazing library! I learned a lot already from reading the code.

UPDATE: I started work on the wrapper, once it's presentable we can discuss whether generalizing it would make sense :)

java version missing class

In the DpPreconditions.java file, it import the following class:
import com.google.differentialprivacy.SummaryOuterClass.MechanismType;

But I can't find it (MechanismType.java) in this repository.

Postgres install not working - part 2

I'm getting the following error after following the documentation explicitly. I'm not a Bazel expert, so a little more information would be helpful if this traces back to a Bazel issue. Thank you.

On issue I think I'm running into is the Postgres 11.x source apparently points to v12 regardless of how I try and set it through ./configure. I've tried to symlink these but still the same error. Thank you in advance.

PKGLIBDIR = /usr/lib/postgresql/12/lib
SHAREDIR = /usr/share/postgresql/12

ERROR: /root/.cache/bazel/_bazel_root/30b75978bb371428f4a3fb652d2a1ed8/external/com_google_protobuf/BUILD:1006:21: in proto_lang_toolchain rule @com_google_protobuf//:cc_toolchain: '@com_google_protobuf//:cc_toolchain' does not have mandatory provider 'ProtoInfo'.
ERROR: Analysis of target '//postgres:anon_func.so' failed; build aborted: Analysis of target '@com_google_protobuf//:cc_toolchain' failed

Set User Contribution for Mean

Hello,
I am trying to calculate a Differentially Private Mean over a data set where each user contributes about 10 entries.
According to the paper referenced in the Readme this library supports this operation.

I found the function SetMaxContributionsPerPartition() in algrorithm.h, but from the docs and the comments it is not clear to me if this functions does what I need.

It would be great, if you could help me with this.

ANON_AVG has a strong bias in some situation

Hallo,
I tested the DP aggregation functions of the postgres plugin on a table with uniformly distributed numbers of type 0.00, 0.01, ..., 9.99, 10.00.

In the following situation I get results for ANON_AVG which are obviously not correct:

dp_test=> select
        avg(a),
        anon_avg(a),
        anon_avg_with_bounds(a, 0.0, 10.0),
        anon_sum(a) / anon_count(a) as anan_avg_manual,
        count(a),
        anon_count(a),
        sum(a),
        anon_sum(a),
        anon_ntile(a, 0.5, 0.0, 10.0)
from
        A
where
        b between 3.00 and 3.01
;
        avg         |     anon_avg     | anon_avg_with_bounds | anan_avg_manual  | count | anon_count |   sum    | anon_sum |    anon_ntile    
--------------------+------------------+----------------------+------------------+-------+------------+----------+----------+------------------
 5.1141938674579624 | 4.66269841269841 |     5.11089108910891 | 5.11177052423343 |  2022 |       2022 | 10340.90 |    10336 | 5.13447173044956
(1 row)

Obviously, the number 4.66269841269841 for the anon_avg is much to small, compared with the non anonymized values as well as the values of ana_avg_with_bounds and anon_sum() / anon_sum().
I had this strong bias in many repeats.

Strangly, for bigger sample, this "outlier" does not appear:

dp_test=> select
        avg(a),
        anon_avg(a),
        anon_avg_with_bounds(a, 0.0, 10.0),
        anon_sum(a) / anon_count(a) as anan_avg_manual,
        count(a),
        anon_count(a),
        sum(a),
        anon_sum(a),
        anon_ntile(a, 0.5, 0.0, 10.0)
from
        A
where
        b between 3.00 and 3.10
;

        avg         |     anon_avg     | anon_avg_with_bounds | anan_avg_manual  | count | anon_count |   sum    | anon_sum |    anon_ntile    
--------------------+------------------+----------------------+------------------+-------+------------+----------+----------+------------------
 5.0534345859108116 | 5.05230652013299 |     5.05318559556787 | 5.05014313417675 | 10831 |      10829 | 54733.75 |    54688 | 5.08097481360734
(1 row)

Strange.

Regards,
Sebastian.

[privacy-on-beam] Broken go.mod

Currently it's only possible to use the pbeam package via Bazel. The go.mod file is broken and generates errors when you import github.com/google/differential-privacy/privacy-on-beam into a codebase that uses native go modules.

$ go run main.go 
go: finding module for package github.com/google/differential-privacy/privacy-on-beam/pbeam
go: found github.com/google/differential-privacy/privacy-on-beam/pbeam in github.com/google/differential-privacy/privacy-on-beam v0.0.0-20200713153156-006ab1f8f903
go: github.com/google/differential-privacy/privacy-on-beam/pbeam: github.com/google/differential-privacy/[email protected]: parsing go.mod: go.mod:8: require github.com/apache/beam: version "v2.22.0" invalid: should be v0 or v1, not v2

[privacy-on-beam] codelab uses 2 packages

Currently the privacy-on-beam codelab uses two different go packages in the same folder which is valid syntax using Bazel, but not native go commands. As a result, you can bazel run the codelab, but not go run it.

Have we considered condensing the codelab into 1 package (or splitting it into 2 folders) so it can run using native go commands?

I'm not suggesting to remove Bazel entirely, but to sure the barrier of entry is as low as possible for people who just want to run a codelab and see what happens.

Unable to build on Fedora 34

I'm in the process of building the cpp library for DP building blocks and get the following errors:

NFO: Analyzed 80 targets (75 packages loaded, 19049 targets configured).
INFO: Found 80 targets...
ERROR: /home/mikerah/.cache/bazel/_bazel_mikerah/dbaf4a023d8e28c5039358720ba4a0cb/external/zlib/BUILD.bazel:31:11: Compiling inffast.c failed: (Exit 1): gcc failed: error executing command /usr/lib64/ccache/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 25 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
ccache: error: Failed to create temporary file for /run/user/1000/ccache-tmp/tmp.cpp_stdout.FNkvly: Read-only file system
INFO: Elapsed time: 50.929s, Critical Path: 0.40s
INFO: 18 processes: 16 internal, 2 linux-sandbox.
FAILED: Build did NOT complete successfully

I previously installed bazelisk and am able to run bazel.

Removing DeltaForThreshold functions from Go

We are considering removing DeltaForThreshold functions from laplace and gaussian noise.Noise implementations. Please let us know if you are using those or are interested in using those.

postgres: "anon_count" crashes on emtpy subquery

Hallo,
connection is lost when (e.g.) "ANON_COUNT" applied to a subquery which yields no resuls:

dp_test=> select ANON_COUNT(ID) from ( select ID from  ( select 1 as ID ) as A where ID = 2 ) as A;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 

in squirrel (jdbc):

Error: An I/O error occurred while sending to the backend.
SQLState:  08006
ErrorCode: 0

In my opinion, this violates the basic principle of differential privacy : You find out that no entry with given signature exists.

Regards,
Sebastian.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.