Giter VIP home page Giter VIP logo

privatekube's Introduction

PrivateKube

PrivateKube is an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. A description of the project can be found on our webpage and in our OSDI'21 paper, titled Privacy Budget Scheduling (PDF locally available here and extended version available on arXiv).

Repo structure

This repository contains the artifact release for the OSDI paper:

  • system: The PrivateKube system, which implements the privacy resource and a new scheduling algorithm for it, called Dominant Privacy Fairness (DPF).
  • privatekube: A Python client for interaction with the PrivateKube system and performing macrobenchmark evaluation.
  • simulator: A simulator for microbenchmarking privacy scheduling algorithms in tightly controlled settings.
  • examples: Usage examples for various components, please refer its README for details.
  • evaluation: Scripts to reproduce the macrobenchmark and microbenchmark evaluation results from our paper.

Instruction structure

1. Getting started with PrivateKube

This section explains how to install the system and walks through a simple example of interaction with the privacy resource. It should take less than 30 mins to complete.

1.1 Requirements

PrivateKube needs a Kubernetes cluster to run. If you don't have a cluster, you can install a lightweight Microk8s cluster on a decent laptop. Kubeflow requires more resources but it is not required in this section.

Below are the instructions to install and configure a lightweight cluster on Ubuntu. For other platforms, see https://microk8s.io/.

sudo snap install microk8s --classic

Check that it is running:

microk8s status --wait-ready

You can add your user to the microk8s group if you don't want to type sudo for every command (you should log out and log in again after this command):

sudo usermod -a -G microk8s $USER

mkdir ~/.kube

sudo chown -f -R $USER ~/.kube

(You can learn more about how to use Microk8s without sudo here)

You can now start and stop your cluster with:

microk8s start 

microk8s stop

Export your configuration:

microk8s config > ~/.kube/config

Declare an alias to use kubectl (you can add this line to your .bash_profile or equivalent):

alias kubectl=microk8s.kubectl

Check that you can control your cluster:

kubectl get pods -A

1.2. Deploying PrivateKube

Download the code

Clone this repository on your machine. Our scripts will only affect this repository (e.g. dataset, logs, etc.) and your cluster, not the rest of your machine.

git clone https://github.com/columbia/PrivateKube.git

Enter the repository:

cd PrivateKube

All the other instructions in this file have to be run from this PrivateKube directory, unless specified otherwise.

Create a Python environment

Create a new virtual environment to interact with PrivateKube, for instance with:

conda create -n privatekube python=3.8

conda activate privatekube

Install the dependencies:

pip install -r privatekube/requirements.txt

Install the PrivateKube package:

pip install -e privatekube

Deploy PrivateKube to your cluster

You can deploy PrivateKube in one line and directly by running:

source system/deploy.sh

If you prefer to understand what is going on, you can run the following commands one by one:

First, let's create a clean namespace to separate PrivateKube from the rest of the cluster:

kubectl create ns privatekube

Then, create the custom resources:

kubectl apply -f system/privacyresource/artifacts/privacy-budget-claim.yaml

kubectl apply -f system/privacyresource/artifacts/private-data-block.yaml

You can now interact with the privacy resource like with any other resource (e.g. pods). pb is a short name for private data block, and pbc stands for privacy claim. You can list blocks and see how much budget they have with: kubectl get pb -A. So far, there are no blocks nor claims, but in the next section (1.3.) we will add some.

We already compiled the controllers and the scheduler and prepared a Kubernetes deployment that will pull them from DockerHub. Launch the privacy controllers and the scheduler:

kubectl apply -f system/dpfscheduler/manifests/cluster-role-binding.yaml

kubectl apply -f  system/dpfscheduler/manifests/scheduler.yaml

There are additional instructions in the system directory if you want to modify the scheduler or run it locally.

1.3. Hello World

Open a first terminal. We are going to monitor the logs of the scheduler to see it in action. Find the scheduler pod with:

kubectl get pods -A | grep scheduler

Then, in the same terminal, monitor the logs of the scheduler with something similar to:

kubectl logs --follow dpf-scheduler-5fb6886497-w7x49 -n privatekube 

(alternatively, you can directly use: kubectl logs --follow "$(kubectl get pods -n privatekube | grep scheduler | awk -F ' ' '{print $1}')" -n privatekube)

Open another terminal. We are going to create a block and a claim and see how they are being scheduled.

Create a new namespace for this example:

kubectl create ns privacy-example

Check that there are no datablocks or claims:

kubectl get pb -A

Add a first datablock:

kubectl apply -f examples/privacyresource/dpf-base/add-block.yaml

List the datablocks to see if you can see your new block:

kubectl get pb --namespace=privacy-example

Check the initial budget of your block:

kubectl describe pb/block-1 --namespace=privacy-example

Add a privacy claim:

kubectl apply -f examples/privacyresource/dpf-base/add-claim-1.yaml

Describe the claim:

kubectl describe pbc/claim-1 --namespace=privacy-example

On your first terminal, you should see that the scheduler detected the claim and is trying to allocate it. Wait a bit, and check the status of the claim again to check if it has been allocated. You can also check the status of the block again.

Finally, clean up:

kubectl delete -f examples/privacyresource/dpf-base/add-claim-1.yaml 
kubectl delete -f examples/privacyresource/dpf-base/add-block.yaml
kubectl delete namespace privacy-example

We now have a proper abstraction to manage privacy as a native Kubernetes resource. The next section will provide an end-to-end example for how to interact with the privacy resource through a real machine learning pipeline. You can also refer to evaluation/macrobenchmark to reproduce part of our evaluation of this resource and the DPF algorithm we developed for it.

1.4. Example usage in a DP ML pipeline

The examples/pipeline directory contains a step-by-step guide to build a DP ML pipeline with PrivateKube.

2. Getting started with the simulator

This simulator is used for prototyping and microbenchmark evaluation of privacy budget scheduling algorithms. It supports controlled evaluation of DPF against baseline algorithms, including round-robin and first-come-first-serve.

2.1 Setup

Setup a Python environment

Install Conda, create and activate an isolated Python environment "ae".

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda/bin/conda shell.bash hook)"
conda init
conda create -n ae  -c conda-forge pypy3.6 pip python=3.6 seaborn notebook -y
conda activate ae

Installation from source

Install a Python package called dpsched via

cd ./simulator
pip install -r ./requirements.txt
pip install .[plot]

2.2 Examples

The minimal simulation example

examples/simulator/minimal_example.py gives a quick start. There are two key concepts in the simulation program:

  1. The simulation model: This implements how different components in the systems behave and interact with each other. One can import it via from dpsched import Top
  2. The configuration dictionary: a dictionary that specifies many aspects of the simulation behavior. for configuration details, please refer to the comments in minimal_example.py.

Basically, there are two steps in minimal_example.py.

  1. Preparing the config dictionary
  2. Calling simulate(config, Top), where config is the config dict and Top is the simulation model.

To run the minimal example:

cd ./examples/simulator
python ./minimal_example.py

or, replace CPython with PyPy for better performance:

cd ./examples/simulator
pypy ./minimal_example.py

The simulation program saves experiment results in a workspace specified by a config dictionary. By default, it is saved under ./examples/exp_results/some_work_space_name.

How to analyze simulation results

dpsched.analysis contains modules for collecting experiment result from workspace directory and plotting various figures. evaluation/microbenchmark/microbenchmark_figures_single_block.ipynb gives examples on how to use the dpsched.analysis module with detailed comments.

2.3 How to reproduce microbenchmark evaluation

Instructions and code for how to use the simulator to reproduce the microbenchmark results in the PrivateKube paper are in evaluation/microbenchmark/README.md.

privatekube's People

Contributors

roxanageambasu avatar tholop avatar

Stargazers

SENS avatar 李少锋 avatar  avatar 庞元喆 avatar Naafiyan Ahmed avatar Dorothy Ko avatar Linchang Xiao avatar Mike Donoso avatar Hideaki Takahashi avatar Ziming Mao avatar Keon Hee Park avatar Wanru Zhao avatar Mahiru Kagura avatar  avatar msyhu avatar Kelly Kostopoulou avatar  avatar Jeongyoon Moon avatar Qinghao Hu avatar Zhifeng Jiang avatar Tao Luo avatar GAO WEI avatar Lianke Qin avatar

Watchers

Riley Spahn avatar  avatar James Cloos avatar  avatar  avatar Gail Kaiser avatar Ronghui Gu avatar Tao Luo avatar

privatekube's Issues

Cycada/Cider?

Can you release the software on GitHub with source code, documentation, packages, tutorial on installing .ipa files, upload the ipa files you extracted to archive.org, and the shortcut and ciderpress .apks? And the closed source frameworks from apple put it on Acrhive.org, which is excluded from most copyright. If you are concerned about copyright to a big extent, email the files to [email protected], which i will manage copyright and if Apple sues me, i will manage it.

https://systems.cs.columbia.edu/projects/cycada/
https://systems.cs.columbia.edu/files/wpid-asplos2014-cider.pdf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.