Giter VIP home page Giter VIP logo

brainstorming-encrypted-git's People

Contributors

chmac avatar

Watchers

 avatar

brainstorming-encrypted-git's Issues

Encrypt every ref and object into a "new" repository

Following this comment I'm breaking this out into a separate issue.

Idea: Encrypt each ref and object from the source repo as a file in the encrypted repo

When pushing the unencrypted repo:

  • Create .git/encrypted
    • cd .git/encrypted && git init && git remote add ... && cd ../..
  • Encrypt the output of git show-ref
    • Save this into .git/encrypted/refs
  • Iterate over each ref and recursively over every linked object
    • Encrypt the output of git cat-file $ref
    • Save this into .git/encrypted/objects/xx/xx/xxxx
  • Add a commit with all changes from above inside .git/encrypted
    • cd .git/encrypted && git add . && git commit -m updates && git push

Pulling an encrypted repo would then look like:

  • Create .git/encrypted
    • cd .git/encrypted && git init && git remote add ... && git pull
  • Iterate over every object
    • find objects/ -type f
    • Decrypt the object
    • Copy the object into the parent git object store
  • Decrypt .git/encrypted/refs and recreate the refs

Brainstorming privacy designs

Following #1, what privacy tradeoffs make sense?

git-remote-gcrypt obscures everything by encrypting the packfiles and replacing the remote's single commit with a new one on each push. This is very far towards the privacy end of the spectrum, but introduces trade offs.

For mobile first applications that use git to store data, what tradeoffs would make sense?

How does git-remote-gcrypt work?

This issue is to track discussion around git-remote-gcrypt and if we could achieve interop with its encryption format.

Having dug into the code, read the docs, and run some local tests, I think it works like this:

  • The git objects and pack files get encrypted
  • Their encrypted filename is the hash of their contents
  • The keys to encrypt those files are put into a "manifest" file
    • This file is always called 91bd0c092128cf2e60e1a608c31e92caf1f9c1595f83f2890ef17c0e4881aa0a
  • The encrypted pack file, object files, maybe some others, and the manifest are put into a directory
  • This directory is then converted into a git repository with a single commit
    • Put differently, if you push this repo to something like GitHub you get a repo with a single commit, with a fixed, hardcoded commit author, time, and a list of files which are the encrypted contents
    • There is no commit history visible, and only one commit

Effectively, a git hosting service is used as a store of the latest "encrypted" git repository. So any pushes must upload the whole commit history again, because from GitHub's perspective, there's only ever 1 single commit.

Pros

  • Strong privacy, not much info is leaked
  • Not even commit history, frequency, etc, is leaked
  • Long standing codebase that's probably been well tested

Cons

  • Brutal uploads for each push, especially harsh on mobile
  • Doesn't use any "git"ness of the host
  • Slightly complex to package up and unpackage

Building a git remote helper

I have a working prototype now that takes a git repo and encrypts it by encrypting each object independently. I can push data to it and pull data back. That much works.

https://github.com/GenerousLabs/git-remote-encrypted

Now, I'm looking at how would it work if implemented as a git remote helper.

Firstly, reading this: https://git-scm.com/docs/gitremote-helpers

Then this is a very helpful guide: https://rovaughn.github.io/

Which in turn links to: https://github.com/git/git/blob/master/t/t5801/git-remote-testgit

The basics are:

  • The helper gets invoked with 2 arguments
    • First: The remote name or it's URL if there is no name
    • Second: The remote URL
  • git passes data to the stdin of the helper and reads from the stdout
    • Commands are batched and separated by a blank line \n\n
    • It sends a capabilities command first
    • We probably want to support fetch and push only at first
    • This actually means supporting several commands

That's about as much as I've understood thus far. Some additional useful reading:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.