Giter VIP home page Giter VIP logo

ipcs's Introduction

ipcs

GoDoc Build Status Go Report Card

Containerd meets IPFS. Peer-to-peer distribution of content blobs.

Getting started

Converting a manifest from DockerHub to p2p manifest:

# Term 1: Start a IPFS daemon
$ make ipfs

# Term 2: Start a rootless containerd backed by ipcs.
$ make containerd

# Term 3: Convert alpine to a p2p manifest
$ make convert
2019/06/04 13:54:40 Resolved "docker.io/library/alpine:latest" as "docker.io/library/alpine:latest@sha256:769fddc7cc2f0a1c35abb2f91432e8beecf83916c421420e6a6da9f8975464b6"
2019/06/04 13:54:40 Original Manifest [456] sha256:769fddc7cc2f0a1c35abb2f91432e8beecf83916c421420e6a6da9f8975464b6:
// ...
2019/06/04 13:54:41 Converted Manifest [456] sha256:9181f3c247af3cea545adb1b769639ddb391595cce22089824702fa22a7e8cbb:
// ...
2019/06/04 13:54:41 Successfully pulled image "localhost:5000/library/alpine:p2p"

Converting two manifests from DockerHub to p2p manifests, and then comparing the number of shared IPLD nodes (layers chunked into 262KiB blocks):

# Term 1: Start a IPFS daemon
$ make ipfs

# Term 2: Start a rootless containerd backed by ipcs.
$ make containerd

# Term 3: Convert ubuntu:bionic and ubuntu:xenial into p2p manifests, then bucket IPLD nodes into nodes unique to each image, and nodes inside intersect.
$ make compare
// ...
2019/06/04 13:51:33 Comparing manifest blocks for "docker.io/library/ubuntu:xenial" ("sha256:8d382cbbe5aea68d0ed47e18a81d9711ab884bcb6e54de680dc82aaa1b6577b8")
2019/06/04 13:51:34 Comparing manifest blocks for "docker.io/titusoss/ubuntu:latest" ("sha256:cfdf8c2f3d5a16dc4c4bbac4c01ee5050298db30cea31088f052798d02114958")
2019/06/04 13:51:34 Found 322 blocks
docker.io/library/ubuntu:xenial: 4503
docker.io/library/ubuntu:xenial n docker.io/titusoss/ubuntu:latest: 87550251
docker.io/titusoss/ubuntu:latest: 76117824
// 87550251 shared bytes in IPLD nodes

Design

IPFS backed container image distribution is not new. Here is a non-exhaustive list of in-the-wild implementations:

P2P container image distribution is also implemented with different P2P networks:

The previous IPFS implementations all utilize the Docker Registry HTTP API V2 to distribute. However, the connection between containerd pulling the image and registry is not peer-to-peer, and if the registry was ran as a sidecar the content would be duplicated twice in the local system. Instead, I chose to implement it as a containerd content plugin for the following reasons:

  • Containerd natively uses IPFS as a content.Store, no duplication.
  • Allow p2p and non-p2p manifests to live together.
  • Potentially do file-granularity chunking by introducing new layer mediatype.
  • Fulfilling the content.Store interface will allow using ipcs to also back the buildkit cache.

IPFS imposes a 4 MiB limit for blocks because it may be run in a public network with adversarial peers. Since its not able to verify hashes until all the content has arrived, an attacker can send gibberish flooding connections and consuming bandwidth. Chunking data into smaller blocks also aids in deduplication:

Chunking blocks in IPFS

IPCS implements containerd's content.Store interface and can be built as a golang plugin to override containerd's default local store. A converter implementation is also provided that converts a regular OCI image manifest to a manifest where every descriptor is replaced with the descriptor of the root DAG node added to IPFS. The root node is the merkle root of the 262KiB chunks of the layer.

Converting to P2P

Although the IPFS daemon or its network may already have the bytes for all image's P2P content, containerd has a boltdb metadata store wrapping the underlying content.Store.

A image pull, starting from the client side goes through the following layers:

  • proxy.NewContentStore (content.Store)
  • content.ContentClient (gRPC client)
  • content.NewService (gRPC server: plugin.GRPCPlugin "content")
  • content.newContentStore (content.Store: plugin.ServicePlugin, services.ContentService)
  • metadata.NewDB (bolt *metadata.DB: plugin.MetadataPlugin "bolt")
  • ipcs.NewContentStore (content.Store: plugin.ContentPlugin, "ipcs")

So in the case of this project ipcs, a pull is simply flushing through its content.Store layers to register the image in containerd's metadata stores. Note that the majority of the blocks don't need to be downloaded into IPFS's local storage in order to complete a pull, and can be delayed until unpacking the layers into snapshots.

Results

Collected data on: 7/11/2019

Systems:

  • m5.large x 3
  • 8.0 GiB Memory
  • 2 vCPUs
  • Up to 10 Gigabit (Throttled by AWS network credits)
  • Linux kernel 4.4.0
  • Ubuntu 16.04.6 LTS
  • Containerd v1.2.6
  • IPFS v0.4.21

Configuration:

  • Switch libp2p mux from yamux to mplex: export LIBP2P_MUX_PREFS="/mplex/6.7.0"
  • Set flatfs sync to false
  • Enable experimental StrategicProviding

Comparison:

  • Pull from DockerHub / Private docker registries
  • Shard content chunks evenly to 3 nodes such that each node has roughly 33% of IPFS blocks.
Image Total size (bytes) IPFS blocks DockerHub pull (secs) IPFS pull (secs) Diff (Hub/IPFS)
docker.io/library/alpine:latest 2759178 14 1.744165732 0.7662775298 227.62%
docker.io/ipfs/go-ipfs:latest 23545678 103 1.791054265 1.633165299 109.67%
docker.io/library/ubuntu:latest 28861894 38 2.720580011 1.629809674 116.93%
docker.io/library/golang:latest 296160075 380 4.687380759 6.015498289 77.92%

IPFS's performance seems to slow down as the number of nodes (size of total image) goes up. There was a recent regression in go-ipfs v0.4.21 that was fixed in this commit on master:

As seen from make compare, there also doesn't seem to be any improvements in deduplication between IPFS chunks as opposed to OCI layers:

$ GO111MODULE=on IPFS_PATH=./tmp/ipfs go run ./cmd/compare docker.io/library/alpine:latest docker.io/library/ubuntu:latest docker.io/library/golang:latest docker.io/ipfs/go-ipfs:latest
// ...
2019/06/04 13:39:55 Found 1381 blocks
docker.io/ipfs/go-ipfs:latest: 46891351
docker.io/library/alpine:latest: 5516903
docker.io/library/golang:latest: 828096081
docker.io/library/ubuntu:latest: 57723854
// Zero block intersection, they are very different images though.

Serious usage of p2p container image distribution should consider Dragonfly and Kraken, because IPFS suffers from performance issues:

Related benchmarking:

Next steps

Explore deduplication by adding each layer's uncompressed, untared files into IPFS to get chunked-file-granular deduplication. IPFS's Unixfs (UNIX/POSIX fs features implemented via IPFS) needs the following:

  • Support for tar file metadata (uid, gid, modtime, xattrs, executable bit, etc):
  • Support for hard links, character/block devices, fifo:
  • Implementation of diff.Comparer and diff.Applier to apply custom IPFS layer mediatype to containerd's tmpmount.

Explore IPFS-FUSE mounted layers for lazy container rootfs:

Explore IPFS tuning to improve performance

  • Tune goroutine/parallelism in various IPFS components.
  • Tune datastore (use experimental go-ds-badger?)
    • badgerds is hard to shard for benchmarking purposes, because GC doesn't remove data on disk: ipfs/go-ds-badger#54
  • Profile / trace performance issues and identify hotspots.

ipcs's People

Contributors

hinshun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

cnxtech rtradeltd

ipcs's Issues

Question about your content store plugin

When I build a plugin containerd will complain with "plugin was built with a different version ...". I am building with the same level of golang as containerd and my go.mod has the same level of containerd that I am running but I cannot get rid of this. How did you avoid it?

Document 2-tier GC

Add documentation on how I envision GC working for ipcs:

  • Containerd GC unpins root IPLD nodes
    • If IPFS daemon is shared with other applications, how do I ensure I'm the last one to unpin that CID? Need some kind of pin refcounting (higher level IPFS proxy?)
  • IPFS GC removes unreferenced IPLD nodes (and its unreferenced children)

Open source test infrastructure

Currently I'm benchmarking p2p pull using a variety of internal code to:

  • Run containerd with ipcs
  • Populate a containerd-ipcs cluster by sharding an image across the nodes
  • Compare a normal pull from a Docker Registry against a p2p pull via sharding the chunked IPLD nodes across the cluster

I need to open source those components and write an automated performance test suite, because I want upstream to include ipcs as a "real world usecase".

Things that would help upstream to test:

  • Easy ability to convert DockerHub images so they have an IPLD graph of alpine or ubuntu (variety of small and big images with little and many layers).
  • Test harness for ipcs

Move to proxy content store

Currently, ipcs is implemented as a in-tree plugin for containerd, this comes with multiple problems:

  • Need to recompile containerd for ipcs
  • If the IPFS daemon is not listening when containerd starts then ipcs won't be initialized with a working client
  • If the IPFS daemon restarts you need to recreate the client, though we could write a fault-tolerant client wrapper
  • IPFS daemon includes a lot of subsystems not necessary to operate ipcs, increasing resource consumption
  • We want to fine-tune and optimize distribution specific to container distribution, IPFS adds layers and complexity compared to just libp2p + bitswap

We should manage our lifecycle outside of containerd with a proxy content store.

  • Implement a proxy content store
  • Embed libp2p + bitswap instead of sidecar IPFS daemon
  • Possibly wrap IPFS content store with boltdb metadata store like containerd

Outstanding questions

  1. Transport of container images via IPFS:
    a. Consume the API of an IPFS client?
    b. Embed an IPFS daemon?
    c. Use the IPFS components (libp2p, go-bitswap/go-graphsync) directly? ipfs/go-graphsync#11
  2. IPLD tuning
    a. What hash function to use?
    b. Should I use raw leaves?
    c. Should I inline small blocks?
    d. What chunking algorithm to use? (constant vs rabin)
  3. IPFS tuning
    a. What datastore to use? flatfs vs badgerds?
    b. How should configure the datastore?
    c. Should I use a DHT? (dhtclient, dht, none)

go: error loading module requirements

63fc57b cannot be compiled with Go 1.12.4

$ make ipcs
go: finding github.com/libp2p/go-libp2p-crypto v2.0.1+incompatible
go: finding github.com/libp2p/go-flow-metrics v0.2.0
go: finding github.com/libp2p/go-libp2p-protocol v1.0.0
go: finding github.com/libp2p/go-libp2p-metrics v2.1.7+incompatible
go: finding github.com/multiformats/go-multiaddr-net v1.6.3
go: finding github.com/libp2p/go-libp2p-peer v2.4.0+incompatible
go: finding github.com/multiformats/go-multihash v1.0.8
go: finding github.com/ipfs/go-ipfs-delay v0.0.0-20181109222059-70721b86a9a8
go: finding github.com/Kubuxu/go-os-helper v0.0.1
go: finding github.com/AndreasBriese/bbloom v0.0.0-20180913140656-343706a395b7
go: github.com/libp2p/[email protected]: unknown revision v0.2.0
go: finding github.com/go-check/check v0.0.0-20180628173108-788fd7840127
go: github.com/libp2p/[email protected]: unknown revision v1.0.0
go: github.com/multiformats/[email protected]: unknown revision v1.6.3
go: finding github.com/whyrusleeping/chunker v0.0.0-20181014151217-fe64bd25879f
go: finding github.com/gxed/go-shellwords v1.0.3
go: github.com/libp2p/[email protected]+incompatible: unknown revision v2.0.1
go: finding github.com/dgraph-io/badger v1.5.5-0.20190226225317-8115aed38f8f
go: github.com/multiformats/[email protected]: unknown revision v1.0.8
go: github.com/libp2p/[email protected]+incompatible: unknown revision v2.1.7
go: finding github.com/ipfs/go-ipfs-blockstore v0.0.1
go: github.com/libp2p/[email protected]+incompatible: unknown revision v2.4.0
go: finding github.com/google/uuid v1.1.1
go: finding github.com/dgryski/go-farm v0.0.0-20190104051053-3adb47b1fb0f
go: finding github.com/ipfs/go-ipfs-exchange-offline v0.0.1
go: finding github.com/ipfs/go-blockservice v0.0.1
go: error loading module requirements
Makefile:11: recipe for target 'ipcs' failed
make: *** [ipcs] Error 1

@hinshun

Benchmarks

Collected data on: 6/13/2019

Systems:

  • m5.large x 3
  • 8.0 GiB Memory
  • 2 vCPUs
  • Up to 10 Gigabit (Throttled by AWS network credits)
  • Linux kernel 4.4.0
  • Ubuntu 16.04.6 LTS
  • Containerd v1.2.6
  • IPFS v0.4.21

Comparison:

  • Pull from DockerHub / Private docker registries
  • Shard content chunks evenly to 3 nodes such that each node has roughly 33% of IPFS blocks.
Image Total size (bytes) IPFS blocks DockerHub pull (secs) IPFS pull (secs) Diff (Hub/IPFS)
docker.io/library/alpine:latest 2759178 14 1.430587576 0.700885049 204.11%
docker.io/ipfs/go-ipfs:latest 23545678 103 1.182348947 8.134937865 33.47%
docker.io/library/ubuntu:latest 28861894 38 2.079848393 1.86365884 111.60%
docker.io/library/golang:latest 296160075 380 4.817960124 11.8802867 40.55%

Integrate with buildkit

POC buildkit using ipcs as psuedo-local build cache.

  1. How do define a "cache-miss". For instance, LLB vertex is cacheable with hash sha256:foobar, it's impossible to know if the IPFS network does not have content for that hash. We either traverse the full network, limit hops to certain depth, set a timeout on fetching the content. On a separate topic, can I race a LLB vertex execution to fetching its cache?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.