Giter VIP home page Giter VIP logo

soci-snapshotter's Introduction

SOCI Snapshotter

PkgGoDev Go Report Card Build Static Badge

SOCI Snapshotter is a containerd snapshotter plugin. It enables standard OCI images to be lazily loaded without requiring a build-time conversion step. "SOCI" is short for "Seekable OCI", and is pronounced "so-CHEE".

The standard method for launching containers starts with a setup phase during which the container image data is completely downloaded from a remote registry and a filesystem is assembled. The application is not launched until this process is complete. Using a representative suite of images, Harter et al FAST '16 found that image download accounts for 76% of container startup time, but on average only 6.4% of the fetched data is actually needed for the container to start doing useful work.

One approach for addressing this is to eliminate the need to download the entire image before launching the container, and to instead lazily load data on demand, and also prefetch data in the background.

Design considerations

No image conversion

Existing lazy loading snapshotters rely on a build-time conversion step, to produce a new image artifact. This is problematic for container developers who won't or can't modify their CI/CD pipeline, or don't want to manage the cost and complexity of keeping copies of images in two formats. It also creates problems for image signing, since the conversion step invalidates any signatures that were created against the original OCI image.

SOCI addresses these issues by loading from the original, unmodified OCI image. Instead of converting the image, it builds a separate index artifact (the "SOCI index"), which lives in the remote registry, right next to the image itself. At container launch time, SOCI Snapshotter queries the registry for the presence of the SOCI index using the mechanism developed by the OCI Reference Types working group.

Workload-specific load order optimization

Another big consideration that we haven't implmented/integrated into SOCI is to image load order based on your specific workload. See design README for more details.

Documentation

  • Getting Started: walk through SOCI setups and features.
  • Build: how to build SOCI from source, test SOCI (and contribute).
  • Install: how to install SOCI as a systemd unit.
  • Debug: accessing logs/metrics and debugging common errors.
  • Glossary: glossary we use in the project.

Project Origin

There a few different lazy loading projects in the containerd snapshotter community. This project began as a fork of the popular Stargz-snapshotter project from commit 743e5e70a7fdec9cd4ab218e1d4782fbbd253803 with the intention of an upstream patch. During development the changes were fundamental enough that the decision was made to create soci-snapshotter as a standalone project. Soci-snapshotter builds on stargz's success and innovative ideas. Long term, this project intends and hopes to join containerd as a non-core project and intends to follow CNCF best practices.

soci-snapshotter's People

Contributors

akihirosuda avatar amazon-auto avatar austinvazquez avatar coderbirju avatar dependabot[bot] avatar dims avatar djdongjin avatar dvnguyen-amzn avatar estesp avatar fangn2 avatar haddscot avatar hanyuel avatar henry118 avatar iain-macdonald avatar juneezee avatar kern-- avatar ktock avatar kzys avatar manujgrover71 avatar rdpsin avatar sbuckfelder avatar seanrmurphy avatar sondavidb avatar sparr avatar subzidion avatar tuananh avatar turan18 avatar vkuzniet avatar wmesard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soci-snapshotter's Issues

Change the default value for min-layer-size

Is your feature request related to a problem? Please describe.
Currently, the default value for min-layer-size is 0, meaning that a zTOC will be built for every layer.
This is obviously not the right default value. This issue is to do some performance testing to pick a better default.

Describe the solution you'd like
We have a better default value for min-layer-size. The default doesn't have to be perfect. It just has to get us in the ballpark.

Describe alternatives you've considered

Additional context

Reading a file can take a long time w/ prefetch enabled

Describe the bug
When prefetch is enabled, reading a file can take multiple seconds. The culprit seems to be the spanmanager.GetSpanContent method; there is a significant amount of lock contention on the span mutex (https://github.com/awslabs/soci-snapshotter/blob/main/fs/span-manager/span_manager.go#L268). As a result, runtime for a container workload is deteriorated.

start := time.Now()
s.mu.Lock()
fmt.Printf("spanId = %d, time = %d\n", spanId, time.Since(start).Nanoseconds())
defer s.mu.Unlock()

With prefetch enabled:

spanId = 0, time = 5206167568
spanId = 1, time = 5120533270
spanId = 7, time = 140
spanId = 8, time = 143
spanId = 5, time = 114
spanId = 4, time = 145
spanId = 12, time = 136
spanId = 9, time = 187
spanId = 11, time = 104
spanId = 10, time = 132
spanId = 5, time = 135
spanId = 16, time = 132
spanId = 21, time = 133
spanId = 21, time = 84547859
spanId = 21, time = 84559003
spanId = 21, time = 84525974
spanId = 21, time = 84467821

With prefetch disabled (no_background_fetch = true in /etc/soci-snapshotter-grpc/config.tml):

spanId = 0, time = 85
spanId = 0, time = 135
spanId = 1, time = 108
spanId = 7, time = 140
spanId = 0, time = 204
spanId = 8, time = 137
spanId = 5, time = 90
spanId = 4, time = 84
spanId = 0, time = 125
spanId = 12, time = 183
spanId = 9, time = 174
spanId = 11, time = 129
spanId = 10, time = 144
spanId = 0, time = 74
spanId = 5, time = 154
spanId = 0, time = 119
spanId = 16, time = 146
spanId = 1, time = 164
spanId = 2, time = 126

Steps To Reproduce

  • Run the snapshotter with prefetch enabled (i.e. the default behavior).

Expected behavior

There should not be this much mutex contention when running the snapshotter with prefetch enabled.

Configuration (please complete the following information):

  • OS: AL2
  • Snapshotter Version: ea7b497
  • Containerd Version: 1.6.6

Additional context
Add any other context about the problem here.

Improve `check-flatc` rule and its invocation

Is your feature request related to a problem? Please describe.

As discussed in PR #96 , the check-flatc rule in the Makefile could use some fixing. Also, it should be invoked by the build workflow.

Describe the solution you'd like

  1. It is surprising (and I would argue, fundamentally broken) for a check-* rule to depend on another rule (flatc) that modifies the workspace. It should stop doing that.
  2. flatc does not have a --dryrun option. So to detect if the generated files need to be updated, check-flatc could run it with -o /tmp/$$ and then compare the resulting files to the existing files, and error out if they differ.
  3. check-flatc is not being invoked in CI today. Workflows should invoke make check instead individual check-* rules, to fix this problem and prevent it from occurring for future check-* rules.

Additional context

See discussion in #96.

Fix make check-dco commit range

When initially adding the make check-dco target, we set the validation commit range to ignore the first couple of commits that didn't have the signed-off line, but otherwise verify all commits up to HEAD. This will work for now, but we may want to limit the check to the last N commits so that we don't have to fetch the entire history in pull request workflows. E.g. #5 only pulls the last 20 commits.

The line in question:

$(shell go env GOPATH)/bin/git-validation -run DCO -range 1374574271f4f14126c1d33735339f765f44f0a0..HEAD

[FEATURE] Allow CLI to discover soci artifacts that it doesn't create

Is your feature request related to a problem? Please describe.
Currently, soci uses a bbolt metadata DB as its source of truth about what SOCI artifacts exist locally. This DB is only updated by soci create. That means that any indices/ztocs that are added to the store outside of the soci create command are invisible to the inspection tools provided by soci (e.g. soci index list will not see an index pulled with soci rpull).

Describe the solution you'd like
I would like to see a command like:

soci rebuild-db

which would rediscover all of the soci artifacts in the local storage and add the necessary metadata to the db.

Describe alternatives you've considered
One alternative would be to have the snapshotter update the local DB when it pulls SOCI artifacts. There are a couple of downsides to this approach:

  1. SOCI artifacts could be deleted outside of either soci or the soci snapshotter which would require a rebuild-like command to avoid a full purge.
  2. bbolt locks the DB file while a process is interacting with it. If the soci snapshotter opens the DB at launch, then it must not be running when soci is invoked. If the soci snapshotter opens/closes the DB around each artifact fetch, then it may see an unacceptable performance hit.

The second issue could be solved by having the snapshotter own the metadata DB and expose a gRPC (or similar) service that the soci command could interact with. This has the downside that the snapshotter is required to be running in order to build SOCI artifacts which is not desirable in cases like a managed index builder where the images will never be run locally.

Additional context
Add any other context or screenshots about the feature request here.

Implement Reference Types alongside ORAS

Is your feature request related to a problem? Please describe.

This project currently relies on the image and distribution specifications from the ORAS project. But that project is being deprecated in favor of the standards out of the Reference Types working group image-spec and distribution-spec, respectively.

Describe the solution you'd like

  • soci-snapshotter and soci create will support both ORAS and Reference Types. Which one is used will be selected at runtime by one of: a config file option, command line option, or environment variable. (Implementor's choice. But the default should probably change to Reference Types as part of this work.
  • References to ORAS in the documentation are updated, accordingly. For example, docs/GETTING_STARTED.md should describe how to set up a local Reference Types distro.

Note: This is a breaking change, since the SOCI index manifest format will be changing slightly.

We should get this work done early, so we break as few people as possible.

Describe alternatives you've considered
We considered just ripping out ORAS outright. That would be less work and more satisfying. But we decided to support both for a time to make the transition smoother.

Additional context
N/A

Don't use `int` for `bits` field in `gzip_index_point`

Is your feature request related to a problem? Please describe.
Currently, the bits field in gzip_index_point has the type int which is 4 bytes on most implementations. However, it only needs to contain values for 0-7. The problem is compounded because we only serialize/deserialize 1 byte out of the 4. Changing it to use a smaller sized integer which shave off a few bytes, but more importantly, help get rid of pesky bugs like #24.

Describe the solution you'd like
Use something like uint8_t or int8_t to represent bits.

Describe alternatives you've considered
N/A

Additional context
N/A

Define SOCI terminology

Is your feature request related to a problem? Please describe.
SOCI introduces a handful of new terms. Some, such as SOCI itself, are defined in the README. Others, such as zTOC, are mentioned in various places, but never explicitly defined. Still others, e.g. SOCI index and SOCI index manifest, are neither defined nor obvious in their distinction.

We should add documentation with specific definitions for these terms both to help new users/contributors understand the project and to avoid talking past each other by using terms inconsistently.

Describe the solution you'd like
A document (possibly the readme) with specific definitions for:

SOCI
SOCI index
SOCI index manifest
zTOC

Describe alternatives you've considered
N/A

Additional context
N/A

Fast restart during uncompression of a span

Is your feature request related to a problem? Please describe.

Today, the entire span is uncompressed at the time of the first file access from that span. The application thread
that issued the read request is blocked while this operation occurs.
But on average, almost half of the data in the span will not be needed to satisfy this initial read request.

This acts as a limiter on how large a span size we use. The larger the span size, the more time we spend synchronously uncompressing data that the application may not even need.

Describe the solution you'd like

An optimization would be for the Span Manager to uncompress the data needed to satisfy the call to GetSpanContent(), and then immediately return to the caller so that the application can restart sooner. The rest of the uncompression can happen in the background.

This would reduce the span size performance penalty almost in half, allowing us to use larger spans. Larger spans mean smaller zTOCs, which means faster SOCI index downloads, which means faster launch times.

Describe alternatives you've considered

An alternative to completing the uncompression in background would be to maintain a per-span 'high-water mark' to track how far into the span we've gotten, and then only uncompress when that specific data is requested. This is likely to be a performance loser except in the rare cases where file access patterns are pathologically random, or when uncompression is very expensive (e.g., when running on a 1982 vintage TRS-80).

Additional context

This will further complicate the span state machine and locking model. Specifically, we'll need an uncompressing state. And we'll need some business logic to handle the case where a second request comes in while in that state. The naive approach to handling that would be to just block until uncompressed. A more aggressive approach would be to try to handle the request if the requested range is already available. If the data is not yet available, the really aggressive approach would be to wait until it is, and then have this second request fast restart as well.

I recommend doing the naive implementation now, and consider the more aggressive optimizations as follow-on work.,

[FEATURE] soci index builder as a library

Is your feature request related to a problem? Please describe.
Currently the index builder code is tightly coupled with Containerd. This blocks us on using the index builder module in environments without having containerd daemon running.

Describe the solution you'd like
We hope to move the index builder code into a separate module and the module should work without requiring on the present of the Containerd daemon. The new index builder module then can be imported by the soci-cli or other projects.

Describe alternatives you've considered
We need to have Containerd daemon running if we want to build soci index files.

Use annotations to map a zTOC to its corresponding layer

Is your feature request related to a problem? Please describe.

For very small layers, itโ€™s not worth creating a zTOC; itโ€™s more performant to just pull the layer at launch time, and untar it. Hence, the soci create --min-layer-size option.

The problem is that today we use null values in the descriptors array to represent these "missing" zTOCs.
In other words, soci.descriptors[i] always corresponds to image.layers[i]. The problem is that ORAS doesn't like that.
This wasn't necessarily a deal breaker, since we will soon be moving to Reference Types. But, they don't really like it either.

Describe the solution you'd like

Rather than fight this fight, let's just go with the annotation based approach. That is, each entry in the descriptors array can have an annotation containing the digest of the corresponding layer.

Re-enable nginx test in `TestOptimizeConsistentSociArtifact`

Is your feature request related to a problem? Please describe.
#48 disables the nginx test in TestOptimizeConsistentSociArtifact due to random failure caused due to how gob encodes FileMetadata.Xattrs (a map). Once we move away from gob, we should re-enable that test. This issue a placeholder so that the TODO isn't lost in the comments.

Describe the solution you'd like
nginx test in TestOptimizeConsistentSociArtifact is re-enabled.

check-flatc has to verify if flatbuf generated files have been modified

Describe the bug
check-flatc does not identify that flatbuf generated files have been modified without using flatc.

Steps To Reproduce

  1. Modify any of the files under ztoc/fbs
  2. Run make check-flatc
  3. Observe nothing happens

Expected behavior

  1. make check-flatc needs to provide the difference between the contents in ztoc/fbs and a newly generated fbs files.

Configuration (please complete the following information):
N/A

Additional context
N/A

Ztoc should be separated from compression

Is your feature request related to a problem? Please describe.
Right now the implementation of Ztoc handling is very dependent on Gzip. This makes it really hard to plug in the new compression algorithm, e.g. zstd.

Describe the solution you'd like

  1. The design exists to propose how to decouple the Ztoc and compression.
  2. Once the design is agreed on, create the tickets to implement the changes.

Describe alternatives you've considered
Keep everything as is. But this tight coupling is hardly extensible.

Additional context
N/A

"Chunk" is not a thing in the SOCI world

Is your feature request related to a problem? Please describe.
The Stargz and eStargz file formats have a notion of "chunks". During the conversion step, large files are split into multiple chunks, so that they can be pulled, verified and uncompressed separately. SOCI has no such notion. In the SOCI world, layers are divided into variable sized "spans" for pulling and checksumming purposes.

Describe the solution you'd like
Any code in SOCI that refers to chunks is dead code and needs to be removed.

The solution should satisfy the following criteria:

  • cd SociSnapshotter; ag chunk | wc -l returns 0
  • If the above criteria cannot be satisfied, a comment is added to this story explaining why.

Describe alternatives you've considered

Additional context
See the following links for reference:

Span Size Impact

Investigate the impact of using different span sizes on benchmark workloads.

Prefetch unit tests are flaky

Describe the bug
The prefetcher unit tests randomly fail with the error: spans digest do not match.

Steps To Reproduce

  • cd into fs

  • Run the tests multiple times.

$ go clean -testcache && go test ./...

ok      github.com/awslabs/soci-snapshotter/fs    0.009s
?       github.com/awslabs/soci-snapshotter/fs/config    [no test files]
ok      github.com/awslabs/soci-snapshotter/fs/layer    8.559s
?       github.com/awslabs/soci-snapshotter/fs/metrics/common    [no test files]
?       github.com/awslabs/soci-snapshotter/fs/metrics/layer    [no test files]
ok      github.com/awslabs/soci-snapshotter/fs/reader    3.878s
ok      github.com/awslabs/soci-snapshotter/fs/remote    12.936s
?       github.com/awslabs/soci-snapshotter/fs/source    [no test files]
ok      github.com/awslabs/soci-snapshotter/fs/span-manager    0.629s

$ go clean -testcache && go test ./...
ok      github.com/awslabs/soci-snapshotter/fs    0.008s
?       github.com/awslabs/soci-snapshotter/fs/config    [no test files]
ok      github.com/awslabs/soci-snapshotter/fs/layer    8.671s
?       github.com/awslabs/soci-snapshotter/fs/metrics/common    [no test files]
?       github.com/awslabs/soci-snapshotter/fs/metrics/layer    [no test files]
ok      github.com/awslabs/soci-snapshotter/fs/reader    3.910s
ok      github.com/awslabs/soci-snapshotter/fs/remote    12.939s
?       github.com/awslabs/soci-snapshotter/fs/source    [no test files]
--- FAIL: TestStateTransition (0.00s)
    --- FAIL: TestStateTransition/max_span_-_prefetch (0.00s)
        span_manager_test.go:211: failed resolving the span for prefetch
    --- FAIL: TestStateTransition/max_span_-_on_demand_fetch (0.00s)
        span_manager_test.go:220: failed getting the span for on-demand fetch
FAIL
FAIL    github.com/awslabs/soci-snapshotter/fs/span-manager    0.631s
FAIL

Expected behavior
The prefetcher unit tests pass consistently.

Configuration (please complete the following information):

  • OS: N/A
  • Snapshotter Version: Commit ID #714f1a971e6
  • Containerd Version: N/A

Remove ORAS support from the snapshotter

Is your feature request related to a problem? Please describe.
With oras-go migration from ORAS to OCI Artifact completed, we should look into removing ORAS support from the snapshotter.

Describe the solution you'd like
ORAS support is removed.

Ztoc struct has to be consistent with the structure defined in ztoc.fbs

Is your feature request related to a problem? Please describe.
Ztoc struct has to be consistent with the structure defined in ztoc.fbs. We have two ztoc's and serialization logic copying data from Ztoc in ztoc.go to fbs-generated Ztoc. In order to avoid any bugs, those must be consistent. The fbs ztoc has more logical structure we all agreed on, so Ztoc struct must follow the lead.

Describe the solution you'd like
Ztoc struct in ztoc.go needs to be updated to reflect the structural changes agreed on when defining the serialization format in .fbs.

Describe alternatives you've considered

Additional context

go-fuse: set AttrTimeout, EntryTimeout and NegativeTimeout to 600s

Is your feature request related to a problem? Please describe.
The fuse process is not tuned. To achieve better performance for metadata operations, one can try tuning the following parameters: AttrTimeout, EntryTimeout and NegativeTimeout.

Describe the solution you'd like
User can set the above parameters in config.toml. The default values for those should be set to 600s.
The PR should contain the comparison for the workloads performance with the value 600s to baseline performance (all values are set to 0).

Remove 'FirstSpanHasBits' field from `FileMetadata`

Is your feature request related to a problem? Please describe.
We should remove the FirstSpanHasBits field from FileMetadata. Firstly, it's a terrible name which leaves readers more confused than enlightened. More importantly, the field exists so that the remote fetcher knows to adjust the fetch range when fetching the spans in case there is partially uncompressed data in the previous byte. We don't actually need this field, as it can be computed from C.has_bits.

Describe the solution you'd like
FirstSpanHasBits is removed.

Describe alternatives you've considered
N/A

Additional context

Add `soci index rm` command

Is your feature request related to a problem? Please describe.
We currently don't have any way to remove ind(exes/ices) from the local store. We should add commands to do so in the SOCI CLI.

Describe the solution you'd like
soci index rm <digest> - allows you to remove a particular index by specifying the index digest (as presented by soci index list) and its entry in the artifact database.

soci index rm --all - removes all ind(exes/ices) from the local store and their entries in the artifact database.

Describe alternatives you've considered
I'm a little torn on soci index rm --all. Maybe it should be a separate command i.e, something likesoci index prune?

Additional context

Cache buffers for reuse

Is your feature request related to a problem? Please describe.
A recent CPU profile of the snapshotter showed that a non-trivial amount of time is being spent on allocating and garbage collecting byte buffers to store compresssed/uncompressed data.

One of the major culprits (although, not the only one) is the ExtractDataFromBuffer function.

(This is especially problematic for large images or workloads the access a bunch of small files at once).

Describe the solution you'd like
We should cache these buffers so we can reuse them and relieve some of the GC pressure. (Maybe using sync.Pool?)

Describe alternatives you've considered

Additional context

[FEATURE] Make a small package for SOCI Snapshotter clients

Is your feature request related to a problem? Please describe.

SOCI Snapshotter clients most likely need AppendDefaultLabelsHandlerWrapper which doesn't have a lot of dependencies (OCI specs and containerd, I believe).

However, consuming SOCI Snapshotter as a Go module will bring a lot of dependencies.

Describe the solution you'd like

Can you provide a small Go module that only has AppendDefaultLabelsHandlerWrapper?

Describe alternatives you've considered

Copy-and-paste the function, assuming the annotations wouldn't change much.

Additional context
Add any other context or screenshots about the feature request here.

[BUG] Soci snapshotter only lazy loads one image

Describe the bug
tl;dr the snapshotter falls back to ahead-of-time pulls after the first image pull because fs.filesystem caches the first SOCI index it sees and tries to use that data for all other image pulls.

fs.filesystem is responsible for reading labels and setting up FUSE mounts for each layer of an image when pulled using a SOCI index. When the first layer of an image is passed to the filesystem, the filesystem does a one time SOCI index fetch to download the SOCI index from the remote registry. This onetime setup also assigns the index and a mapping of layer -> ztoc descriptor in the filesystem itself so we have that information ready for the next layer without an additional network call.

The problem is that the [fs.filesystem] is not unique for each image, but rather there is a single instance for the entire snapshotter. As a result, the first image that gets pulled using a soci index will load that index and it's layers into the filesystem data structure and future image pulls will not change this data. For each layer in the second image, a lookup of the ztoc digest will fail because it is not in the first image's layer -> ztoc map which causes Resolve to fail as it's trying to resolve an empty descriptor. The resolution error bubbles up to the snapshotter service which falls back to ahead-of-time pulling with overlayfs.

We still want to hold on to the index and layer -> ztoc map, but we need to be able to hold on to multiple instances of these at a time. Wrapping those two components up into a IndexInfo struct (ugh. naming is hard) and then storing those in the filesystem in a sync.Map of indexDigest string -> IndexInfo seems like a good way to handle this.

Steps To Reproduce
Start the snapshotter and soci rpull any two images for which there are SOCI indices.

Expected behavior
Both images should be pulled lazily using their respective

Configuration (please complete the following information):

  • OS: AmazonLinux 2
  • Snapshotter Version: soci-snapshotter-grpc b32d03a b32d03a
  • Containerd Version: 1.6.0

Additional context

Model the gzip_index data in the fbs file instead of treating it as a binary blob

Is your feature request related to a problem? Please describe.
Right now gzip_index data is treated as a binary blob. This creates a lot of inconvenience during serialization, since we are required to do interventions do encode/decode integer types with respect to the configuration of the host. Modeling this in fbs will let Flatbuffers do this for us, which is the right thing to do long term.

Describe the solution you'd like
ztoc.fbs gets rid of IndexByteData as [ubyte] in favor of the field of type table gzip_index. The respective serialization and de-serialization logic is added.

Describe alternatives you've considered

Additional context

[BUG] Not usable as a Go module

Describe the bug
I am trying to build a project that uses soci-snapshotter. I ran go get github.com/awslabs/soci-snapshotter@latest, used some of the exported functions and tried go build. It failed with the following error:

/usr/local/go/pkg/tool/linux_arm64/link: running gcc failed: exit status 1
/usr/bin/ld: cannot find -lindexer
collect2: error: ld returned 1 exit status

I think this is because right now the code expects the indexer C library to be built using make.

Is the intention for this package to be usable as a Go module? I would find it quite useful, but maybe it's not a goal for this project?

indexer C component needs to be rewritten to Go

Is your feature request related to a problem? Please describe.
Right now soci snapshotter has indexer component, which is implemented in pure C. It needs to be rewritten in Go, which will provide a convenient abstraction layer over zlib library. The reasoning: C code is less maintainable than respective Go implementation and that may lead to bugs eventually sneaking in.

Describe the solution you'd like
Indexer components needs to be rewritten in Go.

Investigate Create Container time

Create container is taking significantly longer for soci-snapshotter then for overlayfs. Let's understand why this is happening and if the impact can be reduced. Add a benchmark test specifically for Create Container.

Add integration tests to github actions

#5 adds github actions workflows to run unit tests. Integration tests should be a part of this as well, but when I tried they timed out after ~2 hours.

We should do some digging into what exactly is going wrong there.

AWS S3 cache and WSL/Mac/Linux cache mount

Is your feature request related to a problem? Please describe.
I want the lazy load cache in AWS S3 or a localhost shared mount path.

Preferably sqlite blobs where you can walk the B-tree with HTTP range queries and sip just the bytes you need.

See https://github.com/dacort/athena-sqlite/blob/master/lambda-function/vfs.py

Describe the solution you'd like
Long tail binaries get pulled from S3.

Describe alternatives you've considered
It doesn't have to be S3 - implementation could also work or WSL so cache can be stored in /mnt/c/WSL/cache directory for Windows boxen or a similar directory on Mac/Linux boxen.

Additional context

Integration tests frequently fail during the first run in github actions

Describe the bug
Integration tests frequently fail during the first run in github actions. This is not happening when running them locally.
The log looks as follows:

tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
testing: 2022/10/26 17:23:14 failed to run piped commands [[ctr run --rm registry-cdcmpp7350hfc9pi81pg.test/ubuntu:latest test tar -c /usr] [tar -xC /tmp/tmp.yMfRujYlv3]]: exit status 2

Steps To Reproduce

  1. Create a pull request on github and observe the tests failing.
    No steps to reproduce locally.

Expected behavior
Integration tests execution in github actions is consistent with local behavior.

Allow manually remove invalid snapshots on restore

Is your feature request related to a problem? Please describe.
On restart, soci-snapshotter-grpc doesn't start if one of the snapshots cannot restore. This makes impossible to manually remove images and snapshots (e.g. by using ctr).

Describe the solution you'd like
Cherrypick containerd/stargz-snapshotter#901

Describe alternatives you've considered

Additional context

soci-snapshotter should optionally use the Referers API to discover the remote SOCI index

Is your feature request related to a problem? Please describe.

Today, soci-snapshotter has to be explicitly told the digest of the SOCI index that is associated with the image in the remote repository.

That's fine for some installations, where an outside agent will be doing that query anyway (via the Referrers API) to figure out which snapshotter to use. But for some installations--e.g., anyone running containerd themselves--forcing them to feed us the SOCI index digest (using soci rpull) is...um...stupid. Referrers API was developed specifically to relieve clients of this burden.

So let's fix it.

Describe the solution you'd like

The SOCI index digest should be optional in both the client.Pull call and the soci rpull command line. If it is not specified, then the snapshotter itself will use Referrers API to try to look it up.

If Referrers API returns more than one result, the snapshotter will use the first one in the list. (For now. We can iterate on this to later.)

Describe alternatives you've considered

If I were running the world, containerd would be allowed to have multiple snapshotters installed, and its snapshotter plugin interface have a new call which would happen before Prepare to do dynamic snapshotter selection. That is, containerd would ask each configured snapshotter "Hey, do you want to handle this image?" In that world, soci-snapshotter would do the Referrers API call at that time.

Alas, that is not the world in which we live. It could be. containerd #6657 is trying to solve a similar problem.

Additional context

As part of this work, references to the soci rpull command will be removed from the GETTING_STARTED.md guide. In other words, the only thing you need to do on the runtime host, is launch the snapshotter, and then ctr i run the image. No soci CLI commands needed. That way, the demo is more wicked cool, and it is more obvious how SOCI can be used for production workloads.

If this fix is implemented after soci-snapshotter #42, then it would be fine if it only works for Reference Types and not ORAS. ORAS is going away, so there's no point in writing throwaway code.

Enable dependabot for the repo

Is your feature request related to a problem? Please describe.
We need to be able to automatically update dependencies in snapshotter.

Describe the solution you'd like
Enable dependabot.

Describe alternatives you've considered

Additional context

Check if bbolt db for metadata can be tuned

bbolt db, which stores file metadata information may be one of the reasons, the snapshotter's workflows are slow. It would be good to have a look at what can be done to speed things up.

SOCI CLI integration test expansion

Is your feature request related to a problem? Please describe.
The SOCI CLI is tested for integration around the core user experience of creating/pushing/pulling SOCI indices, but it is not tested around the debug tooling for indices and ztocs. We should add integration tests to verify these debugging tools.

Describe the solution you'd like
The following integration tests:

  • index list #250
  • index list with image ref #250
  • index info #250
  • ztoc list #265
  • ztoc list with digest #265
  • ztoc info #334
  • ztoc get-file #334
  • index rm with digest #260
  • index rm with image ref #260
  • index rm everything #260
  • create with all platforms
  • push with all platforms
  • image rpull with all platforms

Describe alternatives you've considered
N/A

Additional context
N/A

Move build-tool installation logic out of Makefile

Is your feature request related to a problem? Please describe.

Today, the Makefile contains a growing number of rules for installing build-time dependencies. This is an antipattern that hurts maintainability and portability. Makefiles should contain the logic to build project artifacts, and that is it. Dependencies should be documented in README.md or BUILDING.md and/or managed by a configure.sh script.

Unit tests fail using Go 1.19 because of data race

Describe the bug
The unit tests inside the soci/ directory fail due a data race. This only happens when using Go 1.19.x, not the previous versions of Go.

Steps To Reproduce

$ go version
go version go1.19 linux/amd64

$ make test 
...
==================
WARNING: DATA RACE
Write at 0x00c001509038 by goroutine 55:
  runtime.racewriterange()
      <autogenerated>:1 +0x29
  internal/poll.(*FD).Pread()
      /usr/local/go/src/internal/poll/fd_unix.go:193 +0x169
  os.(*File).pread()
      /usr/local/go/src/os/file_posix.go:40 +0x335
  os.(*File).ReadAt()
      /usr/local/go/src/os/file.go:136 +0x2de
  io.(*SectionReader).ReadAt()
      /usr/local/go/src/io/io.go:552 +0x205
  github.com/awslabs/soci-snapshotter/soci.ExtractFile.func1()
      /home/ec2-user/soci-snapshotter/soci/ztoc.go:141 +0x1db
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/ec2-user/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57 +0x91

Previous write at 0x00c001509039 by goroutine 56:
  runtime.racewriterange()
      <autogenerated>:1 +0x29
  internal/poll.(*FD).Pread()
      /usr/local/go/src/internal/poll/fd_unix.go:193 +0x169
  os.(*File).pread()
      /usr/local/go/src/os/file_posix.go:40 +0x335
  os.(*File).ReadAt()
      /usr/local/go/src/os/file.go:136 +0x2de
  io.(*SectionReader).ReadAt()
      /usr/local/go/src/io/io.go:552 +0x205
  github.com/awslabs/soci-snapshotter/soci.ExtractFile.func1()
      /home/ec2-user/soci-snapshotter/soci/ztoc.go:141 +0x1db
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/ec2-user/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57 +0x91

Goroutine 55 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /home/ec2-user/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:54 +0xee
  github.com/awslabs/soci-snapshotter/soci.ExtractFile()
      /home/ec2-user/soci-snapshotter/soci/ztoc.go:135 +0x6eb
  github.com/awslabs/soci-snapshotter/soci.TestDecompress()
      /home/ec2-user/soci-snapshotter/soci/ztoc_test.go:138 +0xf88
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1493 +0x47

Goroutine 56 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /home/ec2-user/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:54 +0xee
  github.com/awslabs/soci-snapshotter/soci.ExtractFile()
      /home/ec2-user/soci-snapshotter/soci/ztoc.go:135 +0x6eb
  github.com/awslabs/soci-snapshotter/soci.TestDecompress()
      /home/ec2-user/soci-snapshotter/soci/ztoc_test.go:138 +0xf88
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1493 +0x47
==================
--- FAIL: TestDecompress (12.08s)
    testing.go:1319: race detected during execution of test

Expected behavior
All unit tests pass.

Configuration (please complete the following information):

  • OS: AL2 (Linux 5.10.130-118.517.amzn2.x86_64)
  • Snapshotter Version: c5a2474
  • Containerd Version: v1.6.6
  • Go Version: go version go1.19 linux/amd64

Span access metrics

Is your feature request related to a problem? Please describe.

  • Today a SOCI index is built with a specific span-size and min-layer-size.
  • Eventually we will have config parameters to control background fetch behavior (request rate, request size, request throughput limit, etc. etc.)

For customers to tune those parameters, they need visibility into how spans are being accessed.

Describe the solution you'd like

SOCI snapshotter should generate the following metrics

  1. span_synchronous_fetches - the number of spans fetched due to synchronous read requests (causing application stalls).
  2. span_background_fetches - the number of spans fetched by the background fetcher.
  3. span_first_use - this is the number of spans that were uncompressed and accessed for the first time, regardless of how they were downloaded. (This probably needs a better name.)

Note that span_synchronous_fetches + span_background_fetches - span_first_use is the number of spans downloaded and never accessed.

Check how chunk_size option affects the performance of downloading spans from remote

When span manager requests span data from the remote, the span is being split into chunks of the size chunk_size. With default chunk_size as 50KiB, it may be making multiple requests to remote registry to fetch the compressed data.
We need to evaluate how this affects the performance and if this needs to be tuned.

#99 should take care of chunks in soci, but until it's done, this is worth exploring.

Re-design the background fetcher

Is your feature request related to a problem? Please describe.
The background fetcher is the component of the snapshotter that fetches spans in the background. In it's current state, it is not a particularly well-thought-out component, leaking abstractions and degrading snapshotter performance.

Some of the issues:

  • The background fetcher acquires the lock and only releases it when it's done fetching the span from remote. This can lead to situations like #105, wherein on-demand fetches suffer due to lock contention.

  • The background fetcher is very aggressive; currently, it tries to fetch all spans from all layers as fast as possible. This is undesirable (e.g., there might be cases we want to rate limit the background fetcher to avoid throttling, or prioritize certain layers' prefetch over others).

  • The background fetcher fetches a single span at a time. This is fine for when a span size is a desirable value (for e.g., 8-10MiB for S3), but no so much when it's not. For some span sizes and some repos, it may be more performant to fetch multiple spans at once.

Describe the solution you'd like
The background fetcher is re-architected and takes into consideration all the above points.

Incapsulate Ztoc serialization logic into ZtocSerializer

Is your feature request related to a problem? Please describe.
Ztoc has a very complex serialization logic, which involves doing manual placement of data to compose the flatbuffer byte stream. This asks for a separate entity to do that, instead of relying on ztocToFlatbuf and flatbufToZtoc.

Describe the solution you'd like
Implement ZtocSerializer and replace ztocToFlatbuf and flatbufToZtoc by using the respective Serialize/Deserialize methods.

Describe alternatives you've considered
Keep it as is.

Additional context
N/A

Remove libindexer

Is your feature request related to a problem? Please describe.
soci snapshotter has indexer C component, which at the moment needs to be built separately into libindexer.a and then statically linked during snapshotter's compilation. This shouldn't be this way.

Describe the solution you'd like
gzip_indexer.go and C files need to be located in the same directory. This will allow cgo to search for C files within the same directory where gzip_index.go is located and compile/link them on the fly without creating the additional artifact.

Describe alternatives you've considered

Additional context

Remove SpanStart and SpanEnd from Ztoc's FileMetadata

Is your feature request related to a problem? Please describe.
SpanStart and SpanEnd fields in ztoc.FileMetadata seem to be redundant. The logic they are used in can be changed to using UncompressedOffset and UncompressedSize. This way, the TOC building logic can be simplified and we can get rid of knowing about compression on that stage.

Describe the solution you'd like
Remove SpanStart and SpanEnd fields from ztoc.FileMetadata and adjust the implementation to use UncompressedOffset and UncompressedSize instead.

Describe alternatives you've considered
Keep as is. The problems with current state of things:

  • TOC part of ztoc knows about compression
  • Really hard to consolidate ztoc building logic to account for easy addition of different compression algorithms

Additional context
N/A

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.