Giter VIP home page Giter VIP logo

buildkit-cache-dance's People

Contributors

akihirosuda avatar aminya avatar borchero avatar chagui- avatar davids-ovm avatar dylanratcliffe avatar henryjw avatar johanneswuerbach avatar kskalski avatar mabrikan avatar ngokimphu avatar rose-m avatar shyim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

buildkit-cache-dance's Issues

Allow skipping extraction step

The extraction step takes several minutes for large caches such as caching ~500MB of npm modules. If the actions/cache step had a cache hit, the extraction step is pointless (as the cache will not even be rewritten in that case anyway). This is actually fairly common since lockfiles dont change much. If this action could accept an input for "should-extract", this could be tied to the output of "cache-hit" from actions/cache, and save a few minutes on a lot of runs.

Process multiple targets in single action call and support S3 backend

Hi @AkihiroSuda thank you for picking up maintenance of this important action!

We have added two features on a fork over at https://github.com/dcginfra/buildkit-cache-dance and I wonder if you would be interested in PRs to add these features to v2 of the action, now that its use is recommended in the official Docker documentation. We have two main changes:

  • Process multiple cache mounts in a single pass by specifying an ID for each mount
  • Support AWS S3 as an alternative cache storage backend

The changes require the user's Dockerfile to be modified with cache IDs like this:

FROM ubuntu:22.04
RUN \
  --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-cache \
  --mount=type=cache,target=/var/lib/apt,sharing=locked,id=apt-lib \
  apt-get update && apt-get install -y gcc

And the action is called something like this:

- name: inject cache mounts into docker
  uses: reproducible-containers/buildkit-cache-dance@mount-id-example
  with:
    mounts: |
      apt-cache
      apt-lib

The main change is in the Dancefile, which is generated on the fly with as many mounts and copy operations as necessary. There is no need to pass the cache-source and cache-target separately anymore because the cache is identified by its unique ID instead, like this:

- name: Prepare list of cache mounts for Dancefile
  uses: actions/github-script@v6
  id: mounts
  with:
    script: |
      const mountIds = `${{ inputs.mounts }}`.split(/[\r\n,]+/)
        .map((mount) => mount.trim())
        .filter((mount) => mount.length > 0);
      
      const cacheMountArgs = mountIds.map((mount) => (
        `--mount=type=cache,sharing=shared,id=${mount},target=/cache-mounts/${mount}`
      )).join(' ');
      
      const s3commands = mountIds.map((mount) => (
        `aws s3 sync --no-follow-symlinks --quiet s3://${{inputs.bucket}}/cache-mounts/${mount} /cache-mounts/${mount}`
      )).join('\n');

      core.setOutput('cacheMountArgs', cacheMountArgs);
      core.setOutput('s3commands', s3commands);

- name: Inject cache data into buildx context
  shell: bash
  run: |
    docker build ${{ inputs.cache-source }} --file - <<EOF
    FROM amazon/aws-cli:2.13.17
    COPY buildstamp buildstamp
    RUN ${{ steps.mounts.outputs.cacheMountArgs }} <<EOT
        echo -e '${{ steps.mounts.outputs.s3commands }}' | sh && \
        chmod 777 -R /cache-mounts || true
    EOT
    EOF

The code is currently still written in JS, and is quite tightly bound to S3 (since that is what we need) but I'd love to see features like this supported in the maintained version of the action, since there has been a lot of discussion about this (as I'm sure you're aware). Thoughts?

UID and GID are Not Preserved When Injecting Cache

How to reproduce:

  1. Create directory test_ownership
  2. Add the following Dockerfile
FROM ubuntu

RUN groupadd -g 9999 app && useradd -m -g 9999 -u 9999 app

USER app

WORKDIR /home/app

COPY . /home/app

RUN --mount=type=cache,uid=9999,gid=9999,target="/home/app/tmp_data" \
    echo "Listing BEFORE writing." &&\
    (ls -l /home/app/tmp_data/test.txt || echo "File not yet created") &&\
    echo "THIS IS A TEST" > /home/app/tmp_data/test.txt &&\
    echo "Listing file AFTER writing." &&\
    ls -l /home/app/tmp_data/test.txt
  1. Run docker buildx build --progress plain -t test-ownership ..
  2. (you have to be root user to preserve ownership when extracting) Extract cache by running node ./buildkit-cache-dance/dist/index.js --extract --cache-map '{"<path-to-cache-directory>/cache_dir": {"target": "/home/app/tmp_data", "uid": "9999", "gid": "9999"}}'.
  3. Invalidate build cache by adding a file. e.g. touch invalidate_cache.
  4. Remove build cache docker buildx prune.
  5. Inject cached layer. Run node ./buildkit-cache-dance/dist/index.js --cache-map '{"<path-to-cache-directory>/cache_dir": {"target": "/home/app/tmp_data", "uid": "9999", "gid": "9999"}}'
  6. Build the image again. docker buildx build --progress plain -t test-ownership ..

You should see error like this:

 > [stage-0 5/5] RUN --mount=type=cache,uid=9999,gid=9999,target="/home/app/tmp_data"     echo "Listing BEFORE writing." &&    (ls -l /home/app/tmp_data/test.txt || echo "File not yet created") &&    echo "THIS IS A TEST" > /home/app/tmp_data/test.txt &&    echo "Listing file AFTER writing." &&    ls -l /home/app/tmp_data/test.txt:
0.379 Listing BEFORE writing.
0.380 -rw-r--r-- 1 root root 15 Jun  3 15:42 /home/app/tmp_data/test.txt
0.381 /bin/sh: 1: cannot create /home/app/tmp_data/test.txt: Permission denied
------
Dockerfile:11
--------------------
  10 |
  11 | >>> RUN --mount=type=cache,uid=9999,gid=9999,target="/home/app/tmp_data" \
  12 | >>>     echo "Listing BEFORE writing." &&\
  13 | >>>     (ls -l /home/app/tmp_data/test.txt || echo "File not yet created") &&\
  14 | >>>     echo "THIS IS A TEST" > /home/app/tmp_data/test.txt &&\
  15 | >>>     echo "Listing file AFTER writing." &&\
  16 | >>>     ls -l /home/app/tmp_data/test.txt
  17 |
--------------------
ERROR: failed to solve: process "/bin/sh -c echo \"Listing BEFORE writing.\" &&    (ls -l /home/app/tmp_data/test.txt || echo \"File not yet created\") &&    echo \"THIS IS A TEST\" > /home/app/tmp_data/test.txt &&    echo \"Listing file AFTER writing.\" &&    ls -l /home/app/tmp_data/test.txt" did not complete successfully: exit code: 2

Also, the output of ls command shows that the owner is root.

scratch/buildstamp: No such file or directory

I'm running into this in the post inject step, and I'm not sure how to resolve it:

Post job cleanup.
+ : 'Argv0: /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/post'
++ dirname /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/post
+ dir=/home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4
++ read_action_input skip-extraction
++ /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/read-action-input skip-extraction
+ '[' '' == true ']'
+ : 'Prepare Timestamp for Layer Cache Busting'
+ date --iso=ns
++ read_action_input scratch-dir
++ /home/runner/work/_actions/reproducible-containers/buildkit-cache-dance/v2.1.4/read-action-input scratch-dir
+ tee scratch/buildstamp
tee: scratch/buildstamp: No such file or directory
2024-03-16T01:00:08,006374192+00:00

how is this supposed to work with Go

Hi everyone,

Having some trouble getting this working for both Go modules (go env GOMODCACHE) and Go build cache (go env GOCACHE). The Github Actions setup-go action currently takes care of saving/restoring both of those caches on the runner machine, but does not do anything to help with the Buildkit mount cache. I'm not entirely sure how to use this solution to achieve the Buildkit mount caching -- does anyone have a simple, working example? Thanks!

[v2] "post" steps are executed in a random order

https://github.com/reproducible-containers/buildkit-cache-dance/actions/runs/6175782567/job/16763351153

image

Post inject var-lib-apt into docker is executed before Post Cache var-lib-apt, but Post inject var-cache-apt is executed after Post Cache var-cache-apt

- name: Cache var-cache-apt
uses: actions/cache@v3
with:
path: var-cache-apt
key: var-cache-apt-${{ hashFiles('.github/workflows/test/Dockerfile') }}
- name: Cache var-lib-apt
uses: actions/cache@v3
with:
path: var-lib-apt
key: var-lib-apt-${{ hashFiles('.github/workflows/test/Dockerfile') }}
- name: inject var-cache-apt into docker
uses: ./
with:
cache-source: var-cache-apt
cache-target: /var/cache/apt
- name: inject var-lib-apt into docker
uses: ./
with:
cache-source: var-lib-apt
cache-target: /var/lib/apt

Err: cleaning cache source directory: Error: EACCES: permission denied

dockerfile

RUN --mount=type=cache,target=/root/.cache/go-build,sharing=locked,id=go-build-cache \
    --mount=type=cache,target=/go/pkg/mod,sharing=locked,id=go-pkg-mod \
     go mod download

COPY . .

RUN --mount=type=cache,target=/root/.cache/go-build \
    --mount=type=cache,target=/go/pkg/mod \
    go build -ldflags="-s -w" -o business main.go

git.yml

      - name: Go Build Cache for Docker
        uses: actions/cache@v4
        with:
          path: |
            go-build-cache
            go-pkg-mod
          key: go-cache-multiarch-${{ hashFiles('go.mod') }}
          restore-keys: |
            go-cache-multiarch-

      - name: inject go-build-cache into docker
          # v1 was composed of two actions: "inject" and "extract".
        # v2 is unified to a single action.
        uses: reproducible-containers/[email protected]
        with:
          cache-map: |
            {
              "go-build-cache": {
                "target": "/root/.cache/go-build",
                "id": "go-build-cache"
              },
              "go-pkg-mod": {
                  "target": "/go/pkg/mod",
                  "id": "go-pkg-mod"
                }
            }
          skip-extraction: true

the log
image

Error while cleaning cache source directory: Error: EACCES: permission denied, unlink 'go-build-cache/[email protected]/.gitignore'. Ignoring...

And it takes about the same amount of time to build as it would if we didn't use the cache.

But, my local cache builds very quickly.

Having separate inject steps causes caches to be overwritten

I have two separate caches (one for dependencies, and one for next build caches within a monorepo) that I want to use in my docker build.

The problem is the cache-dance action seems to use the same source folder when extracting the caches. When I added my next build cache to the workflow, it seems that now it overwrites the mounted directed from the first inject step and now the yarn dependencies do not get detected by the docker build.

If I combine all my caches together I'll lose the ability to skip extraction on the dependencies which change much less frequently. I will waste time extracting my yarn cache despite it not changing.

      - name: Set up Yarn build cache
        id: yarn-cache
        uses: actions/cache@v4
        with:
          path: yarn-build-cache
          key: ${{ matrix.platform }}-yarn-${{ hashFiles('yarn.lock') }}
          restore-keys: |
            ${{ matrix.platform }}-yarn-
      - name: Set up Next build cache
        id: next-cache
        uses: actions/cache@v4
        with:
          path: |
            next-build-cache
            nx-build-cache
          key: ${{ matrix.platform }}-next-${{ matrix.app.name }}-${{ hashFiles('yarn.lock') }}-${{ hashFiles(format('apps/{0}/**', matrix.app.name)) }}
          restore-keys: |
            ${{ matrix.platform }}-next-${{ matrix.app.name }}-${{ hashFiles('yarn.lock') }}-
      - name: Inject caches into Docker
        uses: reproducible-containers/buildkit-cache-dance@v3
        with:
          cache-map: |
            {
              "yarn-build-cache": "/fe/.yarn/cache"
            }
          skip-extraction: ${{ steps.yarn-cache.outputs.cache-hit }}
      - name: Inject next cache into Docker
        uses: reproducible-containers/buildkit-cache-dance@v3
        with:
          cache-map: |
            {
              "next-build-cache": "/fe/apps/${{ matrix.app.name }}/.next/cache",
              "nx-build-cache": "/fe/.nx"
            }
          skip-extraction: ${{ steps.next-cache.outputs.cache-hit }}

Large cache takes a long time to inject/extract

Mostly a discussion point, but I've noticed large caches take a very long time to inject/extract. In my case, I have a 467MB cache of npm modules. This is downloaded from github's cache into the workflow runner in 8s, and thus is often worth caching for GH workflows. However, the injection step takes 53s. It seems like a lot of this time is transferring context, supposedly 1.8GB worth (taking 30s). I'm not sure if this is because the context is no longer zipped like when GH downloads it, or if something else is causing the 500MB to triple in size. The extraction step takes 2m56s. Happy to provide any logs as needed.

Allow setting id of mount

When using docker --mount=type=cache, there is an optional argument of id. This can be used to cache the same directory under a different volume to prevent collisions. I'm not sure that it makes a ton of sense to use in this context, but when I was following a guide they provided an id for the mount. However, this action does not accept an id. This was confusing because it led to this action not working (as its injection/extraction steps dont set the id, so it wont match during the real run). I'm not sure how important it is to add, but could be nice for consistency and preventing confusion

Stored cache is empty

I am probably missing something, but doing the same as the example results in an empty cache :

      - name: Cache apk
        uses: actions/cache@v4
        id: cache-apk
        with:
          path: |
            var-cache-apk
          key: ${{ runner.os }}-apk-cache-${{ hashFiles('Dockerfile') }}
          save-always: true
          restore-keys: |
            ${{ runner.os }}-apk-cache-

      - name: Inject apk cache into Docker
        uses: reproducible-containers/[email protected]
        with:
          cache-map: |
            {
              "var-cache-apk": {
                "target": "/var/cache/apk",
                "id": "apk-cache"
              }
            }
          save-always: true
          skip-extraction: ${{ steps.cache-apk.outputs.cache-hit }}
Screenshot 2024-06-02 at 18 18 38

Dockerfile snippet using this cache :

RUN --mount=type=cache,id=apk-cache,target=/var/cache/apk \
<<EOT
set -e

echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories

apk update
apk upgrade
apk add ${PACKAGES}
EOT

I have the same issue with other caches I tried (pnpm store, build dist folders, etc), so that's not related specifically to this path but to the way I implement the action (according to the documentation).

Support explicit extract step

Not sure if this is within the realm of what this action intends to support, but I think theres a valid use-case for wanting an explicit step which extracts a folder from the build. Currently, it is only possible to have a "post" extraction step as part of the combined inject/extract action.

My use case is to extract files for upload to a CDN. In this case, I want to run an explicit upload action after the extraction step. However, since the extract currently runs in the "post" stage after all "non-post" actions, this is hard to do cleanly (its very much designed to be used with a caching action that also runs during the "post" stage).

I saw this was possible in v1, but removed in v2 in favor of the simpler combined model.

Cache restore fails after injection while trying to clean up

I'm trying to update to v3 of this action (thanks again @aminya) on a small repo handling Go builds. The cache is successfully created on the first run, but fails on restore during the second run with the following error:

Run reproducible-containers/buildkit-cache-dance@v3
  with:
    cache-map: {
    "cache-go-build": "/root/.cache/go-build",
    "go-pkg-mod": "/go/pkg/mod"
  }
  
    skip-extraction: true
    scratch-dir: scratch
FROM busybox:1
COPY buildstamp buildstamp
RUN --mount=type=cache,target=/root/.cache/go-build     --mount=type=bind,source=.,target=/var/dance-cache     cp -p -R /var/dance-cache/. /root/.cache/go-build || true
FROM busybox:1
COPY buildstamp buildstamp
RUN --mount=type=cache,target=/go/pkg/mod     --mount=type=bind,source=.,target=/var/dance-cache     cp -p -R /var/dance-cache/. /go/pkg/mod || true
[Error: EACCES: permission denied, rmdir 'go-pkg-mod/go.uber.org/[email protected]/internal'] {
  errno: -13,
  code: 'EACCES',
  syscall: 'rmdir',
  path: 'go-pkg-mod/go.uber.org/[email protected]/internal'
}
Error: EACCES: permission denied, rmdir 'go-pkg-mod/go.uber.org/[email protected]/internal'

Maybe we need to sync/flush the filesystem first before delete is possible? Why is it necessary to do this cleanup step, maybe we could just continue without deleting the source dir?

Support Glob Injection/Extraction

I'm attempting to load a cache I know exists into a cache mount

      - name: Cache All node_modules folders
        uses: actions/cache@v3
        with:
          path: ${{ github.workspace }}/**/node_modules
          key: ${{ runner.os }}-node_modules-${{ env.cache-name }}-${{ hashFiles('**/pnpm-lock.yaml') }}
          restore-keys: |
            ${{ runner.os }}-node_modules-${{ env.cache-name }}-
            ${{ runner.os }}-node_modules-
            ${{ runner.io }}-

I know the above works because the output of the pnpm install is as follows

Run pnpm install --frozen-lockfile --prefer-offline
Scope: all 11 workspace projects
Lockfile is up to date, resolution step is skipped
Packages: +3757
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

. postinstall$ rm -rf node_modules/@types/react-native
. postinstall: Done
Done in 3.4s

I've tried something as follows but it doesn't seem to mount correctly?

 - name: Load pnpm cache into Docker Container
    uses: reproducible-containers/[email protected]
     with:
         cache-source: ${{ runner.os }}-pnpm-${{ env.cache-name }}-${{ hashFiles('**/pnpm-lock.yaml') }}
          cache-target: /mono/**/node_modules
RUN --mount=type=cache,id=pnpm,target=/mono pnpm install --prod --frozen-lockfile

But alas no luck. Any pointers?

dancing with pip cache on a python image

I am attempting to use buildkit-cache-dance to cache pip dependencies in a GitHub Actions workflow but am encountering issues where the cache is not being used.

My example repo: mgaitan/pip-docker-cache-dance

Consider this commit where I removed a dependency, while supposely the rest are available in the cache.

However the logs indicates that despite the cache directive, pip dependencies are being downloaded again.

I'd appreciate any insights or assistance to resolve this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.