Giter VIP home page Giter VIP logo

Comments (36)

aminya avatar aminya commented on August 25, 2024 26

This cache miss across different branches is creating anti-patterns in the way people work. It has been my experience that people do not create different branches or smaller pull requests because the build would start from scratch!

This should be fixed. It is not just a miss-documentation.

from cache.

dkubb avatar dkubb commented on August 25, 2024 11

@chrispat

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo

My assumption (before finding this thread) was that caches would be shared between branches in a single repository. I kept wondering why my merge commits were not using cache even though the cache key was identical to the parent commit's build.

I would definitely love to see caches be shared between branches. I can understand wanting to avoid fork PRs from "poisoning the cache", but I think it should be safe for all branches in a given repository to use a shared cache.

EDIT: A compromise might be that when a feature branch is merged into the default branch it's cache could be merged into the default branch's cache, and then the least recently used of this cache can be purged to bring it down under the limit. Worst case this is no worse than now, but best case we can use the same cache as the final commit in the feature branch.

from cache.

github-actions avatar github-actions commented on August 25, 2024 9

This issue is stale because it has been open for 365 days with no activity. Leave a comment to avoid closing this issue in 5 days.

from cache.

moh-eulith avatar moh-eulith commented on August 25, 2024 8

Screenshot from 2023-03-30 19-07-42
It should be obvious from the above screenshot that this is huge waste of time for workflows as well as waste of space for github.
The cache key is not chosen randomly. It means something regarding the contents of the cache. There should be a configuration option that allows a repository to opt-in to trusting its own cache across branches.

from cache.

jbergstroem avatar jbergstroem commented on August 25, 2024 7

Also need to voice my concern here. It's great that Github provides security support wheels for every repository on Github with a thoughtful set of sane defaults but additionally expose the ability to control these things! That said, If we can understand the security implications of reusing caches, we can also understand how to layer and separate them.

The impact across every repo seems detremental: developers wait longer for PR's to land (wastes time) and pays more for accrued CI minutes (wastes money).

A solution that comes to mind would be adding a scope or flag for how the cache would be accessible (type: private - only accessible to same ref or perhaps scope: $ref or glob).

from cache.

gyfis avatar gyfis commented on August 25, 2024 6

Interesting! I’m glad there’s a way for it to work in its current state, but I feel like sharing (writable) caches between PRs would also be nice. Are you thinking about expanding this functionality, e.g. by making the scope a parameter of the cache action?

from cache.

gyfis avatar gyfis commented on August 25, 2024 6

I want to confirm that adding the cache persist to master push indeed gives pull request access to the same cache. This is the workflow I used (filename cache_persist.yml):

name: Cache persist

on: 
  push:
    branches:
      - master

env:
  BUNDLE_GEMFILE: .github/Gemfile

jobs:
  cache_persist:

    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v1

      - uses: actions/setup-ruby@v1
        with:
          ruby-version: '2.6'

      - uses: actions/cache@v1
        with:
          path: .github/vendor/bundle
          key: github-gems-${{ hashFiles('**/.github/Gemfile.lock') }}
          restore-keys: |
            github-gems-

      - name: bundle install
        run: |
          gem install bundler --no-document
          bundle check --path=vendor/bundle || bundle install --jobs=4 --retry=3 --path=vendor/bundle

Some good next steps for this issue:

  • acknowledging that this is the expected behavior by updating README.md or another document describing the scope functionality and the possibility to add read-only cache by using the master push, (and closing this issue), or
  • making the cache scope mechanism publicly modifiable with appropriate description and API, (and closing this issue), or
  • (the most low-effort one) using this issue as the reference for this functionality, and closing it with such confirmation

@dhadka I’m down for adding a PR to a documentation somewhere describing what’s up with the scoping, if you approve.

from cache.

pkerschbaum avatar pkerschbaum commented on August 25, 2024 5

I also want to add that this cache isolation costs many compute minutes and does not make any sense IMO for organizational private repositories, where every branch is from a trusted developer anyways (so the code-injection-via-cache problem does not exist).
We @hokify use the cache for our monorepo for pnpm store caching and Docker layer caching, and we already experience heavy cache thrashing due to this isolation, which is costly (compute minutes) and reduces productivity (developers have to wait longer for CI/CD to finish).
I would even go so far to say we would have chosen another CI/CD solution if we had known this beforehand, we struggle to get fast CI/CD now because of this...

from cache.

chrispat avatar chrispat commented on August 25, 2024 4

Here is the excerpt from the documentation

A workflow can access and restore a cache created in the current branch, the base branch (including base branches of forked repositories), or the default branch (usually master). For example, a cache created on the default branch master would be accessible from any pull request. Also, if the branch feature-b has the base branch feature-a, a workflow triggered on feature-b would have access to caches created in the default branch (master), feature-a, and feature-b.

Access restrictions provide cache isolation and security by creating a logical boundary between different workflows and branches. For example, a cache created for the branch feature-a (with the base master) would not be accessible to a pull request for the branch feature-b (with the base master).

I think it pretty clearly lays out the access restrictions but I would be happy to take feedback on how to improve it.

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo, however, I don't think I would ever allow that for PRs coming from forks of public repos.

from cache.

LubomirGeorgiev avatar LubomirGeorgiev commented on August 25, 2024 4

+1

from cache.

dhadka avatar dhadka commented on August 25, 2024 3

There is some documentation on scopes in the help docs - https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows#cache-scope

The scope was designed this way to prevent poisoning the cache of other branches. For example, without scoping, someone could create a PR and inject malicious libraries into the cache, which are subsequently read by other branches.

@chrispat I think we can help make the documentation clearer on how the scope works, particularly with respect to the benefit of creating the cache in the default branch so it can be shared among PRs.

from cache.

flippidippi avatar flippidippi commented on August 25, 2024 2

This is still a problem for us too. We don't want to use restore-keys because we would like to use npm ci. We would like to share node module caching across all branches with the package-lock hash as the key to minimize installing packages as some of our private packages cost per install.

from cache.

dhadka avatar dhadka commented on August 25, 2024 1

I've created an internal issue to update the help docs.

from cache.

gyfis avatar gyfis commented on August 25, 2024 1

@YashMathur Yes, that’s exactly right.

This may be obvious but if the cache doesn’t change often, all workflows are fast (PR workflows read the master cache, and the master push workflows also read the master cache). If we don’t count parallelism, a new cache key in this setup is build exactly two times - once in a new feature branch that changes it, and once in a master push branch after merge - and then it’s saved and accessible everywhere.

(Trying to paraphrase the docs here to see if I’m understanding them correctly): Master branch workflow never reads any cache from feature branches, and there’s no cache passing between feature branches, either - that way no branch can pollute cache, unless merged into master.

from cache.

jfo84 avatar jfo84 commented on August 25, 2024 1

I think it might be possible to allow for sharing writable caches across PRs targeting the same base within a repo, however, I don't think I would ever allow that for PRs coming from forks of public repos.

@chrispat This would be great for us. I would prefer to not have to trick the flow by checking into master after each test run to get this kind of functionality. I would imagine forks are an edge case that few people care about in practice.

from cache.

dhadka avatar dhadka commented on August 25, 2024

Yes, it sounds like this behavior is caused by the scope. The PR refs/pull/1660/merge has read-write permissions to caches in the scope refs/pull/1660/merge and read-only permissions on the default branch scope (typically master). PR refs/pull/1661/merge does not have permissions to read the cache from refs/pull/1660/merge and vice versa. Essentially, you would need some action to trigger after merging the change into master which saves the cache. Once saved to the default (master) branch scope, it will be readable by PRs.

from cache.

dhadka avatar dhadka commented on August 25, 2024

Actually, the linked docs is slightly wrong in saying "You can restore a cache created in any branch of a repository, including base branches of forked repositories."

from cache.

YashMathur avatar YashMathur commented on August 25, 2024

@gyfis does the cache persist workflow that you posted copy the cache from a feature branch to master?

from cache.

gyfis avatar gyfis commented on August 25, 2024

@YashMathur I wouldn’t say copy, really -
the push if: master workflow is simply using the same cache as the pull request ones from feature branches, and in so making it available for newer pull request workflows.

from cache.

YashMathur avatar YashMathur commented on August 25, 2024

@gyfis If I understand this correctly, you're running the bundle install every time you merge a PR to master so the cache lives in master's scope. It can then be accessed by other branches.

I don't think the Cache persist workflow will be able to restore any cache that is created by pull requests. Is that correct?

from cache.

mikekaminsky avatar mikekaminsky commented on August 25, 2024

WHOAH -- I will just add on that I found this very confusing and have been spending a lot of hours trying to figure out why my cache hasn't been working as I'd expected. I definitely think the docs could be clearer that the cache is not accessible across branches by default, and adding an example of the job shown here as a way to achieve cache-sharing would be super helpful.

Thanks to those in the thread for laying this out so clearly -- this was driving me bananas coming from gitlab where cross-branch caching is the default. Since it takes about an hour to build the cache for my project, this is going to be a huge win for us!

from cache.

gyfis avatar gyfis commented on August 25, 2024

I don't think this issue is solved yet... but since GitHub has a way of approving running checks for external PRs, there may be a possibility to condition access to cache in some way as well

from cache.

lhotari avatar lhotari commented on August 25, 2024

I'm sorry to spam this issue, but I have a question that is related to caches in pull requests.

The question is about the quota that caches for pull requests consume. When a pull request build uses GitHub Actions Cache, is the quota consumed from the upstream repository where the Pull Request was created, or does the pull request build consume the quota of the forked repository?

I have posted the question in GitHub Community forum for GitHub Actions: https://github.community/t/how-does-the-github-actions-cache-size-limit-work-for-pull-requests/240546/3 .

Would someone be able to reply to this question? Thanks.

(update: #534 seems to be related to my question, I'm sorry about the noise.)

from cache.

chrispat avatar chrispat commented on August 25, 2024

The cache lives in the repo the workflow runs are in which in the case of a pull request from a fork is the upstream repo.

from cache.

chrispat avatar chrispat commented on August 25, 2024

This is still a problem for us too. We don't want to use restore-keys because we would like to use npm ci. We would like to share node module caching across all branches with the package-lock hash as the key to minimize installing packages as some of our private packages cost per install.

If the branch your PR is targeting has the same package lock has as your branch then it should pull the cache from the branch the PR is targeting. The way the cache scopes work is you get a R/W to your scope (your branch) and a RO to the PR target branch and the repos default branch.

from cache.

bishal-pdMSFT avatar bishal-pdMSFT commented on August 25, 2024

@chrispat @N-Usha looks like the doc is still not super helpful in calling out the scopes around PR branches. Should we update the doc with details for PR specifically?
Another possibility now that we have list API is to present a better message as to why no cache was not found (i.e. was it key, or the scope or the version)

from cache.

avivek avatar avivek commented on August 25, 2024

This is something that we are missing too. The caches being scoped to the PR's merge commit branch leads to a lot of wasted action minutes in our setup too. We are using the org enforced required workflows(Beta right now), which at this point do not run on pushes but only on PR's. With the current cache scoping we are basically unable to use any kind of caching.

from cache.

patrikhuber avatar patrikhuber commented on August 25, 2024

I was also wondering why my caching wouldn't work for different branches/PRs, until I eventually stumbled upon this issue.

This is a particular issue when using CMake builds with vcpkg (C++) and vcpkg's binary caching. For every new branch or PR, it kicks off vcpkg's build process which source-builds the required dependencies, and that takes over an hour on the CI runner.

Is there any way to share the caches across branches / PRs yet?

from cache.

techi602 avatar techi602 commented on August 25, 2024

Yea, it is absurd restriction. I understand this is enabled by default to avoid some cache poisoning, but you should be able to opt-out. Maybe github wants your build to run longer in order to charge you more money 😄
Otherwise I am forced to write my own cache workflow which will sync the content of the cache dir to some persistant external storage. 😞

from cache.

ImanMahmoudinasab avatar ImanMahmoudinasab commented on August 25, 2024

@t-dedah @vsvipul A lot of people including me have issues with the current limitation on cache scope. Could you please provide some explanation on what's your plan for addressing this?

from cache.

jbergstroem avatar jbergstroem commented on August 25, 2024

I created a github action that only checks the cache instead of checking and restoring it. It can be useful to run as a job when you merge to a main branch so that consequent PR's can use the github cache action to restore a package cache.

Take a look here: https://github.com/jbergstroem/cache-key-exists

from cache.

miluoshi avatar miluoshi commented on August 25, 2024

I created a github action that only checks the cache instead of checking and restoring it. It can be useful to run as a job when you merge to a main branch so that consequent PR's can use the github cache action to restore a package cache.

Take a look here: jbergstroem/cache-key-exists

Your action can be replaced with actions/cache/restore:

lookup-only - If true, only checks if cache entry exists and skips download. Default: false

  - uses: actions/cache/restore@v3
    with:
      path: path/to/dependencies
      key: ${{ runner.os }}-...
      lookup-only: true # <---- THIS

from cache.

jbergstroem avatar jbergstroem commented on August 25, 2024

Your action can be replaced with actions/cache/restore:

Oh! Never saw this. Thanks!

from cache.

Aiasp1985P avatar Aiasp1985P commented on August 25, 2024

from cache.

cncolder avatar cncolder commented on August 25, 2024

If the cache is tied to the branch, please delete the cache after the branch is deleted.

from cache.

github-actions avatar github-actions commented on August 25, 2024

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

from cache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.