When a requester node sees a request for a docker job with an image tag, no check is m

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Auto-resolve docker latest tag to real digest about bacalhau HOT 3 CLOSED

rossjones commented on June 1, 2024

Auto-resolve docker latest tag to real digest

from bacalhau.

Comments (3)

simonwo commented on June 1, 2024

I think includes all images without a digest, even if the tag isn't latest.

from bacalhau.

simonwo commented on June 1, 2024

@simonwo:

This is a bug that our real users have already hit multiple times, where they make several pushes to mytag:latest and then are baffled as to why the previous version of the image is being used. It also has the same issues discussed above for URL inputs because different compute nodes will have different cache states, so if you are running a job concurrently you may end up with nodes running the job in different images = bad. Obvious issue for reproducability also in that being able to re-run some Docker job with a latest tag is not generally possible without knowing what the state of latest was when the job was run.

We have already had some back and forth on this issue, but I've not heard any compelling arguments as to why the requester should not be able to do this, optionally. I don't think telling the users "you should use a sha256 digest of the image, actually" is practical and again to require the client to do this would require them to have Docker tools installed on their machine/web browser, which again is not practical.

@wdbaruni:

Wouldn't querying the registry in the compute nodes and refreshing the cache if stale solve this issue without adding a dependency on the requester node? It will still be the user's responsibility to make sure the image is not updated while submitting a job with high concurrency, which doesn't sound bad to me, specially that 99% of our jobs are with no concurrency. We can also improve the user experience by sharing the digest of the image that was used to run the job after the job is completed to help them debug incidents.

@frrist:

Possible unpopular opinion- we should disallow the usage of latest in the DockerEngine spec (and just in general). And this doesn't mean the Requester couldn't resolve this before running the job, but we should always be explicit about versioning, especially in the protocol, otherwise things will (probably) break or behave unpredictably.

from bacalhau.

simonwo commented on June 1, 2024

Part of the issue is that even querying the Docker Hub registry costs 1 of the compute nodes 100/hr requests, even without a pull. So unless they pay for Docker Hub Pro or whatever, they are quickly overwhelmed and at worst can only handle 100 Docker jobs per hour. So we have to cache queries as well, as discussed in #2254.

Again, this feature is mainly about reproducability and concurrency issues. Agreed that concurrency-based errors should be rare, but if we have the option to remove them entirely, why not do that? Is an optional uplift at the requester node problematic?

Reproducability is something that we talk about as being important, and it certainly is for some users (e.g. Desci-ers). Would be interested in discussion about "how reproducible is enough" for a results cache like Octostore, I had been assuming that unless the job could be condensed into a completely canonical form, then that wouldn't be good enough.

from bacalhau.

Auto-resolve docker latest tag to real digest about bacalhau HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent