Giter VIP home page Giter VIP logo

Comments (6)

garlick avatar garlick commented on September 9, 2024 1

Pondering about a unified utility... Maybe:

flux archive create ARCHIVE-KEY [--mmap] PATH ...
flux archive remove ARCHIVE-KEY
flux archive list ARCHIVE-KEY [pattern]
flux archive get ARCHIVE-KEY [pattern]

where the create subcommand would work on any rank, unless the --mmap option is specified in which case rank 0 only.

In other words do away with the tags used in flux filemap and use KVS keys instead. What's actually stored under the keys could be debated, but it might be OK to move at least the metadata out of rank 0 broker memory into the KVS....

from flux-core.

garlick avatar garlick commented on September 9, 2024

Storing files in the kvs might be another way to go, where storage would be consumed on rank 0 and the content cache would be leveraged for parallel reads. The net result is not that different from copying files to storage on rank 0, mmapping them, and then fetching them through the content cache

from flux-core.

grondo avatar grondo commented on September 9, 2024

One thought is that it might be nicer for users if there was one set of commands that work in both use cases here: distributing files from rank 0 vs from other ranks. I wonder if we could offer a command that does the Right Thing in either case?

One other thought is when running many jobs, each of which uses this facility, content store usage on rank 0 could grow quickly and there is no way to remove the archives.

Otherwise, I think the flux kvs archive could work and perhaps is a handy tool nonetheless (maybe someone wants to archive results or something in the kvs for provenance, etc)

from flux-core.

garlick avatar garlick commented on September 9, 2024

Great points!

If we go forward, then I agree, we probably should take a look at redesigning flux filemap (possibly renamed) to incorporate this rather than tuck it away in flux kvs.

A TODO for the prototype is to figure out how to reference the content blobs so the archive would be complete on a dump/restore. I thought maybe if the key is data.foo, we could optionally write a data.foo.blobs directory containing keys that are just the blobref strings, that point to the actual blobs. As it is, the archive references blobs that might not be included in the dump.

The caveats would need to be documented of course, but I'm liking this because it leverages a lot of existing work.

from flux-core.

grondo avatar grondo commented on September 9, 2024

Yeah, nice work!

from flux-core.

garlick avatar garlick commented on September 9, 2024

I think I have an OK solution to the Cray problem where shell 0 puts data in an archive and other shells take it out with:

$ flux archive create [-k KEY] --no-force-primary PATH ...
$ flux archive extract [-k KEY] --no-force-primary --waitcreate [PATTERN]

but it makes me wonder if we should have a programmatic interface for shell plugins rather than requiring a shell plugin to exec a command?

Not sure what that looks like but thought I'd put the thought out here for me in the morning or anybody else who wants to weigh in 😃

from flux-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.