Giter VIP home page Giter VIP logo

Comments (9)

ScottTodd avatar ScottTodd commented on August 28, 2024 1

Here's another failure: https://github.com/iree-org/iree/actions/runs/10530365837/job/29180735869#step:6:180

process = <Popen: returncode: 1 args: ['iree-run-module', '--device=hip', '--module=/h...>
stdout = b''
stderr = b"iree/runtime/src/iree/vm/bytecode/archive.c:106: INVALID_ARGUMENT; FlatBuffer data is not present or less than 16 by...iree_tests_cache/artifacts/sdxl_unet/model.rocm_gfx942.vmfb'; loading modules and dependencies; creating run context\n"

Maybe another run started compiling to the same path, overwriting/deleting the file that this run tried to use.

from iree.

ScottTodd avatar ScottTodd commented on August 28, 2024

Reading through the code more, I have a fix in mind... but it's going to be tricky to test.

We can keep using the ArtifactGroup class, but change the directories used to point at two locations: the persistent cache for FetchedArtifact and a local workspace for ProducedArtifact. The locations of those can come in through environment variables.

Bugs are still possible if we change the remote file contents though. Using a cache implementation like huggingface's (git LFS) would be better - that downloads versions from git hashes and then creates symlinks into the refs.

from iree.

ScottTodd avatar ScottTodd commented on August 28, 2024

Alternate approach (closer to huggingface): always use a workspace-relative location, but create symlinks from the cache into that directory before the tests are run.

from iree.

benvanik avatar benvanik commented on August 28, 2024

that would be neat - especially if the cache paths are uniqued per hash such that you can be updating the cache with live symlinks to older versions concurrently (similar to what bazel does when linking from the sandbox to its storage)

from iree.

ScottTodd avatar ScottTodd commented on August 28, 2024

See what huggingface does here: https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache

<CACHE_DIR>
├─ datasets--glue
│  ├─ refs
│  ├─ blobs
│  ├─ snapshots
...

They have tools for choosing where you want files to appear: https://huggingface.co/docs/huggingface_hub/en/guides/download#download-files-to-a-local-folder . The recommended options use symlinks with all the details about refs and blobs hidden from the user.

from iree.

ScottTodd avatar ScottTodd commented on August 28, 2024

Sketches of what a fix could look like here: https://github.com/iree-org/iree/compare/main...ScottTodd:infra-regression-suite-fix?expand=1. This is messy though - needs some deeper thinking. I also haven't tried running these tests locally - setup steps are complicated x_x.

We've been bouncing these tests back and forth between this repo and https://github.com/nod-ai/SHARK-TestSuite/ (https://github.com/iree-org/iree-test-suites could also be an option, if the Azure paths were replaced with something easier to modify for contributors). The repository-specific things are compile flags / spec files and expected dispatch counts / benchmark metrics.

from iree.

saienduri avatar saienduri commented on August 28, 2024

We can also just rework this a little bit: https://github.com/iree-org/iree/blob/main/experimental/regression_suite/ireers_tools/fixtures.py. Instead of making it output a produced artifact, it can just save the vmfb in the local dir. We basically don't use ProducedArtifact at all. I can get a fix out for that. Should be pretty easy and clean.

from iree.

ScottTodd avatar ScottTodd commented on August 28, 2024

Okay yeah, I think that sounds good.

from iree.

saienduri avatar saienduri commented on August 28, 2024

We probably do need a better longterm solution for the mlirs, weights changing in the persistent cache (huggingface seems good), but we can be strategic about when we update the mlirs and weights in Azure for now (also doesn't happen often)

from iree.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.