Giter VIP home page Giter VIP logo

Comments (4)

sjlcwn avatar sjlcwn commented on May 27, 2024

When I run the following command to start vectorvisor, with same testing command.

cargo run --release -- --ip=0.0.0.0 --heap=3145728 --stack=262144 --hcallsize=131072 --partition=false --serverless=true --vmcount=1 --vmgroups=1 --interleave=1 --pinput=true --fastreply=true --lgroup=1 --nvidia=true --input=benchmarks/scrypt/target/wasm32-wasi/release/scrypt.wasm
go run run_scrypt.go 127.0.0.1 8000 1 1 300 256

Vectovisor would complain about CL_NV_INVALID_MEM_ACCESS. The full log is in the attachment.
invalid_memory.txt

Do you have some idea about why this happen, or some advice on debugging with the generated kernel?

from vectorvisor.

SamGinzburg avatar SamGinzburg commented on May 27, 2024

Hi,

Thanks for taking the time to check out VectorVisor! Those are all great questions.

First, I should probably explain the "partitions" concept as it wasn't included in the final paper (and it was also excluded from the evaluation). Early on (and without the partitioner in general) I ran into problems with register spilling and was trying to figure out ways to reduce the overhead from this for the comparatively large programs I was trying to run. The idea is that the overhead from calling into other kernels via the CPS transform was cheaper in some cases than the extra register spilling, and we did have positive results.

The solution I came up with was to partition the resulting openCL kernels by function size/register usage based on the control flow graph (which functions call which other functions + some other heuristics). The partitioner does actually work despite not being in the paper, but when the --partition=false flag is present, there also needs to be a --maxdup=0 flag paired with it to ensure that functions aren't duplicated (as they would be if it were enabled). Alternatively you can enable it and play around with the maxdup value. The default value for this is "1" (1 extra include per function), which is why this error is popping up.

I think you brought up a good point though, which is that this mismatch with false/maxdup could be checked in src/main and an error value returned then (I was the only user for quite some time so never ran into this haha).

For the invalid memory access I suspect it is a result of the lack of compiler optimizations applied to the WASM binary which causes extra memory usage in the final program. It's also possible that commenting those lines out caused weird behavior/bugs elsewhere. The set entry point debug line only prints the expected value when you are running 1 function per partition (an artifact from a debug configuration I used often). You can run with the debugcallprint flag enabled and that will correctly log the entry point in any configuration, although it will dump a volume of data to stdout (all function calls, etc...).

If you are testing locally, you can make use of the run_cached_bin.sh script (poorly named I admit) in the benchmarks dir, which will run all of the compiler optimizations we run in the final paper + invoke VectorVisor with the correct CLI arguments.

To run the scrypt benchmark locally you would add the line to the script:

# format is:
# command, benchmark, heap, stack, hypercall buffer, vmcount, ignore last arg
comp "scrypt" "3145728" "131072" "131072" "VMCOUNT" "5120"

and replace VMCOUNT with a vm count value that fits your local GPU (ignore the last argument, as the script was copy pasted from the script I used in the final evaluation).

After the benchmark is ran at least once you can replace the comp command with runbin and VV should load much faster.

I just reran the benchmark on an RTX 2080 Ti + 2048 VMs and it works for me. Let me know if it still doesn't work after.

  • Sam

from vectorvisor.

sjlcwn avatar sjlcwn commented on May 27, 2024

With the help of run_cached_bin.sh, I could run the scrypt benchmark and get some result.

#added to run_cached_bin.sh
comp "scrypt" "3145728" "131072" "131072" "64" "5120"
go run run_scrypt.go 127.0.0.1 8000 64 1 300
server is active... starting benchmark
Benchmark complete: 232249 requests completed
duration: 300.000000
Total RPS: 774.163333
On device execution time: 39445040.440148
Average request latency: 82649467.696640
queue submit time: 9007.574870
submit count: 1.000000
unique fns: 1.000000
Request Queue Time: 3900.823155
Device Time: 82170664.866686
overhead: 488422.273129
compile time: 326291776113.782898

And I realized that some paramters of the testing script are related to the VV setting.

Thanks for the explanation and suggestions.

from vectorvisor.

SamGinzburg avatar SamGinzburg commented on May 27, 2024

No problem. I'll close this issue for now as it seems to be resolved.

from vectorvisor.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.