Giter VIP home page Giter VIP logo

Comments (15)

TheCodez avatar TheCodez commented on August 28, 2024 1

I plan to add clustering to compare cluster velocity with the simulator ground thruth at some point (not in the near future). This way we have a qualitative comparison of this projects performance. I'll add an issue to track this.

from dynamic-occupancy-grid-map.

cbachhuber avatar cbachhuber commented on August 28, 2024 1

+1 for CI! I'm just reading that github hosts CI for free for open source projects, I didn't expect that. I guess setting up CI will be interesting with the CUDA dependency; let's see. We can also make first baby-steps with CI once the first unit tests for the utils library are working.

Cool, I'm adding the time logging to my plans 👍

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024 1

Moved to here #31 :)

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024 1

Might take some time I'm thinking of using this abstraction https://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/St11ASX_CUDA.pdf to keep the AoS syntax.
But at least with the current implementation I know that SoA is the way to go :)

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024

Conversation moved here from #19

@cbachhuber wrote:

Good question! I'm on a Nvidia Quadro P2000, driver 435.21. I think that performance-wise, it's roughly equivalent to a GTX 1050. I see the following execution times of demo on the current master branch:

Parameter modification Cycle time Setup duration (until the for loop in main) overall execution time
none 138-156ms 44s 47s
particle_count = 1*10e5 52-63ms 12s 13s
particle_count = 6*10e4 35-39ms 6.8s 8.4s

I don't see a significant time difference between release and debug, tested with a few runs. I saw a strong influence of particle_count on the execution time of setupRandomStatesKernel when playing around with the code earlier. Is the same true for you? Which executions times do you see? Also, is such a high particle count necessary? When I used lower numbers (>5*10e4), I didn't see (subjectively) worse results.

Since execution time is one of the main motivations of having this project in addition to mitkina/dogma, I would suggest documenting mine and your cycle/iteration times somewhere quickly accessible (in the top-level readme, or in a file linked from it). What do you think?

@TheCodez wrote:

Thanks for the detailed benchmark. I’m seeing similar timings (slightly slower) using a GTX 1050 on my laptop.

I have an idea to improve the random setup time. Overall my goal is to improve the performance to get close to the timings in the paper, from which I‘m still far away considering the amount of particles they use. I hope that switching from AoS to SoA will give a ~3x performance boost.

Yes, the particle count might be too high for this basic scenario and grid Size. I just tried setting the particle count about as high as my system supports (2GB Video ram only). Actually if you set the resolution to 0.1 you‘ll see that the particle count is too low for that grid size.

I will add your timings to the readme plus a comparison with the paper timings/particle count.

@cbachhuber wrote:

I'm looking forward to your setup time improvement!

For the paper, they use a GTX 980, which is more than twice as powerful as our GPUs. Therefore, if you achieve approximately double the cycle time as reported in the paper, you should have an equally optimized algorithm, right? Of course there is still some way to go.

I see, so the particle count is well motivated 👍

Cool; I would also mention the GPU performance difference between the paper and our experiments.

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024

@cbachhuber I moved the conversation to this issue, instead of a closed pr :)

Commit 36232e0 should fix the long init times and also slightly improve runtime performance. Init times went from ~44s to ~1.5s with no visible degrade in quality. Calling a CUDA function before calling the DOGM init further reduces the time to about ~700ms.

from dynamic-occupancy-grid-map.

cbachhuber avatar cbachhuber commented on August 28, 2024

@cbachhuber I moved the conversation to this issue, instead of a closed pr :)

The one and only correct thing to do 😅 👍

Commit 36232e0 should fix the long init times and also slightly improve runtime performance. Init times went from ~44s to ~1.5s with no visible degrade in quality. Calling a CUDA function before calling the DOGM init further reduces the time to about ~700ms.

Now that's amazing stuff! Thanks for improving so drastically in no time, I'm impressed! 😮 I can confirm these numbers, I now see ~740ms init time.

Also thanks for adding performance to the readme! 👍

from dynamic-occupancy-grid-map.

cbachhuber avatar cbachhuber commented on August 28, 2024

no visible degrade in quality

Let me just ask this one off-topic question here: do you have plans for an objective performance measurement? I.e. a module that compares the result of the grid to the ground truth input, computes noise etc.

I think this would be valuable to have, we could open another issue for that.

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024

@cbachhuber do you get a speedup with either
https://github.com/TheCodez/dynamic-occupancy-grid-map/tree/reduce_thread_divergence
or
https://github.com/TheCodez/dynamic-occupancy-grid-map/tree/vectorized_types

I see no difference.

from dynamic-occupancy-grid-map.

cbachhuber avatar cbachhuber commented on August 28, 2024

I'm also not seeing a significant difference, unfortunately.

Branch Init Time Iteration Time
master 1384ms 132-174ms
reduce_thread_divergence 1384ms 136-185ms
vectorized_types 1360ms 137-202ms

I always see an outlier iteration time during the second iteration. This is 40-60ms higher than the other valuse I see, see maxima in table above. What could be the reason for this?

What do you think about logging iteration time during execution, and printing min/mean/median/max at the end? Similar to what the precision evaluator is doing?

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024

Thanks for testing, I'll take a look.

That's a pretty good idea 👍
I'm thinking of adding CI add some point. This way we could always see if we're regressing on quality/performance on each change.

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024

@cbachhuber
(Your comment was duplicated so I removed it for clarity)

I added CI for Ubuntu. It's not compiling successfully atm because of some linker errors
e.g. undefined reference to __glewBindVertexArray. Any ideas how to fix that?

As the CI machines have no GPU we can only compile not run the code, but I think this is fine.

from dynamic-occupancy-grid-map.

cbachhuber avatar cbachhuber commented on August 28, 2024

Thanks for removing, I had a weird connection issue yesterday and actually clicked 'Comment' twice 😅

I don't know why this happens on CI only. I'm also playing around with CI at the moment (though I don't have much time today).

  • I see that you already tried the issues with static linking. This would also have been my first suggestion.
  • In general, I suggest to replace ubuntu-latest with ubuntu-18.04 for stability. I'm on 18.04 and it's running. CUrrently this should not make a difference, as the two are equivalent.

I will also try to get this to compile in the next days, let's see who solves this first ;)

And I agree, compiling is for now enough.

from dynamic-occupancy-grid-map.

TheCodez avatar TheCodez commented on August 28, 2024

Using SoA instead of AoS particles reduced the runtime by around 50ms. Changing the grid cells to SoA gives a small improvement.

See https://github.com/TheCodez/dynamic-occupancy-grid-map/tree/soa

Big improvements coming soon 😄

from dynamic-occupancy-grid-map.

cbachhuber avatar cbachhuber commented on August 28, 2024

Awesome, looking forward to that! 😃 👍

from dynamic-occupancy-grid-map.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.