Giter VIP home page Giter VIP logo

Comments (23)

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

Probably not, but I don't see how it can be connected.
I can try if you want.

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

I just pulled the latest commit on master and I see no change in the memory usage.

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

I did and according to massif the memory usage remains constant (around 50MB with peaks at 200MB). With a --pages-as-heap=yes profiler shows the real usage and it says that most of the memory is used by tbb.

  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 61 40,068,585,769    2,334,482,432    2,334,482,432             0            0
100.00% (2,334,482,432B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->93.47% (2,181,947,392B) 0x82A46B9: mmap (mmap.c:34)
| ->48.87% (1,140,854,784B) 0x82203CF: new_heap (arena.c:438)
| | ->48.87% (1,140,854,784B) 0x8220C1F: arena_get2.part.3 (arena.c:646)
| |   ->48.87% (1,140,854,784B) 0x8227248: malloc (malloc.c:2911)
| |     ->25.87% (603,979,776B) 0x798FE76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| |     | ->25.87% (603,979,776B) 0x798FF17: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| |     |   ->25.87% (603,979,776B) 0x4C462AC: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |     ->25.87% (603,979,776B) 0x4C448D7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |       ->25.87% (603,979,776B) 0x4C44A55: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |         ->25.87% (603,979,776B) 0x4C44CF7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |           ->25.87% (603,979,776B) 0x87786B8: start_thread (pthread_create.c:333)
| |     |             ->25.87% (603,979,776B) 0x82AA41B: clone (clone.S:109)
| |     |               
| |     ->23.00% (536,875,008B) 0x4C3EDC6: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |       ->23.00% (536,875,008B) 0x4C4893B: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |         ->23.00% (536,875,008B) 0x4C44A63: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |           ->23.00% (536,875,008B) 0x4C44CF7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |             ->23.00% (536,875,008B) 0x87786B8: start_thread (pthread_create.c:333)
| |               ->23.00% (536,875,008B) 0x82AA41B: clone (clone.S:109)
| |                 
| ->37.37% (872,415,232B) 0x822035B: new_heap (arena.c:427)
| | ->37.37% (872,415,232B) 0x8220C1F: arena_get2.part.3 (arena.c:646)
| |   ->37.37% (872,415,232B) 0x8227248: malloc (malloc.c:2911)
| |     ->20.12% (469,762,048B) 0x4C3EDC6: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     | ->20.12% (469,762,048B) 0x4C4893B: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |   ->20.12% (469,762,048B) 0x4C44A63: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |     ->20.12% (469,762,048B) 0x4C44CF7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |     |       ->20.12% (469,762,048B) 0x87786B8: start_thread (pthread_create.c:333)
| |     |         ->20.12% (469,762,048B) 0x82AA41B: clone (clone.S:109)
| |     |           
| |     ->17.25% (402,653,184B) 0x798FE76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| |       ->17.25% (402,653,184B) 0x798FF17: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| |         ->17.25% (402,653,184B) 0x4C462AC: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |           ->17.25% (402,653,184B) 0x4C448D7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |             ->17.25% (402,653,184B) 0x4C44A55: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |               ->17.25% (402,653,184B) 0x4C44CF7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |                 ->17.25% (402,653,184B) 0x87786B8: start_thread (pthread_create.c:333)
| |                   ->17.25% (402,653,184B) 0x82AA41B: clone (clone.S:109)
| |                     
| ->05.58% (130,150,400B) 0x87791D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
| | ->05.58% (130,150,400B) 0x4C44923: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |   ->05.22% (121,753,600B) 0x4C44A55: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |   | ->05.22% (121,753,600B) 0x4C44CF7: ??? (in /usr/lib/x86_64-linux-gnu/libtbb.so.2)
| |   |   ->05.22% (121,753,600B) 0x87786B8: start_thread (pthread_create.c:333)
| |   |     ->05.22% (121,753,600B) 0x82AA41B: clone (clone.S:109)
| |   |       
| |   ->00.36% (8,396,800B) in 1+ places, all below ms_print's threshold (01.00%)
| |   

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

costashatz avatar costashatz commented on May 29, 2024

@Aneoshun any news on this one?

I tried the following very simple code and I excessive memory usage (mainly from mmap) in different computer configurations and versions of tbb:

#include <iostream>

#include <tbb/blocked_range.h>
#include <tbb/parallel_for.h>
#include <tbb/parallel_sort.h>
#include <tbb/partitioner.h>
#include <tbb/task_scheduler_init.h>

namespace parallel {

    typedef tbb::blocked_range<size_t> range_t;

    static void init()
    {
        static tbb::task_scheduler_init init;
    }

    template <typename Range, typename Body>
    inline void p_for(const Range& range, const Body& body)
    {
        tbb::parallel_for(range, body);
    }

    template <typename Range, typename Body>
    inline void p_for(const Range& range, Body& body)
    {
        tbb::parallel_for(range, body);
    }

    template <typename T1, typename T2, typename T3>
    void sort(T1 i1, T2 i2, T3 comp)
    {
        tbb::parallel_sort(i1, i2, comp);
    }
} // namespace parallel

struct _parallel_evaluate {
    void operator()(const parallel::range_t& r) const
    {
        for (size_t i = r.begin(); i != r.end(); ++i) {
            usleep(1000);
        }
    }
};

int main()
{
    int N = 20000;
    parallel::init();

    parallel::p_for(parallel::range_t(0, N), _parallel_evaluate()); //(my_data));

    return 0;
}

It seems weird to me that we've never seen these kind of issues before; I have personally ran a few jobs for over 2-3 weeks with DART and sferes and never had an issue. But lately, I get similar issues to yours.. On the other hand, I tried on an older Ubuntu version machine and we still have big memory usages!

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

Hi @costashatz,

I continue to investigate this.
My current observations:

  • Same problem with NSGA-II (so not related to modular-QD)
  • The simplest way to replicate this in Sferes is to add Eigen::MatrixXd test=Eigen::MatrixXd::Random(2000,2000); in the eval function of ex_nsga2.cpp.

The preload of tbbmalloc_proxy seems to improve the situation. I ran the ex_nsga2 example described above and the one with tbbmalloc_proxy stagnates at 980MB while the other one already reached the double (and keep growing).

You example is even more surprising because you have no memory allocation. This is really surprising. Can you try with the tbbmalloc?

I suspected that this problem was here for a long period of time and that the only difference is that I am currently allocating a lot of memory for each evaluation (clone of the baxter robot), which make the problem more noticeable. Are you trying more challenging evaluations too?

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

... but it does not seem to be linked to sferes... Are you sure we should use --pages-as-heap=yes? maybe there is a different memory issue.

from sferes2.

costashatz avatar costashatz commented on May 29, 2024

You example is even more surprising because you have no memory allocation. This is really surprising. Can you try with the tbbmalloc?

What do you mean try with tbbmalloc? I am not allocating anything..!

Are you trying more challenging evaluations too?

I've always been cloning the hexapod robot and several other robots that we are using. We also used it with the iCub robot in our crawling paper and we had no memory issue. I am investigating my code now to see if this is not related to Sferes, because it doesn't seem to be coming from Sferes..

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

Like you, I don't think it comes from Sferes, but probably form our usage of TBB.
I am not sure we should use pages-as-heap, it just that without this massif does not return to actual memory usage that I seeing in TOP.

I continue to suspect that TBB is doing some sort of memory optimization and the frequent and rapid allocation/free of large amount of ram can lead to suboptimal situations.

I have seen that Eigen is doing similar things with the allocators and in particular with the STL containers, but not sure if that's actually related to the issue.

Currently, I am still uncertain about the origin of the issue.

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

Ok, here are my latest thoughts and observations:

  • the allocator of TBB does not give back memory to the OS, it creates a big cache and tries to re-use its space for future allocations.
  • I think that for some reason the allocator is unable to re-use this space, and reallocate more space from the OS.
  • Some people have the same issue:
    https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/299869
    https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/777497
  • The example that I previously suggested about using ex_nsga2 + a big matrix allocation actually do not reproduce the bug, the memory usage stagnates at some point (after a couple of generation). However, I noticed that using Preload tbbmalloc reduces the total amount of memory by 2.
  • I am expecting to see the memory raising during the first generations and then keeping the same volume. However, when I use dart (and complex robots) the memory never stops raising even after 1000 generations.
  • Surprisingly, I cannot use preload tbbmalloc with dart. It crashes with a bad_alloc exception or a Eigen allocator exception during the model loading. Maybe Dart is doing something unusual with memory allocation which conflicts with tbb?

I will continue my investigations.

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

costashatz avatar costashatz commented on May 29, 2024

Your current guess is that this comes from DART?

This might be possible. Which version of DART are you using? They had a few issues in previous versions. And most importantly, which version of Assimp do you have? Older versions of Assimp had huge memory leak issues..

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

To avoid this potential issue of growing archive, I moved my test on nsga2.
I did not manage to reproduce the bug without Dart (for instance with Eigen only).
Here is a small exp that I use for my test (I am using baxter from my recent commit).
test_tbb_malloc.zip

I am on DART master and Assimp master.
I have fixed 3 memory leaks in DART dartsim/dart#1063 and one in Assimp assimp/assimp#1934 .

if I run valgrind (using valgrind's latest release) on the example above (after 5 generations) I have no strict memory leak:

==40497==    definitely lost: 0 bytes in 0 blocks
==40497==    indirectly lost: 0 bytes in 0 blocks
==40497==      possibly lost: 9,424 bytes in 31 blocks
==40497==    still reachable: 85,176 bytes in 6 blocks
==40497==                       of which reachable via heuristic:
==40497==                         newarray           : 12,312 bytes in 3 blocks```
(with more generations, the valgrind report is exactly the same).

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

I just tested with the robotic arm from robot_simu ... no memory increase. I know that I had a similar issue with the NAO, so maybe it's related to the loading of assets (stl or dae files).

from sferes2.

costashatz avatar costashatz commented on May 29, 2024

I know that I had a similar issue with the NAO, so maybe it's related to the loading of assets (stl or dae files).

We have been using this extensively with the hexapod that has no assets (everything is a cylinder or cube). This strengthens my fear that it comes from Assimp.. DART made the choice to use Assimp to keep all the the 3D models in memory.. So that's where we should investigate the issue.. Thanks for the update! Let's both try to see what's going on in DART/Assimp and see where we get.

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

Hi all,
Quick update, I did not find anything weird with the mesh loading using Assimp. However, I discovered that the problem comes from the collision detection when using bullet. If you set no collision detector, the memory remains stable. The Dart Collision Detector does not implement mesh collision and I did not try with the ODE one.

More specifically, the memory increases when calling these two functions:

  castedGroup->updateEngineData();
  collisionWorld->performDiscreteCollisionDetection();

that are here https://github.com/dartsim/dart/blob/36a6a48aeec4c4e586166c254011d1959ff9f27a/dart/collision/bullet/BulletCollisionDetector.cpp#L233

I continue my investigations and keep you tuned.

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

Aneoshun avatar Aneoshun commented on May 29, 2024

I found a solution! If you download libbullet 2.88 (from git), the problem is gone.
In the recent versions of bullet, they worked on optional multi-threading, they probably have cleaned some part of the code related to memory management too.

Now I can run my sferes experiments with Baxter and only using 240MB of ram, and not 64GB and crashing!

For the records, I am now on Assimp master (plus leak fix), Dart master (plus leak fix) and Bullet master.

I hope that will help you!

from sferes2.

jbmouret avatar jbmouret commented on May 29, 2024

from sferes2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.