Giter VIP home page Giter VIP logo

Comments (11)

iche033 avatar iche033 commented on July 17, 2024

Are there any errors printed in the console?

Dome -> Fortress: we upgraded ogre from 2.1 to 2.2 but not sure if that's the reason or not.

17:33:42: Can’t assign material scene::Material(55154) because this Material does not exist. Have you forgotten to define it in a .material script?
17:33:42: WARNING: Deleting mapped buffer without having it unmapped. This is often sign of a resource leak or a bad pattern. Umapping the buffer for you…”

These warnings should be ok to ignore. They still happen in newer version of gazebo.

17:28:41: OGRE EXCEPTION(3:RenderingAPIException): eglInitialize failed for device EGL_EXT_device_drm [ /dev/dri/card1 in EGLSupport::getGLDisplay at /var/lib/jenkins/workspace/ogre-2.2-debbuilder/repo/RenderSystems/GL3Plus/src/windowing/EGL/PBuffer/OgreEglPBufferSupport.cpp (line 322)

Related issue: gazebosim/gz-rendering#587 - Logged when OGRE tries to query EGL devices - the OGRE 2 dev says it should be harmless.

Using Ogre1. Ogre1 did the same thing without the material error in the log, but the same time hanging whilst the models are being loaded into the world.

Maybe it's a Fuel server issue (https://app.gazebosim.org/). Some older models on Fuel points to ignitionrobotics.org instead of gazebosim.org and may no longer work. Launching gz sim should show errors in console mentioning that it timed out downloading these models.

from gz-sim.

Space-Swarm avatar Space-Swarm commented on July 17, 2024

Thanks so much for the swift reply! This is the error outprinted from the console for loading initial models:

"[Wrn] [FuelClient.cc:1978] The fuel.ignitionrobotics.org URL is deprecrated. Pleasse change https://fuel.ignitionrobotics.org/1.0/OpenRobotics/models/Tunnel Tile 5 to https://fuel.gazebosim.org/1.0/OpenRobotics/models/Tunnel Tile 5

However that only lasts 30 seconds, and is not that big of an issue in terms of load time.

The proper delay starts after the following is outprinted:

"[Dbg] [Sensors.cc:270] Initializing render context
[Msg] Loading plugin [ignition-rendering-ogre2]"

Followed soon after by these errors, which is where the major time out happens:

     "[Wrn] [Component.hh:144] Trying to serialize component with data type [St6vectorIdSaIdEE], which doesn't have `operator<<`. Component will not be serialized.
      [Wrn] [Component.hh:144] Trying to serialize component with data type [St6vectorIdSaIdEE], which doesn't have `operator<<`. Component will not be serialized.
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Model.hh:98] Unable to deserialize sdf::Model
      [GUI] [Wrn] [Component.hh:189] Trying to deserialize component with data type [St6vectorIdSaIdEE], which doesn't have `operator>>`. Component will not be deserialized."

Directly after this line, which is when the rendering memory buffer is deleted, the models load, and the world is shown with the robot in the GUI.

I've tried changing the deprecated ignitionrobotics.org syntax before as well as replacing it with the local model syntax. All it did is result in the deprecated error not being printed out.

From what you've said it sounds like the error is most likely an issue created by Ogre2.2. Is there a way to implement the memory buffer deletion sooner with Ogre2.2?

An alternative reason: Ignition Fuel is trying to load models that are no longer accessible. Is there a way to setup Ignition Fortress to use local files by default prior to attempting to download?

from gz-sim.

iche033 avatar iche033 commented on July 17, 2024

An alternative reason: Ignition Fuel is trying to load models that are no longer accessible. Is there a way to setup Ignition Fortress to use local files by default prior to attempting to download?

Gazebo looks at the local cache first before downloading them from fuel. So if the models are available in ~/.gz/fuel/.. it should just load those.

From what you've said it sounds like the error is most likely an issue created by Ogre2.2. Is there a way to implement the memory buffer deletion sooner with Ogre2.2?

I'm not sure if that's the reason for slow down. You mentioned that it's also hanging with Ogre1 so that makes me thing it's caused by something else.

One thing I would try first is to comment out some models in the world and see if it's because certain models are taking too long to load.

from gz-sim.

Space-Swarm avatar Space-Swarm commented on July 17, 2024

Thanks for the suggestion @iche033! I have been using the simple_tunnel_02 map & one other map, and have commented out different model files. I removed all of the model files except the robot one by one until all model files were removed, and tried different robots (it is very funny to watch a single robot falling through a non existent floor!). All/any of the model files cause this error to crop up in the Ogre2 rendering log:

"Can't assign material scene::Material(65447) because this Material does not exist. Have you forgotten to define it in a .material script?"

It is only after the memory is cleared that the GUI loads with the robot:
WARNING: Deleting mapped buffer without having it unmapped. This is often sign of a resource leak or a bad pattern. Umapping the buffer for you...

After digging into it further and trying different approaches, I have reached the following conclusion which likely explains the situation:

There is a porting problem with Ogre2.2 and Ogre2.1 when the updates were rolled out to gz-rendering6. The reason the error still happens with Ogre1, is the version of gz-rendering6 remains the same, so the code still hangs when launched with the earlier Ogre1 engine instead.

I have identified the particular push when related areas were discussed in gz-rendering6: gazebosim/gz-rendering#223

Tagging the relevant contributors from that push - @mjcarroll @ahcorde @chapulina @darksylinc

This also is in line with what I am experiencing with EGL support with Ignition Fortress - the headless version is actually far slower than the normal GUI version, which it shouldn't be. This means that the port from Ogre2.1 to Ogre2.2 has a fundamental bug of some kind in gz-rendering6 and the EGL rendering is not happening as it should. It could be that there are inefficiencies in Ogre2.2's way of rendering but these bugs I'm experiencing are leading me to think it's a bug based in gz-rendering6/ignition fortress, rather than Ogre2.2 based. I think this may solve related bugs which cropped up with Ignition Fortress (e.g. #1116 and #1370), so I think it's worth other people taking a look at. It's a bit beyond my skill level to solve myself unfortunately.

I believe there also may be a difference in how the two different .materials files are structured in the fuel models vs what comes loaded in Ogre2.2 and works already. I have attached the two corresponding files which highlight that difference, although it may not be a contributing factor.

skybox.material (ogre2.2).txt
tunnel_tile.material (fuel model material).txt

from gz-sim.

peci1 avatar peci1 commented on July 17, 2024

I'm now running a very similar installation and I don't observe this delay. I run it on a NUC with Intel GPU, though. And also with ctu_cras_norlab_absolem_sensor_config_1 .

from gz-sim.

peci1 avatar peci1 commented on July 17, 2024

How long is the delay you observe? Mine is about 1 minute from ign-rendering-ogre2 loading start (with a single robot model).

from gz-sim.

Space-Swarm avatar Space-Swarm commented on July 17, 2024

I've found this very dependent on the map in question and possibly the specs of the PC. It can range from 1 minute to 10-15 minutes with a map like tunnel_circuit_01. Should I open up a separate issue regarding EGL processing headless being slower than the normal run through with fortress?

from gz-sim.

darksylinc avatar darksylinc commented on July 17, 2024

I've found this very dependent on the map in question and possibly the specs of the PC. It can range from 1 minute to 10-15 minutes with a map like tunnel_circuit_01. Should I open up a separate issue regarding EGL processing headless being slower than the normal run through with fortress?

Is that on Linux or on Windows (or Windows via WSL)?

When you mentioned headless, I remembered that if no monitors are plugged to Windows, the OS will disable the GPUs and tons of problems appear. You could be running on full SW emulation.

from gz-sim.

Space-Swarm avatar Space-Swarm commented on July 17, 2024

I've found this very dependent on the map in question and possibly the specs of the PC. It can range from 1 minute to 10-15 minutes with a map like tunnel_circuit_01. Should I open up a separate issue regarding EGL processing headless being slower than the normal run through with fortress?

Is that on Linux or on Windows (or Windows via WSL)?

When you mentioned headless, I remembered that if no monitors are plugged to Windows, the OS will disable the GPUs and tons of problems appear. You could be running on full SW emulation.

Thanks so much for the feedback @darksylinc, this is a bit of a headscratcher! It is on native Linux 20.04 with ROS Noetic and Ignition Fortress and a monitor plugged in. I am running headless via the following launch script using the "headless:=true" option: https://github.com/osrf/subt/blob/26fd5da5cc0d7dbbcd269b30752ca305d2bba3d5/subt_ign/launch/competition.ign#L387C13-L387C14

I've tested the headless on different run throughs using 1, 3 and 5 robots for 1 hour worth of elapsed time (real time, rather than sim time) and the RTF for the headless run throughs are slower than the normal run with the GUI in every case. I believe it indicates 1 of 3 things:

  1. The port of gzrendering6 from Ogre2.1 and Ogre2.2 was implemented with a bug which makes headless slower
  2. The setup using that particular launch file and the SubT simulator is configured wrong
  3. Ogre2.2 renders things less effectively in headless mode

I believe it is likely the first result given the replicated error we are encountering with loading materials in the gazeborendering log. Any help would be much appreciated as solving this bug around headless being slower is really important to moving forward with my PhD work.

from gz-sim.

iche033 avatar iche033 commented on July 17, 2024

AFAIK the "headless:=true" option through ROS launch script just disables GUI window but does not actually enable EGL (a little confusing because both are sometimes referred to as headless).

the RTF for the headless run throughs are slower than the normal run with the GUI in every case

This is really weird. I haven't seen this issue before.

Are you able to share your ogre2.log?

from gz-sim.

darksylinc avatar darksylinc commented on July 17, 2024

The simplest way to address this is to debug it yourself:

  1. OPTIONAL: Install debug symbols for system libs: sudo apt install libc6-dbg libstdc++6-10-dbg (note: libstdc++X-Y-dbg may be different depending on your distro version).
  2. Build gazebo from sources in Debug mode. To do so run colcon build --cmake-args -DBUILD_TESTING=OFF -DCMAKE_BUILD_TYPE=Debug --merge-install when running the colcon step (I disabled tests to speed up compilation since they shouldn't be needed).
  3. Verify your debug gazebo buikd works as expected.
  4. Install QtCreator.
  5. Start a terminal.
  6. Just like in the documentation, run the command (with the period at the beginning) . ~/workspace/install/setup.bash so that environment variables are set.
  7. Launch QtCreator FROM WITHIN THIS TERMINAL (so that QtCreator inherits the environment variables from the previous step).
  8. Go to Debug -> Start and Debug external application.
  9. Enter gazebo's executable and parameters as in this picture (look at Local excutable, Cmd line arguments, and Working directory):
    Screenshot_2024-04-24_22-19-42
  10. Launch it

Once it launched, start doing what triggers this bug (assuming it requires further interaction).
When it starts taking too long (i.e. the bug finally manifests), hit "Pause":

Screenshot_2024-04-24_22-24-21

Once paused it may look like this:
Screenshot_2024-04-24_22-26-03

What's relevant is the call stack and the threads enumeration.
This is the callstack, please Right Click -> "Copy contents to Clipboard" and paste it here:

Screenshot_2024-04-24_22-27-14

The other thing is the Threads button:

Untitled

In this example we have very few threads, but Gazebo has A LOT of threads. Go one by one (you can use the mouse wheel to scroll through the Threads very quickly) and see if there is something suspicious that looks stuck. Each time you change threads, the call stack changes. Anything that looks like relevant or "stuck" to you, please paste it here.

Then you can click on the Pause button again to unpause it; and repeat this step to see if it's stuck in the same place or somewhere else. It is relevant to know if the app is stuck in the same place or keeps jumping between different locations.

If successful the callstack for each thread you paste should look like this:

                                                                                                         
                                                                                                         
1  futex_abstimed_wait_cancelable                         futex-internal.h           320  0x7ffff7d1d7d1 
2  __pthread_cond_wait_common                             pthread_cond_wait.c        520  0x7ffff7d1d7d1 
3  __pthread_cond_timedwait                               pthread_cond_wait.c        665  0x7ffff7d1d7d1 
4  ??                                                                                     0x7fffb4798f4a 
5  ??                                                                                     0x7fffb4793ec1 
6  ??                                                                                     0x7fffb4793e73 
7  Ogre::VulkanWindowSwapChainBased::acquireNextSwapchain OgreVulkanWindow.cpp       538  0x7fffc59826f7 
8  Ogre::VulkanQueue::commitAndNextCommandBuffer          OgreVulkanQueue.cpp        1295 0x7fffc5943978 
9  Ogre::VulkanDevice::commitAndNextCommandBuffer         OgreVulkanDevice.cpp       685  0x7fffc5911f35 
10 Ogre::VulkanRenderSystem::_endFrameOnce                OgreVulkanRenderSystem.cpp 2028 0x7fffc595b109 
11 Ogre::CompositorManager2::_swapAllFinalTargets         OgreCompositorManager2.cpp 828  0x7ffff70e2e7a 
12 Ogre::Root::_updateAllRenderTargets                    OgreRoot.cpp               1575 0x7ffff6ee6201 
13 Ogre::Root::renderOneFrame                             OgreRoot.cpp               1104 0x7ffff6ee6108 
14 Demo::GraphicsSystem::update                           GraphicsSystem.cpp         427  0x22be68       
15 Demo::MainEntryPoints::mainAppSingleThreaded           MainLoopSingleThreaded.cpp 135  0x23b865       
16 mainApp                                                PbsMaterials.cpp           25   0x222ebb       
17 main                                                   MainEntryPointHelper.h     40   0x222db9       

from gz-sim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.