nv-tlabs / ase Goto Github PK

View Code? Open in Web Editor NEW

711.0 711.0 118.0 234.17 MB

License: Other

Python 100.00%

ase's People

Contributors

Stargazers

Watchers

Forkers

simonleegit dkaravolos mgurdal clamence fabianuribe nongkris momentum-research dmarew jiahongwu1995 hui211314dd sn-a-ke davidfitzek sea-conch efan4ik jackzhousz roodry67 onlyfuture rishi1906 mshoe longjohncoder kingshark1 rxlqn cj-viewer dmji fanshaoze woodenjin neroro64 shayebuhuiaaa superdiode skaputsos88 p90-rushb peterzs ninschmi maskey2902 lancemk bimanbordoloi20 taian kayaba-attribution gunnxx yangcyself muhammadmoizulhaq lohaa austintapp zhangjizby dandelionym buaxis thatii wangjingbo1219 gotchs dtch1997 phoenixdigitalfx idigitopia haaappytoast lkberserker skeli9989 jackqin007 mcx umsukgod mjones00 13253591602 sharkwyf ajwm8103 bss1211 konu-droid babyblue26 c0rvus-ix chaserz98 daje770729 croolch tuanphantom renanmb zc1213856 herveyrobot aerovfx aerovfxco hacktron magijedi waylandgod zhangjingh lingkeyang liangpan99 radonnachie joeoliverevans vortexmath choiwy77 lihzha achintyamohan catachiii mi-robotics t-k-233 myelinsheathxd hdadong anuos123 comwitch bruinxiong gigglew moonsliu iniside jihan1218 dhw059

ase's Issues

CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

When I ran the Pre-trained model
python ase/run.py --test --task HumanoidReach --num_envs 16 --cfg_env ase/data/cfg/humanoid_sword_shield_reach.yaml --cfg_train ase/data/cfg/train/rlg/hrl_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/RL_Avatar_Idle_Ready_Motion.npy --llc_checkpoint ase/data/models/ase_llc_reallusion_sword_shield.pth --checkpoint ase/data/models/ase_hlc_reach_reallusion_sword_shield.pth
I got "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)"

the details:
Traceback of TorchScript (most recent call last):
File "/home/xhc/ASE/ASE-main/ase/env/tasks/humanoid.py", line 620, in compute_humanoid_observations_max

root_h = root_pos[:, 2:3]

heading_rot = torch_utils.calc_heading_quat_inv(root_rot)
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
if (not root_height_obs):

File "/home/xhc/ASE/ASE-main/ase/utils/torch_utils.py", line 177, in calc_heading_quat_inv

# the heading is the direction on the xy plane
# q must be normalized
heading = calc_heading(q)
          ~~~~~~~~~~~~ <--- HERE
axis = torch.zeros_like(q[..., 0:3])
axis[..., 2] = 1

File "/home/xhc/ASE/ASE-main/ase/utils/torch_utils.py", line 153, in calc_heading

ref_dir = torch.zeros_like(q[..., 0:3])
ref_dir[..., 0] = 1
rot_dir = quat_rotate(q, ref_dir)
          ~~~~~~~~~~~ <--- HERE
heading = torch.atan2(rot_dir[..., 1], rot_dir[..., 0])

File "/home/xhc/IsaacGym_Preview_4_Package/isaacgym/python/isaacgym/torch_utils.py", line 68, in quat_rotate
shape[0], 3, 1))

c = q_vec * \
    torch.bmm(q_vec.view(shape[0], 1, 3), v.view(
    ~~~~~~~~~ <--- HERE
        shape[0], 3, 1)).squeeze(-1) * 2.0
return a + b + c

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

Unable to achieve good results on Humanoid Reach/Location/Strike tasks

Hi Jason,
I am trying to train the tasks but only Humanoid Heading works, others failed with pretty low rewards. I train the task following your script like this:

python ase/run.py --task HumanoidReach --cfg_env ase/data/cfg/humanoid_sword_shield_reach.yaml --cfg_train ase/data/cfg/train/rlg/hrl_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/RL_Avatar_Idle_Ready_Motion.npy --llc_checkpoint ase/data/models/ase_llc_reallusion_sword_shield.pth --headless
Is there anything wrong in the config? And can you share the trained model?
Thx

Full motion dataset and Retargeting.

Hi, Jason! Thanks for your great work!
I found there are 187 motion clips used for the low-level controller. > "The low-level policy is trained using a custom motion dataset of 187 motion clips, provided by Reallusion [Reallusion 2022].
However, from Reallusion, there are only 87 clips. May I know where are other 100 clips?

Another question would be, will you release the retargeting script for the motion files from Reallusion? Thank you so much!

Latent code and numAMPObsSteps

Hello Jason,

I have some questions regarding the encoding of transitions in the latent space. The paper describes the encoding of transitions between states at t and t+1. In practice however, you use multiple steps for both AMP and the encoding. I understand that it helps for learning complex behavior over long horizons (10 is the default here); For example, the humanoid in AMP cannot learn the backflip using only a transition of 2 steps. I think there might be two issues here though:

The framework becomes non markovian with numAMPObsSteps>2, as the reward is given for the past 9 steps, while the policy only takes the state at the current t.
The encoder also uses a sequence of numAMPObsSteps observations to encode into a latent z. This assumes that the policy was following the same z when producing them, but during training the latent z can be updated at resets or after some random latent_steps (sampled uniformly between 1 and 150), so some parts in the amp_observation could have been generated with a different latent from the one used in the current time step.

Thank you

FBX library

When running the AMP example command:

python ase/run.py --task HumanoidAMP --cfg_env ase/data/cfg/humanoid_sword_shield.yaml --cfg_train ase/data/cfg/train/rlg/amp_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/RL_Avatar_Atk_2xCombo01_Motion.npy --headless

I get this error:

Error: FBX library failed to load - importing FBX data will not succeed. Message: No module named 'fbx' FBX tools must be installed from https://help.autodesk.com/view/FBX/2020/ENU/?guid=FBX_Developer_Help_scripting_with_python_fbx_installing_python_fbx_html

which comes from the file fbx_backend.py in poselib. The program still continues running despite this error. Is the poselib fbx_backend.py still needed, even if we aren't loading an fbx file?

Translate other human model to amp human model

Hi! If I want to translate other human models (such as smplx), how can I get the right T-pose and other parameters like rotation_to_target_skeleton for retargeting motion? By the way, have you tried a more complex human model? Will a different human model affect performance?

Adding quadruped ASE

Hello,

I am interested in working on implementing a version of ASE for quadruped robot models (A1, Anymal). I have started some preliminary work here: dtch1997#1

Let me know if this would be interesting to you! Thanks.

Use latent embedding or not?

Thanks for the paper and code, and we see that the skill latents is projected using a MLP in the actor but not in the critic. Wonder if this is by design?

what is the role of latent code?

In pre-training stage, latent code z is sampled according to the prior p(z), I noticed p(z) is Gaussian distribution. I am confused obout the "random" latent code. If z is sampled randomly sampled from p(z)，same skill motion may map with different z，meanwhile same or similar z may map with different skill motion. So after pre-training, if we randomly sample a latent code z from p(z), what motion does it imitate?
walk? strike? or jump?
thank you for your work, hope for your replay

Initial state distribution bug when resetting the environment

Hello,
thank you for the amazing work. I noticed that when resetting the environment, the initial state is different each time even if stateinit is set to Default or Start.
When looking deep into the code, it seems that the initial state is different because the positions of the rigid bodies are kept the same as before the reset. In the function _reset_env_tensors(self, env_ids) in humanoid.py, only root state and dof states are being reset. Appearently, the current version of IsaacGym does not allow for setting positions for rigid bodies according to the doc.

How can I get the skeleton for re-targeting from a fbx file?

I just want to thank you for publishing the source code to this amazing paper!

I have a fbx file with 2 frames of animation both have the the skeleton in tpose. I am trying to get a tpose for retargeting from the fbx file so I tried:

fbx_file = "/home/bizon/Documents/to_ase/MM_T_Pose.FBX"

motion = SkeletonMotion.from_fbx(
fbx_file_path=fbx_file,
root_joint="pelvis",
fps=60
)

skeleton_fbx = motion.skeleton_tree
zero_pose_fbx = SkeletonState.zero_pose(skeleton_fbx)

plot_skeleton_state(zero_pose_fbx)

The result is a mangled skeleton.

if try:
plot_skeleton_motion_interactive(motion)

I get a perfect skeleton. So I know the fbx file is good. What is the right way to get the Skeleton in tpose?

other observations:
If I print the sample Skeleton loaded from cmu_tpose.npy and the Skeleton loaded from 07_01_cmu.fbx. They are identical (other than local_translation). What's different is the SkeletonState on each. So I am assuming I am missing some kind of transform that needs to be done. I wonder, how was the cmu_tpose.npy originally created from the fbx file? That answer is probably the solution.

Error when training imitation use amp_humanoid_task.yaml

I have trained the model to tracking a motion sequence with command

python ase/run.py --task HumanoidImitation  --cfg_env ase/data/cfg/humanoid_imitation.yaml --cfg_train ase/data/cfg/train/rlg/amp_humanoid_task.yaml --motion_file xxx --checkpoint output/Humanoid_20-14-34-09/nn/Humanoid.pth

and test with

python ase/run.py --test --task HumanoidImitation --num_envs 1 --cfg_env ase/data/cfg/humanoid_imitation.yaml --cfg_train ase/data/cfg/train/rlg/amp_humanoid_task.yaml --motion_file xxx--checkpoint xxx

The humanoid_imitation.py is changed to track a reference motion.
But the robot will jump high at the begining when test.

Agent with many dofs: Discriminator accuracy suddenly drops to zero

Hi,

I am trying to train a low-level policy with an agent that has roughly 100 dofs.
But after a few hundred iterations the policy only predicts the zero action (almost no movement) and the discriminator accuracy suddenly drops on the agent data from around 0.99 to zero while the accuracy on the demo data stays at 1. Also sometimes the ratio in the actor loss is inf.

While the zero pose is part of the reference motion, it looks otherwise fine to me. Also if I train the same agent reduced to around 50 dofs and the same reference motion it works.

Could you maybe share some insights if possible. Maybe on which hyper parameters to tune when there are a lot of dofs?

Thank you!

How can you determine what set of latent variables will produce a particular behavior?

Is it possible to know what set values for latent variable will produce a certain learned clip? If so how do you determine it?

Sudden drop of task rewards

Sometimes task rewards will suddenly drop to very low and discriminator rewards will go high. Is it normal?

amp_humanoid data can not be visualize

Thank you for sharing the nice paper and code,
but when I wanted to try to visualize the data of amp humanoid,e.g.amp_humanoid_jog.npy. I found that it could not be achieved with the current code. Did you use these data when training ASE? How can this data be used?

Issue while running ASE using isaac gym 1.0prview4

Is it a way to run ASE using isaac gym 1.0prview4? Actually, I have an error while running:
python ase/run.py --task HumanoidAMPGetup --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless

It is an incompatibility between the torch requirements of ASE and the latest Isaac Gym version (1.0preview4). I didn't find any options to download the earliest versions of Isaac Gym, the links are unavailable.

Thanks for advice

GPU and CPU give different results

Hi Jason,

In order to get more contact information, I want to try to use CPU to run ASE. However, the results of CPU and GPU are somewhat different. You can see that during the CPU simulation, the friction between the robot's feet and the ground will be smaller, and there will be a slippery state, and when I print self._contact_forces under humanoid.py, the result is very wrong, basically 0. Do you have any idea about this?

command:
python ase/run.py --test --task HumanoidStrike --num_envs 16 --cfg_env ase/data/cfg/humanoid_sword_shield_strike.yaml --cfg_train ase/data/cfg/train/rlg/hrl_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/RL_Avatar_Idle_Ready_Motion.npy --llc_checkpoint ase/data/models/ase_llc_reallusion_sword_shield.pth --checkpoint ase/data/models/ase_hlc_strike_reallusion_sword_shield.pth --sim_device cpu --rl_device cpu

Besides I add a config['device_name'] = 'cpu' in init funtion under ase/learning/common_player.py to make sure policy also use cpu.

Thanks in advance!

How to specify motion files for task training?

Hi,
There is no RL_Avatar_Crouch_Idle_Motion.npy in data/motions, how to specify motion files for every different task training?

Also, I find weights in dataset_reallusion_sword_shield.yaml, how to determine the weights?

Thx

config formula for arbitrary gpu memory sizes

I ran into CUDA out of memory errors using a gtx 1050 (4gb) running the command:

To avoid the out of memory errors I lowered:

num_evs in the cfg_env
amp_obs_demo_buffer_size
amp_replay_buffer_size
minibatch_size
amp_batch_size
amp_minibatch_size

Is there a formula to calculate the total memory used based on these variables and possibly some others?

How to create motion clip file (.npz) from other dataset

Great Work! It is amazing , May I ask what is the data structure of motion clip file. If I have some skeleton coordinates for each frame, how should I create a motion clip file for training ? Thanks

Segmentation fault (core dumped)

This is the output when I run command:

python ase/run.py --task HumanoidAMPGetup --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train \
ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file \
ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless

...
Loading 87/87 motion files: ase/data/motions/reallusion_sword_shield/RL_Avatar_Sword_ParryUp_Motion.npy
Error for key= global_root_yaw_rotation
Error for key= global_translation_xy
Error for key= global_translation_xz
Error for key= local_rotation_to_root
Error for key= local_translation_to_root
Error for key= root_translation_xy
Total added 17
Loaded 87 motions with a total length of 380.278s.
Segmentation fault (core dumped)

I can see in GUI, some green guys falling down and after that, this error occurred unexpectedly. What should I do about that? Thanks a lot!

Error: Segmentation fault (core dumped)

How to resume an interrupted training from checkpoint path?

The training was interrupted because it took too long, so how to resume an interrupted training from checkpoint path?

Error for key= global_root_yaw_rotation

Loading 1/87 motion files: ase/data/motions/reallusion_sword_shield/RL_Avatar_Atk_2xCombo01_Motion.npy

Error for key= global_root_yaw_rotation
Error for key= global_translation_xy
Error for key= global_translation_xz
Error for key= local_translation_to_root
Error for key= root_translation_xy
Total added 17

What does this error mean?what impact will it have?

Estimated time for code and data release?

torch squeeze in htl_players causes a crash when testing with only 1 enviornment

current_action = torch.squeeze(current_action.detach())
around line 91 in hrl_players will cause a crash if you are testing with only on environment. changing it to:
current_action = current_action.detach()

fixes the crash.

Model file missing

pre - trained low level policy checkpoints are missing from ase/data/models

Error for key = global_root_yaw_rotation Error for key = global_translation_xy Error for key = global_translation_xz Error for key = local_rotation_to_root Error for key = local_translation_to_root Error for key = root_translation_xy

When I was loading the motion_file, the value of global_root_yaw_rotation, global_translation_xy, global_translation_xz, local_rotation_to_root, local_translation_to_root and root_translation_xy can't be read with the method that is getattr(obj, k). Are they useless for the training and running?

Heading task results not what I expected..

After training heading task with custom low-level policy, character seems to move but not like what it seemed in the paper.

I did trained my low level policy with possible locomotion direction.
locomotion task showed good result, so I don't think re-training or adding data won't be a solution

What could be the problem?

About the character mesh

Notice there is AMP character (a robot character) in the project. How can I create the paper character for the project? An mjcf files is OK?

how to translate a spherical joint to 3 revolute joints?

Hi, Jason, Thanks for your great work! It's very important to humanoid robot sim. or dev.!
I want to model and sim a humanoid robot with all revolute joints, so need to translate ASE/AMP character model's spherical joint to revolute joints, could u give me some advice/example?

when using retarget_motion, mjcf_importer, etc. examples the matlab visualizer just shows a blank window.

the plot_skeleton_motion_interactive function always returns blank windows on the basic examples like retarget_motion or mjcf_importer. I have the matplotlib installed.

I am trying to retarget some motions but I can't see if the skeletons look right.

Criteria for motion weights

Hello again.
Right now I am implementing my own motion data for low level policy training
I came across that each mocap data needs different weights.

Are there any criteria or formula for the weights in dataset yaml file??
I did check the paper, and there weren't any info for this..:(

and also, I have tried one of your dataset_rellusion.yaml with amp_humanoid.yaml and it resulted "stand still" rather than locomote..
It did worked well when training with single motion file tho.

I wonder what's wrong with it..

Another random question:
My GPU lacks of memory when training. Instead of tackling batch related parameters, I modifed the number of amp observation size in each task related yaml file. Does this affect severely on training ?

How to reduce observations to make them more realistic?

The AMP agent (NN) receives the following observations:
root_h_obs, root_rot_obs, local_root_vel, local_root_ang_vel, dof_obs, dof_vel, flat_local_key_pos

On a real physical robot:
root_h_obs, local_root_vel, local_root_ang_vel, flat_local_key_pos
would probably not be possible to get.

More likely: root rotation, DOF positions, DOF velocities, foot pressure
would be the only ones available. From what I can tell AMP compares these observations to what it sees from the motion capture. Is there a easy way to send a reduced set of observations to the agent that matches more of what is observed in reality? Will the algorithm still work?

can amp model here reproduce the result of different tasks of ase?

I try to use the following command to train amp model to complete the task just like location
python3 ase/run.py --task HumanoidStrike --cfg_env ase/data/cfg/humanoid_sword_shield_strike.yaml --cfg_train ase/data/cfg/train/rlg/amp_humanoid_task.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_strike_amp.yaml

However, the result is worse than which produced by ase. For example, The agent trembles when walking. Do I need to adjust some parameters?

The meaning of parameters

what is the meaning of “_dof_body_ids”， “_dof_offsets”，“_dof_obs_size”， “_num_actions”， “_num_obs”

how to generate the npy motion file from bvh?

Hi @xbpeng ,I have converted my own bvh file to your deepmimic data format success, but when try to convert my bvh to the npy file for ase, the visualized result is unnormal. it seems the rotation value not get right.
are there any different for the 4d rotation of each joint between deemimic and ase's data format? or there are other details that i ignored?
Hope can receive your responce ,thanks~

Where Ray used?

I see your requirements.txt contains ray. I searched the code but did not find where the Ray was used.
May I know why?

errors when trying to install autodesk's FBX sdk and python bindings on Ubuntu, is there a trick to this?

What is the trick to installing this on Ubuntu 20.04?

I have followed all the instructions and even compiling Sip from source (4.19.3) but still can't get this working. I want to use the FBX importer to bring in some animations for ASE.

after installing the sdk and the python bindings:

python PythonBindings.py Python3_x86 buildsip

......

In file included from /usr/lib/gcc/x86_64-linux-gnu/9/include/limits.h:194,
from /usr/lib/gcc/x86_64-linux-gnu/9/include/syslimits.h:7,
from /usr/lib/gcc/x86_64-linux-gnu/9/include/limits.h:34,
from /home/bizon/anaconda3/envs/rlase/include/python3.8/Python.h:11,
from ./sip.h:32,
from sipAPIfbx.h:41,
from sipfbxFbxBindingTable.cpp:38:
/usr/include/limits.h:26:10: fatal error: bits/libc-header-start.h: No such file or directory
26 | #include <bits/libc-header-start.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~

any ideas? Has anyone else gotten FBX installed and working?

what's the platform used for visualize in the paper-video?

The character in the https://youtu.be/hmV4v_EnB0E looks very cool!
Want to know what platform is used for visualize in the video? UE or unity or Issac-gym?

motion file missing?

python ase/run.py --task HumanoidAMPGetup --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless

there is not an file called ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml ?

What exactly is amp_humanoid_task.yaml for?

Hi I have been playing around with your code.
I've noticed that there is an unused yaml for training, amp_humanoid_task.yaml,
and I assume that maybe, this file was for amp task training. I guess..?

Is it possible to train task out of this file instead of invoking ase_humanoid_hrl.yaml inside hrl_humanoid.yaml?
or should I just simply call amp_humanoid_task.yaml for task training?

Runtime Error - "Unsupported joint type"

For some reason I am getting the following error:

humanoid_amp.py", line 294, in _compute_amp_observations
    self._curr_amp_obs_buf[env_ids] = build_amp_observations(self._rigid_body_pos[env_ids][:, 0, :],
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "humanoid.py", line 569, in build_amp_observations
            assert(False), "Unsupported joint type"

        joint_dof_obs = torch_utils.quat_to_tan_norm(joint_pose_q)
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        dof_obs[:, (j * joint_obs_size):((j + 1) * joint_obs_size)] = joint_dof_obs
RuntimeError: CUDA error: unspecified launch failure

Multi-gpu training

Thank you for the paper and code.
I tried multi-gpu using horovod for faster training, but results tend out to be much lower than single-gpu training results. Would you release a version of multi-gpu training?

Difference between state, observation, and amp_observation

Hi, I would like to ask the difference between states, observations, and amp_observations.

My understanding is that the state space is not defined for the humanoid task and just the observation space is used. However, there is amp_obs in the extras which I don't know where it is being used. It is confusing because in the ASE paper, it says "Combined, these features result in a 120D state space". Is it a feature (ie. observation) or state? When I print the observation space of HumanoidAMPGetup, it says 253D for observation and 1400D for amp_observation.

Angle representation for 3 dof joints

Hi,

I have a short question about the angle representation, as I must miss something obvious.

In the amp_humanoid_sword_shield.xml file the 3 dof joints are defined with three hinge joints. However, the motion lib converts the joint angles to rotation vectors / exponential map:

ASE/ase/utils/motion_lib.py

Lines 336 to 339 in de18a56

 if (joint_size == 3): 

 joint_q = local_rot[:, body_id] 

 joint_exp_map = torch_utils.quat_to_exp_map(joint_q) 

 dof_pos[:, joint_offset:(joint_offset + joint_size)] = joint_exp_map

Why should it not be Euler angles? (Does this mean the trained llc output as action (scaled) rotation vectors for 3 dof joints?)

Thanks!

Possible to provide training histories?

Would it be possible to release training logs (i.e. Tensorboard) for the pre-trained low-level controller? ase/data/models/ase_llc_reallusion_sword_shield.pth

This would help us understand things like how long the training run takes, typical learning curve, GPU utilization etc.

How to create a task where one humanoid tries to strike another

I would like to create a task where one humanoid tries to strike another. I imagine a simple reward such as getting points when it strikes the other and losing points if it gets hit. I am interested in what kind of emergent behaviors might appear at it tries to learn to both attach and defend.

From the code it looks like humanoid.py itself handles the tensors assuming only one humanoid actor is present. It looks to be able to handle additional actors as long as the don't have DOF, etc. How would you put 2 humanoids both controlled by the NN in one environment?
Another possibility is to allow the environments to interact with each other. For example the actors from environments 1 and 2 could try to strike each and then 3 and 4 do the same. One could continue that pattern for all actors as long as there is an even number of environments.

Are either of these feasible? If so what would be the easiest approach to take?

Cylinder is not natively supported, tesellated mesh will be used

When I tried to run a pre-trained ASE low-level controller, using the command: python ase/run.py --test --task HumanoidAMPGetup --num_envs 16 --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --checkpoint ase/data/models/ase_llc_reallusion_sword_shield.pth, I had met the question that Cylinder is not natively supported, tesellated mesh will be used.

Minor bug: agent is using cuda:0 device no matter what rl_device arg is

Problem

ase.learning.common_agent.CommonAgent inherits rl_games.common.a2c_common.A2CBase which stores all tensors to self.ppo_device.
self.ppo_device is set by getting device key from config. If there is no device key, it is set to cuda:0 by default. (see here)
tracing back to run.py file, config is supplied by cfg_train["params"]["config"]. You can print cfg_train["params"]["config"].keys() and there is no device.

How to check

To check this issue, simply run the original pretraining command with --rl_device argument is set to another cuda device such as cuda:1 and it still consumes cuda:0 memory.

python ase/run.py --task HumanoidAMPGetup --cfg_env ase/data/cfg/humanoid_ase_sword_shield_getup.yaml --cfg_train ase/data/cfg/train/rlg/ase_humanoid.yaml --motion_file ase/data/motions/reallusion_sword_shield/dataset_reallusion_sword_shield.yaml --headless --rl_device cuda:1

How to fix

To fix this, simply add cfg_train["params"]["config"]["device"] = args.rl_device in function load_cfg().

	if (joint_size == 3):
	joint_q = local_rot[:, body_id]
	joint_exp_map = torch_utils.quat_to_exp_map(joint_q)
	dof_pos[:, joint_offset:(joint_offset + joint_size)] = joint_exp_map