Giter VIP home page Giter VIP logo

slahmr's Introduction

Decoupling Human and Camera Motion from Videos in the Wild

Official PyTorch implementation of the paper Decoupling Human and Camera Motion from Videos in the Wild

Project page | ArXiv

News

  • [2023/07] We updated the code to support tracking from 4D Humans! The original code remains in the release branch.
  • [2023/02] Original release!

Getting started

This code was tested on Ubuntu 22.04 LTS and requires a CUDA-capable GPU.

  1. Clone repository and submodules

    git clone --recursive https://github.com/vye16/slahmr.git
    

    or initialize submodules if already cloned

    git submodule update --init --recursive
    
  2. Set up conda environment. Run

    source install_conda.sh
    

    Alternatively, you can also create a virtualenv environment:

    source install_pip.sh
    
    We also include the following steps for trouble-shooting.
    • Create environment

      conda env create -f env.yaml
      conda activate slahmr
      

      We use PyTorch 1.13.0 with CUDA 11.7. Please modify according to your setup; we've tested successfully for PyTorch 1.11 as well. We've also included env_build.yaml to speed up installation using already-solved dependencies, though it might not be compatible with your CUDA driver.

    • Install PHALP

      pip install phalp[all]@git+https://github.com/brjathu/PHALP.git
      
    • Install current source repo

      pip install -e .
      
    • Install ViTPose

      pip install -v -e third-party/ViTPose
      
    • Install DROID-SLAM (will take a while)

      cd third-party/DROID-SLAM
      python setup.py install
      
  3. Download models from here. Run

    ./download_models.sh
    

    or

    gdown https://drive.google.com/uc?id=1GXAd-45GzGYNENKgQxFQ4PHrBp8wDRlW
    unzip -q slahmr_dependencies.zip
    rm slahmr_dependencies.zip
    

    All models and checkpoints should have been unpacked in _DATA.

Fitting to an RGB video:

For a custom video, you can edit the config file: slahmr/confs/data/video.yaml. Then, from the slahmr directory, you can run:

python run_opt.py data=video run_opt=True run_vis=True

We use hydra to launch experiments, and all parameters can be found in slahmr/confs/config.yaml. If you would like to update any aspect of logging or optimization tuning, update the relevant config files.

By default, we will log each run to outputs/video-val/<DATE>/<VIDEO_NAME>. Each stage of optimization will produce a separate subdirectory, each of which will contain outputs saved throughout the optimization and rendered videos of the final result for that stage of optimization. The motion_chunks directory contains the outputs of the final stage of optimization, root_fit and smooth_fit contain outputs of short, intermediate stages of optimization, and init contains the initialized outputs before optimization.

We've provided a run_vis.py script for running visualization from logs after optimization. From the slahmr directory, run

python run_vis.py --log_root <LOG_ROOT>

and it will visualize all log subdirectories in <LOG_ROOT>. Each output npz file will contain the SMPL parameters for all optimized people, the camera intrinsics and extrinsics. The motion_chunks output will contain additional predictions from the motion prior. Please see run_vis.py for how to extract the people meshes from the output parameters.

Fitting to specific datasets:

We provide configurations for dataset formats in slahmr/confs/data:

  1. Posetrack in slahmr/confs/data/posetrack.yaml
  2. Egobody in slahmr/confs/data/egobody.yaml
  3. 3DPW in slahmr/confs/data/3dpw.yaml
  4. Custom video in slahmr/confs/data/video.yaml

Please make sure to update all paths to data in the config files.

We include tools to both process existing datasets we evaluated on in the paper, and to process custom data and videos. We include experiments from the paper on the Egobody, Posetrack, and 3DPW datasets.

If you want to run on a large number of videos, or if you want to select specific people tracks for optimization, we recommend preprocesing in advance. For a single downloaded video, there is no need to run preprocessing in advance.

From the slahmr/preproc directory, run PHALP on all your sequences

python launch_phalp.py --type <DATASET_TYPE> --root <DATASET_ROOT> --split <DATASET_SPLIT> --gpus <GPUS>

and run DROID-SLAM on all your sequences

python launch_slam.py --type <DATASET_TYPE> --root <DATASET_ROOT> --split <DATASET_SPLIT> --gpus <GPUS>

You can also update the paths to datasets in slahmr/preproc/datasets.py for repeated use.

Then, from the slahmr directory,

python run_opt.py data=<DATA_CFG> run_opt=True run_vis=True

We've provided a helper script launch.py for launching many optimization jobs in parallel. You can specify job-specific arguments with a job spec file, such as the example files in job_specs, and batch-specific arguments shared across all jobs as

python launch.py --gpus 1 2 -f job_specs/pt_val_shots.txt -s data=posetrack exp_name=posetrack_val

Evaluation on 3D datasets

After launching and completing optimization on either the Egobody or 3DPW datasets, you can evaluate the outputs with scripts in the eval directory. Before running, please update EGOBODY_ROOT and TDPW_ROOT in eval/tools.py. Then, run

python run_eval.py -d <DSET_TYPE> -i <RES_ROOT> -f <JOB_FILE>

where <JOB_FILE> is the same job file used to launch all optimization runs.

BibTeX

If you use our code in your research, please cite the following paper:

@inproceedings{ye2023slahmr,
    title={Decoupling Human and Camera Motion from Videos in the Wild},
    author={Ye, Vickie and Pavlakos, Georgios and Malik, Jitendra and Kanazawa, Angjoo},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2023}
}

slahmr's People

Contributors

brentyi avatar geopavlakos avatar songweige avatar vye16 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slahmr's Issues

evaluate on 3dpw but got incorrect camera visualzation

I run fitting on the 3dpw dataset with the command line python run_opt.py data=3dpw run_opt=True run_vis=True, but the visualization of the final result has an incorrect camera and the optimized joints are not aligned with the image.

downtown_arguing_00_input_grid.mp4

Questions about 3DPW eval

Thank you for your great work.
The question is: when I optimized the 3DPW dataset with the example command of launch.py, which person tracks did I use, PHALP preds or gt? Because I saw "3DPW* uses the ground truth person tracks" in the Table 2.
If the program use the predicted values, what should I do to make the program use gt values?

Couldn't install all dependencies

Hi,

I am very interested in your project and I would like to try it out on my own data. However, I am having trouble installing the environment on my system. I first tried to use conda env create -f env.yml as suggested in the README, it takes forever to solve the environment then I started to try installing the dependencies one by one. But it seems like some of the packages are conflict with each other.

I was wondering if you could provide a docker image of the project that contains everything needed to run it. This would make it much easier for me and other users to use your code without worrying about installation issues.

I appreciate your work and I hope you can consider my request. Thank you very much.

My system information:
Operating system: Ubuntu 22.4
Python version: 3.9
Conda version: 23.1.0
Pip version: 23.0.1

FileNotFound error model_config.yaml

Hello, I have been trying to install slahmr. I went through then installation instructions on the github, and I think I got everything. However, I am getting this error when running python run_opt.py data=video data.seq=022691_mpii_test data.root={my_root_dir} run_opt=False run_vis=False

FileNotFoundError: [Errno 2] No such file or directory: '{root}/.cache/4DHumans/logs/train/multiruns/hmr2/0/model_config.yaml'

and

FileNotFoundError: [Errno 2] No such file or directory: '{root}/slahmr/demo/slahmr/phalp_out/022691_mpii_test/results/demo_022691_mpii_test.pkl' -> '{root}/slahmr/demo/slahmr/phalp_out/022691_mpii_test/022691_mpii_test.pkl'

Does anyone have any experience with this? Any help would be appreciated. Thanks!

colab error in the last step

the cells works fine till the last cell i get this error

the code line

def show_local_mp4_video(file_name, width=640, height=480):
import io
import base64
from IPython.display import HTML
video_encoded = base64.b64encode(io.open(file_name, 'rb').read())
return HTML(data=''''''.format(width, height, video_encoded.decode('ascii')))

import glob
mp4s = glob.glob('/content/slahmr/outputs/logs/video-val///*_motion_chunks_grid.mp4')
show_local_mp4_video(mp4s[0], width=960, height=720)

here is the error that shows when running

IndexError Traceback (most recent call last)
in <cell line: 12>()
10 import glob
11 mp4s = glob.glob('/content/slahmr/outputs/logs/video-val///*_motion_chunks_grid.mp4')
---> 12 show_local_mp4_video(mp4s[0], width=640, height=480)

IndexError: list index out of range

NaN Values when optimizing over larger batch sizes or odd pose videos

Hi!

I've been experimenting with ways to make the motion chunk optimization step faster when inferencing over a single video (it takes over 1-2+ hours.. on an A10 GPU whereas without the motion chunk step, it takes 10 min).
I tried experimenting with batch size of 10. Results are still yet to be determined but I've noticed that, sometimes I get Nan values in the forward pass:

File "/home/ubuntu/slahmr/slahmr/optim/losses.py", line 258, in forward
    cur_loss = self.init_motion_prior_loss(
  File "/home/ubuntu/mambaforge/envs/slahmr/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/slahmr/slahmr/optim/losses.py", line 550, in forward
    loss = -self.init_motion_prior["gmm"].log_prob(init_state)
  File "/home/ubuntu/mambaforge/envs/slahmr/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py", line 150, in log_prob
    self._validate_sample(x)
  File "/home/ubuntu/mambaforge/envs/slahmr/lib/python3.10/site-packages/torch/distributions/distribution.py", line 294, in _validate_sample
    raise ValueError(
ValueError: Expected value argument (Tensor of shape (12, 138)) to be within the support (IndependentConstraint(Real(), 1)) of the distribution MixtureSameFamily(
  Categorical(probs: torch.Size([12]), logits: torch.Size([12])),
  MultivariateNormal(loc: torch.Size([12, 138]), covariance_matrix: torch.Size([12, 138, 138]))), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<CatBackward0>)

This happens when I either increase the batch size or put in a video where someone is in unnatural start poses (e.g hanging upside down, etc.). Is this because the optimization is failing to converge? Is there a way to remedy this? If increasing the batch size is not possible, what are recommended ways to make this run faster on a single video? Because the motion chunk optimization step is taking such a huge amount of time on a good GPU for one video..

Note: without the motion chunk optimization step, all videos run fine so this issue only arises at that final step.

Thanks a lot!

Dual steps in Colab notebook

Hi. I was playing around with the Colab notebook and was confused on these two commands (preprocessing and optimizing):

python run_opt.py data=video data.seq=test data.root=/content/slahmr/demo run_opt=False run_vis=False
python run_opt.py data=video data.seq=test data.root=/content/slahmr/demo run_opt=True run_vis=False

Are both necessary to run for every video?

Segmentation fault while running run_vis,py

Setting ground pose to be tensor([[ 1.0000e+00, 7.3667e-04, -1.2146e-04, -1.4130e-01],
[-7.3667e-04, 9.4707e-01, -3.2104e-01, -9.4225e-01],
[-1.2146e-04, 3.2104e-01, 9.4707e-01, -4.9819e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]])
ADDING CAMERA MARKERS
RENDERING VIEWS dict_keys(['src_cam', 'front', 'above', 'side'])
src_cam has 130 poses
Adding camera sequence length 130
ANIMATION LENGTH 130
Fatal Python error: Segmentation fault

Current thread 0x00007f2098654100 (most recent call first):
File "/root/ENTER/envs/slahmr/lib/python3.10/site-packages/OpenGL/platform/baseplatform.py", line 409 in call
File "/root/ENTER/envs/slahmr/lib/python3.10/site-packages/pyrender/texture.py", line 254 in _bind_as_depth_attachment
File "/root/ENTER/envs/slahmr/lib/python3.10/site-packages/pyrender/renderer.py", line 1027 in _configure_shadow_mapping_viewport
File "/root/ENTER/envs/slahmr/lib/python3.10/site-packages/pyrender/renderer.py", line 412 in _shadow_mapping_pass
File "/root/ENTER/envs/slahmr/lib/python3.10/site-packages/pyrender/renderer.py", line 141 in render
File "/root/ENTER/envs/slahmr/lib/python3.10/site-packages/pyrender/offscreen.py", line 102 in render
File "/data/changjian/slahmr/slahmr/vis/viewer.py", line 292 in render
File "/data/changjian/slahmr/slahmr/vis/viewer.py", line 394 in render_frames
File "/data/changjian/slahmr/slahmr/vis/viewer.py", line 370 in animate
File "/data/changjian/slahmr/slahmr/vis/output.py", line 146 in animate_scene
File "/data/changjian/slahmr/slahmr/run_vis.py", line 194 in render_results
File "/data/changjian/slahmr/slahmr/run_vis.py", line 95 in run_vis
File "/data/changjian/slahmr/slahmr/run_vis.py", line 217 in visualize_log
File "/data/changjian/slahmr/slahmr/run_vis.py", line 232 in launch_vis
File "/data/changjian/slahmr/slahmr/run_vis.py", line 271 in main
File "/data/changjian/slahmr/slahmr/run_vis.py", line 297 in

CUDA out of memory

Hi, I get CUDA out-of-memory error when running your model. Specifically, when running PHALP_plus with the following command on a GPU GTX1080ti with 11GB of memory:

cd slahmr/third-party/PHALP_plus; CUDA_VISIBLE_DEVICES=0 python run_phalp.py --base_path slahmr/videos/demo/images/ --video_seq 022691_mpii_test --sample '' --storage_folder slahmr/videos/demo/slahmr/phalp_out --track_dataset posetrack-val --predict TPL --distance_type EQ_010 --encode_type 4c --detect_shots True --track_history 7 --past_lookback 1 --max_age_track 50 --n_init 5 --low_th_c 0.8 --alpha 0.1 --hungarian_th 100 --render_type HUMAN_FULL_FAST --render True --store_mask True --res 256 --render_up_scale 2 --verbose False --overwrite False --use_gt False --batch_id -1 --detection_type mask --start_frame -1

I use the sample video 022691_mpii_test.mp4. The problem is with the detector, the model name seems to be GeneralizedRCNN. Is it normal to require high memory for running PHALP?

Best,

What is the use of cameras.json?

Dear authors,

Thank you for your great work. I am confused about this cameras.json produced while running run_opt.py.

save_camera_json(f"cameras.json", cam_R, cam_t, intrins)

  1. What is the use of this cameras.json? It differs from all the other camera parameters stored in /smooth_fit or /root_fit or /motion_chunk (e.g. XXX_cameras_000000.json) and it seems not to be used anywhere.
  2. Also, the camera parameters in XXX_cameras_000060.json stored in /smooth_fit/ seem to be different from the camera parameters stored in XXX_000060_world_results.npz. Could you tell me why is the parameters in JSON file needed a transformation like below? (as it's also not used anywhere)
    image

How can I speed up the inference when testing on large-scale videos?

Hi, thanks for your excellent work! I have a question regarding speeding up inference when testing on large-scale videos. In README, You recommend preprocessing in advance. However, the tracking is still run with batch_size=1, which is quite slow. Is it possible to use a larger batch_size to speed up the tracking? BTW, could you please tell me the advantage of preprocessing in advance?

Unable to run on Cloud GPU

Hi. I try to run the preprocessing command (on a random test video I have) on an A10 cloud GPU but I keep getting this (I verified that all works well on Colab already):

`Traceback (most recent call last):
File "/home/ubuntu/slahmr/slahmr/preproc/track.py", line 101, in main
phalp_tracker = PHALP_Prime_HMR2(cfg)
File "/home/ubuntu/slahmr/slahmr/preproc/track.py", line 53, in init
super().init(cfg)
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/phalp/trackers/PHALP.py", line 52, in init
self.setup_hmr()
File "/home/ubuntu/slahmr/slahmr/preproc/track.py", line 58, in setup_hmr
self.HMAR = HMR2Predictor(self.cfg)
File "/home/ubuntu/slahmr/slahmr/preproc/track.py", line 30, in init
model, _ = load_hmr2()
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/hmr2/models/init.py", line 36, in load_hmr2
model = HMR2.load_from_checkpoint(checkpoint_path, strict=False, cfg=model_cfg)
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1520, in load_from_checkpoint
loaded = _load_from_checkpoint(
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 90, in _load_from_checkpoint
model = _load_state(cls, checkpoint, strict=strict, kwargs)
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/pytorch_lightning/core/saving.py", line 143, in _load_state
obj = cls(
_cls_kwargs)
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/hmr2/models/hmr2.py", line 59, in init
self.mesh_renderer = MeshRenderer(self.cfg, faces=self.smpl.faces)
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/hmr2/utils/mesh_renderer.py", line 49, in init
self.renderer = pyrender.OffscreenRenderer(viewport_width=self.img_res,
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/pyrender/offscreen.py", line 31, in init
self._create()
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/pyrender/offscreen.py", line 137, in _create
egl_device = egl.get_device_by_index(device_id)
File "/home/ubuntu/anaconda3/envs/slahmr/lib/python3.10/site-packages/pyrender/platforms/egl.py", line 83, in get_device_by_index
raise ValueError('Invalid device ID ({})'.format(device_id, len(devices)))
ValueError: Invalid device ID (0)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Error executing job with overrides: ['data=video', 'data.seq=test', 'data.root=/home/ubuntu/slahmr/demo', 'run_opt=False', 'run_vis=False']
Traceback (most recent call last):
File "/home/ubuntu/slahmr/slahmr/run_opt.py", line 175, in main
dataset = get_dataset_from_cfg(cfg)
File "/home/ubuntu/slahmr/slahmr/data/dataset.py", line 41, in get_dataset_from_cfg
check_data_sources(args)
File "/home/ubuntu/slahmr/slahmr/data/dataset.py", line 70, in check_data_sources
preprocess_tracks(args.sources.images, args.sources.tracks, args.sources.shots)
File "/home/ubuntu/slahmr/slahmr/data/vidproc.py", line 40, in preprocess_tracks
phalp.process_seq(
File "/home/ubuntu/slahmr/slahmr/preproc/launch_phalp.py", line 54, in process_seq
os.rename(f"{res_dir}/results/demo_{seq}.pkl", res_path)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/slahmr/demo/slahmr/phalp_out/results/demo_test.pkl' -> '/home/ubuntu/slahmr/demo/slahmr/phalp_out/results/test.pkl'`

I'm confused on how to solve this. I verified that my GPU works already.

Output Explanation

Hello!

I was wondering if someone could provide me with some details about the outputs from slahmr. I'm currently getting the outputs from the world_results.npz files, and am printing out all the outputs as well as their shapes. Here is what I'm getting.

world_scale: (1, 1)
joints_vel: (1, 1, 22, 3)
trans_vel: (1, 1, 3)
hand_pose: (1, 146, 90)
latent_motion: (1, 145, 48)
floor_plane: (1, 3)
floor_idcs: (1,)
trans: (1, 146, 3)
root_orient: (1, 146, 3)
betas: (1, 16)
latent_pose: (1, 1, 32)
root_orient_vel: (1, 1, 3)
pose_body: (1, 146, 63)
cam_R: (1, 146, 3, 3)
cam_t: (1, 146, 3)
intrins: (4,)
pose_hand: (1, 146, 90)
track_mask: (1, 146)

I'm assuming 146 represents the number of frames in the video, but correct me if I'm wrong. I have a lot of assumptions based on reading the code and the paper of what these mean, but I just wanted to clarify . I am mostly curious of which outputs are positions versus orientations and which form if so.

I'm especially interested in the outputs hand_pose, trans, trans_vel, pose_body, root_orient, root_orient_vel, joints_vel, and latent_motion. Thanks a lot for the help!

Solution to omit frames where there are no people? It is causing 0 Batch size issue

Hi, I am running SLAHMR on a video where for the first couple seconds there is no person in the video (person shows up later).

I saw the PHALP results and confirmed they work fine. However, the run_opt.py fails with the below error:

saved 1 keyframes to /home/ubuntu/slahmr/slahmr/cameras/testVid1/shot-0
FOUND 9/1453 FRAMES FOR SHOT 0
USING TOTAL 9 888x1920 IMGS
TRACK LENGTHS [] [0]
TRACK IDS []
START 0 END 0
SAVED TRACK INFO
Error executing job with overrides: ['data=video', 'run_opt=True', 'run_vis=True']
Traceback (most recent call last):
File "/home/ubuntu/slahmr/slahmr/run_opt.py", line 144, in main
run_opt(cfg, dataset, out_dir, device)
File "/home/ubuntu/slahmr/slahmr/run_opt.py", line 53, in run_opt
loader = DataLoader(dataset, batch_size=B, shuffle=False)
File "/home/ubuntu/mambaforge/envs/slahmr/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 350, in init
batch_sampler = BatchSampler(sampler, batch_size, drop_last)
File "/home/ubuntu/mambaforge/envs/slahmr/lib/python3.10/site-packages/torch/utils/data/sampler.py", line 232, in init
raise ValueError("batch_size should be a positive integer value, "
ValueError: batch_size should be a positive integer value, but got batch_size=0

Is there a way to automatically slice the parts of the video where no people are detected and fix this issue?

SMPL fittings in motion_chunks_final looks worse than input

Hi,
Nice work! I tested your method on a custom video. The 3D positions look good. However, the SMPL fittings in the final visualization video (motion_chunks_final_000260_src_cam.mp4), as shown in the first screenshot, somehow look worse than those in the input video (input_final_000000_src_cam.mp4), as shown in the second screenshot, especially the feet poses.

Screenshot 2023-06-25 at 12 40 45 Screenshot 2023-06-25 at 12 40 57

I also runned the optimization on the demo video you provided in the colab notebook, and the SMPL fittings do look better in the motion_chunks_final video. Do you have any idea why the SMPL fittings are worse after optimization in my custom video?

Conda env create stalls and loops indefinitely

Current Behavior

Conda install hangs and loops indefinitely

Steps to Reproduce

It stalls out when executing the following command:
conda env create -f env.yaml

It runs correctly if you comment out conda-forge in env.yaml but it doesn't create the environment.

Expected Behavior

Conda installs all packages successfully and activates the environment without errors

Environment Information

conda info

  active environment : base
  active env location : /home/shawn/anaconda3
     	shell level : 1
	user config file : /home/shawn/.condarc
populated config files :
   	conda version : 23.1.0
 conda-build version : 3.22.0
  	python version : 3.9.13.final.0
	virtual packages : __archspec=1=x86_64
                   	__glibc=2.35=0
                   	__linux=5.19.0=0
                   	__unix=0=0
	base environment : /home/shawn/anaconda3  (writable)
 conda av data dir : /home/shawn/anaconda3/etc/conda
conda av metadata url : None
    	channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                   	https://repo.anaconda.com/pkgs/main/noarch
                   	https://repo.anaconda.com/pkgs/r/linux-64
                   	https://repo.anaconda.com/pkgs/r/noarch
   	package cache : /home/shawn/anaconda3/pkgs
                   	/home/shawn/.conda/pkgs
	envs directories : /home/shawn/anaconda3/envs
                   	/home/shawn/.conda/envs
        	platform : linux-64
      	user-agent : conda/23.1.0 requests/2.28.1 CPython/3.9.13 Linux/5.19.0-32-generic ubuntu/22.04.1 glibc/2.35
         	UID:GID : 1000:1000
      	netrc file : None
    	offline mode : False

Please let me know if you need any further information to help reproduce this problem.

Thanks!
Shawn

Camera Data Loading: Move first Camera to Origin

I have a question about the meaning of a particular code snippet in the CameraData.load_data function, lines 325-237:

t0 = -cam_t[sidx:sidx+1] + torch.randn(3) * 0.1
self.cam_R = cam_R[sidx:eidx]
self.cam_t = cam_t[sidx:eidx] - t0

My understanding is that cam_R and cam_t together form the world-2-cam matrix, so cam_t does not represent the position of the camera in world coordinates. Given this, I am confused what this code snippet really is meant to do and if this is not a bug in the implementation.

Point cloud coordinates are all zeros

Hi, thanks a lot for releasing the code for SLAHMR and providing such detailed instructions.

I was just wondering why in this line of code of the DROID-SLAM module, it outputs a point cloud with all-zero coordinates, which is weird. The original DROID-SLAM seems to behave properly.

Thank you in advance.

Why is it super slow ? the rendering time is too much,

Hi

SLAHMR give excellent result! Probably one of the best if not the best mocap on github, but the rendering time is extremely slow even with an RTX 4090

there is WHAM that is much faster,
https://github.com/yohanshin/WHAM
but SLAHMR give better mocap results.

I just hope if you can improve the rendering speed because it's truly excessive, a 20s video rendered with WHAM take 1minute but with SLAHMR at least 10 hours.

Thank you!

Why I have no output?

(slahmr) iot@iot-Precision-7920-Tower:~/zjq/slahmr/slahmr$ python run_opt.py data=video data.seq=hyt data.root=/home/iot/zjq/slahmr run_opt=True run_vis=True
out_dir /home/iot/zjq/slahmr/outputs/logs/video-val/2023-12-10/hyt-all-shot-0-0-180
/home/iot/zjq/slahmr/slahmr/cameras/hyt/shot-0 does not exist
/home/iot/zjq/slahmr/slahmr/track_preds/hyt does not exist
/home/iot/zjq/slahmr/slahmr/shot_idcs/hyt.json does not exist
SOURCES {'images': '/home/iot/zjq/slahmr/images/hyt', 'cameras': '/home/iot/zjq/slahmr/slahmr/cameras/hyt/shot-0', 'tracks': '/home/iot/zjq/slahmr/slahmr/track_preds/hyt', 'shots': '/home/iot/zjq/slahmr/slahmr/shot_idcs/hyt.json'}
/home/iot/zjq/slahmr/slahmr/cameras/hyt/shot-0 does not exist
/home/iot/zjq/slahmr/slahmr/track_preds/hyt does not exist
/home/iot/zjq/slahmr/slahmr/shot_idcs/hyt.json does not exist
DATA SOURCES {'images': '/home/iot/zjq/slahmr/images/hyt', 'cameras': '/home/iot/zjq/slahmr/slahmr/cameras/hyt/shot-0', 'tracks': '/home/iot/zjq/slahmr/slahmr/track_preds/hyt', 'shots': '/home/iot/zjq/slahmr/slahmr/shot_idcs/hyt.json'}
FOUND 717 FRAMES in /home/iot/zjq/slahmr/images/hyt
RUNNING PHALP ON /home/iot/zjq/slahmr/images/hyt
PROCESS MainProcess ()
PHALP DIR /home/iot/zjq/slahmr/slahmr/preproc
cd /home/iot/zjq/slahmr/slahmr/preproc; CUDA_VISIBLE_DEVICES=0 python track.py video.source=/home/iot/zjq/slahmr/images/hyt video.output_dir=/home/iot/zjq/slahmr/slahmr/phalp_out/hyt overwrite=False detect_shots=True video.extract_video=False render.enable=False
[12/10 20:46:50] INFO OpenGL_accelerate module loaded

Issues running Colab

Hi! I wanted to try out the SLAHMR project on my own video and was trying to run the provided Colab notebook.

However, I've been running into an issue on the "run on demo video" block ("AttributeError: 'Namespace' object has no attribute 'vis'"). I've been trying workarounds including using more absolute paths and manipulating arguments but I haven't had luck in bypassing this just yet.

Does anyone know how to get around this? Thanks!

Issues running Colab

Hi! I wanted to try out the SLAHMR project on my own video and was trying to run the provided Colab notebook.

However, I've been running into an issue on the "run on demo video" block ("AttributeError: 'Namespace' object has no attribute 'vis'"). I've been trying workarounds including using more absolute paths and manipulating arguments but I haven't had luck in bypassing this just yet.

Does anyone know how to get around this? Thanks!

Error when running long sequence

Hi, I had some problems, the code crushed when I tried to run 1000 frames of my video. I saw the default number of input frames is 120. I am wondering if you have tried to run the code on a longer sequence.

Thank you very much!

Could you please provide information about output?

Thank you for sharing your work as an open-source project. I appreciate your efforts and the quality of your code.

I recently ran the code and obtained a lot of output, but I'm not sure what it means. Would it be possible to provide some introductory information about the output?

Thank you for your help, and I look forward to your response.

run_slam.py error for the video with fixed camera

Hi,

As I mentioned in another #2 issue, I ran into the following error when I tried to use my own custom video.
File "/home/eric-gtxrar/documents/slahmr/third-party/DROID-SLAM/droid_slam/factor_graph.py", line 374, in add_proximity_factors ii, jj = torch.as_tensor(es, device=self.device).unbind(dim=-1) ValueError: not enough values to unpack (expected 2, got 0)

After looking into the code and testing with different videos, I realized that if the camera is fully fixed (e.g. mounted on a tripod) in the video, it will have the above error. But the camera is not fixed, I can successfully generate the result below. Any suggestion to make it work for videos with a fixed camera would be appreciated. Thanks!

image

mmpose version

Hi,

Thanks for sharing your great work! I notice that your code of phalp_plus uses mmpose which has many versions. Could you share the version of mmpose, mmdet and mmcv in your server to let me configure the conda environment easier?

Issue in code?

Hi,

I am running SLAHMR on some long-sequence video and I realize that there seems to have small mistake in your code.

In dataset.py line:325-327, I think you intended to normalize camera translation to the origin when we are splitting the sequence into small batches. However, since t0 is defined as:

t0 = -cam_t[sidx:sidx+1] + torch.randn(3) * 0.1

I believe the code at line:327 needs to be changed to:
self.cam_t = cam_t[sidx:eidx] - t0 --> self.cam_t = cam_t[sidx:eidx] + t0 .

When I fix this, it works well on long-range video. Let me know if this is correct way to fix the code.

Google Colab not working

Hi,

I was trying to run the official colab file provided for this project. However, It took really long time during this phase,

image
(as u can see, it took 56 minutes during this installation).

Moreover, at the end of the notebook there is no video output to be played, even though no error is produced when I run the model.

image image
(this error happens as there is no mp4 video to be played)

Camera motion information

I noticed from the examples that there is a possbility to get the motion that the camera does.
where can I get it?

I've found what seems to be camera data in many places, but seems that there is no useful information on those places,
image

and inside the npz files from world, they also seems that doesnt have no useful info, for example:
image

Is it anywhere else, or maybe is a problem on my installation?

thanks

basicModel_neutral_lbs_10_207_0_v1.0.0.pkl 404

Hi!

Great work! I am trying to use the Colab. I found https://github.com/classner/up/raw/master/models/3D/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl is 404 now. I tried to replace this file from another source but PHALP is not working. python run_opt.py data=video data.seq=022691_mpii_test data.root=/content/slahmr/demo run_opt=False run_vis=False finishes in 30 seconds and it just failed and returned nothing. Thank you for the help!

RUNNING PHALP ON /content/slahmr/demo/images/022691_mpii_test
PROCESS MainProcess ()
PHALP DIR /content/slahmr/slahmr/preproc
cd /content/slahmr/slahmr/preproc; CUDA_VISIBLE_DEVICES=0 python track.py video.source=/content/slahmr/demo/images/022691_mpii_test video.output_dir=/content/slahmr/demo/slahmr/phalp_out/022691_mpii_test overwrite=False detect_shots=True video.extract_video=False render.enable=False
[11/21 18:38:31] INFO No OpenGL_accelerate module loaded: No module

ValueError: Expected value argument (Tensor of shape (1, 138))

I ran several times without problem, but sometimes it gives this sort of error in the middle of the processing

ValueError: Expected value argument (Tensor of shape (1, 138)) to be within the support (IndependentConstraint(Real(), 1)) of the distribution MixtureSameFamily(
Categorical(probs: torch.Size([12]), logits: torch.Size([12])),
MultivariateNormal(loc: torch.Size([12, 138]), covariance_matrix: torch.Size([12, 138, 138]))), but found invalid values:
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]],
device='cuda:0', grad_fn=)

here is a print
image

Why is camera coordinate data passed to SMPLLoss during computing MotionLoss?

Dear authors,

In SmoothOptimizer and RootOptimizer, I think the joints3d/joints3d_op in SMPLLoss/RootLoss is in the world coordinate system. And the re-projection is performed to get the proj_2d joints from (world -> pixel coordinates).

However, it seems like in MotionLoss, the cam_pred_data is passed into RootLoss:

loss, stats_dict = super().forward(
observed_data, cam_pred_data, nsteps, valid_mask=valid_mask
)

Whilst I think from the logic in rollout_latent_motion(), it looks like cam_pred_data is actually still in the world coordinate system, and is transformed back from the local coordinate system used in the rollout procedure. I am therefore confused about the name of the variables: Why is this data named cam_pred_data with a comment above saying "must return trans and root orient in camera frame...", while the function name is apply_world2prior? And it doesn't look like camera coordinate system data to me.

I just would like to confirm whether this is a naming issue or I misunderstood it. Thank you!

Colab not working

Hey, thank you very much for providing a Google Colab for exploration.

However, the Colab does not work for me.

For example, when running the demo video, I encounter the

ModuleNotFoundError: No module named 'tensorboard'

I can see that this is actually in the requirements.txt. I tried several times to start the Colab from scratch, and I probably could resolve the issues, but I think it would be good for the broader community if you fixed the Colab.

Kind regards

issues with run_vis

When I check the output directory of my video, all the video renderings use frames that jump from place to place along the original video, and the same thing loops continuously for the duration of the original video, which leads me to believe that the optimization is being done on this incorrect repeated sequence of images instead of the original video. Yet, the 'images' directory on the root of the repository generated by the slahmr run contain correct image files for the video input (no jumps, no repeated images).

Information on Output Files

Hi, thank you for the amazing work.
Could you please provide a short description in Readme to explain what information each output file contains?
It is a bit hard to backtrack what each output file is for and which conventions are used (keypoints).

about scipy version

hi, bro. very nice work!I want to know what version of scipy you have installed. I installed scipy=1.11.0 and reported an error
image

File with motion information

Hello. I was able to instal SLAHMR on my windows machine and processed a video.
I would like to know where I can get the file with the information about the motion of the characters and the enviroment?

And how I could load them for analisys and for me to try to load that data in Blender.
I alredy did an importer for 4dhumans and was expecting to have similar data output, but when the process ended i was left with these files
image

and I couldnt find any information on how to load it and what the data inside means.

thanks a lot for everything.

Project Instructions

I want to try your code on my video. Can you provide more detailed instructions? It's hard to run your code.

Can this project not be configured in windows environment?

Can this project not be configured in windows environment?
Finally, an error occurred:
ImportError: ('Unable to load EGL library', "Could not find module 'EGL' (or one of its dependencies). Try using the full path with constru
ctor syntax.", 'EGL', None)

Preprocessing vs optimizing

Hi. I was playing around with the Colab notebook and was confused on these two commands:

  1. python run_opt.py data=video data.seq=test data.root=/content/slahmr/demo run_opt=False run_vis=False
  2. python run_opt.py data=video data.seq=test data.root=/content/slahmr/demo run_opt=True run_vis=False

Would I always have to run preprocessing then optimizing? Is there a more efficient way to run this?

Time limit on length of output?

It seems that the output tracking data is limited to ~6 seconds even when the input videos are longer. Is this just an artifact of the visualization videos? Is there a way to generate longer tracking videos?

Continue processing

Is there a possibility to stop processing and continue later?
I tried forcing the process to stop and them start again, but it usually ends up without cotinuing processing, it seems that it sees that there was something running before and just wraps up.

I'm asking that in case I need to stop the process and if I could conitnue later, or in a case that I get a 'nan" value, if I could change the learning rate and run again not loosing all the previous process that was done

Thanks

About initial root translation

Thanks for your wonder work,

I am quite confused about why we use init root translation from this instead of predicted from HMR model (eg. HMR2.0)

Could you help me understand more about this step?

How to fix demo code freezing when not detecting people?

Thank you for sharing such a great project.

However, I would like to ask you one thing.

I have noticed that the current demo code stops when it does not detect a person.

It is difficult for me to figure out how to modify the demo code to solve this problem.

Could you please help me?

I will share the video file I used.

Link: https://drive.google.com/drive/folders/1UKngAcv3nVajudup_h9xNZwaOryIjABO?usp=sharing

Description
-'le_crop_1.mp4' : X video without people at the beginning
-'le_crop_2.mp4' : O video with all detected people

Cuda Out of Memory

This is a follow up issue of Issue #9. I am also using a 1080Ti machine similar to issue #9. Please refer to issue #9 for more details).

The steps mentioned there weren't working directly as the repository has moved ahead from the file structure which existed when the above solution was provided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.