jacobkrantz / vln-ce Goto Github PK

View Code? Open in Web Editor NEW

227.0 12.0 51.0 59.74 MB

Vision-and-Language Navigation in Continuous Environments using Habitat

Home Page: https://jacobkrantz.github.io/vlnce/

License: MIT License

Python 99.22% Shell 0.78%

ai computer-vision robotics deep-learning research python

vln-ce's Introduction

Vision-and-Language Navigation in Continuous Environments (VLN-CE)

Project Website — VLN-CE Challenge — RxR-Habitat Challenge

Official implementations:

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments (paper)
Waypoint Models for Instruction-guided Navigation in Continuous Environments (paper, README)

Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. This repo is a launching point for interacting with the VLN-CE task and provides both baseline agents and training methods. Both the Room-to-Room (R2R) and the Room-Across-Room (RxR) datasets are supported. VLN-CE is implemented using the Habitat platform.

Setup

This project is developed with Python 3.6. If you are using miniconda or anaconda, you can create an environment:

conda create -n vlnce python3.6
conda activate vlnce

VLN-CE uses Habitat-Sim 0.1.7 which can be built from source or installed from conda:

conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless

Then install Habitat-Lab:

git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
# installs both habitat and habitat_baselines
python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all

Now you can install VLN-CE:

git clone [email protected]:jacobkrantz/VLN-CE.git
cd VLN-CE
python -m pip install -r requirements.txt

Data

Scenes: Matterport3D

Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract such that it has the form data/scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes.

Episodes: Room-to-Room (R2R)

The R2R_VLNCE dataset is a port of the Room-to-Room (R2R) dataset created by Anderson et al for use with the Matterport3DSimulator (MP3D-Sim). For details on porting to 3D reconstructions, please see our paper. R2R_VLNCE_v1-3 is a minimal version of the dataset and R2R_VLNCE_v1-3_preprocessed runs baseline models out of the box. See the dataset page for format, contents, and a changelog. We encourage use of the most recent version (v1-3).

Dataset	Extract path	Size
R2R_VLNCE_v1-3.zip	`data/datasets/R2R_VLNCE_v1-3`	3 MB
R2R_VLNCE_v1-3_preprocessed.zip	`data/datasets/R2R_VLNCE_v1-3_preprocessed`	250 MB

Downloading via CLI:

# R2R_VLNCE_v1-3
gdown https://drive.google.com/uc?id=1T9SjqZWyR2PCLSXYkFckfDeIs6Un0Rjm
# R2R_VLNCE_v1-3_preprocessed
gdown https://drive.google.com/uc?id=1fo8F4NKgZDH-bPSdVU3cONAkt5EW-tyr

Encoder Weights

Baseline models encode depth observations using a ResNet pre-trained on PointGoal navigation. Those weights can be downloaded from here (672M). Extract the contents to data/ddppo-models/{model}.pth.

Episodes: Room-Across-Room (RxR)

Download: RxR_VLNCE_v0.zip

About the Room-Across-Room dataset (RxR):

multilingual instructions (English, Hindi, Telugu)
an order of magnitude larger than existing datasets
varied paths to break a shortest-path-to-goal assumption

RxR was ported to continuous environments originally for the RxR-Habitat Challenge. The dataset has train, val_seen, val_unseen, and test_challenge splits with both Guide and Follower trajectories ported. The starter code expects files in this structure:

data/datasets
├─ RxR_VLNCE_v0
|   ├─ train
|   |    ├─ train_guide.json.gz
|   |    ├─ train_guide_gt.json.gz
|   |    ├─ train_follower.json.gz
|   |    ├─ train_follower_gt.json.gz
|   ├─ val_seen
|   |    ├─ val_seen_guide.json.gz
|   |    ├─ val_seen_guide_gt.json.gz
|   |    ├─ val_seen_follower.json.gz
|   |    ├─ val_seen_follower_gt.json.gz
|   ├─ val_unseen
|   |    ├─ val_unseen_guide.json.gz
|   |    ├─ val_unseen_guide_gt.json.gz
|   |    ├─ val_unseen_follower.json.gz
|   |    ├─ val_unseen_follower_gt.json.gz
|   ├─ test_challenge
|   |    ├─ test_challenge_guide.json.gz
|   ├─ text_features
|   |    ├─ ...

The baseline models for RxR-Habitat use precomputed BERT instruction features which can be downloaded from here and saved to data/datasets/RxR_VLNCE_v0/text_features/rxr_{split}/{instruction_id}_{language}_text_features.npz.

RxR-Habitat Challenge

NEW: The 2023 RxR-Habitat Challenge is live!

Challenge webpage: ai.google.com/research/rxr/habitat
Workshop webpage: embodied-ai.org

The RxR-Habitat is hosted at the CVPR 2023 Embodied AI workshop set for June 19th, 2023. The leaderboard opens for challenge submissions on March 1. For official guidelines, please visit: ai.google.com/research/rxr/habitat. We encourage submissions on this dificult task!

The RxR-Habitat Challenge is hosted by Oregon State University, Google Research, and Meta AI. This is the third year of the RxR-Habitat Challenge which previously appeared at the 2021 and 2022 CVPR EAI workshop.

Timeline

Event	Date
Challenge Launch	Mar 17, 2023
Leaderboard Open	Mar 20, 2023
Leaderboard Closes	May 15, 2023
Workshop and Winners Announcement	Jun 19, 2023

Generating Submissions

Submissions are made by running an agent locally and submitting a jsonlines file (.jsonl) containing the agent's trajectories. Starter code for generating this file is provided in the function BaseVLNCETrainer.inference(). Here is an example of generating predictions for English using the Cross-Modal Attention baseline:

python run.py \
  --exp-config vlnce_baselines/config/rxr_baselines/rxr_cma_en.yaml \
  --run-type inference

If you use different models for different languages, you can merge their predictions with scripts/merge_inference_predictions.py. Submissions are only accepted that contain all episodes from all three languages in the test-challenge split. Starter code for this challenge was originally hosted in the rxr-habitat-challenge branch but is now integrated in master.

Required Task Configurations

As specified in the challenge webpage, submissions to the official challenge must have an action space of 30 degree turn angles, a 0.25m step size, and look up / look down actions of 30 degrees. The agent is given a 480x640 RGBD observation space. An example task configuration is given here which loads the English portion of the dataset.

The CMA baseline model (config) is an example of a valid submission. Existing waypoint models are not valid due to their panoramic observation space. Such models would need to be adapted to the challenge configuration.

Baseline Model

The official baseline for the RxR-Habitat Challenge is a monolingual cross-modal attention (CMA) model, labeled Monolingual CMA Baseline on the leaderboard. Configuration files for re-training or evaluating this model can be found in this folder under the name rxr_cma_{en|hi|te}.yaml. Weights for the pre-trained models: [en hi te] (196MB each).

Citing RxR-Habitat Challenge

To cite the challenge, please cite the following papers (RxR and VLN-CE):

@inproceedings{ku2020room,
  title={Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding},
  author={Ku, Alexander and Anderson, Peter and Patel, Roma and Ie, Eugene and Baldridge, Jason},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={4392--4412},
  year={2020}
}

@inproceedings{krantz_vlnce_2020,
  title={Beyond the Nav-Graph: Vision and Language Navigation in Continuous Environments},
  author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
 }

Questions?

Feel free to contact the challenge organizers with any questions, comments, or concerns. The corresponding organizer is Jacob Krantz (@jacobkrantz). You can also open an issue with [RxR-Habitat] in the title, which will also notify us.

VLN-CE Challenge (R2R Data)

The VLN-CE Challenge is live and taking submissions for public test set evaluation. This challenge uses the R2R data ported in the original VLN-CE paper.

To submit to the leaderboard, you must run your agent locally and submit a JSON file containing the generated agent trajectories. Starter code for generating this JSON file is provided in the function BaseVLNCETrainer.inference(). Here is an example of generating this file using the pretrained Cross-Modal Attention baseline:

python run.py \
  --exp-config vlnce_baselines/config/r2r_baselines/test_set_inference.yaml \
  --run-type inference

Predictions must be in a specific format. Please visit the challenge webpage for guidelines.

Baseline Performance

The baseline model for the VLN-CE task is the cross-modal attention model trained with progress monitoring, DAgger, and augmented data (CMA_PM_DA_Aug). As evaluated on the leaderboard, this model achieves:

Split	TL	NE	OS	SR	SPL
Test	8.85	7.91	0.36	0.28	0.25
Val Unseen	8.27	7.60	0.36	0.29	0.27
Val Seen	9.06	7.21	0.44	0.34	0.32

This model was originally presented with a val_unseen performance of 0.30 SPL, however the leaderboard evaluates this same model at 0.27 SPL. The model was trained and evaluated on a hardware + Habitat build that gave slightly different results, as is the case for the other paper experiments. Going forward, the leaderboard contains the performance metrics that should be used for official comparison. In our tests, the installation procedure for this repo gives nearly identical evaluation to the leaderboard, but we recognize that compute hardware along with the version and build of Habitat are factors to reproducibility.

For push-button replication of all VLN-CE experiments, see here.

Starter Code

The run.py script controls training and evaluation for all models and datasets:

python run.py \
  --exp-config path/to/experiment_config.yaml \
  --run-type {train | eval | inference}

For example, a random agent can be evaluated on 10 val-seen episodes of R2R using this command:

python run.py --exp-config vlnce_baselines/config/r2r_baselines/nonlearning.yaml --run-type eval

For lists of modifiable configuration options, see the default task config and experiment config files.

Training Agents

The DaggerTrainer class is the standard trainer and supports teacher forcing or dataset aggregation (DAgger). This trainer saves trajectories consisting of RGB, depth, ground-truth actions, and instructions to disk to avoid time spent in simulation.

The RecollectTrainer class performs teacher forcing using the ground truth trajectories provided in the dataset rather than a shortest path expert. Also, this trainer does not save episodes to disk, instead opting to recollect them in simulation.

Both trainers inherit from BaseVLNCETrainer.

Evaluating Agents

Evaluation on validation splits can be done by running python run.py --exp-config path/to/experiment_config.yaml --run-type eval. If EVAL.EPISODE_COUNT == -1, all episodes will be evaluated. If EVAL_CKPT_PATH_DIR is a directory, each checkpoint will be evaluated one at a time.

Cuda

Cuda will be used by default if it is available. We find that one GPU for the model and several GPUs for simulation is favorable.

SIMULATOR_GPU_IDS: [0]  # list of GPU IDs to run simulations
TORCH_GPU_ID: 0  # GPU for pytorch-related code (the model)
NUM_ENVIRONMENTS: 1  # Each GPU runs NUM_ENVIRONMENTS environments

The simulator and torch code do not need to run on the same device. For faster training and evaluation, we recommend running with as many NUM_ENVIRONMENTS as will fit on your GPU while assuming 1 CPU core per env.

License

The VLN-CE codebase is MIT licensed. Trained models and task datasets are considered data derived from the mp3d scene dataset. Matterport3D based task datasets and trained models are distributed with Matterport3D Terms of Use and under CC BY-NC-SA 3.0 US license.

Citing

If you use VLN-CE in your research, please cite the following paper:

@inproceedings{krantz_vlnce_2020,
  title={Beyond the Nav-Graph: Vision and Language Navigation in Continuous Environments},
  author={Jacob Krantz and Erik Wijmans and Arjun Majundar and Dhruv Batra and Stefan Lee},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2020}
 }

If you use the RxR-Habitat data, please additionally cite the following paper:

@inproceedings{ku2020room,
  title={Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding},
  author={Ku, Alexander and Anderson, Peter and Patel, Roma and Ie, Eugene and Baldridge, Jason},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  pages={4392--4412},
  year={2020}
}

vln-ce's People

Contributors

Stargazers

Watchers

vln-ce's Issues

Error(s) in loading state_dict for CMAPolicy:

Hello,

I'm downloaded the pretrained model and try to run the test_set_inference.yaml. I receive this error.

Unexpected`` key(s) in state_dict: "critic.fc.weight", "critic.fc.bias"

Is there a mismatch between the code and the pretrained model? If so, could you update the model link? Or do you have other suggestions? Thanks in advance!

[RxR-Habitat] Model Weights for Monolingual CMA Baseline

Hey there!

I was wondering - have you ever released the pre-trained weights of the RxR-Habitat CMA baseline model?
Could only find the VLN-CE pre-trained model weights.

Thanks a lot,
Or

Instruction encoder

Could you please provide details on your instruction encoder?
I would like to test the agent on new data, but you only supply pre-computed text weights.
Thanks

questions about the VLNCE dataset

Hi, I feel quite confused about the dataset.
In the vlnce gt dataset, there is locations and actions sequence, however when I use the simulator there is "vlnce-oricle action" in each step, and I found that this action sequence is different from the action sequecen in gt file。 Does someone know which one is the correct action sequence ?

how to render a random episode?

Hi, Jacobkrantz ~ How do you render the 'gif' performed in the readme file? I want to see what the agents was doing closely for each action within the reconstructed scenes for a random episode. Can you give me some suggestions? Thank you~

OSError: Could not load shared object file: libllvmlite.so

Hi, I've run into an issue while running the command:
python run.py --exp-config vlnce_baselines/config/paper_configs/seq2seq.yaml --run-type train

The error log is as below:

Traceback (most recent call last): File "run.py", line 10, in <module> import habitat File "/home/pp456/habitat-lab/habitat/__init__.py", line 8, in <module> from habitat.core.agent import Agent File "/home/pp456/habitat-lab/habitat/core/agent.py", line 13, in <module> from habitat.core.simulator import Observations File "/home/pp456/habitat-lab/habitat/core/simulator.py", line 16, in <module> from habitat.core.dataset import Episode File "/home/pp456/habitat-lab/habitat/core/dataset.py", line 31, in <module> from habitat.core.utils import not_none_validator File "/home/pp456/habitat-lab/habitat/core/utils.py", line 11, in <module> import quaternion File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/quaternion/__init__.py", line 28, in <module> from .quaternion_time_series import slerp, squad, integrate_angular_velocity, minimal_rotation, angular_velocity File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/quaternion/quaternion_time_series.py", line 8, in <module> from quaternion.numba_wrapper import njit File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/quaternion/numba_wrapper.py", line 11, in <module> from numba import njit, jit, vectorize, int64, float64, complex128 File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/numba/__init__.py", line 14, in <module> from numba.core import config File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/numba/core/config.py", line 16, in <module> import llvmlite.binding as ll File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/llvmlite/binding/__init__.py", line 4, in <module> from .dylib import * File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/llvmlite/binding/dylib.py", line 3, in <module> from llvmlite.binding import ffi File "/home/pp456/anaconda3/envs/vlnce2/lib/python3.6/site-packages/llvmlite/binding/ffi.py", line 153, in <module> raise OSError("Could not load shared object file: {}".format(_lib_name)) OSError: Could not load shared object file: libllvmlite.so

The error seems to be a conflict between pytorch (1.6.0) and quaternion (2020.9.5.14.42.2)/numba (0.51.2): running the code import torch; import numba (or quaternion) or leads to the error, while import numba (or quaternion) is fine.

Installation was done according to the "Habitat and Other Dependencies" section: installed habitat-sim with conda, and habitat-lab from branch v0.1.5 according to the given steps. The habitat versions are habitat: 0.1.5 and habitat-sim: 0.1.5

I was wondering if you might know how to resolve this issue? Thank you so much!

How to use multiple GPUs to train dagger models?

Thanks for the great work.

When I run training using dagger_trainer.py, I found that a large part of training time is taken by 1) collecting data and 2) training the model using collected data. The first process can be speeded up by setting more simulator GPU (SIMULATOR_GPU_IDS). However, the second process can only use one GPU (TORCH_GPU_ID) by default.

Is there any easy way to use multiple GPUs to speed up the second process? Or should I use torch.distributed to reproduce the code by myself?

Many thanks!

habitat-sim problem: Platform::WindowlessEglApplication::tryCreateContext(): no EGL devices found

When I want to try this code, I must install the haistat-sim first, but I encounter a big bug. I followed the issue in the haistat sim No.288 , but it did not solve my problem. Who has some good suggestions?

I follow the codes:

conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless

git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git

python -m pip install -r requirements.txt
python -m pip install -r habitat_baselines/rl/requirements.txt
python -m pip install -r habitat_baselines/rl/ddppo/requirements.txt
python setup.py develop --all

wget http://dl.fbaipublicfiles.com/habitat/habitat-test-scenes.zip
unzip habitat-test-scenes.zip

python examples/example.py

Then get the following errors:

I1010 18:31:04.590993 11476 SceneGraph.h:93] Created DrawableGroup:
Platform::WindowlessEglApplication::tryCreateContext(): unable to find EGL device for CUDA device 0
WindowlessContext: Unable to create windowless context

I1207 08:31:49.998020 1190 SceneGraph.h:93] Created DrawableGroup:
Platform::WindowlessEglApplication::tryCreateContext(): no EGL devices found, likely a driver issue; enable --magnum-gpu-validation to see additional info
WindowlessContext: Unable to create windowless context

I can confirm that my NVIDIA driver is correct, cuda is also OK, since I can run other CNN-related codes well.

In fact, I can run habitat correctly in my PC, but I have always encountered this error in docker of my server.

I have tried multiple versions of the NVIDIA driver and cuda, as well as some possible dependent lib versions, such as libgl in my server, following haistat sim No.288 .

The following are all the differences I can understand between PC and server's docker:

with **ldconfig -N -v | grep libEGL**, In the server's docker:

/sbin/ldconfig.real: Can't stat /usr/local/cuda/compat/lib: No such file or directory
/sbin/ldconfig.real: Path /usr/local/cuda/lib64' given more than once /sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib: No such file or directory /sbin/ldconfig.real: Can't stat /usr/local/nvidia/lib64: No such file or directory /sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory /sbin/ldconfig.real: Path /lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.27.so is the dynamic linker, ignoring
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libcuda.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.100 is empty, not checked.
/sbin/ldconfig.real: File /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.440.100 is empty, not checked.
libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
libEGL.so.1 -> libEGL.so.1.0.0
libEGL_nvidia.so.0 -> libEGL_nvidia.so.470.141.03

BUT in the PC:

/sbin/ldconfig.real: Path /lib/x86_64-linux-gnu' given more than once /sbin/ldconfig.real: Path /usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring
libEGL.so.1 -> libEGL.so.1.1.0
libEGL_nvidia.so.0 -> libEGL_nvidia.so.440.44

It seems to be a problem from this difference? Who has some good solutions, please help me. Thanks.

How to use multi gpu?

The most time-consuming part of the code is the load part of the data set, and I want to use multiple GPUs to speed it up.

Cuda

Cuda will be used by default if it is available. When training on large portions of the dataset, multiple GPUs is favorable.

SIMULATOR_GPU_IDS: [0,1]  # Each GPU runs NUM_ENVIRONMENTS environments
TORCH_GPU_ID: 0
NUM_ENVIRONMENTS: 1

I followed the instructions in the readme. After correcting, it showed repeated training in the log, which seems to be wrong.

Does the code really support multiple GPUs?

2021-06-22 01:49:53,897 [Epoch: 1/15] [Batch: 1/6568] [BatchTime: 118.16s] [EpochTime: 118s] [Loss: 1.7911]
2021-06-22 01:49:54,260 [Epoch: 1/15] [Batch: 1/6568] [BatchTime: 118.52s] [EpochTime: 119s] [Loss: 1.7911]                   
2021-06-22 01:49:55,733 [Epoch: 1/15] [Batch: 2/6568] [BatchTime: 1.83s] [EpochTime: 120s] [Loss: 1.7777] 
2021-06-22 01:49:56,771 [Epoch: 1/15] [Batch: 2/6568] [BatchTime: 2.51s] [EpochTime: 121s] [Loss: 1.7777] 
2021-06-22 01:49:56,820 [Epoch: 1/15] [Batch: 3/6568] [BatchTime: 1.09s] [EpochTime: 121s] [Loss: 1.7504] 
2021-06-22 01:49:57,925 [Epoch: 1/15] [Batch: 3/6568] [BatchTime: 1.15s] [EpochTime: 122s] [Loss: 1.7504] 
2021-06-22 01:50:01,319 [Epoch: 1/15] [Batch: 4/6568] [BatchTime: 4.5s] [EpochTime: 126s] [Loss: 1.6842] 
2021-06-22 01:50:02,464 [Epoch: 1/15] [Batch: 4/6568] [BatchTime: 4.54s] [EpochTime: 127s] [Loss: 1.6842]
2021-06-22 01:50:03,281 [Epoch: 1/15] [Batch: 5/6568] [BatchTime: 1.96s] [EpochTime: 128s] [Loss: 1.5942] 
2021-06-22 01:50:04,933 [Epoch: 1/15] [Batch: 5/6568] [BatchTime: 2.47s] [EpochTime: 129s] [Loss: 1.5942] 
2021-06-22 01:50:08,206 [Epoch: 1/15] [Batch: 6/6568] [BatchTime: 4.92s] [EpochTime: 132s] [Loss: 1.4861] 
2021-06-22 01:50:09,176 [Epoch: 1/15] [Batch: 6/6568] [BatchTime: 4.24s] [EpochTime: 133s] [Loss: 1.4861]
2021-06-22 01:50:12,253 [Epoch: 1/15] [Batch: 7/6568] [BatchTime: 4.05s] [EpochTime: 137s] [Loss: 1.3834]
2021-06-22 01:50:12,607 [Epoch: 1/15] [Batch: 7/6568] [BatchTime: 3.43s] [EpochTime: 137s] [Loss: 1.3834] 
2021-06-22 01:50:13,432 [Epoch: 1/15] [Batch: 1/6568] [BatchTime: 114.51s] [EpochTime: 115s] [Loss: 1.7911] 
2021-06-22 01:50:15,151 [Epoch: 1/15] [Batch: 1/6568] [BatchTime: 116.23s] [EpochTime: 116s] [Loss: 1.7911] 
2021-06-22 01:50:15,406 [Epoch: 1/15] [Batch: 2/6568] [BatchTime: 1.81s] [EpochTime: 116s] [Loss: 1.7777]   
2021-06-22 01:50:15,726 [Epoch: 1/15] [Batch: 1/6568] [BatchTime: 116.8s] [EpochTime: 117s] [Loss: 1.7911] 
2021-06-22 01:50:15,832 [Epoch: 1/15] [Batch: 1/6568] [BatchTime: 116.91s] [EpochTime: 117s] [Loss: 1.7911]
2021-06-22 01:50:17,043 [Epoch: 1/15] [Batch: 3/6568] [BatchTime: 1.64s] [EpochTime: 118s] [Loss: 1.7504] 
2021-06-22 01:50:17,344 [Epoch: 1/15] [Batch: 8/6568] [BatchTime: 5.09s] [EpochTime: 142s] [Loss: 1.2995] 
2021-06-22 01:50:18,344 [Epoch: 1/15] [Batch: 2/6568] [BatchTime: 2.51s] [EpochTime: 119s] [Loss: 1.7777] 
2021-06-22 01:50:18,430 [Epoch: 1/15] [Batch: 8/6568] [BatchTime: 5.81s] [EpochTime: 143s] [Loss: 1.2995] 
2021-06-22 01:50:18,438 [Epoch: 1/15] [Batch: 2/6568] [BatchTime: 3.29s] [EpochTime: 120s] [Loss: 1.7777] 
2021-06-22 01:50:18,953 [Epoch: 1/15] [Batch: 2/6568] [BatchTime: 3.23s] [EpochTime: 120s] [Loss: 1.7777] 
2021-06-22 01:50:19,627 [Epoch: 1/15] [Batch: 3/6568] [BatchTime: 1.28s] [EpochTime: 121s] [Loss: 1.7504]  
2021-06-22 01:50:19,973 [Epoch: 1/15] [Batch: 3/6568] [BatchTime: 1.53s] [EpochTime: 121s] [Loss: 1.7504]  
2021-06-22 01:50:20,715 [Epoch: 1/15] [Batch: 3/6568] [BatchTime: 1.76s] [EpochTime: 122s] [Loss: 1.7504] 
2021-06-22 01:50:21,385 [Epoch: 1/15] [Batch: 4/6568] [BatchTime: 4.34s] [EpochTime: 122s] [Loss: 1.6842]
2021-06-22 01:50:23,499 [Epoch: 1/15] [Batch: 9/6568] [BatchTime: 6.15s] [EpochTime: 148s] [Loss: 1.2126] 
2021-06-22 01:50:23,876 [Epoch: 1/15] [Batch: 4/6568] [BatchTime: 4.25s] [EpochTime: 125s] [Loss: 1.6842] 
2021-06-22 01:50:25,250 [Epoch: 1/15] [Batch: 9/6568] [BatchTime: 6.82s] [EpochTime: 150s] [Loss: 1.2126]   
2021-06-22 01:50:25,257 [Epoch: 1/15] [Batch: 5/6568] [BatchTime: 3.4s] [EpochTime: 126s] [Loss: 1.5942]  
2021-06-22 01:50:26,637 [Epoch: 1/15] [Batch: 5/6568] [BatchTime: 2.76s] [EpochTime: 128s] [Loss: 1.5942] 
2021-06-22 01:50:26,908 [Epoch: 1/15] [Batch: 10/6568] [BatchTime: 3.41s] [EpochTime: 151s] [Loss: 1.1721]  
2021-06-22 01:50:27,124 [Epoch: 1/15] [Batch: 4/6568] [BatchTime: 7.15s] [EpochTime: 128s] [Loss: 1.6842] 
2021-06-22 01:50:27,997 [Epoch: 1/15] [Batch: 4/6568] [BatchTime: 6.14s] [EpochTime: 129s] [Loss: 1.6842] 
2021-06-22 01:50:28,676 [Epoch: 1/15] [Batch: 10/6568] [BatchTime: 3.42s] [EpochTime: 153s] [Loss: 1.1721] 
2021-06-22 01:50:31,479 [Epoch: 1/15] [Batch: 5/6568] [BatchTime: 2.48s] [EpochTime: 133s] [Loss: 1.5942]  
2021-06-22 01:50:31,666 [Epoch: 1/15] [Batch: 5/6568] [BatchTime: 4.54s] [EpochTime: 133s] [Loss: 1.5942] 
2021-06-22 01:50:31,779 [Epoch: 1/15] [Batch: 6/6568] [BatchTime: 6.52s] [EpochTime: 133s] [Loss: 1.4861]

Confusion of updating state encoder in DAgger algorithm

Hi thanks for your great work. I am confused about the training and evaluation for DAgger model. The RNN states are zeros in training but the action predicted and evaluation phase will use RNN states in previous steps. Is there any reason why the training can give a correct updating signal to the state encoder?

code block

DefaultCPUAllocator: can't allocate memory: you tried to allocate 1589575680 bytes.

Hi Jacob,

Following the setup instructions in https://github.com/jacobkrantz/VLN-CE to run the model, with

python run.py --exp-config=vlnce_baselines/config/rxr_baselines/rxr_cma_en.yml --run-type=train

I got the following error:

  File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wangsu/KrantzVLNCE/habitat-lab/VLN-CE/vlnce_baselines/models/encoders/resnet_encoders.py", line 199, in forward
    resnet_output = self.cnn(normalize(rgb_observations))
  File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/wangsu/anaconda3/envs/vlnce_py3.6_h0.1.7/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: [enforce fail at CPUAllocator.cpp:68] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1589575680 bytes. Error code 12 (Cannot allocate memory)

My machine however does have enough memory:

              total        used        free      shared  buff/cache   available
Mem:    27390640128   574816256 25532850176     8773632  1282973696 26407305216

Could you help look into this please? Thanks!

ValueError: Type mismatch (<class 'habitat.config.default.Config'> vs. <class 'yacs.config.CfgNode'>) with values (DATASET:

hello，I'm running python run.py --exp-config vlnce_ baselines/config/ nonlearning.yaml --The following error occurred during run type Eval:

ValueError: Type mismatch (<class 'habitat.config.default.Config'> vs. <class 'yacs.config.CfgNode'>) with values (DATASET:

I don't understand why the report is wrong. I look forward to your reply.

VLNCE questions

Thanks for the incredible effort on putting this dataset together! I was wondering how can I find the continuous trajectory actions/camera poses for each episode. If I look into "xR_VLNCE_v0/train/train_guide.json.gz", each episode has a trajectory_id field. Does this correspond to the keys in "RxR_VLNCE_v0/train/train_guide_gt.json.gz"? Or is it episode_id that corresponds?

In addition, where can I find the camera poses (location and rotation) for each trajectory? There's an "actions" field in "RxR_VLNCE_v0/train/train_guide_gt.json.gz", how do the actions integers map to actions (1 forward, 2 turn left, 3 turn right)? What does the "locations" field mean there? I would really appreciate if you could help understand the field structure a bit better.

Thanks!

Action space?

I'm unclear on the action space, specifically the size and significance of the different dimensions of the predicted logits. The size appears to be fixed, but is the meaning absolute (0 is forward, 1 is left, 2 is rights, etc..) or can the meaning change (e.g. for R2R certain agents predict over candidate images).

In the code there are references to the action_space member of env, but I'm not too sure how it's initialized. If possible a pointer to the right spot in the code would be greatly appreciated as well!

Edit:

NVM, I took another look at the paper. There are 4 actions: move forward 0.25m, turn-left or turn-right 15 degrees, stop.

Clarification on trainer and agent

Hello,

I am working in ML, but I am new to Habitat. I'm slightly confused about the purpose of some files: hierarchical_trainer.py, nonlearning_agents.py, and robo_vln_trainer.py.

From my understanding, these files import functions from Habitat, which I expect are used for training purposes, given their filenames include "_trainer". And "agent" is for evaluation. However, the README mentions an offline data buffer, suggesting that the trainers extract data from this buffer, without live interaction with the Habitat environment. This has led to some confusion.

Could you please clarify the following:

The specific roles and functionalities of hierarchical_trainer.py, nonlearning_agents.py, and robo_vln_trainer.py.
Whether my understanding that the trainers operate by extracting data from an offline buffer, thus not requiring Habitat for real-time simulation, is correct.
I might be missing some context or misunderstanding the architecture, so the suggestions you give would help me understand the code.

Thank you very much for your time and assistance.

Josh

KeyError: 'instruction'

I have successfully run python run.py --exp-config vlnce_baselines/config/nonlearning.yaml --run-type train and got "data/checkpoints/ckpt.x.pth".
But an error occurred when I run python run.py --exp-config vlnce_baselines/config/nonlearning.yaml --run-type eval ,

I0414 23:24:12.502465 9994 simulator.py:146] Loaded navmesh data/scene_datasets/mp3d/2azQ1b91cZZ/2azQ1b91cZZ.navmesh
2021-04-14 23:24:12,506 Initializing task VLN-v0
2021-04-14 23:24:16,634 Loaded weights from checkpoint: data/checkpoints/ckpt.0.pth
2021-04-14 23:24:16,635 Finished setting up actor critic model.
Traceback (most recent call last):
  File "/home/wangzihao/VLN-CE/run.py", line 86, in <module>
    main()
  File "/home/wangzihao/VLN-CE/run.py", line 41, in main
    run_exp(**vars(args))
  File "/home/wangzihao/VLN-CE/run.py", line 80, in run_exp
    trainer.eval()
  File "/home/wangzihao/hab015/lab/habitat_baselines/common/base_trainer.py", line 109, in eval
    checkpoint_index=prev_ckpt_ind,
  File "/home/wangzihao/VLN-CE/vlnce_baselines/dagger_trainer.py", line 807, in _eval_checkpoint
    observations, config.TASK_CONFIG.TASK.INSTRUCTION_SENSOR_UUID
  File "/home/wangzihao/VLN-CE/vlnce_baselines/common/utils.py", line 25, in transform_obs
    instruction_sensor_uuid
KeyError: 'instruction'
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7f4415a2d9e8>>
Traceback (most recent call last):
  File "/home/wangzihao/hab015/lab/habitat/core/vector_env.py", line 518, in __del__
    self.close()
  File "/home/wangzihao/hab015/lab/habitat/core/vector_env.py", line 400, in close
    write_fn((CLOSE_COMMAND, None))
  File "/home/wangzihao/.conda/envs/hab015/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/wangzihao/.conda/envs/hab015/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/wangzihao/.conda/envs/hab015/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

When I debugged, I found that the dict variable observations in line 806 of "dagger_trainer.py" had no key instruction, only keys rgb,depth in it.
Further，I found that the dict variable observations_spaces line 139 of "habitat/core/vector_env.py" differed between train and eval, keys rgb,depth,instruction,progress,vln_oracle_action_sensor in train but keys rgb,depth in eval.

What's wrong with the command？

Checkpoint restoration

Hello,
Thank you so much for putting together this dataset and codebase! I am trying to reproduce your results; however, I am unable to find any references to restore training from checkpoints. I am using the default config provided. Some pointers would be much appreciated!

Thank you very much!

Random env seed for multi-processes training on habitat v0.1.6

So habitat v0.1.6 removes the third parameter of make_env_fn which is served as the parameter for the env.seed() function, instead the env.seed() takes env.config.TASK_CONFIG.SEED as its parameter.

I wonder if this breaks any training scheme for the env.config.TASK_CONFIG.SEED is a constant integer (default is 100) but in your implementation you used process number which is different for each process. If so, how can I fix this?

R2R HPN reproduction

Thank you for publishing this great work! I have a problem when I try to reproduce the result.

I download the released checkpoint and try to fine-tune the network using the 6-hpn-__.yaml file.

The network is trained on a node with 4 TITAN X GPUs and 8 environments for each process. I only edited the number of environments in the yaml file with nothing else changed.

python -u -m torch.distributed.launch \
    --use_env \
    --nproc_per_node 4 \
    run.py \
    --exp-config vlnce_baselines/config/r2r_waypoint/6-hpn-__.yaml \
    --run-type train

The problem is that the curve of success during training soon drops after a few iterations. Here is the curve.

Is there any problem with this drop?

BTW, I also tried to train from scratch using 6-hpn-__.yaml (4 TITAN X, 8 envs per process). It achieves about 0.09-0.15 SR after seeing 9M frames (here is the curve). It seems a little bit slow. Is that normal?

Thanks.

[RxR-Habitat] What kind of GPU is needed to train the cma policy with the original config?

Hi there,

While running the starter code for the rxr challenge I found a single NVIDIA 2080ti GPU's VRAM (11GiB) could only fit batch_size 1 with the cma policy and max_traj_len 250. Although we could set effective_batch_size but it is set to be -1 while batch_size to be 3 in the original config. So I'm wondering what kind of GPU is needed to train the cma policy with the original config?

Also, I found with batch_size 1, max_traj_len 250, preload_size 30 and 9 environments simulated, the used RAM will be more that 40 GiB. Is it normal?

Thanks!

n (counts) have to be positive - assertion error

Hi, I'm getting this error when training seq2seq baseline:

miniconda3/envs/vlnce-py3.6/lib/python3.6/site-packages/gym/spaces/discrete.py", line 36, in __init__
    assert n > 0, "n (counts) have to be positive"
AssertionError: n (counts) have to be positive

Some info:
python: 3.6
habitat-lab & habitat-sim version: 0.1.7
gym.version: '0.26.2'

It seems originating from this line.

[RxR-Habitat Challenge] When will the challenge result be notified

Hi Jacob,

We are a team that took part in the RxR-Habitat Challenge. We will be grateful if you have time to answer our questions!

We want to know whether our submission is valid and when will the results notify us.
How will you notify the participants? By the email account registered for the challenge or by other ways?

Thanks for your attention to this matter!
Best Regards,
Dong

what are _gt files for rxr?

Hello,

When I pull all data for rxr, I do not get _gt files. what are {train,val_seen,val_unseen}_guide_gt.json.gz files? Where do I get them?

[RxR-VLNCE Challenge] How to get the error log fron RxR leadboard

I have tried some times to submmit our results.
But the status of our attempts are error. (We have used the standard RxR task config.)
I want to know if there is any way to check the error log file?

Thanks for your attention to this matter!
Best regards,

Question about heading difference between VLN-CE and VLN

Hi, thanks for this wonderful work!

It appears that the initial heading of the VLN-CE agent is different from the corresponding heading in the VLN environment. May I ask if this is by design or is there anything that I set up incorrectly? Here is an example:

Instruction: Walk forward and enter the house. Veer right and enter the doorway on the left of the bookcase. Stop in the doorway.
path_id: 6000
episode: 619

The image shown above is the initial heading of the VLN-CE agent. In this case, the heading is not correct and if the agent follows the instruction correctly (walk forward and enter the house, ...), it will not get to the final destination. Thanks for your help in advance!

[RxR-Habitat] Eval baseline reproduction

Hello,
I tried to reproduce the baseline performance using the same yaml file listed on the README-

python run.py \
  --exp-config vlnce_baselines/config/rxr_configs/rxr_cma_en.yaml \
  --run-type train

My experimental setup used 4 TITAN X GPUs with 4 environments each. Referring issue #17 , I set my batch_size: 1 and effective_batch_size: 3 to successfully train. No other changes have been made to the codebase.

After evaluating my saved checkpoint, I found the following metrics (all average across episodes):

steps_taken: 350.443718
path_length: 6.737881
distance_to_goal: 11.082229
success: 0.066503
oracle_success: 0.180703
spl: 0.055868
ndtw: 0.358719

Comparing these to Table 2 entry for Seq2Seq w/ RGBD, Instructions, and History, I found my performance to be significantly lower for the matching metrics.

Is this the right config to be used to match the relevant baseline?

'RuntimeError: No scenes to load...' when using recollect_trainer

When I trained my model with TRAINER_NAME set to recollect_trainer, I got the error:

Traceback (most recent call last):
  File "run.py", line 92, in <module>
    main()
  File "run.py", line 43, in main
    run_exp(**vars(args))
  File "run.py", line 84, in run_exp
    trainer.train()
  File "/home/raven/hzt/vln-ce/vlnce_baselines/recollect_trainer.py", line 77, in train
    dataset = TeacherRecollectionDataset(self.config)
  File "/home/raven/hzt/vln-ce/vlnce_baselines/common/recollection_dataset.py", line 49, in __init__
    self.initialize_sims()
  File "/home/raven/hzt/vln-ce/vlnce_baselines/common/recollection_dataset.py", line 60, in initialize_sims
    episodes_allowed=list(self.trajectories.keys()),
  File "/home/raven/hzt/vln-ce/vlnce_baselines/common/env_utils.py", line 51, in construct_envs
    "No scenes to load, multi-process logic relies on being able"
RuntimeError: No scenes to load, multi-process logic relies on being able to split scenes uniquely between processes

I traced the error back to habitat_extensions/task.py and found that self.episodes in VLNCEDatasetV1 class was empty after initializing. The cause seems to be different data types between elements in config.EPISODES_ALLOWED and episode_id in self.episodes.

My temporary solution is converting allowed episode ids to integers before passing them to construct_envs (line 57-61 in vlnce_baselines/common/recollection_dataset.py):

        self.envs = construct_envs(
            config,
            get_env_class(config.ENV_NAME),
            episodes_allowed=[int(v) for v in self.trajectories.keys()],
        )

Hope to provide more detailed content about embeddings.json.gz

Hello, I am very interested in your research.
I hope to get the details about embeddings.json.gz: the correspondence among words - word embedding - instruction_tokens.
I would be very grateful if I could get your reply.

VLN-CE start_rotation is different from VLN heading

Thanks for the wonderful repo! I noticed that the start_rotation in VLN-CE results in a different direction from the heading in VLN.
May I know how do you convert the start direction?

[RxR-Habitat] How to export a robot navigation video based on a baseline

Could you provide me with the code in the image above "Generate a video of a robot navigating with natural language guidance based on baseline"?

the code gets stuck in Deconstructing Simulator

❓ Questions and Help

After I installed habitat-sim, habitat-lab and VLN-CE in the README.md successfully, and run the command python run.py --exp-config vlnce_baselines/config/rxr_configs/rxr_cma_en.yaml --run-type train, I got multiprocess_error.

So I change to ThreadedVectorEnv.

Now the code gets stuck in Deconstructing Simulator

Two warnings are notable:

eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In internal function: Additional INFO may be available
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In function eglQueryDeviceAttribEXT(), attribute query failed at backend
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=1): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In internal function: Additional INFO may be available
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In function eglQueryDeviceAttribEXT(), attribute query failed at backend
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=2): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In internal function: Additional INFO may be available
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In function eglQueryDeviceAttribEXT(), attribute query failed at backend
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=3): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
Platform::WindowlessEglApplication: found 18 EGL devices, choosing EGL device 3 for CUDA device 0
Renderer: Tesla V100-SXM2-32GB/PCIe/SSE2 by NVIDIA Corporation

······················································
Debug output: API (131185): Buffer detailed info: Buffer object 2381 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.                               
Debug output: API (131185): Buffer detailed info: Buffer object 2382 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations. 
Debug output: API (131185): Buffer detailed info: Buffer object 2383 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations. 
······················································

Here are the complete log

······················································
I0608 16:28:13.042002 62079 AbstractObjectAttributesManagerBase.h:181] AbstractObjectAttributesManager<T>::createObject  (Stage) : Done making attributes with handle : data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.glb
I0608 16:28:13.042045 62079 AbstractObjectAttributesManagerBase.h:188] File (data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.glb) exists but is not a recognized config filename extension, so new default Stage attributes created and registered.
I0608 16:28:13.042096 62079 Simulator.cpp:156] Loading navmesh from data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.navmesh
I0608 16:28:13.045042 62079 Simulator.cpp:158] Loaded.
I0608 16:28:13.045125 62079 SceneGraph.h:93] Created DrawableGroup:
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=0): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In internal function: Additional INFO may be available
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In function eglQueryDeviceAttribEXT(), attribute query failed at backend
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=1): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In internal function: Additional INFO may be available
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In function eglQueryDeviceAttribEXT(), attribute query failed at backend
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=2): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In internal function: Additional INFO may be available
eglQueryDeviceAttribEXT(): EGL_BAD_ATTRIBUTE error: In function eglQueryDeviceAttribEXT(), attribute query failed at backend
Platform::WindowlessEglApplication: eglQueryDeviceStringEXT(EGLDevice=3): EGL_NV_device_cuda EGL_EXT_device_drm EGL_EXT_device_query_name
Platform::WindowlessEglApplication: found 18 EGL devices, choosing EGL device 3 for CUDA device 0
Renderer: Tesla V100-SXM2-32GB/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 460.73.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-forward-compatible-core-context
    nv-egl-incorrect-gl11-function-pointers
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
GL::Context: enabling GPU validation
Debug output: API (131185): Buffer detailed info: Buffer object 1 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.
Debug output: API (131185): Buffer detailed info: Buffer object 2 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.
I0608 16:28:13.786864 62079 ResourceManager.cpp:234] ResourceManager::loadStage : Not loading semantic mesh
I0608 16:28:13.786923 62079 ResourceManager.cpp:262] ResourceManager::loadStage : start load render asset data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.glb.
I0608 16:28:13.786948 62079 ResourceManager.cpp:569] ResourceManager::loadStageInternal : Attempting to load stage data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.glb
I0608 16:28:13.787017 62079 ResourceManager.cpp:1119] Importing Basis files as BC7 for B6ByNegPMKs.glb
2021-06-08 16:28:14,250 Initializing dataset RxR-VLN-CE-v1
2021-06-08 16:28:14,735 Initializing dataset RxR-VLN-CE-v1
2021-06-08 16:28:14,842 Initializing dataset RxR-VLN-CE-v1
······················································
Debug output: API (131185): Buffer detailed info: Buffer object 2381 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.                               
Debug output: API (131185): Buffer detailed info: Buffer object 2382 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations. 
Debug output: API (131185): Buffer detailed info: Buffer object 2383 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations. 
Debug output: API (131185): Buffer detailed info: Buffer object 2384 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.  
Debug output: API (131185): Buffer detailed info: Buffer object 2385 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations. 
Debug output: API (131185): Buffer detailed info: Buffer object 2386 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.  
Debug output: API (131185): Buffer detailed info: Buffer object 2387 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.  
Debug output: API (131185): Buffer detailed info: Buffer object 2388 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.  
Debug output: API (131185): Buffer detailed info: Buffer object 2389 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.                               
Debug output: API (131185): Buffer detailed info: Buffer object 2390 (bound to NONE, usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.
······················································

W0608 16:14:55.194553 55726 Simulator.cpp:248] :                                                                                                                                                                                                                  [24/1824]
---
 The active scene does not contain semantic annotations.
---
I0608 16:14:55.279703 54757 simulator.py:213] Loaded navmesh data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.navmesh
I0608 16:14:55.280652 54757 simulator.py:225] Recomputing navmesh for agent's height 0.88 and radius 0.18.
I0608 16:14:55.290956 55647 PathFinder.cpp:382] Building navmesh with 1631x1204 cells
I0608 16:14:55.725460 54758 simulator.py:213] Loaded navmesh data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.navmesh
I0608 16:14:55.726246 54758 simulator.py:225] Recomputing navmesh for agent's height 0.88 and radius 0.18.
I0608 16:14:55.733539 55734 PathFinder.cpp:382] Building navmesh with 1631x1204 cells
I0608 16:14:55.839206 55735 PathFinder.cpp:650] Created navmesh with 3399 vertices 1683 polygons
I0608 16:14:55.841809 55735 Simulator.cpp:710] reconstruct navmesh successful
2021-06-08 16:14:55,883 Initializing task VLN-v0
I0608 16:14:56.118773 54756 simulator.py:213] Loaded navmesh data/scene_datasets/mp3d/B6ByNegPMKs/B6ByNegPMKs.navmesh
I0608 16:14:56.119809 54756 simulator.py:225] Recomputing navmesh for agent's height 0.88 and radius 0.18.
I0608 16:14:56.130873 55726 PathFinder.cpp:382] Building navmesh with 1631x1204 cells
I0608 16:14:56.342190 55647 PathFinder.cpp:650] Created navmesh with 3399 vertices 1683 polygons
I0608 16:14:56.345774 55647 Simulator.cpp:710] reconstruct navmesh successful
2021-06-08 16:14:56,349 Initializing task VLN-v0
I0608 16:14:56.457960 55734 PathFinder.cpp:650] Created navmesh with 3399 vertices 1683 polygons
I0608 16:14:56.460407 55734 Simulator.cpp:710] reconstruct navmesh successful
2021-06-08 16:14:56,463 Initializing task VLN-v0
I0608 16:14:57.256770 55726 PathFinder.cpp:650] Created navmesh with 3399 vertices 1683 polygons
I0608 16:14:57.260226 55726 Simulator.cpp:710] reconstruct navmesh successful
2021-06-08 16:14:57,264 Initializing task VLN-v0
2021-06-08 16:15:18,202 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:18,204 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:18,554 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:18,554 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:18,667 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:18,667 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:18,668 Center cropping observation size of rgb from (256, 341) to (224, 224)
2021-06-08 16:15:18,668 Center cropping observation size of rgb from (256, 341) to (224, 224)
I0608 16:15:20.204214 55735 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0608 16:15:20.204249 55734 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0608 16:15:20.207165 55735 SemanticScene.h:41] Deconstructing SemanticScene
I0608 16:15:20.207171 55734 SemanticScene.h:41] Deconstructing SemanticScene
I0608 16:15:20.222903 55735 SceneManager.h:25] Deconstructing SceneManager
I0608 16:15:20.222925 55735 SceneGraph.h:26] Deconstructing SceneGraph
I0608 16:15:20.224483 55735 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.225446 55735 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.225834 55734 SceneManager.h:25] Deconstructing SceneManager
I0608 16:15:20.225854 55734 SceneGraph.h:26] Deconstructing SceneGraph
I0608 16:15:20.227872 55734 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.228914 55734 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.262079 55735 Renderer.cpp:34] Deconstructing Renderer
I0608 16:15:20.262105 55735 WindowlessContext.h:17] Deconstructing WindowlessContext
I0608 16:15:20.270624 55734 Renderer.cpp:34] Deconstructing Renderer
I0608 16:15:20.270648 55734 WindowlessContext.h:17] Deconstructing WindowlessContext
I0608 16:15:21.956271 55735 Simulator.cpp:49] Deconstructing Simulator
I0608 16:15:22.450003 55734 Simulator.cpp:49] Deconstructing Simulator
2021-06-08 16:15:23,076 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:23,077 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:23,081 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:23,081 Center cropping observation size of rgb from (256, 341) to (224, 224)
2021-06-08 16:15:23,157 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:23,159 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:23,162 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:23,163 Center cropping observation size of rgb from (256, 341) to (224, 224)
I0608 16:15:23.617830 55726 PhysicsManager.cpp:33] Deconstructing PhysicsManager
2021-06-08 16:14:57,264 Initializing task VLN-v0
2021-06-08 16:15:18,202 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:18,204 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:18,554 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:18,554 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:18,667 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:18,667 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:18,668 Center cropping observation size of rgb from (256, 341) to (224, 224)
2021-06-08 16:15:18,668 Center cropping observation size of rgb from (256, 341) to (224, 224)
I0608 16:15:20.204214 55735 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0608 16:15:20.204249 55734 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0608 16:15:20.207165 55735 SemanticScene.h:41] Deconstructing SemanticScene
I0608 16:15:20.207171 55734 SemanticScene.h:41] Deconstructing SemanticScene
I0608 16:15:20.222903 55735 SceneManager.h:25] Deconstructing SceneManager
I0608 16:15:20.222925 55735 SceneGraph.h:26] Deconstructing SceneGraph
I0608 16:15:20.224483 55735 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.225446 55735 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.225834 55734 SceneManager.h:25] Deconstructing SceneManager
I0608 16:15:20.225854 55734 SceneGraph.h:26] Deconstructing SceneGraph
I0608 16:15:20.227872 55734 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.228914 55734 Sensor.h:87] Deconstructing Sensor
I0608 16:15:20.262079 55735 Renderer.cpp:34] Deconstructing Renderer
I0608 16:15:20.262105 55735 WindowlessContext.h:17] Deconstructing WindowlessContext
I0608 16:15:20.270624 55734 Renderer.cpp:34] Deconstructing Renderer
I0608 16:15:20.270648 55734 WindowlessContext.h:17] Deconstructing WindowlessContext
I0608 16:15:21.956271 55735 Simulator.cpp:49] Deconstructing Simulator
I0608 16:15:22.450003 55734 Simulator.cpp:49] Deconstructing Simulator
2021-06-08 16:15:23,076 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:23,077 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:23,081 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:23,081 Center cropping observation size of rgb from (256, 341) to (224, 224)
2021-06-08 16:15:23,157 Resizing observation of depth: from (480, 640) to (256, 341)
2021-06-08 16:15:23,159 Resizing observation of rgb: from (480, 640) to (256, 341)
2021-06-08 16:15:23,162 Center cropping observation size of depth from (256, 341) to (256, 256)
2021-06-08 16:15:23,163 Center cropping observation size of rgb from (256, 341) to (224, 224)
I0608 16:15:23.617830 55726 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0608 16:15:23.617887 55726 SemanticScene.h:41] Deconstructing SemanticScene
I0608 16:15:23.633332 55726 SceneManager.h:25] Deconstructing SceneManager
I0608 16:15:23.633350 55726 SceneGraph.h:26] Deconstructing SceneGraph
I0608 16:15:23.634776 55726 Sensor.h:87] Deconstructing Sensor
I0608 16:15:23.635766 55726 Sensor.h:87] Deconstructing Sensor
I0608 16:15:23.665839 55726 Renderer.cpp:34] Deconstructing Renderer
I0608 16:15:23.665858 55726 WindowlessContext.h:17] Deconstructing WindowlessContext
I0608 16:15:23.704097 55647 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I0608 16:15:23.704145 55647 SemanticScene.h:41] Deconstructing SemanticScene
I0608 16:15:23.719182 55647 SceneManager.h:25] Deconstructing SceneManager
I0608 16:15:23.719203 55647 SceneGraph.h:26] Deconstructing SceneGraph
I0608 16:15:23.720651 55647 Sensor.h:87] Deconstructing Sensor
I0608 16:15:23.721441 55647 Sensor.h:87] Deconstructing Sensor
I0608 16:15:23.752156 55647 Renderer.cpp:34] Deconstructing Renderer
I0608 16:15:23.752175 55647 WindowlessContext.h:17] Deconstructing WindowlessContext
I0608 16:15:25.311139 55726 Simulator.cpp:49] Deconstructing Simulator
I0608 16:15:25.460878 55647 Simulator.cpp:49] Deconstructing Simulator

ValueError: All images in a movie should have same size

hello
python run.py --run-type eval
Why do the following errors occur 'ValueError: All images in a movie should have same size'?

Evaluation takes 5 hours for one checkpoint.

So I trained a basic seq2seq model resulting with 15 checkpoints named ckpt.{num}.pth. I modified the NUM_PROCESSES filed to 4 and it takes about one day to train on a GTX 2080Ti graphic card (I used the headless habitat ver v0.1.5).

However, when I run the eval script, it takes about 5 hours to evaluate one checkpoint. Is this even normal?

And what is the point of evaluating all the checkpoints (and start from the earliest one) by the way?

EOFError and ConnectionResetError in the multiprocessing connection.py

After I installed habitat-sim, habitat-lab and VLN-CE in the README.md successfully, and run the command "python run.py --exp-config vlnce_baselines/config/rxr_configs/rxr_cma_en.yaml --run-type train", I got this error.

Did I do something wrong?

[RxR-Habitat] About dagger_trainer.py

I was using the Network to generate the episode stored in the \data\trajectories_dirs\...\trajectories.lmdb
when expert_uuid = self.config.IL.DAGGER.expert_policy_sensor_uuid = shortest_path_sensor

I wonder in habitat which function define shortest_path_sensor?
What is the output of the shortest_path_sensor?
The Principles of operation of shortest_path_sensor？
How to use the shortest_path_sensor to judge the trajectory is the shortest?

Sincerely！
Thanks！

import error

RGB image quality

Hi Jacob,

I found that the image quality rendered by Habitat seems to be inferior to that of the MP3D simulator. Do you have any suggestions for improving the image quality? Thank you!

Here is a comparison of these two simulators. I set WIDTH=512, HEIGHT=512, HFOV=90, and save the image as the PNG format in Habitat:

While the MP3D is better:

The ddppo download link is broken.

"Baseline models encode depth observations using a ResNet pre-trained on PointGoal navigation. Those weights can be downloaded from here (672M). Extract the contents to data/ddppo-models/{model}.pth." The ddppo download link is broken.

[RxR-Habitat] Using pose_trace data in RxR

Hello,

In RxR dataset (which is based on graph environment), they provide 'pose_trace' data which includes 'text_mask' that points out what words the agent saw in specific viewpoint. I am curious whether the 'instruction_id' in RxR exactly matches with 'instruction_id' in RxR-CE, which enable us to use the 'pose_trace' data based on your RxR-CE dataset

The number of ground-truth actions does not match the number of steps in R2R_VLNCE_v1-3 gt. dataset

I have downloaded the preprocessed R2R datasets from this official website. In {split}_gt.json.gz, the field 'actions' contains ground truth actions, which should produce the coordinates stored in the field 'locations'.
However, the numbers of the elements in these 2 fields do not equal.

Could anyone give me a hint on how to relate these 2 fields? Thanks

How to get panoramas ?

Hi, I find when I use env.observation I only get one image. However, the CMA model should use panorams, right ? Is ther someone konws how to get the panoramas ?

jacobkrantz / vln-ce Goto Github PK

vln-ce's Introduction

Vision-and-Language Navigation in Continuous Environments (VLN-CE)

Setup

Data

Scenes: Matterport3D

Episodes: Room-to-Room (R2R)

Encoder Weights

Episodes: Room-Across-Room (RxR)

RxR-Habitat Challenge

Timeline

Generating Submissions

Required Task Configurations

Baseline Model

Citing RxR-Habitat Challenge

Questions?

VLN-CE Challenge (R2R Data)

Baseline Performance

Starter Code

Training Agents

Evaluating Agents

Cuda

License

Citing

vln-ce's People

Contributors

Stargazers

Watchers

Forkers

vln-ce's Issues

How to use multi gpu?

Cuda

❓ Questions and Help

Recommend Projects

Recommend Topics

Recommend Org