The envbiasvln from zhangybzbo

Need a bit of clarification about reproducing the results mentioned in the repository and the env-bias paper

Hi,

Thank you very much for sharing the features used in the paper! I am currently trying to reproduce the results mentioned in the env-bias paper and the results reported in the repository, so I use the original code in R2R-EnvDrop with ResNet-152-imagenet.tsv to train the ResNet agent and use the features provided in this repository, namely the GT-Seg.tsv with the modify.py code to train the GT-Seg agent.

The training script I use was:

name=agent flag="--attn soft --train listener --featdropout 0.3 --angleFeatSize 128 --feedback sample --mlWeight 0.2 --subout max --dropout 0.5 --optim rms --lr 1e-4 --iters 80000 --maxAction 35" mkdir -p snap/$name CUDA_VISIBLE_DEVICES=$1 python3 r2r_src/train.py $flag --name $name

which I believe is the same as the original provided one in R2R-EnvDrop to train the agent module. After the two models are trained, I use the best trained model in the val-unseen split for evaluation, the evaluation script was:

name=ResNet_result
flag="--attn soft --train validlistener
      --load snap/agent/state_dict/best_val_unseen_ResNet
      --angleFeatSize 128
      --featdropout 0.4
      --subout max --maxAction 35"
mkdir -p snap/$name
CUDA_VISIBLE_DEVICES=$1 python3 r2r_src/train.py $flag --name $name | tee snap/$name/log

(Of course,this is just the ResNet one, and the 'name' and '--load' values are changed accordingly in the GT_Seg one)

Then I got the two evaluation results:

ResNet:
Env name: val_unseen, nav_error: 5.8181, oracle_error: 3.8605, steps: 25.7867, lengths: 9.9428, success_rate: 0.4585, oracle_rate: 0.5338, spl: 0.4219
Env name: val_seen, nav_error: 4.5834, oracle_error: 2.8824, steps: 26.7630, lengths: 10.6334, success_rate: 0.5749, oracle_rate: 0.6552, spl: 0.5432
Env name: train, nav_error: 0.3455, oracle_error: 0.2661, steps: 25.4575, lengths: 9.9808, success_rate: 0.9736, oracle_rate: 0.9830, spl: 0.9609

GT_Seg:
Env name: val_unseen, nav_error: 4.7119, oracle_error: 2.8192, steps: 55.6573, lengths: 20.2547, success_rate: 0.5534, oracle_rate: 0.6705, spl: 0.4717
Env name: val_seen, nav_error: 4.7694, oracle_error: 2.9660, steps: 47.3888, lengths: 17.1864, success_rate: 0.5397, oracle_rate: 0.6494, spl: 0.4792
Env name: train, nav_error: 1.1375, oracle_error: 0.8062, steps: 31.0214, lengths: 11.8337, success_rate: 0.8880, oracle_rate: 0.9252, spl: 0.8499

According to the results of SR, I would like to ask two questions:

Is this the right evaluation results and what we are supposed to see when evaluating the model with the features provided?
The reproduced ResNet results look close to the one mentioned both in your paper (according to table 4) and the one in the README file of this repository. However, for the GT_Seg one, the results (55% and 53%) seems slightly worse than the one reported in the paper (again table 4, 55% and 56%) and noticeably better than the one mentioned in the README file (48% and 48%). And it's not hard to tell that the results in the paper (55% and 56%) are different from the one described in the README file (48% and 48%). So what causes the difference? Are they being run in different settings?

Your clarification will be highly appreciated, thank you very much!

The illustration of nav-graph and R2R env

I've read your novelty paper "Diagnosing the Environment Bias in Vision-and-Language Navigation". Could you give me some advice about illustrating the navigation graph like Figure1 and Figure3 in your paper?

zhangybzbo / envbiasvln Goto Github PK

envbiasvln's People

Contributors

Stargazers

Watchers

Forkers

envbiasvln's Issues

Need a bit of clarification about reproducing the results mentioned in the repository and the env-bias paper

The illustration of nav-graph and R2R env

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent