Giter VIP home page Giter VIP logo

mlvr's Introduction

Multimodal Language Vehicle Retrieval (MLVR)

The code is for our paper in the 7th AI City Challenge Track 2, Tracked-Vehicle Retrieval by Natural Language Descriptions, reaching the 2nd rank on the public leaderboard.

A Unified Multi-modal Structure for Retrieving Tracked Vehicles through Natural Language Descriptions

Introduction

Through the development of multi-modal and contrastive learning, image and video retrieval have made immense progress over the last years. Organically fused text, image, and video knowledge brings huge potential opportunities for multi-dimension, and multi-view retrieval, especially in traffic senses. This paper proposes a novel Multi-modal Language Vehicle Retrieval (MLVR) system, for retrieving the trajectory of tracked vehicles based on natural language descriptions. The MLVR system is mainly combined with an end-to-end text-video contrastive learning model, a CLIP few-shot domain adaption method, and a semi-centralized control optimization system. Through a comprehensive understanding the knowledge from the vehicle type, color, maneuver, and surrounding environment, the MLVR forms a robust method to recognize an effective trajectory with provided natural language descriptions. Under this structure, our approach has achieved 81.79% Mean Reciprocal Rank (MRR) accuracy on the test dataset, in the 7th AI City Challenge Track 2, Tracked-Vehicle Retrieval by Natural Language Descriptions, rendering the 2nd rank on the public leaderboard.

Requirements

pip install -r requirements.txt

Structure

MLVR
├── data                   # put aicity2023 track 2 data
├── docs                   # pictures and paper
├── preprocessing          # process the data for model
├── model                  # modules for MLVR                 
│   ├── vrm                # Video Recognition Module
│   ├── vct                # Vehicle Color and Type Modules
│   ├── vmm                # Vehicle Motion Module
│   └── vsm                # Vehicle Surrounding Module
├── postprocessing         # Model Postprocessing
│   ├── matrix             # vrm, vct, vmm, vsm score matrices
│   └── final_results.json # submit result 81.79%
├── requirements.txt
└── README.md

Running

Preprocessing

  1. Get images from the video
cd ./preprocessing
python extract_vdo_frms.py
  1. Get background of the images
python generate_median.py
  1. Generate the video clip for video recognition module
python create_video_clip.py
  1. Format the text input for video recognition module
python create_vrm_data.py
  1. Crop the vehicle images for vehicle color and type modules
python crop_vehicle_bbox.py
  1. Format the text input for vehicle color and type modules
python create_vct_data.py

Model

  1. Video Recognition Module (baseline)

    This part is modified from X-CLIP.

    Please download the pretrain model here for test, and put it in \model\vrm\ckpts\.

cd ./model/vrm
sh ./scripts/train.sh # train
sh ./scripts/test.sh  # test
  1. Vehicle Color and Type Modules

    This part is modified from Tip-Adapter.

cd ./model/vct
python train.py --config vehicle_color_train.yaml  # vehicle color module train
python test.py --config vehicle_color_test.yaml  #  vehicle color module test

python train.py --config vehicle_type_train.yaml  # vehicle type module train
python test.py --config vehicle_type_test.yaml  #  vehicle type module test
  1. Vehicle Motion Modules
cd ./model/vmm
python main.py # vehicle color module
  1. Vehicle Surrounding Modules

    This part is modified from GLIP.

cd ./model/vsm/branch1
python vsm1.py # vehicle surrounding module branch 1

cd ./model/vsm/branch2
python get_candidates.py # vehicle surrounding module branch 2

Postprocessing

Run the following command to generate the final submit result 81.79%.

cd ./postprocessing
python mcs.py # match control system

mlvr's People

Contributors

eadst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mlvr's Issues

crop_vehicle_bblx.py

Hi Den:

about
path = "C:/MLVR-main/data/test-tracks.json"
get_imgs("C:/MLVR-main/data/train/S01/c001/img1")==>this need img path ,right?

because have as below problem:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.2.2\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/MLVR-main/preprocessing/crop_vehicle_bbox.py", line 23, in
get_imgs("C:/MLVR-main/data/train/S01/c001/img1")
File "C:/MLVR-main/preprocessing/crop_vehicle_bbox.py", line 7, in get_imgs
with open(path, "r") as f:
PermissionError: [Errno 13] Permission denied: 'C:/MLVR-main/data/train/S01/c001/img1'

thank you

create_vrm

with open('./aicity23/xclip/dataset/aicity/raw-captions.pkl', 'wb') as handle:
pickle.dump(aicity_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open('./aicity23/xclip/dataset/aicity/train_list.txt', 'w') as f:
    f.write(train_list)
with open('./aicity23/xclip/dataset/aicity/val_list.txt', 'w') as f:
    f.write(val_list)
with open('./aicity23/xclip/dataset/aicity/test_list.txt', 'w') as f:
    f.write(test_list)

/postprocessing/mcs.py

Hi Den:
About postprocessing section,
path_dict = {"baseline": "./matrix/vrm.npy", # video recognition module
"vcm": "./matrix/vcm.npy", # vehicle color module
"vtm": "./matrix/vtm.npy", # vehicle type module
"vmm": "./matrix/vmm.npy", # vehicle motion module
"vsm1": "./matrix/vsm1.npy", # vehicle surrounding module branch 1
"vsm2": "./matrix/vsm2.npy", # vehicle surrounding module branch 2
}
,you use these npy files to create final result.json.could you simply tell me your ideas about the keys to create these npy files since you only give other's reference code not your modified version.
Thank you. @eadst

question about run vrm

Hi sir:
Thank you for a job well done.
When I run vrm again, the following error occurs. Do you have a solution?

FileNotFoundError: [Errno 2] No such file or directory: '/2t_disk/fly/MLVR/model/vrm/modules/bpe_simple_vocab_16e6.txt.gz'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.