Giter VIP home page Giter VIP logo

bm-nas's Introduction

BM-NAS: Bilevel Multimodal Neural Architecture Search (AAAI 2022 Oral)

Yihang Yin, Siyu Huang, Xiang Zhang

Paper, Poster and Presentation

framework

Please check our arXiv version here for the full paper with supplementary. We also provide our poster in this repo. Our oral presentation video at AAAI-2022 can be found on YouTube, for both the brief introduction and the full presentation.

Requirements

The latest tested versions are:

pytorch==1.10.1
opencv-python==4.5.5.62
sklearn==1.10.1
tqdm
IPython
graphviz (you need excutebles, not only Python API)

Pre-trained Backbones and Pre-processed Datasets

The backbones (checkpoints) and pre-processed datasets (BM-NAS_dataset) are available at here, you can download them and put them in the root directory. For some weird reason, ego checkpoints are blocked by AliCloud, There is an alternative Google Drive link for checkpoints at here.

MM-IMDB Experiments

You can just use our pre-processed dataset, but you should cite the original MM-IMDB dataset.

Dataset Pre-processing

If you want to use the original one, you can follow these steps.

You can download multimodal_imdb.hdf5 from the original repo of MM-IMDB. Then use our pre-processing script to split the dataset.

$ python datasets/prepare_mmimdb.py

Run Experiments

$ python main_darts_searchable_mmimdb.py
$ python main_darts_found_mmimdb.py --search_exp_dir=<dir of search exp>

NTU RGB-D Experiments

You can just use our pre-processed dataset, but you should cite the original NTU RGB-D dataset.

Dataset Pre-processing

If you want to use the original one, you can follow these steps.

First, request and download the NTU RGB+D dataset (not NTU RGB+D 120) from the official site. We only use the 3D skeletons (body joints) and RGB videos modality.

Then, run the following script to reshape all RGB videos to 256x256 with 30 fps:

$ python datasets/prepare_ntu.py --dir=<dir of RGB videos>

Run Experiments

First search the hypernets. You can use --parallel for data-parallel. The default setting will require about 128GB of GPU memeroy, you may adjust the --batchsize according to your budget.

$ python main_darts_searchable_ntu.py --parallel

Then train the searched fusion network. You need to assign the searching experiment by --search_exp_dir.

$ python main_darts_found_ntu.py --search_exp_dir=<dir of search exp>

If you want to just run the test process (no training of the fusion network), you can also use this script, you need to assign both the searching and evaluation experiments directories.

$ python main_darts_found_ntu.py --search_exp_dir=<dir of search exp> --eval_exp_dir=<dir of eval exp>

EgoGesture Experiments

First search the hypernets. You can use --parallel for data-parallel. You may adjust the --batchsize according to your GPU memory budget.

$ python main_darts_searchable_ego.py --parallel

Then train the searched fusion network. You need to assign the searching experiment by --search_exp_dir.

$ python main_darts_found_ego.py --search_exp_dir=<dir of search exp>

If you want to just run the test process (no training of the fusion network), you can also use this script, you need to assign both the searching and evaluation experiments directories.

$ python main_darts_found_ego.py --search_exp_dir=<dir of search exp> --eval_exp_dir=<dir of eval exp>

Visualization

You can use structure_vis.ipynb to visualize the searched genotypes.

structure_vis_example

Citation

If you find this work helpful, please kindly cite our paper.

@article{yin2021bm,
  title={BM-NAS: Bilevel Multimodal Neural Architecture Search},
  author={Yin, Yihang and Huang, Siyu and Zhang, Xiang},
  journal={arXiv preprint arXiv:2104.09379},
  year={2021}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.