Giter VIP home page Giter VIP logo

vdo_slam_modified's Introduction

在动态环境中,相比剔除动态物体上的特征点,VDO_SLAM结合图像分割、光流估计以及深度估计等算法,除了估算相机位姿外,还估算了动态物体的位姿以及跟踪

b站视频

example

准备工作

  1. kitti测试数据下载
    链接:https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_drive_0047/2011_10_03_drive_0047_sync.zip
    相关参数:https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_calib.zip
  2. 参照VideoFlow,生成光流估计
  3. 参照ZoeDepth,生成深度信息
  4. yolov8下载yolov8 seg onnx,比如yolov8 seg small,并放在当前目录下
  5. opencv-4.8, pangolin-0.6,解压onnxruntime-linux-x64-1.16.3.tgz

运行

./example/vdo_slam example/kitti_10_03.yaml /home/spurs/dataset/kitti_raw/2011_10_03/2011_10_03_drive_0047_sync/image_02

相关的论文


offical readme(click to expand)

VDO-SLAM

Authors: Jun Zhang*, Mina Henein*, Robert Mahony and Viorela Ila (*equally contributed)

VDO-SLAM is a Visual Object-aware Dynamic SLAM library for RGB-D cameras that is able to track dynamic objects, estimate the camera poses along with the static and dynamic structure, the full SE(3) pose change of every rigid object in the scene, extract velocity information, and be demonstrable in real-world outdoor scenarios. We provide examples to run the SLAM system in the KITTI Tracking Dataset, and in the Oxford Multi-motion Dataset.

VDO-SLAM VDO-SLAM

Click HERE to watch a demo video.

1. License

VDO-SLAM is released under a GPLv3 license. For a list of all code/library dependencies (and associated licenses), please see Dependencies.md.

If you use VDO-SLAM in an academic work, please cite:

@article{zhang2020vdoslam,
  title={{VDO-SLAM: A Visual Dynamic Object-aware SLAM System}},
  author={Zhang, Jun and Henein, Mina and Mahony, Robert and Ila, Viorela},
  year={2020},
  eprint={2005.11052},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
 }

Related Publications:

  • VDO-SLAM: A Visual Dynamic Object-aware SLAM System
    Jun Zhang*, Mina Henein*, Robert Mahony and Viorela Ila. ArXiv:2005.11052. [ArXiv/PDF] [Code] [Video] [BibTex] NOTE: a new version of the manuscript has been updated, please check ⬆ the Arxiv link (Dec 2021).
  • Robust Ego and Object 6-DoF Motion Estimation and Tracking
    Jun Zhang, Mina Henein, Robert Mahony and Viorela Ila. The IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS 2020. [ArXiv/PDF] [BibTex]
  • Dynamic SLAM: The Need For Speed
    Mina Henein, Jun Zhang, Robert Mahony and Viorela Ila. The International Conference on Robotics and Automation. ICRA 2020. [ArXiv/PDF] [BibTex]

2. Prerequisites

We have tested the library in Mac OS X 10.14 and Ubuntu 16.04, but it should be easy to compile in other platforms.

c++11, gcc and clang

We use some functionalities of c++11, and the tested gcc version is 9.2.1 (ubuntu), the tested clang version is 1000.11.45.5 (Mac).

OpenCV

We use OpenCV to manipulate images and features. Download and install instructions can be found at: http://opencv.org. Required at least 3.0. Tested with OpenCV 3.4.

Eigen3

Required by g2o (see below). Download and install instructions can be found at: http://eigen.tuxfamily.org. Required at least 3.1.0.

g2o (Included in dependencies folder)

We use modified versions of g2o library to perform non-linear optimizations. The modified libraries (which are BSD) are included in the dependencies folder.

Use Dockerfile for auto installation

For Ubuntu users, a Dockerfile is added for automatically installing all dependencies for reproducible environment, built and tested with KITTI dataset. (Thanks @satyajitghana for the contributions 👍 )

3. Building VDO-SLAM Library

Clone the repository:

git clone https://github.com/halajun/VDO_SLAM.git VDO-SLAM

We provide a script build.sh to build the dependencies libraries and VDO-SLAM. Please make sure you have installed all required dependencies (see section 2). Please also change the library file suffix, i.e., '.dylib' for Mac (default) or '.so' for Ubuntu, in the main CMakeLists.txt. Then Execute:

cd VDO-SLAM
chmod +x build.sh
./build.sh

This will create

  1. libObjSLAM.dylib (Mac) or libObjSLAM.so (Ubuntu) at lib folder,

  2. libg2o.dylib (Mac) or libg2o.so (Ubuntu) at /dependencies/g2o/lib folder,

  3. and the executable vdo_slam in example folder.

4. Running Examples

KITTI Tracking Dataset

  1. Download the demo sequence: kitti_demo, and uncompress it.

  2. Execute the following command.

./example/vdo_slam example/kitti-0000-0013.yaml PATH_TO_KITTI_SEQUENCE_DATA_FOLDER

Oxford Multi-motion Dataset

  1. Download the demo sequence: omd_demo, and uncompress it.

  2. Execute the following command.

./example/vdo_slam example/omd.yaml PATH_TO_OMD_SEQUENCE_DATA_FOLDER

5. Processing Your Own Data

You will need to create a settings (yaml) file with the calibration of your camera. See the settings files provided in the example/ folder. RGB-D input must be synchronized and depth registered. A list of timestamps for the images is needed for input.

The system also requires image pre-processing as input, which includes instance-level semantic segmentation and optical flow estimation. In our experiments, we used Mask R-CNN for instance segmentation (for KITTI only; we applied colour-based method to segment cuboids in OMD, check the matlab code in tools folder), and PWC-NET (PyTorch version) for optic-flow estimation. Other state-of-the-art methods can also be applied instead for better performance.

For evaluation purpose, ground truth data of camera pose and object pose are also needed as input. Details of input format are shown as follows,

Input Data Pre-processing

  1. The input of segmentation mask is saved as matrix, same size as image, in .txt file. Each element of the matrix is integer, with 0 stands for background, and 1,2,...,n stands for different instance label. Note that, to easily compare with ground truth object motion in KITTI dataset, we align the estimated mask label with the ground truth label. The .txt file generation (from .mask) and alignment code is in tools folder.

  2. The input of optical flow is the standard .flo file that can be read and processed directly using OpenCV.

Ground Truth Input for Evaluation

  1. The input of ground truth camera pose is saved as .txt file. Each row is organized as follows,
FrameID R11 R12 R13 t1 R21 R22 R23 t2 R31 R32 R33 t3 0 0 0 1

Here Rij are the coefficients of the camera rotation matrix R and ti are the coefficients of the camera translation vector t.

  1. The input of ground truth object pose is also saved as .txt file. One example of such file (KITTI Tracking Dataset), which each row is organized as follows,
FrameID ObjectID B1 B2 B3 B4 t1 t2 t3 r1

Where ti are the coefficients of 3D object location t in camera coordinates, and r1 is the Rotation around Y-axis in camera coordinates. B1-4 is 2D bounding box of object in the image, used for visualization. Please refer to the details in KITTI Tracking Dataset if necessary.

The provided object pose format of OMD dataset is axis-angle + translation vector. Please see the provided demos for details. A user can input a custom data format, but need to write a new function to input the data.

vdo_slam_modified's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.