Giter VIP home page Giter VIP logo

pytorch_generalized_3d_lane_detection's Introduction

(pytorch) Gen-LaneNet: a generalized and scalable approach for 3D lane detection

Introduction

This is a pytorch implementation of Gen-LaneNet, which predicts 3D lanes from a single image. Specifically, Gen-LaneNet is a unified network solution that solves image encoding, spatial transform of features and 3D lane prediction simultaneously. The method refers to the ECCV 2020 paper:

'Gen-LaneNet: a generalized and scalable approach for 3D lane detection', Y Guo, etal. ECCV 2020. [eccv][arxiv]

Key features:

  • A geometry-guided lane anchor representation generalizable to novel scenes.

  • A scalable two-stage framework that decouples the learning of image segmentation subnetwork and geometry encoding subnetwork.

  • A synthetic dataset for 3D lane detection [repo].

Another baseline

This repo also includes an unofficial implementation of '3D-LaneNet' in pytorch for comparison. The method refers to

"3d-lanenet: end-to-end 3d multiple lane detection", N. Garnet, etal., ICCV 2019. [paper]

Requirements

If you have Anaconda installed, you can directly import the provided environment file.

conda env update --file environment.yaml

Those important packages includes:

  • opencv-python 4.1.0.25
  • pytorch 1.4.0
  • torchvision 0.5.0
  • tensorboard 1.15.0
  • tensorboardx 1.7
  • py3-ortools 5.1.4041

Data preparation

The 3D lane detection method is trained and tested on the 3D lane synthetic dataset. Running the demo code on a single image should directly work. However, repeating the training, testing and evaluation requires to prepare the dataset:

If you prefer to build your own data splits using the dataset, please follow the steps described in the 3D lane synthetic dataset repository. All necessary codes are included here already.

Run the Demo

python main_demo_GenLaneNet_ext.py

Specifically, this code predict 3D lane from an image given known camera height and pitch angle. Pretrained models for the segmentation subnetwork and the 3D geometry subnetwork are loaded. Meanwhile, anchor normalization parameters wrt. the training set are also loaded. The demo code will produce lane predication from a single image visualized in the following figure.

The lane results are visualized in three coordinate frames, respectively image plane, virtual top-view, and ego-vehicle coordinate frame. The lane-lines are shown in the top row and the center-lines are shown in the bottom row.

How to train the model

Step 1: Train the segmentation subnetwork

The training of Gen-LaneNet requires to first train the segmentation subnetwork, ERFNet.

  • The training of the ERFNet is based on a pytorch implementation [repo] modified to train the model on the 3D lane synthetic dataset.

  • The trained model should be saved as 'pretrained/erfnet_model_sim3d.tar'. A pre-trained model is already included.

Step 2: Train the 3D-geometry subnetwork

python main_train_GenLaneNet_ext.py
  • Set 'args.dataset_name' to a certain data split to train the model.
  • Set 'args.dataset_dir' to the folder saving the raw dataset.
  • The trained model will be saved in the directory corresponding to certain data split and model name, e.g. 'data_splits/illus_chg/Gen_LaneNet_ext/model*'.
  • The anchor offset std will be recorded for certain data split at the same time, e.g. 'data_splits/illus_chg/geo_anchor_std.json'.

The training progress can be monitored by tensorboard as follows.

cd datas_splits/Gen_LaneNet_ext
./tensorboard  --logdir ./

Batch testing

python main_test_GenLaneNet_ext.py
  • Set 'args.dataset_name' to a certain data split to test the model.
  • Set 'args.dataset_dir' to the folder saving the raw dataset.

The batch testing code not only produces the prediction results, e.g., 'data_splits/illus_chg/Gen_LaneNet_ext/test_pred_file.json', but also perform full-range precision-recall evaluation to produce AP and max F-score.

Other methods

In './experiments', we include the training codes for other variants of Gen-LaneNet models as well as for the baseline method 3D-LaneNet as well as its extended version integrated with the new anchor proposed in Gen-LaneNet. Interested users are welcome to repeat the full set of ablation study reported in the gen-lanenet paper. For example, to train 3D-LaneNet:

cd experiments
python main_train_3DLaneNet.py

Evaluation

Stand-alone evaluation can also be performed.

cd tools
python eval_3D_lane.py

Basically, you need to set 'method_name' and 'data_split' properly to compare the predicted lanes against ground-truth lanes. Evaluation details can refer to the 3D lane synthetic dataset repository or the Gen-LaneNet paper. Overall, the evaluation metrics include:

  • Average Precision (AP)
  • max F-score
  • x-error in close range (0-40 m)
  • x-error in far range (40-100 m)
  • z-error in close range (0-40 m)
  • z-error in far range (40-100 m)

We show the evaluation results comparing two methods:

  • "3d-lanenet: end-to-end 3d multiple lane detection", N. Garnet, etal., ICCV 2019
  • "Gen-lanenet: a generalized and scalable approach for 3D lane detection", Y. Guo, etal., Arxiv, 2020 (GenLaneNet_ext in code)

Comparisons are conducted under three distinguished splits of the dataset. For simplicity, only lane-line results are reported here. The results from the code could be marginally different from that reported in the paper due to different random splits.

  • Standard
Method AP F-Score x error near (m) x error far (m) z error near (m) z error far (m)
3D-LaneNet 89.3 86.4 0.068 0.477 0.015 0.202
Gen-LaneNet 90.1 88.1 0.061 0.496 0.012 0.214
  • Rare Subset
Method AP F-Score x error near (m) x error far (m) z error near (m) z error far (m)
3D-LaneNet 74.6 72.0 0.166 0.855 0.039 0.521
Gen-LaneNet 79.0 78.0 0.139 0.903 0.030 0.539
  • Illumination Change
Method AP F-Score x error near (m) x error far (m) z error near (m) z error far (m)
3D-LaneNet 74.9 72.5 0.115 0.601 0.032 0.230
Gen-LaneNet 87.2 85.3 0.074 0.538 0.015 0.232

Visualization

Visual comparisons to the ground truth can be generated per image when setting 'vis = True' in 'tools/eval_3D_lane.py'. We show two examples for each method under the data split involving illumination change.

  • 3D-LaneNet

  • Gen-LaneNet

Citation

Please cite the paper in your publications if it helps your research:

@article{guo2020gen,
  title={Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection},
  author={Yuliang Guo, Guang Chen, Peitao Zhao, Weide Zhang, Jinghao Miao, Jingao Wang, and Tae Eun Choe},
  booktitle={Computer Vision - {ECCV} 2020 - 16th European Conference},
  year={2020}
}

Copyright and License

The copyright of this work belongs to Baidu Apollo, which is provided under the Apache-2.0 license.

pytorch_generalized_3d_lane_detection's People

Contributors

yuliangguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch_generalized_3d_lane_detection's Issues

Segmentation fault (core dumped) training Issue

I want to get the result segmented to the vehicle in the final result,
so I want to re-learn using erfnet_model_sim3d_7class.tar.

All dataset paths were checked, and the pytorch version and CUDA and cuDNN versions were matched.
But I only got the error below.
Segmentation fault (core dumped)

What is the problem?

If you had the same problem, please share a solution.

can't extract model from data_splits//illus_chg/Gen_LaneNet_ext

Hello, author:
when I extract model from below file, I got the errors like these:

$ tar -xvf model_best_epoch_29.pth.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

How can I solve it? Thank you!

Typo in README.md

In Batch Testing of README.md:

Set 'args.dataset_name' to a certain data split to train the model

Is it supposed to be "Set 'args.dataset_name' to a certain data split to test the model"?

cuDNN error

When I run main_demo_GenLanezNet_ext.py, I got the following error:

cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "main_demo_GenLaneNet_ext.py", line 147, in
output_geo = output_geo[0].data.cpu().numpy()
NameError: name 'output_geo' is not defined

How can I fix it?

3D-LaneNet benchmark source

Hello @yuliangguo

Which implementation of 3D-LaneNet did you use to generate the score?

Method AP F-Score x error near (m) x error far (m) z error near (m) z error far (m)
3D-LaneNet 89.3 86.4 0.068 0.477 0.015 0.202
Gen-LaneNet 90.1 88.1 0.061 0.496 0.012 0.214

Perhaps you can share it also? as 3D-LaneNet author does not give reference implementation (like you did with Gen-Lanenet). Thanks

GPU issue

I have two GPU on my desktop, however, even I set os.environ["CUDA_VISIBLE_DEVICES"] = "1", the main_train_GenLaneNet_ext program sill uses the gpu 0. How can I make it use gpu 1?

A question about estimate z for 3D lane

Thanks for your amazing work.
I have read Part 3.1 and I just cannot understand the process of estimating z for 3D lane. After the first and second trasformation,I think there should only be red curves on the following image. So where do these blue curves come from? Thank you very much for answering my questions .
1596187663(1)

Need of gt_height and gt_pitch for inference: GeoNet3D_ext

Hey! I have explored your implementation quite nicely. The GeoNet3D_ext model takes as an input the features from the lane segmentation model. Thus those features are then fed to the GeoNet3D_ext model for 3d lane detection. From your implementation, I have realized that you are not predicting camera height and camera pitch instead you are using the gt camera height and gt camera pitch for the same and in your training script you are updating your M_inv using the same gt_camera_height and gt_camera_pitch.

if not args.fix_cam and not args.pred_cam:

            if not args.fix_cam and not args.pred_cam:
                model2.update_projection(args, gt_hcam, gt_pitch)

I tried to run the train script for GeoNet3D_ext and try to found out that gt_cam_height == pred_cam_height and the same for pitch.

Whereas in the unofficial implementation of 3D lanenet as per the train script, the height and pitch are different from gt's.

**Consider a real-world scenario where I want to use the pre-trained models provided by you for 3D lane Detection: ** in that particular case camera_height and camera_pitch are fixed as not really predicted by the model. Does in that particular case the results for 3d lane detection will differ? Have you tested your approach with any real-world data?

使用IPM将mask转换为top-view的疑惑

你好,我看在训练GeoNet,首先使用了一个固定参数,将2D车道线mask,通过IPM方式转换为俯视图,请问这个固定的参数怎么来的,这个固定参数是否已经包含了相机外参(旋转矩阵和平移矩阵)。这个参数是否只是预先标定的,不太准确的外参。

关于R_g2c计算方式的疑惑

tools/utils.py文件:

def homograpthy_g2im(cam_pitch, cam_height, K):
    # transform top-view region to original image region
    R_g2c = np.array([[1, 0, 0],
                      [0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch)],
                      [0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch)]])
    H_g2im = np.matmul(K, np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1))
    return H_g2im

R_g2c是车体坐标系绕x轴旋转90+pitch度的矩阵可以理解, 为什么H_g2im却只取前两列呢?

GeoNet3D_ext.py文件 230行:

        # homograph ground to camera
        # H_g2cam = np.array([[1,                             0,               0],
        #                     [0, np.cos(np.pi / 2 + cam_pitch), args.cam_height],
        #                     [0, np.sin(np.pi / 2 + cam_pitch),               0]])
        H_g2cam = np.array([[1,                  0,               0],
                            [0, np.sin(-cam_pitch), args.cam_height],
                            [0, np.cos(-cam_pitch),               0]])

这里H_g2cam其实是 np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)的结果, 但是为什么又与上面的定义方式不同了呢?

feel uncertain about model robust

Hi,when i change the calibration of camera,what ever the height or the pitch angel,the model output is totaly error.
Is this algorithm can cover different calibration of camera?
we train this algorithm by usning the real 3d world label with image from different calibration of camera,whether this algorithm can conver different calibration of camera?Or it's cant't to converge directly.

How to use webcam?

Hello, i'm student and study your code.

I use webcam and video file to test the test.py. But i don't know how to use webcam and video file

please to teach how to use webcam and video test.

about pred_hcam and pred_pitch

Thank you very much for your code, I'm an employee of the national intelligent vehicle innovation center.
In LaneNet3D.py, pred_hcam and pred_pitch are obtained by training, but in GeoNet3D.py I found that they're both directly assigned without training.
I want to know why assign them directly without training? Maybe I misunderstood, Hope to get your reply ~~
Thank you again for your code!

version of py3-ortools 5.1.4041

ERROR: Could not find a version that satisfies the requirement py3-ortools==5.1.4041 (from versions: none)。
do i need to use ortools.eg:ortools==7.2.6977

Curvature computation

I was just curious if the curvature that is computed in camera coordinate system is sufficient to use for the steering of the vehicle or is it recommended to compute the curvature in world coordinates as well. If yes, how can XYZ of a particular lane point be used to compute the curvature of the lane ? In traditional IPM based curvature detectection just the X and Y are used keeping Z=0 as it is on the road surface. Since camera to world transformation is rigid can i just rely on the curvature values in camera coordinate system ?

Thanks in advance for any clarification or comments on my understanding. Please do let me know if i am missing something.

tusimple

你好,请问genLaneNet可以使用tusimple训练吗?label里是不是会有缺失的部分

关于H_g2cam的疑惑

在Lanet3D_ext.py中,有这样的公式,这公式是怎么来的?

# homograph ground to camera
# H_g2cam = np.array([[1,                             0,               0],
#                     [0, np.cos(np.pi / 2 + cam_pitch), args.cam_height],
#                     [0, np.sin(np.pi / 2 + cam_pitch),               0]])
H_g2cam = np.array([[1,                             0,               0],
                    [0, np.sin(-cam_pitch), args_cam_height],
                    [0, np.cos(-cam_pitch),               0]])

cam的坐标系是在哪定义?为什么是这个样子?

load model error

python main_demo_GenLaneNet_ext.py
and don't work

Unexpected key(s) in state_dict: "encoder.0.weight", "encoder.0.bias", "encoder.1.weight", "encoder.1.bias", "encoder.1.running_mean", "encoder.1.running_var", "encoder.1.num_batches_tracked", "encoder.4.weight", "encoder.4.bias", "encoder.5.weight", "encoder.5.bias", "encoder.5.running_mean", "encoder.5.running_var", "encoder.5.num_batches_tracked", "encoder.8.weight", "encoder.8.bias", "encoder.9.weight", "encoder.9.bias", "encoder.9.running_mean", "encoder.9.running_var", "encoder.9.num_batches_tracked", "encoder.12.weight", "encoder.12.bias", "encoder.13.weight", "encoder.13.bias", "encoder.13.running_mean", "encoder.13.running_var", "encoder.13.num_batches_tracked", "lane_out.features.0.weight", "lane_out.features.0.bias", "lane_out.features.1.weight", "lane_out.features.1.bias", "lane_out.features.1.running_mean", "lane_out.features.1.running_var", "lane_out.features.1.num_batches_tracked", "lane_out.features.3.weight", "lane_out.features.3.bias", "lane_out.features.4.weight", "lane_out.features.4.bias", "lane_out.features.4.running_mean", "lane_out.features.4.running_var", "lane_out.features.4.num_batches_tracked", "lane_out.features.6.weight", "lane_out.features.6.bias", "lane_out.features.7.weight", "lane_out.features.7.bias", "lane_out.features.7.running_mean", "lane_out.features.7.running_var", "lane_out.features.7.num_batches_tracked", "lane_out.features.9.weight", "lane_out.features.9.bias", "lane_out.features.10.weight", "lane_out.features.10.bias", "lane_out.features.10.running_mean", "lane_out.features.10.running_var", "lane_out.features.10.num_batches_tracked", "lane_out.features.12.weight", "lane_out.features.12.bias", "lane_out.features.13.weight", "lane_out.features.13.bias", "lane_out.features.13.running_mean", "lane_out.features.13.running_var", "lane_out.features.13.num_batches_tracked", "lane_out.features.15.weight", "lane_out.features.15.bias", "lane_out.features.16.weight", "lane_out.features.16.bias", "lane_out.features.16.running_mean", "lane_out.features.16.running_var", "lane_out.features.16.num_batches_tracked", "lane_out.features.18.weight", "lane_out.features.18.bias", "lane_out.features.19.weight", "lane_out.features.19.bias", "lane_out.features.19.running_mean", "lane_out.features.19.running_var", "lane_out.features.19.num_batches_tracked", "lane_out.dim_rt.0.weight", "lane_out.dim_rt.0.bias", "lane_out.dim_rt.1.weight", "lane_out.dim_rt.1.bias", "lane_out.dim_rt.1.running_mean", "lane_out.dim_rt.1.running_var", "lane_out.dim_rt.1.num_batches_tracked", "lane_out.dim_rt.3.weight", "lane_out.dim_rt.3.bias".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.