This is the reference PyTorch implementation for training and testing depth prediction models using the method described in our paper LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR
If you find our work useful, please consider citing:
@misc{bartoccioni2021lidartouch,
title={LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR},
author={Florent Bartoccioni and Éloi Zablocki and Patrick Pérez and Matthieu Cord and Karteek Alahari},
year={2021},
eprint={2109.03569},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
First, clone the repo
# clone project
git clone https://github.com/F-Barto/LiDARTouch
cd LiDARTouch
Then, create the conda environment, install dependencies and activate env.
# create conda env and install dependancies
conda env create -n LiDARTouch -f environment.yaml
conda activate LiDARTouch
pip install -e .
To train the model from scratch on KITTI you first need to download both:
- the raw data
- the depth completion data
Once the data downloaded you need to preprocess it.
ℹ️Note that we provide the data split files under data_splits
Under the scripts/kitti_data_preparation
folder you will find:
lidar_sparsification.py
prepare_split_data.py
This script virtually sparsify the raw 64-beam LiDAR to a 4-beam LiDAR; use as follows:
python lidar_sparsification.py KITTI_RAW_ROOT_DIR OUTPUT_DIR DATA_SPLIT_DIR SPLIT_FILE_NAMES [OPTIONS]
e.g.,
python ./lidar_sparsification.py \
/path_to_kitti_root_folder/KITTI_raw/ \
/path_to_sparfied_lidar_data/sparsified_lidar/ \
/path_to_LiDARTouch_folder/LiDARTouch/data_splits \
'eigen_train_files.txt,filtered_eigen_val_files.txt,filtered_eigen_test_files.txt' \
--downsample_factor=16
the parameter --downsample_factor=16
indicates that only 1 out of 16 beams will be kept (leading to 4 beam).
Alternatively, you can choose to select individual beams by their indexes with --downsample_indexes='5,7,9,11,20
.
Then we will create a pickle split_data
containing the data for:
- the source views available for each image listed in the split file
- the relative pose between the source and target views using the IMU and/or Perspective-n-point w/ LiDAR
This script is used as follows:
prepare_split_data.py KITTI_RAW_ROOT_DIR OUTPUT_PATH DATA_SPLIT_DIR SPLIT_FILE_NAMES SOURCE_VIEWS_INDEXES [OPTIONS]
e.g.,
python ./prepare_split_data.py \
/path_to_kitti_root_folder/KITTI_raw/ \
/path_to_output/split_data.pkl \
/path_to_LiDARTouch_folder/LiDARTouch/data_splits \
'eigen_train_files.txt,filtered_eigen_val_files.txt,filtered_eigen_test_files.txt' \
'[-1,1]' \
--imu \
--pnp /path_to_sparfied_lidar_data/sparsified_lidar/factor_16
use --help
for more details.
Copy and rename the file .env_example
to .env
.
Change the paths present in the .env
file to configure the saving dir, the paths to your dataset and the pre-processed data.
Monodepth2 depth network only photometric supervision (relative depth | infinite depth issue)
python train.py experiment=PoseNet_P_multiscale depth_net=monodepth2
Monodepth2 depth network with IMU supervision (metric depth | infinite depth issue)
python train.py experiment=PoseNet_P+IMU_multiscale depth_net=monodepth2
Monodepth2-L depth network with LiDARTouch supervision (metric depth | NO infinite depth issue)
python train.py experiment=PnP_P+ml1L4_multiscale depth_net=monodepth2lidar
Regarding the infinite depth problem, the two major factors alleviating it are the auto-masking and the LiDAR self-supervision. In practice, we found multi-scale supervision and the smoothness loss to be critical for stable training when using the LiDAR self-supervision.
This work and code base is based upon the papers and code base of:
In particular, to structure our code we used:
Please consider giving these projects a star or citing their work if you use them.