Skip to content

[ECCV '24] VEGS: View-Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

License

Notifications You must be signed in to change notification settings

deepshwang/vegs

Repository files navigation

[ECCV'24] VEGS: View-Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

Sungwon Hwang*1Min-Jung Kim*1Taewoong Kang1Jayeon Kang2Jaegul Choo1
*: Equal contribution
1KAIST, 2Ghent University
| Project Page | arXiv | Code |

teaser.gif
Our method aligns and flattens Gaussian covariances to scene surfaces estimated from
monocular normal estimations.



dynamic.gifobject_relocation.gif
Our method jointly reconstructs static scene with dynamic object such as cars, which can then be relocated arbitrarily.

Abstract

Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolated View Synthesis (EVS) problem by evaluating the reconstructions on views such as looking left, right or downwards with respect to training camera distributions. To improve rendering quality for EVS, we initialize our model by constructing dense LiDAR map, and propose to leverage prior scene knowledge such as surface normal estimator and large-scale diffusion model. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS. To the best of our knowledge, we are the first to address the EVS problem in urban scene reconstruction. We will release the code upon acceptance.

Installation

1. Requirements

The software requirements are the following:

  • Conda (recommended for easy setup)
  • C++ Compiler for PyTorch extensions
  • CUDA toolkit 11.8 for PyTorch extensions
  • C++ Compiler and CUDA SDK must be compatible

Please refer to the original 3D Gaussian Splatting repository for more details about requirements.

2. Clone the repository

# HTTPS
git clone https://github.com/deepshwang/vegs.git --recursive

or

# SSH
git clone [email protected]:deepshwang/vegs.git --recursive

3. Install packages

Create and activate the environemnt with the required packages installed.

conda env create -f environment.yml
conda activate vegs

Dataset Preparation

We provide training pipeline for KITTI-360 Dataset. Pleaser refer to the data documentation for details on the data structure.

[1] Download Data

You may register and log-in for KITTI-360 page. Then, please download the following data.

KITTI-360
└───calibration
└───data_2d_raw
│   └───2013_05_28_drive_{seq:0>4}_sync
└───data_3d_semantics
│   └───train
│       └───static
│           └───{start_frame:0>10}_{end_frame:0>10}.ply
│       └───dynamic
│           └───{start_frame:0>10}_{end_frame:0>10}.ply
└───data_3d_bboxes
│   └───train
│       └───2013_05_28_drive_{seq:0>4}_sync.xml
│   └───train_full
│       └───2013_05_28_drive_{seq:0>4}_sync.xml
└───data_poses
│   └───2013_05_28_drive_{seq:0>4}_sync

Since each sequence is too large to construct as a single scene model, we use scene segment pre-divided by frames, start_frame and end_frame.


[2-1] EITHER Triangulate 3D points from training images and known camera poses using COLMAP.

In addition to the LiDAR map, we use points triangulated from training images. To prepare the points, run the following command. (COLMAP must be installed to run)

python triangulate.py --data_dir ${KITTI360_DIR}

where ${KITTI360_DIR} is the KITTI-360 data directory. By default, the script will triangulate for all scene sgements in data, and save the results in data_3d_colmap and data_3d_colmap_processed folder under the KITTI-360 data directory.

[2-2] OR Download Triangluated points for KITTI-360.

You may download the points from here and save them into ${KITTI360_DIR}/data_3d_colmap_processed


[3-1] EITHER Prepare Monocular Surface Normal Estimations

We use omnidata for monocular surface normal estimation. Please download and place the pretrained model in omnidata/pretrained_models/omnidata_dpt_normal_v2.ckpt. Running the following scripts will save monocular surface normal estimations in data_2d_normal_omnidata_all under the KITTI-360 data directory. To prepare the data, run

bash bash_scripts/normal_preprocess_kitti360.sh ${GPU_NUM} ${KITTI360_DIR}

[3-2] OR Download Monocular Surface Normal Estimations

You may download pre-calculated monocular surface normal estimations from here, and save them into ${KITTI360_DIR}/data_2d_normal_omnidata_all.

Note that the file only contains a frame segment from 3972 to 4258 in sequence 0009 as files for all sequences are too large.


[4-1] EITHER Prepare training images & Fine-tune with LoRA

To prepare dataset for LoRA training, run the following command.

bash bash_scripts/lora_preprocess_kitti360.sh

This will prepare square-cropped dataset and save them into lora/data/kitti360.

By default, this will prepare images for scene segments listed in lora/data/kitti360/2013_05_28_drive_train_dynamic_vehicle_human_track_num_vehicles.txt, which includes scene fragements where vehicles are the only dynamic objects in the scene (as our method cannot handle topologically-varying dynamic objects such as walking people). You may change the text file to only process the scene segment of interest.

We use diffusers to train Stable-Diffusion with LoRA. To train, run the following command.

bash bash_scripts/lora_train_kitti360.sh ${GPU_NUM}

By default, the script will train fine-tuned models for all scene segments listed in lora/data/kitti360/2013_05_28_drive_train_dynamic_vehicle_human_track_num_vehicles.txt.

[4-2] OR download pre-trained LoRA weights for KITTI-360.

You may download pre-trained LoRA weights from here and unzip them under lora/models/kitti360. Again, we only provide models for scene segments listed in lora/data/kitti360/2013_05_28_drive_train_dynamic_vehicle_human_track_num_vehicles.txt.


Training

To train VEGS for a scene segment of interest, run the following command.

bash bash_scripts/train_kitti360.sh ${GPU_NUM} ${DATA_PATH} ${SEQUENCE} ${START_FRAME} ${END_FRAME} ${EXPERIMENT_NOTE}
Parameter Description Default
${GPU_NUM} Index of GPU to use. 0
${DATA_PATH} Data path ./KITTI-360
${SEQUENCE} Index of sequence to train 0009
${START_FRAME} Start frame number of the frame segment 3972
${END_FRAME} End frame number of the frame segment 4258
${EXP_NOTE} Optional note for the run.
The note will be included to the folder that the model will be saved.
""

Trained model and images rendered on conventional and extrapolated cameras will be saved in output.

Video Rendering

We also provide a script to render and save from camera trajectories, along with novel cameras interpolated between adjacent pairs of the cameras within the trajectory for smooth video rendering.

bash bash_scripts/render_video.sh ${GPU_NUM} ${MODEL_PATH}

where ${MODEL_PATH} is the path of the trained gaussian model. Running the script will give you smooth video renderings from both interpolated and extrapolated views.

About

[ECCV '24] VEGS: View-Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages