Skip to content

[CVPR 2024] AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

License

Notifications You must be signed in to change notification settings

nianticlabs/airplanes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

This is the reference PyTorch implementation for training and testing AirPlanes, the method described in the following CVPR 2024 publication:

AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

Jamie Watson, Filippo Aleotti, Mohamed Sayed, Zawar Qureshi, Oisin Mac Aodha, Gabriel Brostow, Michael Firman, and Sara Vicente

Project Page, Paper (pdf), Supplementary Material (pdf), Video

teaser

This code is for non-commercial use only; please see the license file for terms. If you do find any part of this codebase helpful, please cite our paper using the BibTex below. Thanks!

Table of Contents

πŸ—ΊοΈ Overview

AirPlanes takes as input posed RGB images, and outputs a 3D planar representation of the scene.

Pipeline overview

Our pipeline consists of the following steps:

  • 2D Network inference, where the network predicts for each image:

    • A depth map.
    • A per-pixel planar probability score, indicating if a pixel is part of a planar or of a non-planar region.
    • A per-pixel planar embedding, where pixels belonging to the same plane should have a similar embedding vector.

    This network extends SimpleRecon, a popular network for depth estimation from posed images.

  • Scene Optimisation, where a scene-specific MLP is trained for each scene. This step tackles the main limitation of the previous step: planar embeddings predicted from single images are not multi-view consistent. The MLP takes as input the coordinates of a 3D point (x,y,z) and predicts an embedding vector. A push-pull loss enforces that embeddings that are similar/different in an image must also be similar/different in 3D. After training, each per-scene MLP can predict a multi-view consistent embedding array for any 3D point in the scene.

  • 3D Plane Clustering, where a clustering algorithm is used to group mesh vertices into planes. Our custom sequential RANSAC implementation uses the 3D embeddings from the MLP to identify the inilier set of each 3D plane hypothesis.

βš™οΈ Setup

You can install a new environment using mamba. To install mamba:

make install-mamba

To create the environment:

make create-mamba-env

Next, please activate the environment and install the project as a module:

conda activate airplanes
pip install -e .

We ran our experiments with PyTorch 2.0.1, CUDA 11.7 and Python 3.9.7.

πŸ“¦ Pretrained models

Our 2D network pretrained on ScanNetv2 is available online here.

We suggest you place the downloaded model in a new checkpoints folder.

πŸƒ Running out of the box!

We made two captures available to make it easier to quickly try the code.

First, download these scans, place them in a new folder called arbitrary_captures and unzip them.

Then, run the full inference pipeline with our model:

echo "πŸ“Έ Preparing keyframes for the sequences"
python -m scripts.inference_on_captures prepare-captures [--captures /path/to/captures]

echo "πŸš€ Generating TSDFs and 2D embeddings using our model for each scan"
python -m scripts.inference_on_captures model-inference [--checkpoint /path/to/model/checkpoint --output-dir  /path/to/predicted/meshes]

echo "πŸš€ Training 3D embeddings (MLPs)..."
python -m scripts.inference_on_captures train-embeddings [--pred-root  /path/to/predicted/meshes --captures /path/to/captures]

echo "πŸš€ Running RANSAC!"
python -m scripts.inference_on_captures run-ransac-ours [--pred-root  /path/to/predicted/meshes --dest-planar-meshes /path/to/planar/meshes --captures /path/to/captures]

This script will infer per-image embeddings and scene geometry using the 2D network, optimise 3D embeddings and, finally, fit planar meshes using 3D embeddings in the RANSAC loop.

πŸ—ƒοΈ Data preparation

This section explains how to prepare ScanNetv2 for training and testing: in fact, ScanNetv2 only provides meshes with semantic labels, but not with planes. Following previous works, we process the dataset extracting planar information with RANSAC.

πŸ•’ Please note that the data preparation scripts will take a few hours to run.

ScanNetv2 download (training & testing)

Please follow instructions reported in SimpleRecon

You should get at the end a ScanNetv2 root folder that looks like:

SCANNET_ROOT
β”œβ”€β”€ scans_test (test scans)
β”‚   β”œβ”€β”€ scene0707
β”‚   β”‚   β”œβ”€β”€ scene0707_00_vh_clean_2.ply (gt mesh)
β”‚   β”‚   β”œβ”€β”€ sensor_data
β”‚   β”‚   β”‚   β”œβ”€β”€ frame-000261.pose.txt
β”‚   β”‚   β”‚   β”œβ”€β”€ frame-000261.color.jpg 
β”‚   β”‚   β”‚   └── frame-000261.depth.png (full res depth, stored scale *1000)
β”‚   β”‚   β”œβ”€β”€ scene0707.txt (scan metadata and image sizes)
β”‚   β”‚   └── intrinsic
β”‚   β”‚       β”œβ”€β”€ intrinsic_depth.txt
β”‚   β”‚       └── intrinsic_color.txt
β”‚   └── ...
└── scans (val and train scans)
    β”œβ”€β”€ scene0000_00
    β”‚   └── (see above)
    β”œβ”€β”€ scene0000_01
    └── ....
Ground-truth generation from ScanNetv2 meshes (training & testing)

Our ground-truth generation script is adapted from the one released by PlaneRCNN.

Run:

python -m scripts.run_scannet_processing --scannet path/to/ScanNetv2 --output-dir destination/path

NOTE: PlaneRCNN saves planes as (nx*d,ny*d,nz*d), while our project represents planes as (nx/d,ny/d,nz/d). We will manage this difference later in our dataloaders.

The code will generate in the root folder a structure as:

ROOT_FOLDER
β”œβ”€β”€ scene0000_00
β”‚   β”œβ”€β”€ planes.npy  (array with planes parameters. Each plane is encoded as (nx*d,ny*d,nz*d))
β”‚   └── planes.ply  (unsquashed mesh of the scene, where each colour encodes a plane ID)
β”œβ”€β”€ scene0000_01
β”‚   └── annotation
β”‚       β”œβ”€β”€ planes.npy
β”‚       └── planes.ply
└── ...

Unsquashed means that the geometry of the mesh is unchanged with respect to the ground truth mesh provided by ScanNetv2, i.e. the mesh is not composed of planes. The colour of each vertex in the mesh encodes the plane ID.

RGB channels encode the plane ids, which can be obtained using:

    plane_ids = plane_mesh.visual.vertex_colors.copy().astype("int32")
    plane_ids = (
        plane_ids[:, 0] * 256 * 256 + plane_ids[:, 1] * 256 + plane_ids[:, 2]
    ) // 100 - 1

The alpha channel stores 0 if the vertex is an edge between two planes.

Render ground truth plane images (training & testing)

At this point, we have RGB images, depth maps and plane-annotated ground-truth meshes. We are missing images annotated with plane IDs. To get those images, we have to render ground-truth meshes at different camera locations.

python -m scripts.run_rendering render-train-split --scannet /path/to/scannet --planar-meshes /path/to/mesh/with/planes/ids --output-dir /path/to/render/folder

python -m scripts.run_rendering render-val-split --scannet /path/to/scannet --planar-meshes /path/to/mesh/with/planes/ids --output-dir /path/to/render/folder

python -m scripts.run_rendering render-test-split --scannet /path/to/scannet --planar-meshes /path/to/mesh/with/planes/ids --output-dir /path/to/render/folder

The script generates in the DEST folder a structure as:

DEST
β”œβ”€β”€ scene0000_00
β”‚   └── frames
β”‚       β”œβ”€β”€ 000000_planes.png   (image where pixels encode planeIDs)
β”‚       β”œβ”€β”€ 000000_depth.npy    (depth map rendered from the mesh)
β”‚       β”œβ”€β”€ 000001_planes.png
β”‚       β”œβ”€β”€ 000001_depth.png
β”‚       └── ...
β”œβ”€β”€ scene0000_01
β”‚   └── frames
β”‚       β”œβ”€β”€ 000000_planes.png
β”‚       β”œβ”€β”€ 000000_depth.npy
β”‚       β”œβ”€β”€ 000001_planes.png
β”‚       β”œβ”€β”€ 000001_depth.npy
β”‚       └── ...
└── ...   

As before, we can get plane Ids from colours in planes.png as:

plane_ids = your_custom_image_reader("000000_planes.png")
plane_ids = (plane_ids[:, :, 0] * 256 * 256 + plane_ids[:, :, 1] * 256 + plane_ids[:, :, 2]) // 100 - 1

Note that rendered depth maps are not needed for training and validation. For this reason, we do not save depth maps for these two splits.

Generate visibility volumes (testing)

We rely on visibility volumes for benchmarking. This is because the ground-truth mesh in ScanNetv2 has been obtained by post-processing sensor data. This mask allows to remove voxels that are not visible in the scan. Please see more details in the paper and supplement.

The following script can generate such volumes using depth renders and ScanNetv2 data.

python -m scripts.run_creation_of_visibility_volumes --scannet /path/to/scannet --output-dir /path/to/planar/meshes --renders /path/to/render/folder

Where --renders points to the directory with the ground truth plane images created in the previous step. The script adds the visibility volumes to output-dir, so that the final structure is:

β”œβ”€β”€ scenexxxx_yy
β”‚   β”œβ”€β”€ mesh_with_planes.ply
β”‚   β”œβ”€β”€ scenexxxx_yy_planes.npy
β”‚   β”œβ”€β”€ scenexxxx_yy_visibility_pcd.ply (a point cloud for debugging purposes only)
β”‚   β”œβ”€β”€ scenexxxx_yy_volume.npz (visibility volume!)
└── ...  

NOTE: our benchmarks expect to find volumes in the same folder of ground-truth meshes with planes. Please set output-dir accordingly.

At the end of the process you can inspect the sampled visibility volume, saved as a point cloud. It should look like the following figure:

sampled visibility volume

πŸ§ͺ Running and Evaluating on ScanNetV2

🚦 We need ground-truth meshes with planes and visibility volumes. Please run all the data preparation steps if you haven't already. πŸ™

In order to evaluate AirPlanes, first we have to run the inference of our 2D network and cache intermediate results. Then, we use these outputs to train our 3D embeddings MLPs, one per scene. This embeddings can be used in the RANSAC loop to extract a planar representation for the scene (saved as a planar mesh, one per scene). Once we have all the planar meshes we can evaluate them using our meshing, segmentation and planar benchmarks.

First, we have to update the testing configuration configs/data/scannet_test.yaml. Specifically:

  • dataset_path: path to ScanNetv2 data

The following script summarises all the steps. You can run (and customise) these steps individually if needed.

echo "πŸš€ Generating TSDFs and 2D embeddings using our model for each scan"
python -m scripts.evaluation model-inference [--output-dir /path/to/results/folder]

echo "πŸš€ Training 3D embeddings (MLPs)..."
python -m scripts.evaluation train-embeddings [--pred-root /path/to/predicted/meshes]

echo "πŸš€ Running RANSAC!"
python -m scripts.evaluation run-ransac-ours [--pred-root /path/to/predicted/meshes --dest-planar-meshes /path/to/planar/meshes]

echo "πŸ§ͺ Benchmarking Geometry"
python -m scripts.evaluation meshing-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth --output-score-dir /path/to/scores/folder]

echo "πŸ§ͺ Benchmarking Segmentation"
python -m scripts.evaluation segmentation-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth --output-score-dir /path/to/scores/folder]

echo "πŸ§ͺ Benchmarking Planar"
python -m scripts.evaluation planar-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth --output-score-dir /path/to/scores/folder]

Similarly, you can evaluate the baseline model, i.e. SimpleRecon without embeddings. You can use the meshes saved by our model. The baseline doesn't need per-scene optimisation ⏩.

echo "πŸš€ Running RANSAC!"
python -m scripts.evaluation run-ransac-baseline [--pred-root /path/to/predicted/meshes --dest-planar-meshes /path/to/planar/meshes]

echo "πŸ§ͺ Benchmarking Geometry"
python -m scripts.evaluation meshing-benchmark [--pred-root /path/to/planar/meshes --gt-root /path/to/ground/truth]
...

⏳ Optional: Training the 2D Network on ScanNetV2

We recommend using our pretrained 2D network when evaluating our method.

To retrain the 2D network, please first download the initial weights from here and place it in the "checkpoints" folder. You also need to generate the training data, following the instructions in data preparation.

This network is in charge of estimating depth maps, 2D plane embeddings and per-pixel plane probabilities from posed images.

First, we have to update the training configuration configs/data/scannet_train.yaml. Specifically:

  • dataset_path: path to ScanNetv2 data
  • planes_path: path to meshes with planes IDs generated during the step Ground-truth generation from ScanNetv2 meshes
  • renders_path: path to the training renders generated during the step Render ground truth plane images (training & testing)

When the config is ready, please run:

python -m airplanes.train_2D_network

πŸ™ Acknowledgements

We thank PlaneRCNN authors for sharing their data processing code.

πŸ“œ BibTeX

If you find our work useful in your research, please consider citing our paper:

@inproceedings{watson2024airplanes,
  title={AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings},
  author={Watson, Jamie and Aleotti, Filippo and Sayed, Mohamed and Qureshi, Zawar and Brostow and Mac Aodha, Oisin and Firman, Michael and Vicente, Sara},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024},
}

πŸ‘©β€βš–οΈ License

Copyright Β© Niantic, Inc. 2024. Patent Pending. All rights reserved. Please see the license file for terms.