Repo for TWM (Machine Vision Techniques) project @ WUT 24L semester
The following instructions describe:
- Setting up the environment and running notebooks
- Functions and purpose of the notebooks
- Sources and brief description of the data
- Create a virtual environment (conda, venv, etc.), e.g.:
python -m venv aerial_images
- Activate the virtual environment (depending on the operating system):
- Windows:
.\aerial_images\Scripts\activate
- Linux:
source aerial_images/bin/activate
- Windows:
- Download and install the
torch
library according to the instructions on pytorch.org - Install the remaining required libraries:
pip install -r requirements.txt
Generally, the repository is organized in such a way that the data
directory contains only raw data, the src
directory contains code and helper functions, and the notebooks
directory contains notebooks with code. For better clarity, the notebooks are not stored in the root directory, but should be moved there before running. The individual sub-directories contain what their names indicate, below is a detailed description:
data
- directory containing only data, directories ending withPatches
contain datasets divided into patchessrc
- directory containing source code, including helper functionscallbacks
- directory containing callback functions, assisting in model training managementdatasets
- directory containing dataset classes (inheriting fromtorch.utils.data.Dataset
)utils
- directory containing helper functions for datasets, mainly for converting masks to labels of type {0, 1, 2, ...} and transformations (torchvision.transforms
)
evaluation
- directory containing helper functions for model evaluationmodels
- directory containing a helper baseline model class (inheriting fromtorch.nn.Module
)utils.py
- general helper functions
notebooks
- directory containing notebooks with codedatasets_to_patches
- notebooks demonstrating the division of datasets into patchesmasks_conversion
- notebooks demonstrating the conversion of masks to labels of type {0, 1, 2, ...}no_finetune
- notebooks demonstrating attempts to use models without training for aerial image segmentation (baseline, weights trained on ImageNet do not transfer to the new dataset)sanity_checks
- notebooks demonstrating sanity checks for datasetswith_finetune
- notebooks demonstrating actual training of models on new datasets
INRIA
: 2 (binary - building i non-building) sourceDubai
: 6 source- Building: #3C1098
- Land (unpaved area): #8429F6
- Road: #6EC1E4
- Vegetation: #FEDD3A
- Water: #E2A929
- Unlabeled: #9B9B9B
Aerial Drone
: 20 (tree, gras, other vegetation, dirt, gravel, rocks, water, paved area, pool, person, dog, car, bicycle, roof, wall, fence, fence-pole, window, door, obstacle) source
Warning
It looks like, there are actually 23 classes.
UAVid
: 8 source- building: living houses, garages, skyscrapers, security booths, and buildings under construction.
- road: road or bridge surface that cars can run on legally. Parking lots are not included.
- tree: tall trees that have canopies and main trunks.
- low vegetation: grass, bushes and shrubs.
- static car: cars that are not moving, including static buses, trucks, automobiles, and tractors. Bicycles and motorcycles are not included.
- moving car: cars that are moving, including moving buses, trucks, automobiles, and tractors. Bicycles and motorcycles are not included.
- human: pedestrians, bikers, and all other humans occupied by different activities.
- clutter: all objects not belonging to any of the classes above.
INRIA
train
: 180 (labels present, need to manually split into train and val)test
: 144 (no labels)
Dubai
train
: 72 (labels present, need to manually split into train and val)
Aerial Drone
train
: 400 (labels present, need to manually split into train and val)
UAVid
train
: 200val
: 70test
: 10
- repos
- datasets
- kaggle solutions
- libraries
- papers