Skip to content

Data format for single‐cell representation learning

Ziwen Liu edited this page Oct 25, 2024 · 1 revision

The training data for single cell representation learning consists of images and tracking results. Specifically, viscy.data.triplet.TripletDataModule requires the data formats described as follows.

Images

The images should be stored in HCS OME-Zarr v0.4 format. See iohub documentation for instructions to write them. An example dataset can be found here.

Tracking

Tracking is done per-FOV with Ultrack, which produces segmentation results as arrays, and tracking results as tables.

VisCy expects an HCS OME-Zarr store with additional metadata, where the arrays are segmentation labels with FOV names consistent with the image arrays, and each FOV should have the tracking table in a CSV file at the same level as the FOV metadata. The directory tree should look like this:

tracks.zarr
├── 0
│   ├── 3
│   │   ├── 000002
│   │   │   ├── 0
│   │   │   ├── tracks_0_3_000002.csv
│   │   │   ├── .zattrs
│   │   │   └── .zgroup
│   │   ├── 001000
│   │   │   ├── 0
│   │   │   ├── tracks_0_3_001000.csv
│   │   │   ├── .zattrs
│   │   │   └── .zgroup
│   │   ├── .zattrs
│   │   └── .zgroup
│   ├── 6
│   │   ├── 000002
│   │   │   ├── 0
│   │   │   ├── tracks_0_6_000002.csv
│   │   │   ├── .zattrs
│   │   │   └── .zgroup
│   │   ├── 001000
│   │   │   ├── 0
│   │   │   ├── tracks_0_6_001000.csv
│   │   │   ├── .zattrs
│   │   │   └── .zgroup
│   │   ├── .zattrs
│   │   └── .zgroup
│   └── .zgroup
├── .zattrs
└── .zgroup

An example dataset can be found here.

Clone this wiki locally