This repository contains the code for the paper "Contrastive Pre-Training and Multiple Instance Learning for Predicting Tumor Microsatellite Instability" (EMBC 2024). The project focuses on enhancing microsatellite instability (MSI) prediction in Whole Slide Image (WSI) analysis of gastrointestinal cancers through a two-stage weakly supervised methodology.
We propose a framework that integrates Multiple Instance Learning (MIL) with a Contrastive Clustering Network (CCNet) for feature extraction. The method leverages the synergy of these approaches to significantly improve MSI classification accuracy, surpassing existing methods in the field.
The model was trained and evaluated on the Colorectal Cancer (CRC) and Stomach Adenocarcinoma (STAD) datasets, with the performance assessed using AUROC and F1 Score metrics.
Dataset | Folder | # of WSIs (Train) | # of WSIs (Test) | # of Patches (Train) | # of Patches (Test) |
---|---|---|---|---|---|
CRC | MSI | 39 | 26 | 46,704 | 29,335 |
CRC | MSS | 221 | 74 | 46,704 | 70,569 |
STAD | MSI | 35 | 25 | 50,285 | 27,904 |
STAD | MSS | 150 | 74 | 50,285 | 90,104 |
The datasets can be accessed here.
The code is split into two folders: Section 1 focuses on Feature Extractor Training and Feature Extraction. Section 2 deals with Multiple Instance Learning Classifiers and constructing bags using the feature vectors.
You can start the training process by running:
python train.py
Once the training is completed, there will be a saved model in the "model_path" specified in arguments. To perform extraction with the trained model, run
python extractor.py
You can start the 5-fold classification process by running:
python classifier.py
- Python 3.8+
- PyTorch 1.9.0
- torchvision 0.11.2
- scikit-learn 0.24.2
- pandas 1.1.5
- numpy 1.19.5
- scipy 1.5.4
If you find our work useful, please consider citing our paper:
@article{nap2024,
title={Contrastive Pre-Training and Multiple Instance Learning for Predicting Tumor Microsatellite Instability},
author={Nap, Ronald and Aburidi, Mohammed and Marcia, Roummel},
journal={EMBC},
year={2024}
}