In this module, we analyze the normalized training data features from 3.normalize_data.
We use UMAP for analyis of features. UMAP was introduced in McInnes, L, Healy, J, 2018 as a manifold learning technique for dimension reduction. We use UMAP to reduce the feature data into 1 and 2 dimensions. We use Matplotlib to visualize the 1D and 2D UMAPS.
For each reduction with UMAP, we create two types of visualizations. The first visualization colors all points by their phenotypic class. The second visualization colors points for only certain phenotypic classes, with all other phenotypic classes being colored gray.
Note: Phenotypic classes colored in second visualization can be changed with the classes_2
variable in analyze_data.ipynb.
Use the commands below to analyze training data.
# Make sure you are located in 4.analyze_data
cd 4.analyze_data
# Activate mitocheck_data conda environment
conda activate mitocheck_data
# Analyze data
bash analyze_data.sh