Skip to content

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

Notifications You must be signed in to change notification settings

ZhangqiJiang07/Multi-view_Multi-class_Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 

Repository files navigation

Multi-view Multi-class Datasets

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

📄 Statistics of Datasets

📢 More information about the datasets can be found in [Google Sheets | Tencent Docs].

No. Datasets #Samples #Classes #Views Tag Reference
1 100Leaves 1,600 100 3 Plant leaf classification using probabilistic integration of shape, texture and margin features
2 Caltech101-7 1,474 7 6 imbalance Large-scale multi-view spectral clustering via bipartite graph
3 Caltech101-20 2,386 20 6 imbalance Deep Incomplete Multi-View Learning Network with Insufficient Label Information
4 Caltech101 9,144 102 6 imbalance Binary Multi-View Clustering
5 Deep Caltech101 8,677 101 2 imbalance Trusted Multi-View Classification
6 Caltech256 30,607 257 3 imbalance Auto-weighted Multi-view Clustering for Large-scale Data
7 Deep AWA_2views 10,158 50 2 imbalance Deep Partial Multi-View Learning
8 Reuters_2views 18,758 6 2 imbalance Multi-view Spectral Clustering Network
9 NoisyMNIST 70,000 10 2 Robust Multi-View Clustering With Incomplete Information
10 NoisyMNIST 30,000 10 2 Robust Multi-View Clustering With Incomplete Information
11 MNIST-USPS 5,000 10 2 Robust Multi-View Clustering With Incomplete Information
12 Scene15 4,485 15 3 Ensemble projection for semi-supervised image classification
13 Out-Scene 2,688 8 4 Deep Incomplete Multi-View Learning Network with Insufficient Label Information
14 NUS-WIDE 30,000 31 5 imbalance Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity

✨ We have collated some publicly available datasets and you can download them from Baidu Netdisk. The data format is as follows:

xxx.mat
|---gnd: matrix, double, start from 1, (sample_number, 1).
|---X: cell, (1, view_num)
|---|---X{i}: matrix, double, (sample_number, feature_dimension).

🔥 Update

  • [2024/08/12] The script for label distribution plot is uploaded label_distribution/plot_label_distribution.ipynb!
  • [2024/08/08] Create a share link to datasets we have collected from the Internet for public research. [Baidu Netdisk]

🌋 Modality Evaluation

We simply adopt the SVM as a baseline to evaluate the contribution of each modality for the classification task.

📊 Label Distribution

📢 More figures can be found in fold label_distribution!

Acknowledgements

Some datasets were downloaded from these sites, for which we are very grateful:

[1] https://github.com/liujiyuan13/mvdata

[2] https://github.com/wangsiwei2010/large_scale_multi-view_clustering_datasets

About

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

Topics

Resources

Stars

Watchers

Forks