Skip to content
Krishna Somandepalli edited this page Jan 22, 2019 · 41 revisions

Multimodal Representation Learning using Deep Multiset Canonical Correlation Analysis

Table of contents

  1. Introduction to CCA, GCCA and MCCA
    i. Summary
    ii. CCA
    iii. CCA for multiple datasets
  2. Proposed method: Deep MCCA
    i. Formulation
    ii. Algorithm
  3. Performance Evaluation : Affinity measures
  4. Synthetic data experiments
    i. Data generation
    ii. Effect of batch size and embedding dimensions
    iii. Comparison with a supervised DNN model
  5. Real world data experiments
    i. Baseline methods
    ii. Noisy MNIST - Latin script
    iii. Noisy MNIST - Bangla script
    iv. Cross-dataset - Train on MNIST, test on Bangla

Introduction - Canonical correlation variants : CCA, GCCA and MCCA

Summary

We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an extension to representation learning using CCA when the underlying signal is observed across multiple (more than two) modalities. We use deep learning framework to learn non-linear transformations from different modalities to a shared subspace such that the representations maximize the ratio of between- and within-modality covariance of the observations. Unlike linear discriminant analysis, we do not need class information to learn these representations, and we show that this model can be trained for complex data using mini-batches. Using synthetic data experiments, we show that dMCCA can effectively recover the common signal across the different modalities corrupted by multiplicative and additive noise. We also analyze the sensitivity of our model to recover the correlated components with respect to mini-batch size and dimension of the embeddings. Performance evaluation on noisy handwritten datasets shows that our model outperforms other CCA-based approaches and is comparable to deep neural network models trained end-to-end on this dataset.

CCA, Generalized CCA and Multiset CCA

CCA finds the projection space that maximizes the correlation between two datasets. Its mathematical formulation and prevalent extensions to it are described below:

drawing

Proposed method : Deep Multiset CCA

drawing

Deep MCCA formulation

alt text

dMCCA Algorithm

alt text

Performance Evaluation : Affinity measures

drawing

Synthetic data experiments

In order to evaluate that dMCCA is learning highly correlated components, we generate synthetic observations where the number of common signal components across the different modalities is known. Because the source signal is given, we can build a supervised deep learning model to reconstruct the source signal which provides an empirical upper-bound of performance in our experiments.

Data generation

drawing

Effect of batch size and embedding dimension

drawing

We first analyzed the performance of dMCCA algorithm by varying embedding dimension and mini-batch size. Figure above shows the reconstruction affinity measure R_a and the inter-set affinity measure R_s for these parameters. Notice that the maximum R_a is achieved for the embedding dimension of 10 (which is the number of correlated components used to generate the data) indicating that the dMCCA retains some notion of the ambient dimension for maximizing correlation between modalities. The R_s measure consistently decreased with increasing embedding dimension. Because we estimate covariances in the loss function and use SGD with mini-batches for optimization, we also examine the performance with varying batch sizes. As shown in figure above a mini-batch size greater than 400 gives consistent results. Additionally, these measures were comparable when we used tanh activation function for the layers.

Comparison with a supervised DNN model

We compared the performance of our system with an empirical upper-bound obtained by training a DNN to reconstruct the source signals from the input observations. As shown in Table below, and as expected -- the R_a measure for our system is lower than the supervised system. However, the affinity between the modalities for the embeddings from dMCCA is higher than that of the supervised system. This is perhaps the benefit of modelling the between-set covariance over just minimizing the reconstruction error that is common to many deep representation learning methods, as well as DGCCA.

drawing

Real world data experiments

Baseline Experiments

alt text

Noisy MNIST

Dataset - 3 modalities

drawing

Performance Comparison

alt text

Clustering performance

drawing

Noisy MNIST - Bangla script

Dataset - 3 modalities

drawing

Performance Comparison

alt text

Cross-data generalizability - Train on MNIST, test on Bangla

Because the noise types in NMNIST and Bangla digits are similar, we test the performance evaluation for a classification task on the model trained with NMNIST, but evaluate on Bangla digits.

alt text

Clone this wiki locally