Improved Diversity Measures #256

PaulWAyers · 2024-09-01T14:21:50Z

I am not 100% satisfied with the diversity measures we currently have in place.

One other diversity measure we could implement would be the data volume: given the pairwise distances, the Cayley-Menger determinant gives the volume of the data. Clearly the volume is nonzero only if the dimension of the feature vectors is greater than the number of data points. When this is not true (or even if it is) this reference suggests forming a kernelized Gramian matrix with elements

$$ g_{ij} = e^{-\gamma d_{ij}^2} $$

with the induced distance squared,

$$ D_{ij}^2 = 2 - 2g_{ij} $$

These capture data volume, but do not capture "holes" in the data. A simple way to capture holes in the data is to look at the maximum gap between points that are included,

$$ div = \max{i,j} d_{ij} $$

or we can look at all the points, and find the one which is most distant from all others,

$$ div = \max_i \min{j} d_{ij} $$

The latter measure is directly optimized by the MaxMin algorithm.

FanwangM self-assigned this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Diversity Measures #256

Improved Diversity Measures #256

PaulWAyers commented Sep 1, 2024

Improved Diversity Measures #256

Improved Diversity Measures #256

Comments

PaulWAyers commented Sep 1, 2024