You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not 100% satisfied with the diversity measures we currently have in place.
One other diversity measure we could implement would be the data volume: given the pairwise distances, the Cayley-Menger determinant gives the volume of the data. Clearly the volume is nonzero only if the dimension of the feature vectors is greater than the number of data points. When this is not true (or even if it is) this reference suggests forming a kernelized Gramian matrix with elements
$$
g_{ij} = e^{-\gamma d_{ij}^2}
$$
with the induced distance squared,
$$
D_{ij}^2 = 2 - 2g_{ij}
$$
These capture data volume, but do not capture "holes" in the data. A simple way to capture holes in the data is to look at the maximum gap between points that are included,
$$
div = \max{i,j} d_{ij}
$$
or we can look at all the points, and find the one which is most distant from all others,
$$
div = \max_i \min{j} d_{ij}
$$
The latter measure is directly optimized by the MaxMin algorithm.
The text was updated successfully, but these errors were encountered:
I am not 100% satisfied with the diversity measures we currently have in place.
One other diversity measure we could implement would be the data volume: given the pairwise distances, the Cayley-Menger determinant gives the volume of the data. Clearly the volume is nonzero only if the dimension of the feature vectors is greater than the number of data points. When this is not true (or even if it is) this reference suggests forming a kernelized Gramian matrix with elements
with the induced distance squared,
These capture data volume, but do not capture "holes" in the data. A simple way to capture holes in the data is to look at the maximum gap between points that are included,
or we can look at all the points, and find the one which is most distant from all others,
The latter measure is directly optimized by the MaxMin algorithm.
The text was updated successfully, but these errors were encountered: