Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Diversity Measures #256

Open
PaulWAyers opened this issue Sep 1, 2024 · 0 comments
Open

Improved Diversity Measures #256

PaulWAyers opened this issue Sep 1, 2024 · 0 comments
Assignees

Comments

@PaulWAyers
Copy link
Member

I am not 100% satisfied with the diversity measures we currently have in place.

One other diversity measure we could implement would be the data volume: given the pairwise distances, the Cayley-Menger determinant gives the volume of the data. Clearly the volume is nonzero only if the dimension of the feature vectors is greater than the number of data points. When this is not true (or even if it is) this reference suggests forming a kernelized Gramian matrix with elements

$$ g_{ij} = e^{-\gamma d_{ij}^2} $$

with the induced distance squared,

$$ D_{ij}^2 = 2 - 2g_{ij} $$

These capture data volume, but do not capture "holes" in the data. A simple way to capture holes in the data is to look at the maximum gap between points that are included,

$$ div = \max{i,j} d_{ij} $$

or we can look at all the points, and find the one which is most distant from all others,

$$ div = \max_i \min{j} d_{ij} $$

The latter measure is directly optimized by the MaxMin algorithm.

@FanwangM FanwangM self-assigned this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants