-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some point view for DirectedSphereExclusion selested number relate to #152 #159
Comments
A possibility could be decreasing |
@xychem Hello, This is Aditi. Can I give it a try? Can you please provide me more detail on it? |
@FanwangM can you help @Aditish51 with this? |
Hi, Aditi. I hope these information below can help you! structure of softwareWhen you run DSE example in the notebook, the order of functions which are called is below.
structure of DSE algorithmThe idea of the DSE is that (code is in algorithm in DiverseSelector/methods/partition.py)
# calculate distance of all samples from reference sample; distance is a (n_samples,) array
distances = scipy.spatial.minkowski_distance(X[self.ref_index], X, p=self.p)
# find index of all samples within radius of sample idx (this includes the sample
# index itself)
index_exclude = kdtree.query_ball_point(
X[idx], self.r, eps=self.eps, p=self.p, workers=-1
)
# exclude samples within radius r of sample idx (measure by Minkowski p-norm) from
# future consideration by setting their bitarray value to 1
for index in index_exclude:
bv[index] = 1
if len(selected) > max_size:
return selected problemWhen the r is larger, we will select fewer points (in which their distances between each other are larger than r). When the r is smaller, we can select more points (in which their distances between each other are larger than r). As above, one significant thing is to optimize r(radius) to get a proper r (the number of the points (in which their distances between each other are larger than r) is equal to what we want) which is coded in DiverseSelector/methods/utils.py. This issue is about there exists some situations that in special r, some points will be "degenerate" (quoting @PaulWAyers). We can see the list above when r > 1.919372827, the selected number = 3; when r one solutionWhat I thought is as same as @marco-2023 , I droped the last selected samples after the iteration. However I think the different selected samples maybe cause different consequence (more or less) when selecting small samples (like in the notebook, we just select 4 points in each cluster, the weight of one sample is larger, so the consequence may be different). The code below (in DiverseSelector/methods/utils.py) is one way to drop last selected samples (I think it's not good because the if condition in the while loop, which cause larger caclulation). I think you can directly drop the last selected samples by using array.pop() out of the while loop and in this way, you can also drop special selected samples, not just last one. while (len(selected) < lower_size or len(selected) > upper_size) and (n_iter < obj.n_iter+1):
# change sphere radius based on the defined bound
if bounds[1] == np.inf:
# make sphere radius larger by a factor of 2
obj.r = bounds[0] * 2
else:
# make sphere radius smaller by a factor of 1/2
obj.r = (bounds[0] + bounds[1]) / 2
# re-select samples with the new radius
if n_iter < obj.n_iter:
selected = obj.algorithm(X, upper_size)
# the selected number is sensitive to r
else:
selected = obj.algorithm(X,size-1)
# adjust lower/upper bounds of radius range
if len(selected) > size:
bounds[0] = obj.r
else:
bounds[1] = obj.r
n_iter += 1 |
Thanks @xychem !! |
In the DirectedSphereExclusion method, the setected number is related to r(radius). When r is larger, we will get fewer molecules; otherwise we will get more molecules. The function optimize_radius of utils.py is used to optimize r through iteration. ( When selected number is larger, we decrease r; otherwise we can increase r. )
But in the case which we choose 12 points in 3 clusters by using DirectedSphereExclusion, the setected number is sensitive to r which causes the oscillation of selected number. We can see when r > 1.919372827, the selected number = 3; when r$\leq$ 1.919372826, the selected number = 5. ( Which means existing two points which are "close" enough. )
The previous situation (11 points)
The present situation (13 points)
The text was updated successfully, but these errors were encountered: