-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch detection for a specific cluster or cut_distance #660
Comments
What you want is certainly not available by default. I would suggest you reach out to @JelmerBot, the author of the BranchDetector, to see if you can work out how best to implement what you need. |
What you could do now is clone and partially overwrite the hdbscan object with labels from a def clone_with_labels(clusterer, labels, probabilities=None):
from copy import copy
clone = copy(clusterer) # Shallow copy
# used to know which point is in which cluster
clone.labels_ = labels.astype(clusterer.labels_.dtype)
# used for weighted centroid computation
if probabilities is not None:
clone.probabilities_ = probabilities.astype(clusterer.probabilities_.dtype)
else:
clone.probabilities_ = np.ones_like(clusterer.probabilities_)
# used to read the number of clusters
clone.cluster_persistence_ = [None for _ in range(clone.labels_.max() + 1)]
return clone Assuming branch_detector = BranchDetector(
#parameters
).fit(clone_with_labels(clusterer, cut_labels)) or result = detect_branches_in_clusters(
clone_with_labels(clusterer, cut_labels),
# other parameters
) This approach is not very clean or robust. The cloned object has references into the original hdbscan object. Changing the clone can change the original object. That can lead to subtle bugs if one is not careful. Facilitating this use case properly in the branch detection API could be fairly straight forward. We could add a |
Hello,
I am currently trying to combine HDBSCAN with BranchDetector and I am not sure if the current implementation supports my use case. I use HDBSCAN
single_linkage_tree
to find all lambda values and then I select a smaller subset of these values as possible thresholds for the clusteringcut_distance
parameter. It works great and I am able to get a list of labels for a given lambda value, but I would like to find the branching clusters that are available for the givencut_distance
as well.Both
BranchDetector
anddetect_branches_in_clusters
seem to work on the whole model of HDBSCAN which lack the described flexibility and I would like to avoid calculating HDBSCAN for eachcut_distance
from the start. Alternatively, I would like to find branches for a specific cluster in subtree rather than passing the whole model, but I guess it might not be possible due to the nature of the algorithm.Please let me know how I can do it, because maybe I am missing some parameter in the documentation that is crucial. Thanks!
The text was updated successfully, but these errors were encountered: