Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated HDBSCAN parameter selection #5

Open
danellecline opened this issue May 22, 2024 · 0 comments
Open

Automated HDBSCAN parameter selection #5

danellecline opened this issue May 22, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@danellecline
Copy link
Contributor

Picking parameters can be challenging.

This paper has an interesting automated choice for DBSCAN.

Something like the following could be used to find epsilon for a given minpts/k (untested)

def calculate_eps_minpts(data, k):
    # Fit NearestNeighbors model
    neighbors = NearestNeighbors(n_neighbors=k)
    neighbors_fit = neighbors.fit(data)

    # Compute the k-distance for each point
    distances, indices = neighbors_fit.kneighbors(data)

    # Take the k-th nearest distance (i.e., the k-distance)
    k_distances = distances[:, k - 1]

    # Sort the k-distances in ascending order
    k_distances = np.sort(k_distances)

    # Plotting the k-distances to visualize the knee
    plt.plot(k_distances)
    plt.xlabel("Points sorted by distance")
    plt.ylabel(f"Distance to {k}-th nearest neighbor")
    plt.title(f"{k}-distance Graph")
    plt.show()

    # Use KneeLocator to find the knee point
    kneedle = KneeLocator(range(len(k_distances)), k_distances, curve="convex", direction="increasing")

    # The knee point corresponds to the optimal epsilon
    eps = k_distances[kneedle.knee]

    return eps
@danellecline danellecline added the enhancement New feature or request label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant