-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically generate topics and keywords #47
Comments
It may be possible to bootstrap a learning corpus with this list of topics: https://github.com/github/explore. |
A low tech way for projects published to a package manager that supports keywords would be to pull the existing ones from the I did experiment with pulling interesting words from readmes and descriptions in the Libraries.io codebase using a ruby library called highscore but removed it a while back as the result we're great and it was pretty slow to be running as part of the critical path inside the rails app, main code was here: https://github.com/librariesio/libraries.io/blob/7a15048fe7135052dc3ac9383d13833b5cb1f85b/app/models/readme.rb#L75-L79 |
Yeah, I already do that for projects which have manifests. I'm trying to think of a better way to extract. I think not using the entire README - just the description and background sections - should help. I'm going to make a package now to automatically cross-check with topics from github/explore. Might be a solution while we don't have an API for suggesting topics yet from GitHub. Thanks for the help! Slowness isn't an issue for me, this will be pretty fast I think. |
I've started work on this, here: Katahdin. |
This will involve a couple of things. First, parsing the README. Second, finding the Description or Background section. Then, either topic extraction or NER of that information, with the goal of seeing if you can automatically suggest topics for the README.
For now, noun phrases may do the trick, in the description, for suggestions. This would be greatly aided by a test database of repositories and topics, however.
The text was updated successfully, but these errors were encountered: