Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use non-blocking IO for non-wildcard norm queries #14

Open
RunDevelopment opened this issue Aug 30, 2021 · 0 comments
Open

Use non-blocking IO for non-wildcard norm queries #14

RunDevelopment opened this issue Aug 30, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@RunDevelopment
Copy link
Member

RunDevelopment commented Aug 30, 2021

Some innocent queries can generate thousands of non-wildcard norm queries.

E.g. #I #love #humans. #I is equivalent to 10 words, #love to 36, and #humans to 11 for a total of 3960 non-wildcard norm queries.

Each of those non-wildcard norm queries requires a BigHashMap lookup, so 1 disk access. The problem is that all of those are done sequentially. This isn't very fast even with IO caches.

The solution would be to use AIO (posix) to do these IO ops concurrently. We already do that with the phrase corpus.


Just to illustrate how slow sequential access is: We have to do between 100 and 1000 random accesses per regex query. A typical HDD spins with about 100 revolutions per second. So one (uncached) regex query will need 50 to 500 revolutions or 0.5s to 5s just for IO access. Doing everything asynchronously could reduce this to just a few revolutions for a speedup of at least 10x.

@RunDevelopment RunDevelopment added the enhancement New feature or request label Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant