You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some innocent queries can generate thousands of non-wildcard norm queries.
E.g. #I #love #humans. #I is equivalent to 10 words, #love to 36, and #humans to 11 for a total of 3960 non-wildcard norm queries.
Each of those non-wildcard norm queries requires a BigHashMap lookup, so 1 disk access. The problem is that all of those are done sequentially. This isn't very fast even with IO caches.
The solution would be to use AIO (posix) to do these IO ops concurrently. We already do that with the phrase corpus.
Just to illustrate how slow sequential access is: We have to do between 100 and 1000 random accesses per regex query. A typical HDD spins with about 100 revolutions per second. So one (uncached) regex query will need 50 to 500 revolutions or 0.5s to 5s just for IO access. Doing everything asynchronously could reduce this to just a few revolutions for a speedup of at least 10x.
The text was updated successfully, but these errors were encountered:
Some innocent queries can generate thousands of non-wildcard norm queries.
E.g.
#I #love #humans
.#I
is equivalent to 10 words,#love
to 36, and#humans
to 11 for a total of 3960 non-wildcard norm queries.Each of those non-wildcard norm queries requires a BigHashMap lookup, so 1 disk access. The problem is that all of those are done sequentially. This isn't very fast even with IO caches.
The solution would be to use AIO (posix) to do these IO ops concurrently. We already do that with the phrase corpus.
Just to illustrate how slow sequential access is: We have to do between 100 and 1000 random accesses per regex query. A typical HDD spins with about 100 revolutions per second. So one (uncached) regex query will need 50 to 500 revolutions or 0.5s to 5s just for IO access. Doing everything asynchronously could reduce this to just a few revolutions for a speedup of at least 10x.
The text was updated successfully, but these errors were encountered: