Percolator is much slower than in ES1, and pre-selecting do not work #16285
Labels
bug
Something isn't working
Performance
This is for any performance related enhancements or bugs
Search:Performance
Search
Search query, autocomplete ...etc
What is the bug?
We have been trying to migrate from Elasticsearch version 1.7.6 to the latest version (8.15) in our company and discovered that the latest version has become much slower. To find the reason for this degradation, I conducted several experiments. During these experiments, I found that some claimed improvements likely do not work as expected. I have duplicated this issue from elasticseaerch repository, because I found the same problem in Opensearch. I'm sure that problem migrated when Opensearch was forked.
Experiment details
I created the following index mapping:
I filled index with 10 000 duplicated queries, which contain only must, should, term and range conditions
You can see, that there are two main conditions inside must: the first is simple, and the second a bit more complex. Logically, there is no reason to check the second condition if the first one is false. However, my experiments showed that if the first condition is false for a document, adding conditions inside should (the second condition) increases the percolation time. Therefore, I conclude that the improvements claimed in this article https://www.elastic.co/blog/elasticsearch-percolator-continues-to-evolve do not work.
I ran the percolation with the following request:
As a result, I got the following percolation time with one document: ~0.157 seconds.
I conducted a similar experiment on Elasticsearch version 1.7.6 with identical data, and the result was: ~0.008 seconds, which is ! ~20x faster.
We also tried percolating with real production data. The only improvement we saw was when we added additional filters with the percolate query by using metadata, which we extracted from the primary query. For example, we took the query mentioned above and added metadata (
meta_data.category
field).Then I sent the following request:
But this approach has a disadvantage. It becomes more difficult to percolate a large batch. If I need to percolate many documents, I have to separate them by the category field, resulting in smaller batches. This negates the improvement of percolating many documents in one query. I also tried using named percolation (the
name
field in the percolate query) and made a query with a few percolate queries inside (one for each category), but this approach did not have any advantage compared to separate requests (the percolation time was the same).In general, extracting metadata and adding additional filters for this metadata seems like unnecessary work, forcing us to maintain those extra filters. It seems that the search engine should handle such optimizations itself. I suspect this is the "pre-selecting" feature.
Python scripts for experiments (python 3.12): scripts
Conclusion
Currently, percolation with queries, even simple filters, performs significantly slower than in older versions of Elasticsearch. It seems likely that the latest version lacks the pre-selecting optimization, or it is not functioning correctly. Alternatively, I might have missed something, and it can be enabled. I would appreciate any help you can provide to resolve this problem.
The text was updated successfully, but these errors were encountered: