You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The performance can be massively improved by processing the schema beforehand. All enum values and patterns should be combined to a single pattern as shown in the example below:
Actually, you iteratively append the enum values and regex patterns to a single regex and compute for every iteration the intersection between the current pattern and ".*". This is very expensive and results in bad performance (for this specific kind of schema).
I added an example json file (anyOf.json) that shows the problem. anyOf.json takes on my machine about 50-60 seconds for the result (LHS :< RHS and RHS :< LHS) when checking the file against itself (command jsonsubschema anyOf.json anyOf.json). Applying preprocessing, it takes about 0.04 seconds. I also attached a python script (smaller_anyOf.py) that contains the preprocessing. The script combines the string-enum-values and all patterns to a single pattern as shown in the example above.
Hello,
I noticed a performance problem as soon as the schema contains the following structure:
... "anyOf": [ {"enum": ["aa", "bb", "cc"]}, {"pattern": "pattern1"}, {"pattern": "pattern2"}, {"pattern": "pattern3"}, ... ] ...
The performance can be massively improved by processing the schema beforehand. All enum values and patterns should be combined to a single pattern as shown in the example below:
... "anyOf": [ {"pattern": "^aa$|^bb$|^cc$|pattern1|pattern2|pattern3"} ] ...
Actually, you iteratively append the enum values and regex patterns to a single regex and compute for every iteration the intersection between the current pattern and ".*". This is very expensive and results in bad performance (for this specific kind of schema).
I added an example json file (anyOf.json) that shows the problem. anyOf.json takes on my machine about 50-60 seconds for the result (LHS :< RHS and RHS :< LHS) when checking the file against itself (command
jsonsubschema anyOf.json anyOf.json
). Applying preprocessing, it takes about 0.04 seconds. I also attached a python script (smaller_anyOf.py) that contains the preprocessing. The script combines the string-enum-values and all patterns to a single pattern as shown in the example above.AnyOf.zip
By transforming the string-enum-values to a regex, special regex characters (e.g. ".", "-", ...) are escaped to get an identical expression as regex.
... "enum": ["ab-c"] ...
will be transformed to
... "pattern": "^ab\\-c$" ...
Be careful, this can currently lead to another problem - see #6 .
Best Regards
Michael
The text was updated successfully, but these errors were encountered: