Add upper and lower max chunk size limits to ChunkingSettings #115130
+115
−42
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
We currently do not have an upper limit the max chunk size provided by the user in either word or sentence boundary chunking settings. The lower limit is currently 1 which is not a useful chunk size to set. To provide more reasonable limits to the users this change introduces a lower limit of 10 words for word based strategies and 20 words for sentence based strategies as well as an upper limit of 300 for word and sentence based strategies.
Testing
Note: There is still an outstanding question of whether we want to allow 0 overlap on word based chunking settings. We currently force the user to provide a positive integer so 0 is not an option. We should hold off on merging this PR until we decide on this.