Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 811 Bytes

README.md

File metadata and controls

18 lines (12 loc) · 811 Bytes

paice_method

This jupyter notebook extends the concept of Paice method of evaluating stemmers for search application. This aproach introduces a weight for comparing overstemming (OI) / understermming (UI) scores. Additional weight is calculated based on word stats from application vocabulary. img.png

This approach has the advantage with a metric for the quality of a stemmer sensitive to application for search.

The second advantage of proposed method is that the effects of words in large concept groups(for example with all verbal forms) do not dominate the results. The weight of each word is equal.

Contents

  • stemmers.py: Different stemming classes that extends base Stemmer.
  • solr_client.py: client to solr with basic API for analysis.

Example

report1.png