-
Notifications
You must be signed in to change notification settings - Fork 0
Pragalbhv/NLP-Project
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The following were implemented: __> Preprocessing and setting up eval metrics __> VSM model __>LSA model __>Kmeans and LDA ********************************************************************** This folder contains the template code for a search engine application. main.py - The main module that contains the outline of the Search Engine. Do not change anything in this file. util.py - An extra file where you can add any additional processing or utility functions that you may need for any of the sub-tasks. sentenceSegmentation.py, tokenization.py, inflectionReduction.py and stopwordRemoval.py - Implement the corresponding sub-tasks inside the functions in these files. More files corresponding to each sub-task will be provided as the assignment progresses, along with updated versions of main.py To test your code, run main.py with the appropriate arguments Usage: main.py [-custom] [-dataset DATASET FOLDER] [-out_folder OUTPUT FOLDER] [-segmenter SEGMENTER TYPE (naive|punkt)] [-tokenizer TOKENIZER TYPE (naive|ptb)] When the -custom flag is passed, the system will take a query from the user as input. When the flag is not passed, all the queries in the Cranfield dataset are considered, for example: > python main.py -custom > Enter query below > Papers on Aerodynamics This will generate *queries.txt files in the OUTPUT FOLDER after each stage of preprocessing of the query and *docs.txt files in the OUTPUT FOLDER after each stage of preprocessing of the documents.
About
NLP project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published