This repository serves as a hub for tools and resources related to classifying occupations using the Occupational Information Network (ONET) taxonomy. The ONET taxonomy provides a standardized framework for categorizing and organizing occupational information, making it a valuable resource for various applications such as workforce development, career guidance, and labor market analysis.
Install all the project dependencies by running the following command in the project's root directory:
poetry install
You can use the following command to activate the virtual environment
poetry shell
Note: Make sure you have Poetry installed on your system. If not, you can install it using:
pip install poetry
Ensure you have the following API keys before stored in the .env file:
- PINECONE_ENVIRONMENT
- PINECONE_API_KEY
- OPEN_AI_API_KEY
- OPEN_AI_API_TYPE
- OPEN_AI_API_VERSION
- OPEN_AI_ENDPOINT
Ensure you have the following folders created: checkpoints, data.
Ensure you have test_data.csv and train_data.csv files in the data folder.
You would need to run to create the checkpoint of the model, the embeddings files and the label encoder checkpoint:
python src/main.py
If you already have the mentioned above parts, you can just run this to obtain predictions:
python src/predict.py
For more details of the implementation you may check "MEMO.md" file
If you encounter any issues with the open ai version, run this:
pip install openai==0.28
If you encounter issues with the relative path, run this:
export PYTHONPATH=$PWD