The project includes Python scripts for various tasks such as importing data, preprocessing, partitioning data, computing similarities, training models, and making predictions. It consists of two main parts:
-
Game Prediction:
- Predict whether a user has played a particular game.
- Build models, including logistic regression, for game prediction.
- Evaluate model performance and test on provided data.
-
Hours Played Prediction:
- Predict the number of hours a user has played a game.
- Perform preprocessing steps and define functions for iteration.
- Train the model and make predictions on test data.
Data/
train.json.gz
pairs_Played.csv
pairs_Hours.csv
README.md
assignment1.py
predictions_Played.csv
predictions_Hours.csv
-
Data Preparation:
- Place the provided data files (
train.json.gz
,pairs_Played.csv
,pairs_Hours.csv
) in theData/
directory.
- Place the provided data files (
-
Running the Code:
- Execute the
assignment1.py
script to run the entire analysis pipeline. - Ensure all required libraries are installed (
gzip
,scipy
,sklearn
,numpy
, etc.).
- Execute the
-
Viewing Results:
- The predictions for game plays (
predictions_Played.csv
) and hours played (predictions_Hours.csv
) will be generated. - Explore the results and evaluate model performance based on accuracy metrics.
- The predictions for game plays (