This repository contains the code and analysis for clustering visitors to the Google Merchandise store using the Google Customer Revenue Prediction dataset from Kaggle. The goal is to gain insights into visitor behavior and segment them based on various features.
-
Channels and Affiliates: Explored the distribution of visitors across different channels and affiliates to understand the sources of traffic.
-
Visitor Clustering: Utilized k-means clustering to segment visitors based on features such as Channel Grouping, device category, and datetime variables.
-
Scaling Numerical Variables: Used the StandardScaler to scale numerical variables such as visitNumber, totals.pageviews, etc.
-
One-Hot Encoding: Encoded categorical variables to facilitate clustering.
-
K-Means Clustering: Applied k-means clustering to group visitors into distinct clusters.
-
Visualization: Created visualizations to understand the distribution of visitors within each cluster, specifically focusing on the distribution of categorical variables.
-
Cluster Interpretation: Analyzed the characteristics of each cluster to derive actionable insights into different visitor segments.
-
Categorical Variable Analysis: Explored how categorical variables contribute to the clustering and identified patterns within each cluster.
-
Data Setup: Ensure you have the Google Customer Revenue Prediction dataset available.
-
Run the Code: Execute the provided Python code to reproduce the analysis and visualizations.
-
Explore Results: Refer to the generated visualizations and analysis in Jupyter notebooks or other relevant files.
- Python 3.x
- Jupyter Notebooks
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
Feel free to reach out if you have any questions or suggestions!