This project focuses on exploratory data analysis of the Zomato dataset. The dataset provides various details about restaurants, including location, cuisines, pricing, ratings, and more. Through this analysis, we aim to uncover key insights that can inform business decisions or form the basis for predictive models.
-
Data Cleaning and Preprocessing:
- Loaded and cleaned the Zomato dataset to handle missing values, remove duplicates, and standardize formats.
- Key columns analyzed:
City
,Cuisines
,Average Cost for two
,Aggregate rating
,Votes
,Price range
, and more.
-
Geographical Insights:
- City-Level Analysis: Analyzed the distribution of restaurants across cities and identified top cities by restaurant count and average ratings.
- Locality Analysis: Explored restaurant distribution and ratings by localities within cities.
-
Cuisine Analysis:
- Popular Cuisines: Identified the most popular cuisines in the dataset based on the frequency of occurrences.
- Cuisine vs Ratings: Investigated how different cuisines correlate with average ratings.
-
Price and Cost Analysis:
- Price Range Distribution: Visualized the distribution of restaurants across different price ranges.
- Cost Analysis by City: Compared the average cost for two across different cities.
- Price Range vs Rating: Analyzed the relationship between price range and aggregate ratings.
-
Booking and Delivery Services:
- Table Booking: Analyzed the impact of table booking availability on restaurant ratings.
- Online Delivery: Explored how the availability of online delivery services affects restaurant ratings.
-
Votes and Ratings Relationship:
- Votes vs Ratings: Investigated the relationship between the number of votes a restaurant receives and its aggregate rating.
- Top Cities by Restaurant Count: Cities like Delhi, Bengaluru, and Mumbai have the highest number of restaurants in the dataset.
- Top Cities by Average Ratings: Smaller cities like Pune and Hyderabad tend to have higher average ratings compared to larger metro areas.
- Locality-Based Analysis: Certain localities within cities stand out for their high concentration of highly-rated restaurants.
- Most Popular Cuisines: North Indian, Chinese, and Italian are among the most popular cuisines offered by restaurants in the dataset.
- Cuisine and Ratings: Fine-dining cuisines such as Italian and Mediterranean tend to have higher average ratings compared to fast food or street food options.
- Price Range Distribution: Most restaurants fall into the mid-range price bracket, with a few high-end and low-end outliers.
- Cost by City: Cities like Mumbai and Delhi have higher average costs for two, whereas smaller cities like Pune have relatively lower costs.
- Price vs Rating: There is a noticeable trend where higher-priced restaurants tend to have better ratings, indicating that price may be a reflection of quality.
- Table Booking vs Ratings: Restaurants that offer table booking tend to have higher average ratings compared to those that don't.
- Online Delivery vs Ratings: Restaurants offering online delivery also show a trend of higher ratings, possibly indicating customer preference for convenience.
- Votes Correlation: There is a positive correlation between the number of votes and ratings, suggesting that more popular restaurants (in terms of customer engagement) tend to have higher ratings.
The analysis includes several visualizations to help uncover patterns and trends:
- Bar charts showing restaurant distribution by city and locality.
- Scatter plots and pair plots to visualize relationships between variables like cost, votes, and ratings.
- Heatmaps to show correlations between numerical variables.
This exploratory data analysis of the Zomato dataset has provided valuable insights into the relationship between location, cuisine, pricing, and customer feedback. These insights can be used for strategic decision-making, marketing initiatives, or as features for machine learning models in predictive tasks such as rating prediction or customer segmentation.
- Sentiment Analysis: If customer reviews are available, sentiment analysis could provide further insights into customer satisfaction.
- Recommendation System: Build a recommendation system based on user preferences and restaurant attributes.
- Clustering Analysis: Use clustering techniques to group similar restaurants and understand key characteristics of different clusters.
- Python: For data analysis and visualization.
- Pandas: For data manipulation and aggregation.
- Seaborn & Matplotlib: For data visualization.
- Jupyter Notebook: For conducting and documenting the analysis.
Zomato_EDA.ipynb
: Jupyter notebook containing the entire analysis and code implementation.README.md
: Project overview, insights, and future directions.
Feel free to explore the notebook for detailed analysis and visualizations!