- Exploratory Data Analysis
Exploratory Data Analysis (EDA) is the process of examining and summarizing a dataset to understand its characteristics, identify patterns, and make informed decisions. It involves calculating summary statistics, visualizing data through plots and charts, identifying missing or inconsistent data, and exploring relationships between variables. EDA provides insights that guide further analysis and decision-making.
EDA helps determine how best to manipulate data sources to get the answers you need, making it easier for data scientists to discover patterns, spot anomalies, test a hypothesis, or check assumptions.
Performing EDA involves the following concise steps:
1.Understand the Data: Get familiar with the dataset's structure, variables, and potential missing values or inconsistencies.
2.Clean the Data: Handle missing values, erroneous data and duplicates appropriately.
3.Data Transformation: Rename confusing columns for better readability and drop Irrelevant Columns.
4.Calculate Summary Statistics: Compute basic statistics like mean, median, and standard deviation for numeric variables, and frequency counts for categorical variables. 5.Visualize the Data: Create plots such as histograms, box plots, and scatter plots to visualize the data distribution, outliers, and relationships between variables.
6.Analyze Relationships: Identify correlations between numeric variables and visualize them using correlation matrices or scatter plots.
7.Identify Outliers and Anomalies: Spot unusual observations that deviate significantly from the norm.
8.Handle Categorical Variables: Analyze categorical variables using bar plots or pie charts to understand category distributions.
9.Iterate and Explore: Continuously explore the data, generate hypotheses, and delve deeper into specific aspects for further analysis.
We have a data-set of cars which contains more of 10,000 rows and more than 10 columns which contains features of the car such as Engine Fuel Type, Engine HP, Transmission Type, highway MPG, city MPG and many more.