Skip to content

Commit

Permalink
added EDA
Browse files Browse the repository at this point in the history
  • Loading branch information
rohit-chandra committed Jan 21, 2024
1 parent 762ff4c commit d6f701d
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 3 deletions.
57 changes: 54 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,59 @@
# Machine Learning School
## Penguin Classification

Main aim of this project to is implement end-to-end ML pipelines on AWS sagemaker :target:.


## 1: Training Pipeline

- In this session we’ll run Exploratory Data Analysis on the [Penguins dataset](https://www.kaggle.com/datasets/parulpandey/palmer-archipelago-antarctica-penguin-data) and we’ll build a simple [SageMaker Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-sdk.html) with one step to split and transform the data.

<p align="left">
<img src="program/images/training.png"/>
</p>

- We’ll use a Scikit-Learn Pipeline for the transformations, and a Processing Step with a SKLearnProcessor to execute a preprocessing script. Check the SageMaker Pipelines Overview for an introduction to the fundamental components of a SageMaker Pipeline.

### Step 1: EDA

- Let’s run Exploratory Data Analysis on the dataset. The goal of this section is to understand the data and the problem we are trying to solve.

- Let’s load the Penguins dataset:

```
import pandas as pd
import numpy as np
penguins = pd.read_csv(DATA_FILEPATH)
penguins.head()
```
<p align="left">
<img src="program/images/eda1.PNG"/>
</p>

- We can see the dataset contains the following columns:

- `species`: The species of a penguin. This is the column we want to predict.
- `island`: The island where the penguin was found
- `culmen_length_mm`: The length of the penguin’s culmen (bill) in millimeters
- `culmen_depth_mm`: The depth of the penguin’s culmen in millimeters
- `flipper_length_mm`: The length of the penguin’s flipper in millimeters
- `body_mass_g`: The body mass of the penguin in grams
- `sex`: The sex of the penguin

- If you are curious, here is the description of a penguin’s culmen:

<p align="left">
<img src="program/images/culmen.jpeg"/>
</p>








This repository contains the source code of the [Machine Learning School](https://www.ml.school) program.

If you find any problems with the code or have any ideas on improving it, please open an issue and share your recommendations.

## Running the code

Expand Down
Binary file added program/images/eda1.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d6f701d

Please sign in to comment.