Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes in the basic structure #339

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,237 changes: 2,237 additions & 0 deletions DataSet/WineQuality.ipynb

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
title: Clean Blog
email: your-email@example.com
description: A Blog Theme by Start Bootstrap
author: Start Bootstrap
baseurl: "/startbootstrap-clean-blog-jekyll"
url: "https://startbootstrap.github.io"
title: Data Lab
email: diazaguirrejohanna@gmail.com
description: "Analytics and customized solutions for your business or project. <br> by Johanna Diaz Aguirre "
author: "Start Bootstrap"
baseurl: ""
url: "https://Cjohanna.github.io"

# Social Profiles
twitter_username: SBootstrap
github_username: StartBootstrap
facebook_username: StartBootstrap
twitter_username:
github_username:
facebook_username:
instagram_username:
linkedin_username:

Expand Down
1 change: 1 addition & 0 deletions _includes/head.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

<link href='https://fonts.googleapis.com/css?family=Lora:400,700,400italic,700italic' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800' rel='stylesheet' type='text/css'>


<script src="https://use.fontawesome.com/releases/v5.15.3/js/all.js" crossorigin="anonymous"></script>

Expand Down
2 changes: 1 addition & 1 deletion _includes/navbar.html
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
<a class="nav-link" href="{{"/about" | relative_url }}">About</a>
</li>
<li class="nav-item">
<a class="nav-link" href="{{ "/posts" | relative_url }}">Posts</a>
<a class="nav-link" href="{{ "/posts" | relative_url }}">Projects</a>
</li>
<li class="nav-item">
<a class="nav-link" href="{{"/contact" | relative_url }}">Contact</a>
Expand Down
9 changes: 0 additions & 9 deletions _layouts/home.html
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,6 @@ <h3 class="post-subtitle">{{ post.subtitle }}</h3>
<h3 class="post-subtitle">{{ post.excerpt | strip_html | truncatewords: 15 }}</h3>
{% endif %}
</a>
<p class="post-meta">Posted by
{% if post.author %}
{{ post.author }}
{% else %}
{{ site.author }}
{% endif %}
on
{{ post.date | date: '%B %d, %Y' }} &middot; {% include read_time.html content=post.content %}
</p>
</article>

<hr>
Expand Down
5 changes: 1 addition & 4 deletions _layouts/post.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@ <h1>{{ page.title }}</h1>
{% if page.subtitle %}
<h2 class="subheading">{{ page.subtitle }}</h2>
{% endif %}
<span class="meta">Posted by
<a href="#">{% if page.author %}{{ page.author }}{% else %}{{ site.author }}{% endif %}</a>
on {{ page.date | date: '%B %d, %Y' }} &middot; {% include read_time.html
content=page.content %}
<span class="meta">Posted by Johanna Diaz
</span>
</div>
</div>
Expand Down
62 changes: 62 additions & 0 deletions _posts/.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
layout: post
title: "Wine Quality Predictor"
subtitle: "Exploring wine quality through analysis and prediction"
date: 2023-06-28 23:45:13 -0400
background: '/img/posts/03.jpg'
---

<p>This project aims to apply Machine Learning techniques to develop predictive models that can predict the quality of wine based on its physicochemical attributes. To achieve this, visualization and exploratory analysis techniques will be explored to understand the distribution of the attributes and their impact on prediction. Additionally, the crucial role of data preprocessing and transformation in the modeling phase will be highlighted.</p>

<h2 class="section-heading">Description of Dataset</h2>

<p><strong>Source:</strong> This dataset was created by P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. The "Wine Quality" dataset is available in the UCI Machine Learning Repository.</p>

<p>The "Wine Quality" dataset is commonly used in classification and regression problems to predict wine quality. It was collected to evaluate the quality of white and red wines based on physicochemical attributes. The inputs were gathered through physicochemical tests, and the output is based on sensory data obtained from evaluations conducted by wine experts. Each expert rated the wine quality on a scale from 0 (very poor) to 10 (excellent).</p>

<p> <strong> Attribute Description:</strong>The "Wine Quality" dataset consists of instances of white and red wines, with a total of 11 numerical attributes. These attributes include characteristics such as acidity, sugar levels, sulfates, and alcohol content. The wine quality is represented by a categorical target variable. </p>

<blockquote class="blockquote">Input Attributes (x):</blockquote>
<ul>
<li>Fixed acidity (Acidez fija): g/dm³ (float)</li>
<li>Volatile acidity (Acidez volátil): g/dm³ (float)</li>
<li>Citric acid (Ácido cítrico): g/dm³ (float)</li>
<li>Residual sugar (Azúcar residual): g/dm³ (float)</li>
<li>Chlorides (Cloruros): g/dm³ (float)</li>
<li>Free sulfur dioxide (Dióxido de azufre libre): mg/dm³ (float)</li>
<li>Total sulfur dioxide (Dióxido de azufre total): mg/dm³ (float)</li>
<li>Density (Densidad): g/cm³ (float)</li>
<li>pH: (float)</li>
<li>Sulphates (Sulfatos): g/dm³ (float)</li>
<li>Alcohol (Alcohol): % vol (float)</li>
</ul>

<blockquote class="blockquote">Output Attributes(y):</blockquote>
<ul>
<li>Wine Quality: It is evaluated on a discrete scale of 0 to 10, where a higher value indicates better quality</li>
</ul>
<h2 class="section-heading">Exploratory Data Analysis</h2>
<img class="img-fluid" src="https://images.unsplash.com/photo-1542903660-eedba2cda473?auto=format&fit=crop&q=80&w=2070&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" alt="Demo Image">
<P>Before diving into the predictive modeling, an initial step is to perform an Exploratory Data Analysis (EDA). For this purpose, you can load the dataset into your working environment, such as Python with Pandas, and execute the following tasks:</P>

<p>This repository contains the results of a comprehensive analysis of a white wine dataset provided by the reference researchers. The analysis involved three key stages: data preprocessing, modeling and evaluation, and performance improvement strategies.</p>

<p> <strong> Data Preprocessing</strong></p>
<p>In the data preprocessing phase, we applied essential techniques, such as feature scaling using StandardScaler. This ensured that all features contributed equally to the modeling process, thus avoiding the dominance of certain attributes over others.</p>

<p> <strong> Modeling</strong></p>
<p>In the modeling stage, we deliberately selected four specific algorithms: Decision Tree, Random Forest, SVM, and K-NN. The choice of these algorithms was based on their demonstrated ability to address classification problems in complex datasets. It is important to note that, although some algorithms were shared with the reference study, the methodologies differed in aspects such as hyperparameter tuning and how class imbalance was addressed.</p>

<p> <strong> Evaluation</strong></p>
<p>The reference study provided results that served as a starting point for model evaluation. It was observed that, despite achieving acceptable levels of accuracy, class imbalance affected the model's ability to generalize to minority classes. This finding was crucial in determining the need to implement an oversampling strategy.</p>

<p> <strong> Performance Improvement</strong></p>
<p>The oversampling strategy was the key element in significantly improving the model's performance. By increasing the number of instances in the minority classes, the class distribution was balanced, allowing the model to learn more effectively from all classes. This resulted in a noticeable increase in accuracy on the test set, reaching an impressive 92%.</p>

<p> <strong> Conclusion</strong></p>
<p>While both methodologies shared the use of some common algorithms, they differed in their approach to addressing class imbalance and the specific model configurations. This distinction is fundamental and highlights the importance of adapting methodologies to the unique characteristics of each dataset and problem.</p>

<p>For more details, refer to the complete analysis and code in this repository.</p>
<a href="https://colab.research.google.com/drive/1CpCsCjCLkUBLazwJL5NQSS70D6kBgIbr?authuser=1" target="_blank" class="btn btn-primary">Open in Google Colab</a>

<p>Photographs by <a href="https://unsplash.com/">Unsplash</a>.</p>
40 changes: 0 additions & 40 deletions _posts/2020-01-26-dinosaurs.html

This file was deleted.

39 changes: 0 additions & 39 deletions _posts/2020-01-27-dreams.html

This file was deleted.

40 changes: 0 additions & 40 deletions _posts/2020-01-28-exploration.html

This file was deleted.

Loading