Cameroon Air Quality Prediction - AutoGluon

Introduction

This study focuses on predicting air quality in Cameroon, specifically the concentration of particulate matter (PM2.5), using various machine learning techniques. The dataset includes weather and air quality features collected from different cities across Cameroon.

Leaderboard Achievement

Here’s a snapshot of the position on the leaderboard during the Cameroon Air Quality Prediction competition, showing the score in 5th place at the end of the competition. After further model improvements, the 1st-place score was surpassed.

Methodology

The analysis began with an exploration of the dataset, where data inconsistencies were addressed. Features with a single value, such as 'sunrise', 'sunset', and 'snowfall_sum', were removed. Redundant variables, including city, longitude, and latitude, were also eliminated to reduce unnecessary complexity in the models.

Feature Engineering

Enhancing predictive power involved analyzing the distribution of PM2.5 concentrations across different cities, leading to the creation of a new feature:

Distance from Bafoussam, the city with the highest PM2.5 levels.

Models

Several machine learning models were initially employed to predict PM2.5 levels, including:

CatBoost
LightGBM (LGBM)
XGBoost (XGB)
GradientBoostingRegressor
ExtraTreesRegressor
RandomForestRegressor
AdaBoostRegressor
MLPRegressor

These models were evaluated using a 9-split RepeatedKFold cross-validation strategy to ensure reliable results.

However, after initial testing, AutoGluon was introduced and ultimately provided the best performance, surpassing all other models in predictive accuracy.

Results

Among all models tested, CatBoost performed well, achieving a root mean squared error (RMSE) of 3.11078. However, AutoGluon outperformed every other model, achieving the lowest RMSE of 2.97008.

Key Result Comparison

Model	RMSE
CatBoost	3.11078
AutoGluon	2.97008

This result demonstrates a significant improvement over the other models, indicating the superior predictive capabilities of AutoGluon for this particular task.

While other models, such as CatBoost and ExtraTrees, provided competitive results, AutoGluon’s automatic model selection and hyperparameter tuning led to the best performance, further validating its effectiveness as an AutoML tool for air quality prediction.

Conclusion

This study highlights the critical role of feature engineering in improving model performance, as well as the superiority of AutoGluon over other machine learning models for the task of predicting PM2.5 concentrations. AutoGluon’s automated approach to model selection and optimization resulted in a more accurate prediction, achieving the best RMSE score and outperforming traditional models like CatBoost and XGBoost.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
images		images
.gitignore		.gitignore
README.md		README.md
autogluon_cameroon_air_quality.ipynb		autogluon_cameroon_air_quality.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cameroon Air Quality Prediction - AutoGluon

Introduction

Leaderboard Achievement

Methodology

Feature Engineering

Models

Results

Key Result Comparison

Conclusion

About

Releases

Packages

Languages

drkbluescience/AutoGluon_Cameroon_Air_Quality

Folders and files

Latest commit

History

Repository files navigation

Cameroon Air Quality Prediction - AutoGluon

Introduction

Leaderboard Achievement

Methodology

Feature Engineering

Models

Results

Key Result Comparison

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages