📚Chapter: — PyCaret
If you want to read more articles about Machine Learning Libraries , don’t forget to stay tuned :) click here.
Introduction
Regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable', or 'target') and one or more independent variables (often called 'features', 'predictors', or 'covariates'). The objective of regression in machine learning is to predict continuous values such as sales amount, quantity, temperature etc.
Regression analysis is a fundamental technique in machine learning used to predict numerical values based on input features. Traditionally, setting up a regression model requires extensive preprocessing, model selection, and hyperparameter tuning. However, PyCaret, an open-source low-code machine learning library, simplifies this process by automating tasks such as feature engineering, model training, and evaluation.
In this blog, we will walk through how to perform regression using PyCaret with minimal code and effort.
Why Use PyCaret for Regression?
Low-code Implementation: Reduces the amount of code needed for model building.
Automated Preprocessing: Handles missing values, feature encoding, and transformations.
Multiple Model Training: Trains and compares multiple regression models automatically.
Hyperparameter Tuning: Provides an easy way to tune model parameters.
Deployment Ready: Allows exporting trained models for production use.
1-Import library
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from prettytable import PrettyTable
from sklearn.metrics import roc_curve, auc
from mlxtend.plotting import plot_confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import warnings
warnings.filterwarnings("ignore")
#from pycaret.utils import enable_colab
#enable_colab()
2- Installing Pycaret
Before we begin, install PyCaret using pip:
#capture #suppresses the displays
# install the full version
!pip install pycaret[full]
!pip install pyyaml==5.4.1
3- Import the necessary packages
!pip install markupsafe==2.0.1
Runtime> Restart Runtime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pycaret
import jinja2
from pycaret.regression import*
4-Dataset
For this tutorial, we will use the Boston Housing Dataset, which contains various housing-related features and a target variable (MEDV, median house price).
import pandas as pd
from pycaret.regression import *
# Load dataset
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(as_frame=True)
df = data.frame
# Display first few rows
df.head()
5- Setting up Environment in PyCaret
PyCaret requires setting up the environment using the setup()
function. This function automatically handles preprocessing tasks.
# Initialize PyCaret regression setup
regression_setup = setup(data=df, target='MedHouseVal', session_id=123)
data
: The dataset to be used.target
: The column we want to predict.session_id
: Used for reproducibility.
6- Comparing All Models
This function trains all the available models in the model library using default hyperparameters and evaluates performance metrics using cross-validation. The number of folds can be defined using the foldparameter (default = 10 folds). The table is sorted (highest to lowest) by the metric of choice which can be defined using the sortparameter(in this case we have sorted it on RMSE) n_select parameter in the setup function controls the return of trained models. In this case, I am setting it to 15, meaning return the top 15 models as a list. pull function in the second line stores the output of compare_models as pd.DataFrame .
One of PyCaret’s powerful features is the compare_models()
function, which evaluates multiple models and ranks them based on performance.
compare_models()
compare_models(sort = 'RMSE')
compare_models(fold = 5)
This command automatically trains and evaluates different regression models and selects the best-performing one.
7- Select Best Model
best= compare_models(n_select = 2, sort= 'RMSE')
compare_model_result = pull()
8- Create a Model
Once the best model is identified, we can create and fine-tune it:
catboost = create_model('catboost')
print(catboost)
9- Tune a Model
tuned_dt = tune_model('catboost')
tuned_lda= tune_model(catboost, optimize='RMSE', search_library='optuna')
#tuned model object is stored in the variable 'tuned_dt'.
print(tuned_lda)
10-Plot a Model
10.1- Analyze the best model
evaluate_model(catboost)
This will display various metrics like R-squared, RMSE, and MAE.
10.2- Check the residuals of the trained model
plot_model(catboost, plot = 'residuals_interactive')
plot_model(catboost)
10.3-Check feature importance
plot_model(catboost, plot='feature')
10.4- Prediction Error Plot
plot_model(catboost, plot = 'error')
11-Predict on test / hold-out Sample
predict_model(catboost);
12- Make predictions on test data
unseen_predictions = predict_model(catboost, data=data_unseen)
unseen_predictions.head()
11- Finalize Model for Deployment
final_catboost = finalize_model(catboost)
#Final Light Gradient Boosting Machine parameters for deployment
print(final_catboost)
14-Saving the model
save_model(final_catboost,'Final Lightgbm Model 08Feb2020')
15-Loading the saved model
saved_final_catboost = load_model('Final catboost Model 08Feb2020')
Conclusion
PyCaret significantly simplifies the regression modeling process by automating various steps such as preprocessing, model comparison, tuning, and evaluation. This makes it an excellent choice for beginners and professionals who want to quickly build and deploy machine learning models.
Would you like to explore PyCaret further? Try it out with different datasets and advanced configurations!
Please Follow and 👏 Subscribe for the story courses teach to see latest updates on this story
🚀 Elevate Your Data Skills with Coursesteach! 🚀
Ready to dive into Python, Machine Learning, Data Science, Statistics, Linear Algebra, Computer Vision, and Research? Coursesteach has you covered!
🔍 Python, 🤖 ML, 📊 Stats, ➕ Linear Algebra, 👁️🗨️ Computer Vision, 🔬 Research — all in one place!
Don’t Miss Out on This Exclusive Opportunity to Enhance Your Skill Set! Enroll Today 🌟 at
Machine Learning libraries Course
🔍 Explore Tools, Python libraries for ML, Slides, Source Code, Free online Courses and More!
Stay tuned for our upcoming articles because we reach end to end ,where we will explore specific topics related to Machine Learning libraries in more detail!
Remember, learning is a continuous process. So keep learning and keep creating and Sharing with others!💻✌️
Ready to dive into data science and AI but unsure how to start? I’m here to help! Offering personalized research supervision and long-term mentoring. Let’s chat on Skype: themushtaq48 or email me at mushtaqmsit@gmail.com. Let’s kickstart your journey together!
Contribution: We would love your help in making coursesteach community even better! If you want to contribute in some courses , or if you have any suggestions for improvement in any coursesteach content, feel free to contact and follow.
Together, let’s make this the best AI learning Community! 🚀