# Define specific colors (same as CSS from quarto vapor theme)
= '#1b133a'
background = '#ea39b8'
pink = '#6f42c1'
purple = '#32fbe2' blue
Introduction
Regression, a fundamental concept in machine learning, enables us to model relationships between variables. Whether predicting house prices based on square footage or analyzing the impact of advertising spend on sales, regression plays a pivotal role in understanding and predicting patterns in data.
Data Visualization
I use the following colors in all of my blogs data visualizations
I will use the following plot_scatter
function to create the data visualizations on this page
import matplotlib.pyplot as plt
def plot_scatter(X, X_Range, ax_plot, title):
= plt.subplots()
fig, ax = ax.scatter(X, y, color=blue, edgecolor='black', alpha=0.6)
scatter =purple, linewidth=2)
ax.plot(X_Range, ax_plot, color=purple)
ax.set_title(title, color'Feature', color=blue)
ax.set_xlabel('Target', color=blue)
ax.set_ylabel(True, linestyle='--', alpha=0.5)
ax.grid(='x', colors=blue)
ax.tick_params(axis='y', colors=blue)
ax.tick_params(axis
ax.set_facecolor(pink)
fig.set_facecolor(background) plt.show()
Linear Regression
Linear regression is a straightforward method for modeling linear relationships between variables. In a simple linear regression model, the relationship is expressed as \(y=mx+b\) where \(y\) is the target variable, \(x\) is the feature, \(m\) is the slope, and \(b\) is the intercept. For multiple features, the equation becomes a linear combination.
Linear regression can be implemented effortlessly using Python’s scikit-learn library. Let’s generate a synthetic dataset and visualize the linear regression line:
import numpy as np
from sklearn.linear_model import LinearRegression
# Generate synthetic linear data
42)
np.random.seed(= 2 * np.random.rand(100, 1)
X = 4 + 3 * X + np.random.randn(100, 1)
y
# Train linear regression model
= LinearRegression()
linear_reg
linear_reg.fit(X, y)
# Visualize the linear regression line
= linear_reg.predict(X)
ax_plot 'Linear Regression') plot_scatter(X, X, ax_plot,
Nonlinear Regression
While linear regression is powerful, not all relationships are linear. Nonlinear regression extends the concept by accommodating more complex patterns. Polynomial regression is a common technique, where the relationship between variables is expressed as a polynomial equation.
Let’s use Python to implement nonlinear regression through polynomial regression. This time, we’ll generate a synthetic dataset with a nonlinear relationship:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# Generate synthetic nonlinear data
42)
np.random.seed(= 6 * np.random.rand(100, 1) - 3
X = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)
y
# Train nonlinear regression model (polynomial regression)
= 2
degree = make_pipeline(PolynomialFeatures(degree), LinearRegression())
poly_reg
poly_reg.fit(X, y)
# Visualize the nonlinear regression curve
= np.linspace(-3, 3, 100).reshape(-1, 1)
X_range = poly_reg.predict(X_range)
ax_plot 'Nonlinear Regression (Polynomial)') plot_scatter(X, X_range, ax_plot,
The power of regression becomes evident when we visualize the results. Scatter plots with regression lines or curves help us understand how well the model captures the underlying patterns in the data.
In conclusion, linear and nonlinear regression are indispensable tools in the machine learning toolbox. While linear regression is effective for simple relationships, nonlinear regression allows us to model more complex patterns. The ability to implement these techniques with Python makes them accessible and applicable to a wide range of real-world scenarios.