Chapter 2: Regression Analysis Mastery

From linear relationships to complex polynomial modeling with mathematical foundations

Learning Objectives

Master linear regression theory and mathematical foundations
Understand polynomial regression and feature engineering
Learn multiple regression with feature importance analysis
Evaluate models using proper metrics (MSE, MAE, R²)
Recognize and prevent overfitting in regression models
Apply regularization techniques (Ridge, Lasso)

What is Regression?

Core Concept and Mathematical Foundation

Regression Analysis is a supervised learning technique used to predict continuous numerical values by modeling the relationship between input features and target variables.

🎯 The Fundamental Equation:

y = f(X) + ε

y: Target variable (what we want to predict)
f(X): The function we want to learn
X: Input features (independent variables)
ε: Error term (noise and unmeasured factors)

Real-World Regression Examples:

Real Estate Pricing

Predict: House price

Features: Size, location, bedrooms, age

Why Linear: Generally, larger houses cost more

Stock Market Analysis

Predict: Stock price movement

Features: Trading volume, market indicators

Challenge: Non-linear, highly volatile

Weather Forecasting

Predict: Tomorrow's temperature

Features: Today's weather, pressure, humidity

Complexity: Seasonal patterns, non-linear trends

Linear Regression: The Foundation

Mathematical Deep Dive

Simple Linear Regression Formula:

y = β₀ + β₁x + ε

β₀ (Beta Zero): Y-intercept - value when x = 0
β₁ (Beta One): Slope - change in y per unit change in x
x: Independent variable (feature)
y: Dependent variable (target)
ε: Random error term

Key Assumptions of Linear Regression:

1️⃣ Linearity

The relationship between X and y is linear

Check: Scatter plots, residual plots

2️⃣ Independence

Observations are independent of each other

Important for time series and spatial data

3️⃣ Homoscedasticity

Constant variance of residuals

Check: Residuals vs fitted values plot

4️⃣ Normality

Residuals are normally distributed

Check: Q-Q plots, Shapiro-Wilk test

How Linear Regression Works - The Math Behind the Magic:

Ordinary Least Squares (OLS) Method:

Linear regression finds the best line by minimizing the sum of squared residuals:

Objective Function:

Minimize: Σ(yᵢ - ŷᵢ)²

Where ŷᵢ = β₀ + β₁xᵢ (predicted value)

The Solution (for simple linear regression):

Slope (β₁):

β₁ = Σ((xᵢ - x̄)(yᵢ - ȳ)) / Σ((xᵢ - x̄)²)

Intercept (β₀):

β₀ = ȳ - β₁x̄

Intuition: The slope tells us the correlation scaled by the ratio of standard deviations. The intercept ensures the line passes through the point (x̄, ȳ).

Multiple Linear Regression

Extending to Multiple Features

Multiple Regression Formula:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε

Or in matrix form: y = Xβ + ε

p: Number of features
βⱼ: Coefficient for feature xⱼ
X: Design matrix (n × p matrix)
β: Parameter vector

Feature Importance and Interpretation:

Coefficient Interpretation:

Magnitude: Larger |βⱼ| means more influence on prediction
Sign: Positive β increases y, negative β decreases y
Units: βⱼ represents change in y per unit change in xⱼ

⚠️ Important Caveat: Coefficients represent the effect of changing one feature while holding all others constant. In practice, features are often correlated!

Multicollinearity: When Features are Too Similar

❌ Problems with Highly Correlated Features:

Unstable coefficient estimates
Difficult to interpret individual feature importance
High variance in predictions
Numerical instability in matrix inversion

Detection Methods:

Correlation Matrix: Look for correlations > 0.8
Variance Inflation Factor (VIF): VIF > 10 indicates problems
Condition Number: > 30 suggests multicollinearity

Solutions:

Remove highly correlated features
Use Principal Component Analysis (PCA)
Apply regularization (Ridge, Lasso)
Collect more data if possible

️ Polynomial Regression: Capturing Non-Linear Relationships

Beyond Straight Lines

🔄 Polynomial Transformation:

Polynomial regression extends linear regression by adding polynomial features:

Degree 2 (Quadratic):

y = β₀ + β₁x + β₂x² + ε

Degree 3 (Cubic):

y = β₀ + β₁x + β₂x² + β₃x³ + ε

General Form:

y = β₀ + β₁x + β₂x² + ... + βₐxᵈ + ε

Key Insight: Polynomial regression is still linear in the parameters β! We just transform the features.

️ The Bias-Variance Tradeoff

Underfitting (High Bias)

Model too simple
Cannot capture underlying pattern
Poor performance on both training and test data
Solution: Increase model complexity

Overfitting (High Variance)

Model too complex
Memorizes training data noise
Good training, poor test performance
Solution: Reduce complexity or add data

Sweet Spot: Find the optimal degree that minimizes total error = bias² + variance + noise

Choosing the Right Polynomial Degree

Practical Guidelines:

Degree 1: Linear relationship
Degree 2: One curve (parabola) - good for many real-world phenomena
Degree 3-4: More complex curves with multiple turns
Degree >5: Usually overfitting unless you have lots of data

Selection Methods:

Cross-Validation: Test different degrees, pick best CV score
Learning Curves: Plot training vs validation error
Information Criteria: AIC, BIC balance fit and complexity
Domain Knowledge: Physics/business understanding of relationship

Pro Tip: Start simple (degree 1-2) and increase complexity only if validation performance improves!

Regression Evaluation Metrics

Measuring Model Performance

Essential Regression Metrics:

1️⃣ Mean Squared Error (MSE)

MSE = (1/n) Σ(yᵢ - ŷᵢ)²

Pros: Heavily penalizes large errors

Cons: Same units as y², hard to interpret

Use when: Large errors are especially bad

2️⃣ Root Mean Squared Error (RMSE)

RMSE = √MSE

Pros: Same units as y, interpretable

Cons: Still penalizes large errors heavily

Use when: You want MSE benefits with interpretability

3️⃣ Mean Absolute Error (MAE)

MAE = (1/n) Σ|yᵢ - ŷᵢ|

Pros: Robust to outliers, easy to interpret

Cons: Doesn't distinguish small vs large errors

Use when: You have outliers or all errors are equally bad

4️⃣ R-squared (R²)

R² = 1 - (SS_res / SS_tot)

Range: 0 to 1 (higher is better)

Interpretation: % of variance explained

Caveat: Can be misleading with non-linear relationships

Which Metric to Use?

RMSE: Most common, good for normally distributed errors
MAE: When you have outliers or skewed error distribution
R²: For understanding model explanatory power
Multiple metrics: Always use several metrics for complete picture!

️ Regularization: Preventing Overfitting

Ridge and Lasso Regression

Why Regularization?

When we have many features or polynomial terms, the model can become too complex and overfit. Regularization adds a penalty term to prevent this.

General Regularized Objective:

Minimize: MSE + λ × Penalty(β)

Where λ (lambda) controls the strength of regularization

Ridge Regression (L2)

Penalty = Σβⱼ²

Characteristics:

Shrinks coefficients toward zero
Keeps all features (no feature selection)
Good when all features are somewhat relevant
Handles multicollinearity well

Best for: Many relevant features

Lasso Regression (L1)

Penalty = Σ|βⱼ|

Characteristics:

Can set coefficients exactly to zero
Automatic feature selection
Produces sparse models
Good when only some features are relevant

Best for: Feature selection needed

️ Choosing λ (Regularization Strength):

λ = 0: No regularization (standard regression)
Small λ: Light penalty, close to unregularized
Large λ: Heavy penalty, coefficients shrink toward zero
λ → ∞: All coefficients approach zero (underfitting)

Selection Method: Use cross-validation to find optimal λ that minimizes validation error!

Key Takeaways and Best Practices

✅ Chapter 2 Mastery:

• Linear regression mathematical foundations and assumptions

• Multiple regression with feature importance interpretation

• Polynomial regression for non-linear relationships

• Comprehensive evaluation metrics (MSE, RMSE, MAE, R²)

• Overfitting detection and regularization techniques

• Practical model selection and validation strategies

🎓 Practical Guidelines for Regression Success:

Always start simple: Begin with linear regression before trying polynomial
Check assumptions: Plot residuals to verify linearity and homoscedasticity
Handle multicollinearity: Use correlation matrices and VIF
Use multiple metrics: Don't rely on R² alone
Validate properly: Use cross-validation for model selection
Consider regularization: Especially with many features or limited data
Understand your domain: Let business knowledge guide feature engineering

Chapter 2: Regression Analysis Mastery

Learning Objectives

What is Regression?

Core Concept and Mathematical Foundation

🎯 The Fundamental Equation:

Real-World Regression Examples:

Real Estate Pricing

Stock Market Analysis

Weather Forecasting

Linear Regression: The Foundation

Mathematical Deep Dive

Simple Linear Regression Formula:

Key Assumptions of Linear Regression:

1️⃣ Linearity

2️⃣ Independence

3️⃣ Homoscedasticity

4️⃣ Normality

How Linear Regression Works - The Math Behind the Magic:

Ordinary Least Squares (OLS) Method:

The Solution (for simple linear regression):

Multiple Linear Regression

Extending to Multiple Features

Multiple Regression Formula:

Feature Importance and Interpretation:

Coefficient Interpretation:

Multicollinearity: When Features are Too Similar

❌ Problems with Highly Correlated Features:

Detection Methods:

Solutions:

️ Polynomial Regression: Capturing Non-Linear Relationships

Beyond Straight Lines

🔄 Polynomial Transformation:

️ The Bias-Variance Tradeoff

Underfitting (High Bias)

Overfitting (High Variance)

Choosing the Right Polynomial Degree

Practical Guidelines:

Selection Methods:

Regression Evaluation Metrics

Measuring Model Performance

Essential Regression Metrics:

1️⃣ Mean Squared Error (MSE)

2️⃣ Root Mean Squared Error (RMSE)

3️⃣ Mean Absolute Error (MAE)

4️⃣ R-squared (R²)

Which Metric to Use?

️ Regularization: Preventing Overfitting

Ridge and Lasso Regression

Why Regularization?

Ridge Regression (L2)

Characteristics:

Lasso Regression (L1)

Characteristics:

️ Choosing λ (Regularization Strength):

Key Takeaways and Best Practices

🎓 Practical Guidelines for Regression Success:

Hands-On Python Implementation

Linear Regression from Scratch

Complete Boston Housing Example

Linear Regression Implementation

Multiple Linear Regression

Polynomial Regression

Regularization: Ridge and Lasso

Model Selection with GridSearchCV

Congratulations!