Course Machine Learning Fundamentals Chapter 2 Difficulty beginner Estimated Time 90 min

Chapter 2: Regression Analysis

Regression Analysis in Machine Learning Fundamentals.

67% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Explain the core machine-learning ideas behind Regression Analysis.
  • Connect Regression Analysis to practical model-building workflows.
  • Recognize common assumptions, pitfalls, and evaluation choices.

Chapter 2: Regression Analysis Mastery

From linear relationships to complex polynomial modeling with mathematical foundations

Learning Objectives

  • Master linear regression theory and mathematical foundations
  • Understand polynomial regression and feature engineering
  • Learn multiple regression with feature importance analysis
  • Evaluate models using proper metrics (MSE, MAE, R²)
  • Recognize and prevent overfitting in regression models
  • Apply regularization techniques (Ridge, Lasso)

What is Regression?

Core Concept and Mathematical Foundation

Regression Analysis is a supervised learning technique used to predict continuous numerical values by modeling the relationship between input features and target variables.

🎯 The Fundamental Equation:

y = f(X) + ε
  • y: Target variable (what we want to predict)
  • f(X): The function we want to learn
  • X: Input features (independent variables)
  • ε: Error term (noise and unmeasured factors)

Real-World Regression Examples:

Real Estate Pricing

Predict: House price

Features: Size, location, bedrooms, age

Why Linear: Generally, larger houses cost more

Stock Market Analysis

Predict: Stock price movement

Features: Trading volume, market indicators

Challenge: Non-linear, highly volatile

Weather Forecasting

Predict: Tomorrow's temperature

Features: Today's weather, pressure, humidity

Complexity: Seasonal patterns, non-linear trends

Linear Regression: The Foundation

Mathematical Deep Dive

Simple Linear Regression Formula:

y = β₀ + β₁x + ε
  • β₀ (Beta Zero): Y-intercept - value when x = 0
  • β₁ (Beta One): Slope - change in y per unit change in x
  • x: Independent variable (feature)
  • y: Dependent variable (target)
  • ε: Random error term

Key Assumptions of Linear Regression:

1️⃣ Linearity

The relationship between X and y is linear

Check: Scatter plots, residual plots

2️⃣ Independence

Observations are independent of each other

Important for time series and spatial data

3️⃣ Homoscedasticity

Constant variance of residuals

Check: Residuals vs fitted values plot

4️⃣ Normality

Residuals are normally distributed

Check: Q-Q plots, Shapiro-Wilk test

How Linear Regression Works - The Math Behind the Magic:

Ordinary Least Squares (OLS) Method:

Linear regression finds the best line by minimizing the sum of squared residuals:

Objective Function:

Minimize: Σ(yᵢ - ŷᵢ)²

Where ŷᵢ = β₀ + β₁xᵢ (predicted value)

The Solution (for simple linear regression):

Slope (β₁):

β₁ = Σ((xᵢ - x̄)(yᵢ - ȳ)) / Σ((xᵢ - x̄)²)

Intercept (β₀):

β₀ = ȳ - β₁x̄
Intuition: The slope tells us the correlation scaled by the ratio of standard deviations. The intercept ensures the line passes through the point (x̄, ȳ).

Multiple Linear Regression

Extending to Multiple Features

Multiple Regression Formula:

y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε

Or in matrix form: y = Xβ + ε

  • p: Number of features
  • βⱼ: Coefficient for feature xⱼ
  • X: Design matrix (n × p matrix)
  • β: Parameter vector

Feature Importance and Interpretation:

Coefficient Interpretation:
  • Magnitude: Larger |βⱼ| means more influence on prediction
  • Sign: Positive β increases y, negative β decreases y
  • Units: βⱼ represents change in y per unit change in xⱼ
⚠️ Important Caveat: Coefficients represent the effect of changing one feature while holding all others constant. In practice, features are often correlated!

Multicollinearity: When Features are Too Similar

❌ Problems with Highly Correlated Features:
  • Unstable coefficient estimates
  • Difficult to interpret individual feature importance
  • High variance in predictions
  • Numerical instability in matrix inversion
Detection Methods:
  • Correlation Matrix: Look for correlations > 0.8
  • Variance Inflation Factor (VIF): VIF > 10 indicates problems
  • Condition Number: > 30 suggests multicollinearity
Solutions:
  • Remove highly correlated features
  • Use Principal Component Analysis (PCA)
  • Apply regularization (Ridge, Lasso)
  • Collect more data if possible

️ Polynomial Regression: Capturing Non-Linear Relationships

Beyond Straight Lines

🔄 Polynomial Transformation:

Polynomial regression extends linear regression by adding polynomial features:

Degree 2 (Quadratic):

y = β₀ + β₁x + β₂x² + ε

Degree 3 (Cubic):

y = β₀ + β₁x + β₂x² + β₃x³ + ε

General Form:

y = β₀ + β₁x + β₂x² + ... + βₐxᵈ + ε
Key Insight: Polynomial regression is still linear in the parameters β! We just transform the features.

️ The Bias-Variance Tradeoff

Underfitting (High Bias)
  • Model too simple
  • Cannot capture underlying pattern
  • Poor performance on both training and test data
  • Solution: Increase model complexity
Overfitting (High Variance)
  • Model too complex
  • Memorizes training data noise
  • Good training, poor test performance
  • Solution: Reduce complexity or add data
Sweet Spot: Find the optimal degree that minimizes total error = bias² + variance + noise

Choosing the Right Polynomial Degree

Practical Guidelines:
  • Degree 1: Linear relationship
  • Degree 2: One curve (parabola) - good for many real-world phenomena
  • Degree 3-4: More complex curves with multiple turns
  • Degree >5: Usually overfitting unless you have lots of data
Selection Methods:
  1. Cross-Validation: Test different degrees, pick best CV score
  2. Learning Curves: Plot training vs validation error
  3. Information Criteria: AIC, BIC balance fit and complexity
  4. Domain Knowledge: Physics/business understanding of relationship
Pro Tip: Start simple (degree 1-2) and increase complexity only if validation performance improves!

Regression Evaluation Metrics

Measuring Model Performance

Essential Regression Metrics:

1️⃣ Mean Squared Error (MSE)
MSE = (1/n) Σ(yᵢ - ŷᵢ)²

Pros: Heavily penalizes large errors

Cons: Same units as y², hard to interpret

Use when: Large errors are especially bad

2️⃣ Root Mean Squared Error (RMSE)
RMSE = √MSE

Pros: Same units as y, interpretable

Cons: Still penalizes large errors heavily

Use when: You want MSE benefits with interpretability

3️⃣ Mean Absolute Error (MAE)
MAE = (1/n) Σ|yᵢ - ŷᵢ|

Pros: Robust to outliers, easy to interpret

Cons: Doesn't distinguish small vs large errors

Use when: You have outliers or all errors are equally bad

4️⃣ R-squared (R²)
R² = 1 - (SS_res / SS_tot)

Range: 0 to 1 (higher is better)

Interpretation: % of variance explained

Caveat: Can be misleading with non-linear relationships

Which Metric to Use?

  • RMSE: Most common, good for normally distributed errors
  • MAE: When you have outliers or skewed error distribution
  • R²: For understanding model explanatory power
  • Multiple metrics: Always use several metrics for complete picture!

️ Regularization: Preventing Overfitting

Ridge and Lasso Regression

Why Regularization?

When we have many features or polynomial terms, the model can become too complex and overfit. Regularization adds a penalty term to prevent this.

General Regularized Objective:

Minimize: MSE + λ × Penalty(β)

Where λ (lambda) controls the strength of regularization

Ridge Regression (L2)

Penalty = Σβⱼ²
Characteristics:
  • Shrinks coefficients toward zero
  • Keeps all features (no feature selection)
  • Good when all features are somewhat relevant
  • Handles multicollinearity well
Best for: Many relevant features

Lasso Regression (L1)

Penalty = Σ|βⱼ|
Characteristics:
  • Can set coefficients exactly to zero
  • Automatic feature selection
  • Produces sparse models
  • Good when only some features are relevant
Best for: Feature selection needed

️ Choosing λ (Regularization Strength):

  • λ = 0: No regularization (standard regression)
  • Small λ: Light penalty, close to unregularized
  • Large λ: Heavy penalty, coefficients shrink toward zero
  • λ → ∞: All coefficients approach zero (underfitting)
Selection Method: Use cross-validation to find optimal λ that minimizes validation error!

Key Takeaways and Best Practices

✅ Chapter 2 Mastery:

• Linear regression mathematical foundations and assumptions

• Multiple regression with feature importance interpretation

• Polynomial regression for non-linear relationships

• Comprehensive evaluation metrics (MSE, RMSE, MAE, R²)

• Overfitting detection and regularization techniques

• Practical model selection and validation strategies

🎓 Practical Guidelines for Regression Success:

  1. Always start simple: Begin with linear regression before trying polynomial
  2. Check assumptions: Plot residuals to verify linearity and homoscedasticity
  3. Handle multicollinearity: Use correlation matrices and VIF
  4. Use multiple metrics: Don't rely on R² alone
  5. Validate properly: Use cross-validation for model selection
  6. Consider regularization: Especially with many features or limited data
  7. Understand your domain: Let business knowledge guide feature engineering