Chapter 2: Regression Analysis Mastery
From linear relationships to complex polynomial modeling with mathematical foundations
Learning Objectives
- Master linear regression theory and mathematical foundations
- Understand polynomial regression and feature engineering
- Learn multiple regression with feature importance analysis
- Evaluate models using proper metrics (MSE, MAE, R²)
- Recognize and prevent overfitting in regression models
- Apply regularization techniques (Ridge, Lasso)
What is Regression?
Core Concept and Mathematical Foundation
Regression Analysis is a supervised learning technique used to predict continuous numerical values by modeling the relationship between input features and target variables.
🎯 The Fundamental Equation:
- y: Target variable (what we want to predict)
- f(X): The function we want to learn
- X: Input features (independent variables)
- ε: Error term (noise and unmeasured factors)
Real-World Regression Examples:
Real Estate Pricing
Predict: House price
Features: Size, location, bedrooms, age
Why Linear: Generally, larger houses cost more
Stock Market Analysis
Predict: Stock price movement
Features: Trading volume, market indicators
Challenge: Non-linear, highly volatile
Weather Forecasting
Predict: Tomorrow's temperature
Features: Today's weather, pressure, humidity
Complexity: Seasonal patterns, non-linear trends
Linear Regression: The Foundation
Mathematical Deep Dive
Simple Linear Regression Formula:
- β₀ (Beta Zero): Y-intercept - value when x = 0
- β₁ (Beta One): Slope - change in y per unit change in x
- x: Independent variable (feature)
- y: Dependent variable (target)
- ε: Random error term
Key Assumptions of Linear Regression:
1️⃣ Linearity
The relationship between X and y is linear
Check: Scatter plots, residual plots
2️⃣ Independence
Observations are independent of each other
Important for time series and spatial data
3️⃣ Homoscedasticity
Constant variance of residuals
Check: Residuals vs fitted values plot
4️⃣ Normality
Residuals are normally distributed
Check: Q-Q plots, Shapiro-Wilk test
How Linear Regression Works - The Math Behind the Magic:
Ordinary Least Squares (OLS) Method:
Linear regression finds the best line by minimizing the sum of squared residuals:
Objective Function:
Where ŷᵢ = β₀ + β₁xᵢ (predicted value)
The Solution (for simple linear regression):
Slope (β₁):
Intercept (β₀):
Multiple Linear Regression
Extending to Multiple Features
Multiple Regression Formula:
Or in matrix form: y = Xβ + ε
- p: Number of features
- βⱼ: Coefficient for feature xⱼ
- X: Design matrix (n × p matrix)
- β: Parameter vector
Feature Importance and Interpretation:
Coefficient Interpretation:
- Magnitude: Larger |βⱼ| means more influence on prediction
- Sign: Positive β increases y, negative β decreases y
- Units: βⱼ represents change in y per unit change in xⱼ
Multicollinearity: When Features are Too Similar
❌ Problems with Highly Correlated Features:
- Unstable coefficient estimates
- Difficult to interpret individual feature importance
- High variance in predictions
- Numerical instability in matrix inversion
Detection Methods:
- Correlation Matrix: Look for correlations > 0.8
- Variance Inflation Factor (VIF): VIF > 10 indicates problems
- Condition Number: > 30 suggests multicollinearity
Solutions:
- Remove highly correlated features
- Use Principal Component Analysis (PCA)
- Apply regularization (Ridge, Lasso)
- Collect more data if possible
️ Polynomial Regression: Capturing Non-Linear Relationships
Beyond Straight Lines
🔄 Polynomial Transformation:
Polynomial regression extends linear regression by adding polynomial features:
Degree 2 (Quadratic):
Degree 3 (Cubic):
General Form:
️ The Bias-Variance Tradeoff
Underfitting (High Bias)
- Model too simple
- Cannot capture underlying pattern
- Poor performance on both training and test data
- Solution: Increase model complexity
Overfitting (High Variance)
- Model too complex
- Memorizes training data noise
- Good training, poor test performance
- Solution: Reduce complexity or add data
Choosing the Right Polynomial Degree
Practical Guidelines:
- Degree 1: Linear relationship
- Degree 2: One curve (parabola) - good for many real-world phenomena
- Degree 3-4: More complex curves with multiple turns
- Degree >5: Usually overfitting unless you have lots of data
Selection Methods:
- Cross-Validation: Test different degrees, pick best CV score
- Learning Curves: Plot training vs validation error
- Information Criteria: AIC, BIC balance fit and complexity
- Domain Knowledge: Physics/business understanding of relationship
Regression Evaluation Metrics
Measuring Model Performance
Essential Regression Metrics:
1️⃣ Mean Squared Error (MSE)
Pros: Heavily penalizes large errors
Cons: Same units as y², hard to interpret
Use when: Large errors are especially bad
2️⃣ Root Mean Squared Error (RMSE)
Pros: Same units as y, interpretable
Cons: Still penalizes large errors heavily
Use when: You want MSE benefits with interpretability
3️⃣ Mean Absolute Error (MAE)
Pros: Robust to outliers, easy to interpret
Cons: Doesn't distinguish small vs large errors
Use when: You have outliers or all errors are equally bad
4️⃣ R-squared (R²)
Range: 0 to 1 (higher is better)
Interpretation: % of variance explained
Caveat: Can be misleading with non-linear relationships
Which Metric to Use?
- RMSE: Most common, good for normally distributed errors
- MAE: When you have outliers or skewed error distribution
- R²: For understanding model explanatory power
- Multiple metrics: Always use several metrics for complete picture!
️ Regularization: Preventing Overfitting
Ridge and Lasso Regression
Why Regularization?
When we have many features or polynomial terms, the model can become too complex and overfit. Regularization adds a penalty term to prevent this.
General Regularized Objective:
Where λ (lambda) controls the strength of regularization
Ridge Regression (L2)
Characteristics:
- Shrinks coefficients toward zero
- Keeps all features (no feature selection)
- Good when all features are somewhat relevant
- Handles multicollinearity well
Lasso Regression (L1)
Characteristics:
- Can set coefficients exactly to zero
- Automatic feature selection
- Produces sparse models
- Good when only some features are relevant
️ Choosing λ (Regularization Strength):
- λ = 0: No regularization (standard regression)
- Small λ: Light penalty, close to unregularized
- Large λ: Heavy penalty, coefficients shrink toward zero
- λ → ∞: All coefficients approach zero (underfitting)
Key Takeaways and Best Practices
✅ Chapter 2 Mastery:
• Linear regression mathematical foundations and assumptions
• Multiple regression with feature importance interpretation
• Polynomial regression for non-linear relationships
• Comprehensive evaluation metrics (MSE, RMSE, MAE, R²)
• Overfitting detection and regularization techniques
• Practical model selection and validation strategies
🎓 Practical Guidelines for Regression Success:
- Always start simple: Begin with linear regression before trying polynomial
- Check assumptions: Plot residuals to verify linearity and homoscedasticity
- Handle multicollinearity: Use correlation matrices and VIF
- Use multiple metrics: Don't rely on R² alone
- Validate properly: Use cross-validation for model selection
- Consider regularization: Especially with many features or limited data
- Understand your domain: Let business knowledge guide feature engineering