Chapter 2: Foundation Models & Problems

Why Foundation Models Matter

Before diving into advanced techniques like XGBoost and Random Forests, we need to understand the foundation models they're built upon. These simple models reveal fundamental problems that drive innovation in machine learning.

The Two Pillars of ML

Most machine learning algorithms build upon two fundamental approaches:

📈 Linear Models

Core Idea: Fit a straight line (or hyperplane) through data

Examples: Linear Regression, Logistic Regression

Strength: Simple, fast, interpretable

Weakness: Can't capture complex patterns

🌳 Tree Models

Core Idea: Create decision rules by splitting data

Examples: Decision Trees, Rule-based systems

Strength: Captures complex patterns, interpretable

Weakness: Prone to overfitting

The Journey We'll Take

In this chapter, you'll discover:

How linear models work and their bias-variance tradeoff
Why decision trees overfit and how complexity affects performance
Interactive demos showing these problems in action
Why these problems led to regularization and ensemble methods

Foundation Model Selector

Choose your data characteristics to see which foundation model works best:

Data Complexity: 3

Dataset Size: 3

Interpretability Need: 3

Recommended Model:

Linear Models: Simple but Powerful

Linear models are the foundation of machine learning. Despite their simplicity, they reveal fundamental concepts that apply to all ML algorithms.

Linear Regression Demo

Linear regression visualization showing data points and fitted line with bias-variance tradeoff

Interactive Parameters

Model Complexity: 1

Noise Level: 3

Training Size: 100

0.15

Bias

0.08

Variance

0.23

Total Error

The Bias-Variance Tradeoff

Linear models demonstrate the fundamental bias-variance tradeoff in machine learning:

Low Variance

Consistent predictions across different datasets
Not sensitive to small data changes
Reliable and stable

High Bias (Potentially)

May be too simple for complex patterns
Underfitting on non-linear data
Limited model capacity

Linear Model Limitations

As you experiment with the parameters above, you'll notice:

Limited Flexibility: Can't capture non-linear relationships
Feature Engineering Required: Need to manually create polynomial features
Assumption Heavy: Assumes linear relationship between features and target

This leads us to need more flexible models... like decision trees!

Decision Trees: Flexibility with a Cost

Decision trees solve the flexibility problem of linear models but introduce their own challenges. Let's explore what makes them powerful and problematic.

Decision Tree Overfitting

Decision tree visualization showing overfitting with complex branches and poor generalization

Tree Complexity Control

Max Depth: 3

Min Samples Split: 10

Dataset Complexity: 3

0.95

Training Acc

0.72

Validation Acc

0.23

Overfitting

Interactive Overfitting Demonstration

See how decision tree complexity affects performance:

Tree Complexity:

Simple Complex

Moderate Complexity (Depth: 3)

Good balance between fitting the data and generalizing to new examples.

🎯 Decision Tree Strengths

Non-linear Patterns: Can capture complex relationships
Feature Interactions: Automatically finds feature combinations
Interpretability: Easy to visualize decision paths
No Assumptions: Works with any data distribution
Mixed Data Types: Handles numerical and categorical features

⚠️ Decision Tree Problems

Overfitting: Memorizes training data noise
Instability: Small data changes create different trees
Bias: Favors features with more split points
Limited Expressiveness: Axis-aligned splits only
High Variance: Very sensitive to training data

The Fundamental Problems

Now you've seen both foundation models in action. Each has critical limitations that drive the need for more sophisticated approaches.

Problem #1: The Bias-Variance Tradeoff

Linear Models

High Bias, Low Variance

Consistent but potentially inaccurate
Underfits complex patterns
Limited model capacity

Decision Trees

Low Bias, High Variance

Flexible but inconsistent
Overfits to training data
Unstable predictions

Solution Preview: Regularization techniques (L1/L2) and Ensemble methods balance this tradeoff!

Problem #2: Single Model Limitations

Both foundation models suffer from being "single" models:

Limited Perspective: Each model has one way of looking at data
Sensitive to Data: Small changes can dramatically affect results
Error Propagation: If the model is wrong, it's completely wrong

Solution Preview: Ensemble methods combine multiple models to overcome individual limitations!

Problem Demonstration

See how the same data affects different models:

Add Noise to Data: 0

Data Complexity: 1

0.85

Linear Model

0.72

Decision Tree

0.91

Ensemble (Preview)

Notice how the ensemble consistently outperforms individual models!

What's Coming Next

Now that you understand the fundamental problems, you're ready to see how the ML community solved them:

Chapter 3: Regularization techniques (L1/L2) that control overfitting
Chapter 4-6: Ensemble methods that combine multiple models
Chapter 7: XGBoost - the optimized implementation that wins competitions

Each solution directly addresses the problems you've discovered in this chapter!

Chapter 2 Quiz

Test your understanding of foundation models and their problems:

Question 1: What is the main tradeoff between linear models and decision trees?

Linear models are always better

Linear models have high bias/low variance, decision trees have low bias/high variance

Decision trees are always more accurate

There is no tradeoff

Correct! This is the fundamental bias-variance tradeoff. Linear models are consistent but potentially too simple (high bias), while decision trees are flexible but unstable (high variance).

Question 2: Why do decision trees tend to overfit?

They are too simple

They can't capture complex patterns

They can memorize training data by creating very specific rules

They require too much data

Exactly! Decision trees can keep splitting until they perfectly classify every training example, essentially memorizing the data rather than learning generalizable patterns.

Question 3: Which problem do ensemble methods primarily solve?

Making models run faster

Reducing the limitations of single models by combining multiple models

Making models more interpretable

Reducing the amount of data needed

Perfect! Ensemble methods address the core limitation that single models have only one perspective on the data. By combining multiple models, we can leverage their different strengths and reduce individual weaknesses.

Chapter 2: Foundation Models & Their Problems