Chapter 2: Foundation Models & Their Problems

Understanding the building blocks of ML and the fundamental problems that led to advanced techniques

Why Foundation Models Matter

Before diving into advanced techniques like XGBoost and Random Forests, we need to understand the foundation models they're built upon. These simple models reveal fundamental problems that drive innovation in machine learning.

The Two Pillars of ML

Most machine learning algorithms build upon two fundamental approaches:

📈 Linear Models

Core Idea: Fit a straight line (or hyperplane) through data

Examples: Linear Regression, Logistic Regression

Strength: Simple, fast, interpretable

Weakness: Can't capture complex patterns

🌳 Tree Models

Core Idea: Create decision rules by splitting data

Examples: Decision Trees, Rule-based systems

Strength: Captures complex patterns, interpretable

Weakness: Prone to overfitting

The Journey We'll Take

In this chapter, you'll discover:

  • How linear models work and their bias-variance tradeoff
  • Why decision trees overfit and how complexity affects performance
  • Interactive demos showing these problems in action
  • Why these problems led to regularization and ensemble methods

Foundation Model Selector

Choose your data characteristics to see which foundation model works best:

3
3
3
Recommended Model:

Linear Models: Simple but Powerful

Linear models are the foundation of machine learning. Despite their simplicity, they reveal fundamental concepts that apply to all ML algorithms.

Linear Regression Demo

Linear regression visualization showing data points and fitted line with bias-variance tradeoff

Interactive Parameters

1
3
100
0.15
Bias
0.08
Variance
0.23
Total Error

The Bias-Variance Tradeoff

Linear models demonstrate the fundamental bias-variance tradeoff in machine learning:

Low Variance
  • Consistent predictions across different datasets
  • Not sensitive to small data changes
  • Reliable and stable
High Bias (Potentially)
  • May be too simple for complex patterns
  • Underfitting on non-linear data
  • Limited model capacity

Linear Model Limitations

As you experiment with the parameters above, you'll notice:

  • Limited Flexibility: Can't capture non-linear relationships
  • Feature Engineering Required: Need to manually create polynomial features
  • Assumption Heavy: Assumes linear relationship between features and target

This leads us to need more flexible models... like decision trees!

Decision Trees: Flexibility with a Cost

Decision trees solve the flexibility problem of linear models but introduce their own challenges. Let's explore what makes them powerful and problematic.

Decision Tree Overfitting

Decision tree visualization showing overfitting with complex branches and poor generalization

Tree Complexity Control

3
10
3
0.95
Training Acc
0.72
Validation Acc
0.23
Overfitting

Interactive Overfitting Demonstration

See how decision tree complexity affects performance:

Simple Complex
Moderate Complexity (Depth: 3)

Good balance between fitting the data and generalizing to new examples.

🎯 Decision Tree Strengths

  • Non-linear Patterns: Can capture complex relationships
  • Feature Interactions: Automatically finds feature combinations
  • Interpretability: Easy to visualize decision paths
  • No Assumptions: Works with any data distribution
  • Mixed Data Types: Handles numerical and categorical features

⚠️ Decision Tree Problems

  • Overfitting: Memorizes training data noise
  • Instability: Small data changes create different trees
  • Bias: Favors features with more split points
  • Limited Expressiveness: Axis-aligned splits only
  • High Variance: Very sensitive to training data

The Fundamental Problems

Now you've seen both foundation models in action. Each has critical limitations that drive the need for more sophisticated approaches.

Problem #1: The Bias-Variance Tradeoff

Linear Models

High Bias, Low Variance

  • Consistent but potentially inaccurate
  • Underfits complex patterns
  • Limited model capacity
Decision Trees

Low Bias, High Variance

  • Flexible but inconsistent
  • Overfits to training data
  • Unstable predictions

Solution Preview: Regularization techniques (L1/L2) and Ensemble methods balance this tradeoff!

Problem #2: Single Model Limitations

Both foundation models suffer from being "single" models:

  • Limited Perspective: Each model has one way of looking at data
  • Sensitive to Data: Small changes can dramatically affect results
  • Error Propagation: If the model is wrong, it's completely wrong

Solution Preview: Ensemble methods combine multiple models to overcome individual limitations!

Problem Demonstration

See how the same data affects different models:

0
1
0.85
Linear Model
0.72
Decision Tree
0.91
Ensemble (Preview)

Notice how the ensemble consistently outperforms individual models!

What's Coming Next

Now that you understand the fundamental problems, you're ready to see how the ML community solved them:

  • Chapter 3: Regularization techniques (L1/L2) that control overfitting
  • Chapter 4-6: Ensemble methods that combine multiple models
  • Chapter 7: XGBoost - the optimized implementation that wins competitions

Each solution directly addresses the problems you've discovered in this chapter!

Chapter 2 Quiz

Test your understanding of foundation models and their problems:

Question 1: What is the main tradeoff between linear models and decision trees?

Linear models are always better
Linear models have high bias/low variance, decision trees have low bias/high variance
Decision trees are always more accurate
There is no tradeoff
Correct! This is the fundamental bias-variance tradeoff. Linear models are consistent but potentially too simple (high bias), while decision trees are flexible but unstable (high variance).

Question 2: Why do decision trees tend to overfit?

They are too simple
They can't capture complex patterns
They can memorize training data by creating very specific rules
They require too much data
Exactly! Decision trees can keep splitting until they perfectly classify every training example, essentially memorizing the data rather than learning generalizable patterns.

Question 3: Which problem do ensemble methods primarily solve?

Making models run faster
Reducing the limitations of single models by combining multiple models
Making models more interpretable
Reducing the amount of data needed
Perfect! Ensemble methods address the core limitation that single models have only one perspective on the data. By combining multiple models, we can leverage their different strengths and reduce individual weaknesses.