Chapter 2: Decision Tree Mathematics

Dive deep into the mathematical foundations of decision trees, including entropy, information gain, and splitting criteria.

Learning Objectives

  • Understand entropy and its role in decision tree splitting
  • Learn how to calculate information gain
  • Master Gini impurity as an alternative splitting criterion
  • Understand different splitting criteria and when to use them
  • See mathematical concepts in action with interactive demos

Entropy & Information Theory

🎲 Think of Entropy Like a Coin Flip

Imagine you have a bag of coins. If all coins are fair (50% heads, 50% tails), you're very uncertain about what you'll get - this is high entropy. If all coins are weighted to always land heads, you're certain about the outcome - this is low entropy.

Entropy measures the amount of uncertainty or randomness in a dataset. In decision trees, we use entropy to determine how "pure" or "impure" a group of data points is.

Mathematical Definition

Entropy is calculated using the formula:

H(S) = -Σ p(x) × log₂(p(x))

Where:

  • H(S) is the entropy of set S
  • p(x) is the proportion of class x in the set
  • log₂ is the logarithm base 2

Information Gain

🎁 Information Gain = How Much We Learn

Information gain measures how much we reduce uncertainty by splitting the data. If splitting gives us much more organized groups, we gain a lot of information!

Information Gain Formula

Information Gain is calculated as:

IG(S, A) = H(S) - Σ (|Sv|/|S|) × H(Sv)

Gini Impurity

🎯 Gini: A Simpler Alternative to Entropy

Gini impurity is another way to measure how "mixed" a dataset is. It's computationally faster than entropy and gives similar results in most cases.

Gini Impurity Formula

Gini impurity is calculated as:

Gini(S) = 1 - Σ p(x)²

Splitting Criteria

🔍 How Decision Trees Choose the Best Split

Decision trees try every possible way to split the data and pick the one that gives the most "pure" groups. It's like trying different questions to see which one best separates the data.

Common Splitting Criteria

�� Information Gain

Maximizes the reduction in entropy. Good for balanced datasets.

🎯 Gini Impurity

Minimizes the probability of misclassification. Faster computation.

📈 Information Gain Ratio

Information gain divided by split information. Reduces bias toward multi-valued attributes.

Interactive Mathematics Demo

🧮 Calculate Entropy & Information Gain

Try different datasets and see how entropy and information gain change. This will help you understand the math behind decision tree splitting!

Click "Generate Sample Data" to start

Mathematical calculations will appear here

Chapter 2 Quiz

🧠 Test Your Mathematical Understanding

Answer these questions to make sure you understand the math behind decision trees!

Question 1: What is the entropy of a perfectly pure dataset?

Question 2: Which splitting criterion is computationally fastest?