Chapter 2: Decision Tree Mathematics
Dive deep into the mathematical foundations of decision trees, including entropy, information gain, and splitting criteria.
Learning Objectives
- Understand entropy and its role in decision tree splitting
- Learn how to calculate information gain
- Master Gini impurity as an alternative splitting criterion
- Understand different splitting criteria and when to use them
- See mathematical concepts in action with interactive demos
Entropy & Information Theory
🎲 Think of Entropy Like a Coin Flip
Imagine you have a bag of coins. If all coins are fair (50% heads, 50% tails), you're very uncertain about what you'll get - this is high entropy. If all coins are weighted to always land heads, you're certain about the outcome - this is low entropy.
Entropy measures the amount of uncertainty or randomness in a dataset. In decision trees, we use entropy to determine how "pure" or "impure" a group of data points is.
Mathematical Definition
Entropy is calculated using the formula:
H(S) = -Σ p(x) × log₂(p(x))
Where:
- H(S) is the entropy of set S
- p(x) is the proportion of class x in the set
- log₂ is the logarithm base 2
Information Gain
🎁 Information Gain = How Much We Learn
Information gain measures how much we reduce uncertainty by splitting the data. If splitting gives us much more organized groups, we gain a lot of information!
Information Gain Formula
Information Gain is calculated as:
IG(S, A) = H(S) - Σ (|Sv|/|S|) × H(Sv)
Gini Impurity
🎯 Gini: A Simpler Alternative to Entropy
Gini impurity is another way to measure how "mixed" a dataset is. It's computationally faster than entropy and gives similar results in most cases.
Gini Impurity Formula
Gini impurity is calculated as:
Gini(S) = 1 - Σ p(x)²
Splitting Criteria
🔍 How Decision Trees Choose the Best Split
Decision trees try every possible way to split the data and pick the one that gives the most "pure" groups. It's like trying different questions to see which one best separates the data.
Common Splitting Criteria
�� Information Gain
Maximizes the reduction in entropy. Good for balanced datasets.
🎯 Gini Impurity
Minimizes the probability of misclassification. Faster computation.
📈 Information Gain Ratio
Information gain divided by split information. Reduces bias toward multi-valued attributes.
Interactive Mathematics Demo
🧮 Calculate Entropy & Information Gain
Try different datasets and see how entropy and information gain change. This will help you understand the math behind decision tree splitting!
Click "Generate Sample Data" to start
Mathematical calculations will appear here
Chapter 2 Quiz
🧠 Test Your Mathematical Understanding
Answer these questions to make sure you understand the math behind decision trees!