Chapter 3: Python Implementation
Learn how to implement decision trees in Python using scikit-learn and build your own from scratch.
Learning Objectives
- Learn to use scikit-learn's DecisionTreeClassifier
- Understand key parameters and how to tune them
- Build a custom decision tree from scratch
- Visualize decision trees effectively
- Handle real-world datasets with decision trees
Using scikit-learn
🐍 scikit-learn: Your Decision Tree Toolkit
scikit-learn provides powerful, optimized implementations of decision trees that handle all the complex math for you. You just need to understand how to use them effectively!
Basic Implementation
Simple Decision Tree Example
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# Load sample data
iris = load_iris()
X, y = iris.data, iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train decision tree
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Make predictions
predictions = clf.predict(X_test)
accuracy = clf.score(X_test, y_test)
print(f"Accuracy: {accuracy:.2f}")
Key Parameters
🎯 criterion
How to measure split quality
📏 max_depth
Maximum depth of the tree
🍃 min_samples_split
Minimum samples needed to split
🍂 min_samples_leaf
Minimum samples in leaf nodes
Custom Implementation
🔨 Building Your Own Decision Tree
While scikit-learn is powerful, understanding how to build a decision tree from scratch helps you truly understand the algorithm and customize it for specific needs.
Core Components
Node Class
class TreeNode:
def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
self.feature = feature # Feature to split on
self.threshold = threshold # Threshold value
self.left = left # Left child
self.right = right # Right child
self.value = value # Prediction (for leaf nodes)
def is_leaf(self):
return self.value is not None
Tree Visualization
👁️ Seeing Your Decision Tree
Visualizing decision trees helps you understand how they make decisions and debug any issues. There are several ways to visualize trees in Python.
Text Visualization
Using sklearn.tree.export_text
from sklearn.tree import export_text
# Export tree as text
tree_rules = export_text(clf, feature_names=iris.feature_names)
print(tree_rules)
Graph Visualization
Using sklearn.tree.plot_tree
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
# Plot the decision tree
plt.figure(figsize=(12, 8))
plot_tree(clf, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True)
plt.title("Decision Tree Visualization")
plt.show()
Parameters & Tuning
⚙️ Tuning Your Decision Tree
Decision trees have several parameters that control their behavior. Understanding these parameters is crucial for building effective models.
Preventing Overfitting
📏 Maximum Depth
Limit how deep the tree can grow
�� Minimum Samples Split
Require minimum samples to create a split
�� Minimum Samples Leaf
Require minimum samples in leaf nodes
🔍 Maximum Features
Maximum features to consider at each split
Interactive Python Demo
�� Try Python Implementation
Experiment with different parameters and see how they affect the decision tree performance and structure!
Click "Load Dataset" to start
Python code and results will appear here
Chapter 3 Quiz
🧠 Test Your Python Knowledge
Answer these questions about Python implementation!