Chapter 9: Linkage Criteria Methods

Linkage Criteria: The Heart of Hierarchical Clustering

Think of linkage criteria like different ways to measure how close two groups are:

Single linkage: Like measuring the distance between the closest pair of students from different groups
Complete linkage: Like measuring the distance between the farthest pair of students from different groups
Average linkage: Like measuring the average distance between all pairs of students from different groups
Ward's method: Like measuring how much the total variance increases when you merge two groups

Linkage criteria define how to measure the distance or dissimilarity between clusters, fundamentally determining the structure and properties of the resulting hierarchy. The choice of linkage criterion has profound effects on cluster shape, computational complexity, and the ability to recover different types of cluster structures. This chapter provides comprehensive mathematical analysis of the major linkage methods and practical guidance for selection.

Why Linkage Criteria Matter

Choosing the right linkage criterion helps you:

Get the right cluster shapes: Some methods create compact clusters, others create elongated ones
Handle different data types: Some methods work better with certain types of data
Control computational complexity: Some methods are faster than others
Recover specific structures: Different methods are better at finding different patterns

Mathematical Framework for Linkage Criteria

All linkage methods can be understood within a unified mathematical framework that defines inter-cluster distance functions.

General Linkage Framework

Basic Setup:

Given two clusters A and B, define the inter-cluster distance as:

D(A, B) = f({d(a, b) : a ∈ A, b ∈ B})

Where d(a, b) is the pairwise distance between points a and b, and f is an aggregation function.

Lance-Williams Formula:

Most linkage criteria can be expressed using the recursive Lance-Williams formula:

D(A∪B, C) = αₐD(A,C) + αᵦD(B,C) + βD(A,B) + γ|D(A,C) - D(B,C)|

Where αₐ, αᵦ, β, γ are parameters that define the specific linkage method.

Key Properties:

Monotonicity: D(A,B) ≤ D(A∪B,C) for most linkage criteria
Symmetry: D(A,B) = D(B,A) always holds
Reducibility: Can compute distances incrementally as clusters merge
Space complexity: Only need to store inter-cluster distances

Classification of Linkage Methods

Linkage criteria can be classified by their mathematical properties and the types of cluster structures they favor.

Linkage Method	Mathematical Definition	Lance-Williams Parameters	Key Characteristics
Single (Minimum)	min{d(a,b) : a∈A, b∈B}	αₐ=αᵦ=0.5, β=0, γ=-0.5	Chaining, elongated clusters
Complete (Maximum)	max{d(a,b) : a∈A, b∈B}	αₐ=αᵦ=0.5, β=0, γ=0.5	Compact, spherical clusters
Average (UPGMA)	(1/\|A\|\|B\|)Σₐ∈ₐ Σᵦ∈ᵦ d(a,b)	αₐ=\|A\|/(\|A\|+\|B\|), αᵦ=\|B\|/(\|A\|+\|B\|), β=γ=0	Balanced, moderate compactness
Ward's Method	Increase in within-cluster sum of squares	αₐ=(\|A\|+\|C\|)/(\|A\|+\|B\|+\|C\|), similar for αᵦ	Spherical, equal-sized clusters
Centroid	d(μₐ, μᵦ) where μ is centroid	αₐ=\|A\|/(\|A\|+\|B\|), αᵦ=\|B\|/(\|A\|+\|B\|), β=-αₐαᵦ	Can violate monotonicity

Computational Complexity Overview

Different linkage criteria have varying computational requirements, affecting their scalability to large datasets.

Complexity Analysis Summary

Time Complexity (Basic Algorithms):

Single/Complete/Average: O(n³) using naive approach
Single (optimized): O(n² log n) using MST algorithms
Ward's method: O(n³) for basic implementation
SLINK/CLINK: O(n²) for single/complete linkage

Space Complexity:

Distance matrix: O(n²) for storing all pairwise distances
Optimized storage: O(n) for certain algorithms (SLINK)
Dendrogram storage: O(n) for final tree structure

Practical Scalability:

Small datasets (n < 1,000): All methods feasible
Medium datasets (1,000 < n < 10,000): Use optimized algorithms
Large datasets (n > 10,000): Consider approximation methods

Visualization: Linkage Criteria Comparison

Loading linkage comparison visualization...

Linkage Selection Guidelines

Choosing the appropriate linkage criterion depends on data characteristics, cluster shape expectations, and computational constraints.

Data Characteristics

Well-separated spherical clusters: Complete linkage or Ward's method
Elongated or irregular clusters: Single linkage
Mixed cluster shapes: Average linkage
Noisy data with outliers: Complete or Ward's method
Different cluster densities: Single linkage

Computational Considerations

Large datasets: Single linkage with MST optimization
Memory constraints: SLINK/CLINK algorithms
Real-time applications: Precomputed distance matrices
Approximate clustering: Sampling-based methods
Parallel processing: Divisible linkage criteria

Application Domains

Phylogenetics: UPGMA (average linkage)
Image segmentation: Ward's method
Social network analysis: Single linkage
Market segmentation: Ward's method
Gene expression analysis: Average or complete linkage

Single Linkage: Nearest Neighbor Clustering

Single linkage, also known as the nearest neighbor or minimum method, defines the distance between clusters as the minimum distance between any two points in different clusters. This criterion tends to create elongated clusters and is particularly effective at detecting arbitrarily shaped clusters and cluster chains.

Mathematical Definition and Properties

Single linkage has a simple yet powerful mathematical formulation with deep connections to graph theory.

Single Linkage Mathematical Framework

Basic Definition:

For clusters A and B, the single linkage distance is:

D_single(A, B) = min{d(a, b) : a ∈ A, b ∈ B}

Where d(a, b) is the distance between individual points a and b.

Lance-Williams Parameters:

For updating distances when merging clusters A and B:

αₐ = αᵦ = 0.5: Equal weighting of both clusters
β = 0: No dependency on distance between merged clusters
γ = -0.5: Takes minimum of the two distances

Resulting update formula:

D(A∪B, C) = 0.5·D(A,C) + 0.5·D(B,C) - 0.5·|D(A,C) - D(B,C)|

Equivalent Formulation:

The Lance-Williams formula simplifies to:

D(A∪B, C) = min{D(A,C), D(B,C)}

This shows that single linkage always takes the minimum distance.

Connection to Minimum Spanning Trees

Single linkage has a fundamental equivalence to minimum spanning tree algorithms, providing both theoretical insights and computational advantages.

MST-Single Linkage Equivalence Theorem

Main Result:

Theorem: Single linkage hierarchical clustering is equivalent to finding the minimum spanning tree of the complete graph and removing edges in decreasing order of weight.

Proof Sketch:

MST construction: Build MST using Kruskal's or Prim's algorithm
Edge removal: Remove edges in decreasing order of weight
Connected components: At each step, connected components form clusters
Equivalence: This produces identical hierarchy to single linkage

Algorithmic Implications:

Computational efficiency: Can use O(n² log n) MST algorithms
Space efficiency: Store only MST edges, not full distance matrix
Theoretical analysis: Leverage MST properties for clustering analysis

Chaining Phenomenon

Single linkage's tendency to create chain-like clusters is both its strength and weakness, depending on the application.

Understanding the Chaining Effect

Mechanism:

Single linkage connects clusters through their closest points, which can create long chains of connected points even when the overall clusters are far apart.

When Chaining is Beneficial:

Elongated natural clusters: Rivers, roads, mountain ridges
Network structures: Social networks, protein interactions
Irregular cluster shapes: Non-convex or curved clusters
Variable density clusters: Clusters with different densities

When Chaining is Problematic:

Well-separated spherical clusters: Will be incorrectly merged
Noisy data: Noise points create unwanted connections
Outliers: Single outliers can bridge distinct clusters
Uniform cluster expectations: When expecting compact, similar-sized clusters

Visualization: Single Linkage Behavior

Image Description: A 2x2 grid demonstrating single linkage characteristics. Top-left: Two well-separated circular clusters with a few noise points creating a bridge between them. Top-right: Single linkage dendrogram showing these getting merged at a low height due to the bridge. Bottom-left: Elongated crescent-shaped clusters that naturally chain together. Bottom-right: Single linkage successfully identifying the crescent structure where other methods might fail. Each visualization includes merge order annotations and distance measurements.

This shows both the strength (handling irregular shapes) and weakness (chaining effect) of single linkage

Random Initialization Algorithm

Basic Random Selection:

function random_init(X, k): n, d = X.shape indices = random_sample(n, k) // Sample k indices without replacement centroids = X[indices] // Select corresponding data points return centroids

Random Uniform in Feature Space:

function random_uniform_init(X, k): n, d = X.shape min_vals = min(X, axis=0) // Feature-wise minimum max_vals = max(X, axis=0) // Feature-wise maximum centroids = uniform(min_vals, max_vals, size=(k, d)) return centroids

Advantages of Random Initialization:

Simplicity: Easy to implement and understand
Speed: O(kd) time complexity
Unbiased: No assumptions about data structure
Baseline: Good reference for comparing other methods

Disadvantages:

High variance: Results vary significantly across runs
Poor clustering: Often leads to suboptimal solutions
Slow convergence: May require many iterations
Empty clusters: Risk of centroids in sparse regions

Furthest-First Heuristic

This method iteratively selects centroids that are as far as possible from previously selected ones, promoting good coverage of the data space.

Furthest-First Initialization

function furthest_first_init(X, k): n, d = X.shape centroids = [] // Step 1: Choose first centroid randomly first_idx = random_choice(n) centroids.append(X[first_idx]) // Step 2: Iteratively choose furthest points for i = 2 to k: max_distance = -1 furthest_idx = -1 for j = 1 to n: // Find minimum distance to existing centroids min_dist = min([distance(X[j], c) for c in centroids]) if min_dist > max_distance: max_distance = min_dist furthest_idx = j centroids.append(X[furthest_idx]) return centroids

Advantages:

Good coverage: Centroids spread across data space
Deterministic: Same result for same first choice
No empty clusters: Guarantees centroids on data points
Better than random: Generally produces better initializations

Disadvantages:

Outlier sensitivity: May select extreme outliers
Computational cost: O(nk) time complexity
Still suboptimal: Not guaranteed to find good initializations
First choice matters: Quality depends on initial random selection

Complete Linkage: Maximum Distance Clustering

Complete linkage, also known as the maximum or furthest neighbor method, defines the distance between clusters as the maximum distance between any two points in different clusters. This criterion creates compact, spherical clusters and is robust to outliers and noise.

Mathematical Definition and Properties

Complete linkage provides a conservative approach to cluster merging, ensuring that all points in merged clusters are relatively close to each other.

Complete Linkage Mathematical Framework

Basic Definition:

For clusters A and B, the complete linkage distance is:

D_complete(A, B) = max{d(a, b) : a ∈ A, b ∈ B}

Where d(a, b) is the distance between individual points a and b.

Lance-Williams Parameters:

For updating distances when merging clusters A and B:

αₐ = αᵦ = 0.5: Equal weighting of both clusters
β = 0: No dependency on distance between merged clusters
γ = 0.5: Takes maximum of the two distances

Resulting update formula:

D(A∪B, C) = 0.5·D(A,C) + 0.5·D(B,C) + 0.5·|D(A,C) - D(B,C)|

Equivalent Formulation:

The Lance-Williams formula simplifies to:

D(A∪B, C) = max{D(A,C), D(B,C)}

This shows that complete linkage always takes the maximum distance.

Compact Cluster Formation

Complete linkage's conservative merging strategy leads to the formation of compact, well-separated clusters.

Understanding Compact Cluster Formation

Mechanism:

Complete linkage only merges clusters when the maximum distance between any two points in the merged cluster remains small, ensuring tight cluster boundaries.

Advantages of Compact Clusters:

Well-defined boundaries: Clear separation between clusters
Robust to outliers: Outliers don't affect cluster merging decisions
Equal-sized clusters: Tends to create clusters of similar sizes
Interpretable results: Clusters have clear geometric meaning

When Complete Linkage is Ideal:

Spherical cluster expectations: When clusters are expected to be roughly spherical
Noisy data: When data contains outliers or noise points
Well-separated clusters: When clusters are clearly distinct
Equal importance: When all points within a cluster should be similar

Visualization: Complete Linkage Behavior

Image Description: A 2x2 grid demonstrating complete linkage characteristics. Top-left: Three well-separated circular clusters with some noise points. Top-right: Complete linkage dendrogram showing clear separation between clusters with high merge heights. Bottom-left: Comparison showing how complete linkage creates compact clusters while single linkage would chain them together. Bottom-right: Cluster boundaries showing tight, spherical formations with clear separation.

This demonstrates complete linkage's strength in creating compact, well-separated clusters

K-means++ Initialization Algorithm

function kmeans_plus_plus(X, k): n, d = X.shape centroids = [] // Step 1: Choose first centroid uniformly at random first_idx = random_choice(n) centroids.append(X[first_idx]) // Step 2: Choose remaining k-1 centroids for i = 2 to k: distances = [] // Compute squared distance to nearest existing centroid for j = 1 to n: min_dist_sq = min([||X[j] - c||² for c in centroids]) distances.append(min_dist_sq) // Choose next centroid with probability proportional to squared distance probabilities = distances / sum(distances) next_idx = weighted_random_choice(probabilities) centroids.append(X[next_idx]) return centroids

Key Insight:

The probability of selecting a point as the next centroid is proportional to its squared distance from the nearest existing centroid. This creates a bias toward points that are far from current centroids, promoting good spatial distribution.

Mathematical Formulation:

For selecting the (j+1)-th centroid, given j existing centroids C = {c₁, c₂, ..., cⱼ}:

P(xᵢ) = D²(xᵢ) / Σₖ D²(xₖ)

Where D²(xᵢ) = min_{c∈C} ||xᵢ - c||² is the squared distance to the nearest centroid.

Theoretical Analysis

K-means++ comes with strong theoretical guarantees that explain its superior performance.

K-means++ Approximation Guarantee

Main Theorem (Arthur & Vassilvitskii, 2007):

Theorem: K-means++ initialization followed by Lloyd's algorithm produces a solution with expected cost at most O(log k) times the optimal k-means cost.

Formally: E[cost(K-means++ solution)] ≤ 8(ln k + 2) × OPT

Where OPT is the cost of the optimal k-means clustering.

Proof Sketch:

Potential function: Define Φ = Σᵢ D²(xᵢ) as sum of squared distances to nearest centroids
Expected reduction: Each K-means++ step reduces E[Φ] by a constant factor
Concentration: Use probability tail bounds to show consistent performance
Optimality bound: Relate final potential to optimal clustering cost

Implications:

Logarithmic guarantee: Performance degrades slowly with k
Probabilistic bound: Guarantee holds in expectation
Initialization only: Bound applies to initialization, Lloyd's improves it
Practical relevance: Constant factors are reasonable in practice

Initialization Comparison Demo

Initialization Method: Number of Clusters: 3

Click "Run Demo" to compare different initialization methods

Average Linkage: Balanced Clustering

Average linkage, also known as UPGMA (Unweighted Pair Group Method with Arithmetic Mean), defines the distance between clusters as the average distance between all pairs of points in different clusters. This method provides a balanced approach between single and complete linkage.

Mathematical Definition and Properties

Average linkage offers a compromise between the extremes of single and complete linkage, providing moderate cluster compactness.

Average Linkage Mathematical Framework

Basic Definition:

For clusters A and B, the average linkage distance is:

D_average(A, B) = (1/|A||B|) Σ_{a∈A} Σ_{b∈B} d(a, b)

Where |A| and |B| are the sizes of clusters A and B, and d(a, b) is the distance between points a and b.

Lance-Williams Parameters:

For updating distances when merging clusters A and B:

αₐ = |A|/(|A|+|B|): Proportional to cluster A size
αᵦ = |B|/(|A|+|B|): Proportional to cluster B size
β = 0: No dependency on distance between merged clusters
γ = 0: No absolute difference term

Resulting update formula:

D(A∪B, C) = (|A|/(|A|+|B|))·D(A,C) + (|B|/(|A|+|B|))·D(B,C)

Weighted Average Interpretation:

The update formula shows that average linkage uses a weighted average where larger clusters have more influence on the resulting distance.

UPGMA and Phylogenetic Applications

Average linkage has deep roots in phylogenetic analysis and evolutionary biology, where it's known as UPGMA.

UPGMA in Phylogenetic Analysis

Historical Context:

UPGMA was originally developed for constructing phylogenetic trees from genetic distance data, making it one of the oldest hierarchical clustering methods.

Key Assumptions:

Molecular clock: Assumes constant rate of evolution
Ultrametric property: All leaves are equidistant from root
Additive distances: Distances satisfy triangle inequality

Modern Applications:

Gene expression analysis: Clustering genes with similar expression patterns
Taxonomic classification: Organizing species by genetic similarity
Protein family analysis: Grouping related protein sequences
Microbiome studies: Analyzing bacterial community structures

Visualization: Average Linkage Behavior

Image Description: A 2x2 grid demonstrating average linkage characteristics. Top-left: Mixed cluster shapes with varying densities. Top-right: Average linkage dendrogram showing moderate merge heights between single and complete linkage. Bottom-left: Comparison of cluster boundaries showing balanced compactness. Bottom-right: Phylogenetic tree example showing UPGMA application to genetic data with equal branch lengths from root.

This demonstrates average linkage's balanced approach and its phylogenetic applications

Computational Complexity and Efficiency

Average linkage has moderate computational requirements compared to other linkage methods.

Complexity Analysis

Time Complexity:

O(n² log n): For n data points

Initial distance matrix: O(n²)
n-1 merge operations: O(n log n) with efficient data structures
Distance updates: O(n) per merge

Space Complexity:

O(n²): For storing the distance matrix

Memory Optimization Strategies:

Lazy evaluation: Compute distances on-demand
Chunked processing: Process data in batches
Approximate methods: Use sampling for large datasets

Advantages and Limitations

Average linkage provides a balanced approach with specific strengths and weaknesses.

Advantages of Average Linkage

Balanced clustering: Compromise between single and complete linkage
Robust to outliers: Less sensitive than single linkage
Biological relevance: Natural for phylogenetic applications
Interpretable results: Clear cluster boundaries
Size-aware: Larger clusters have more influence

Limitations of Average Linkage

Computational cost: More expensive than single linkage
Memory requirements: Needs full distance matrix
Chaining tendency: Can still create elongated clusters
Parameter sensitivity: Results depend on distance metric choice

Average Linkage Clustering Demo

Click "Generate Demo" to see average linkage clustering in action

Ward's Method: Variance Minimization

Ward's method, also known as minimum variance clustering, merges clusters that result in the smallest increase in within-cluster variance. This method is particularly effective for creating compact, spherical clusters and is widely used in practice.

Mathematical Foundation

Ward's method minimizes the increase in within-cluster sum of squares (WSS) when merging clusters.

Ward's Method Mathematical Framework

Objective Function:

Ward's method minimizes the increase in within-cluster sum of squares:

ΔWSS(A,B) = WSS(A∪B) - WSS(A) - WSS(B)

Where WSS(C) = Σ_{x∈C} ||x - μ_C||² is the within-cluster sum of squares for cluster C.

Simplified Formula:

The increase in WSS can be simplified to:

ΔWSS(A,B) = (|A|·|B|)/(|A|+|B|) · ||μ_A - μ_B||²

Where |A| and |B| are cluster sizes, and μ_A, μ_B are cluster centroids.

Lance-Williams Parameters:

For updating distances when merging clusters A and B:

αₐ = (|A|+|C|)/(|A|+|B|+|C|): Size-weighted coefficient
αᵦ = (|B|+|C|)/(|A|+|B|+|C|): Size-weighted coefficient
β = -|C|/(|A|+|B|+|C|): Negative coefficient for cluster C
γ = 0: No absolute difference term

Variance Minimization Properties

Ward's method has unique properties that make it particularly effective for certain types of data.

Key Properties of Ward's Method

Variance Minimization:

Optimal merging: Always merges clusters that minimize variance increase
Compact clusters: Tends to create spherical, well-separated clusters
Size sensitivity: Larger clusters have more influence on merging decisions
Monotonic increase: Merge heights always increase (ultrametric property)

Geometric Interpretation:

Centroid-based: Focuses on distances between cluster centroids
Size weighting: Accounts for cluster sizes in distance calculations
Variance preservation: Maintains low within-cluster variance

Advantages and Applications

Ward's method is particularly well-suited for specific types of clustering problems.

When to Use Ward's Method

Ideal Conditions:

Spherical clusters: When clusters are expected to be roughly spherical
Similar cluster sizes: When clusters should be of similar sizes
Well-separated data: When clusters are clearly distinct
Low noise: When data has minimal outliers

Common Applications:

Market segmentation: Customer clustering based on behavior
Image segmentation: Grouping similar pixels in images
Gene expression analysis: Clustering genes with similar expression patterns
Social network analysis: Identifying communities in networks

Visualization: Ward's Method Behavior

Image Description: A 2x2 grid demonstrating Ward's method characteristics. Top-left: Well-separated spherical clusters of similar sizes. Top-right: Ward's method dendrogram showing gradual increase in merge heights with compact cluster formation. Bottom-left: Comparison showing how Ward's method creates more balanced clusters than other linkage methods. Bottom-right: Variance plot showing how within-cluster variance increases minimally with each merge.

This demonstrates Ward's method's strength in creating compact, balanced clusters

Ward's Method Clustering Demo

Click "Generate Demo" to see Ward's method clustering in action

Method Comparison

Understanding the differences between linkage methods is crucial for selecting the most appropriate approach for your specific clustering problem.

Comparative Analysis

Each linkage method has distinct characteristics that make it suitable for different types of data and clustering objectives.

Linkage Method Properties Comparison

Method	Cluster Shape	Chaining	Outlier Sensitivity	Computational Cost	Best For
Single Linkage	Elongated, irregular	High	Very sensitive	Low	Non-spherical clusters
Complete Linkage	Compact, spherical	Low	Robust	High	Well-separated clusters
Average Linkage	Balanced	Moderate	Moderate	High	General purpose
Ward's Method	Compact, spherical	Low	Robust	High	Similar-sized clusters
Centroid Linkage	Variable	Moderate	Moderate	Medium	Size-aware clustering

Selection Guidelines

Choosing the right linkage method depends on your data characteristics and clustering objectives.

Method Selection Criteria

Data Characteristics:

Cluster shape: Spherical vs. elongated vs. irregular
Cluster size: Similar vs. varying sizes
Noise level: Clean data vs. noisy data with outliers
Separation: Well-separated vs. overlapping clusters

Application Requirements:

Interpretability: Need for clear cluster boundaries
Computational resources: Time and memory constraints
Domain knowledge: Biological, social, or business context
Validation needs: Requirements for robust clustering

Performance Considerations

Different linkage methods have varying computational requirements and performance characteristics.

Computational Complexity Comparison

Method	Time Complexity	Space Complexity	Memory Usage	Scalability
Single Linkage	O(n²)	O(n)	Low	Good
Complete Linkage	O(n² log n)	O(n²)	High	Poor
Average Linkage	O(n² log n)	O(n²)	High	Poor
Ward's Method	O(n² log n)	O(n²)	High	Poor
Centroid Linkage	O(n² log n)	O(n²)	Medium	Moderate

Visualization: Method Comparison

Image Description: A 2x3 grid showing the same dataset clustered with different linkage methods. Top row: Single linkage (elongated clusters), Complete linkage (compact clusters), Average linkage (balanced clusters). Bottom row: Ward's method (spherical clusters), Centroid linkage (size-aware clusters), and a comparison dendrogram showing different merge patterns.

This demonstrates how different linkage methods produce different clustering results on the same data

Interactive Demos

Explore different linkage methods through interactive demonstrations that show how each method behaves on various types of data.

Linkage Method Comparison Demo

Dataset Type:

Linkage Method:

Dataset

Clustering Result

Dendrogram Visualization Demo

Number of Clusters: 3

Data Points

Dendrogram

Test Your Linkage Methods Knowledge

Think of this quiz like a linkage methods certification test:

It's okay to get questions wrong: That's how you learn! Wrong answers help you identify what to review
Each question teaches you something: Even if you get it right, the explanation reinforces your understanding
It's not about the score: It's about making sure you understand the key concepts
You can take it multiple times: Practice makes perfect!

Evaluate your understanding of linkage criteria, their mathematical foundations, and practical applications.

What This Quiz Covers

This quiz tests your understanding of:

Single linkage: How to measure minimum distances between clusters
Complete linkage: How to measure maximum distances between clusters
Average linkage: How to measure average distances between clusters
Ward's method: How to minimize variance when merging clusters
Method comparison: When to use each linkage method

Don't worry if you don't get everything right the first time - that's normal! The goal is to learn.

Question 1: Which linkage method is most prone to chaining?

Complete linkage
Single linkage
Average linkage
Ward's method

Question 2: What does Ward's method minimize?

Maximum distance between clusters
Average distance between clusters
Increase in within-cluster variance
Total number of clusters

Question 3: Which linkage method is most robust to outliers?

Single linkage
Complete linkage
Average linkage
Centroid linkage

Question 4: What is the time complexity of most linkage methods?

O(n log n)
O(n²)
O(n² log n)
O(n³)

Question 5: Which linkage method is also known as UPGMA?

Single linkage
Complete linkage
Average linkage
Ward's method

Quiz Score

Correct answers: 0 / 5

Learning Objectives

Linkage Criteria: The Heart of Hierarchical Clustering

Why Linkage Criteria Matter

Mathematical Framework for Linkage Criteria

General Linkage Framework

Basic Setup:

Lance-Williams Formula:

Key Properties:

Classification of Linkage Methods

Computational Complexity Overview

Complexity Analysis Summary

Time Complexity (Basic Algorithms):

Space Complexity:

Practical Scalability:

Visualization: Linkage Criteria Comparison

Linkage Selection Guidelines

Data Characteristics

Computational Considerations

Application Domains

Single Linkage: Nearest Neighbor Clustering

Mathematical Definition and Properties

Single Linkage Mathematical Framework

Basic Definition:

Lance-Williams Parameters:

Equivalent Formulation:

Connection to Minimum Spanning Trees

MST-Single Linkage Equivalence Theorem

Main Result:

Proof Sketch:

Algorithmic Implications:

Chaining Phenomenon

Understanding the Chaining Effect

Mechanism:

When Chaining is Beneficial:

When Chaining is Problematic:

Visualization: Single Linkage Behavior

Random Initialization Algorithm

Basic Random Selection:

Random Uniform in Feature Space:

Advantages of Random Initialization:

Disadvantages:

Furthest-First Heuristic

Furthest-First Initialization

Advantages:

Disadvantages:

Complete Linkage: Maximum Distance Clustering

Mathematical Definition and Properties

Complete Linkage Mathematical Framework

Basic Definition:

Lance-Williams Parameters:

Equivalent Formulation:

Compact Cluster Formation

Understanding Compact Cluster Formation

Mechanism:

Advantages of Compact Clusters:

When Complete Linkage is Ideal:

Visualization: Complete Linkage Behavior

K-means++ Initialization Algorithm

Key Insight:

Mathematical Formulation:

Theoretical Analysis

K-means++ Approximation Guarantee

Main Theorem (Arthur & Vassilvitskii, 2007):

Proof Sketch:

Implications:

Initialization Comparison Demo

Average Linkage: Balanced Clustering

Mathematical Definition and Properties

Average Linkage Mathematical Framework

Basic Definition:

Lance-Williams Parameters:

Weighted Average Interpretation:

UPGMA and Phylogenetic Applications

UPGMA in Phylogenetic Analysis

Historical Context:

Key Assumptions:

Modern Applications:

Visualization: Average Linkage Behavior

Computational Complexity and Efficiency