Chapter 2: Distance Metrics Fundamentals

Master the mathematical foundations of distance metrics, the building blocks of clustering algorithms

Learning Objectives

  • Understand the mathematical definitions and properties of Euclidean distance
  • Master Manhattan distance theory and geometric interpretations
  • Learn formal proofs of metric space properties
  • Analyze computational complexity and efficiency considerations
  • Apply distance metrics to real-world clustering problems
  • Compare and contrast different distance measures through interactive demos
  • Understand when to choose each metric for specific data types

Metric Space Theory: The Mathematical Foundation

Think of metric space theory like learning the rules of measurement:

  • Just like measuring distance with a ruler: There are certain rules that make sense - you can't have negative distances, the distance from A to B should be the same as from B to A
  • These rules apply everywhere: Whether you're measuring the distance between cities, comparing products, or analyzing data points
  • Understanding these rules helps you choose the right "ruler": Different situations need different ways of measuring
  • It's the foundation for everything else: Once you understand these basic rules, all distance metrics make sense

Before diving into specific distance metrics, we must understand the mathematical framework that underlies all distance measures in clustering. A metric space provides the formal foundation for measuring similarity and dissimilarity between data points.

Why Metric Space Theory Matters

Understanding metric space theory helps you:

  • Choose the right distance metric: Know which "ruler" to use for your specific problem
  • Understand why algorithms work: See the mathematical reasoning behind clustering methods
  • Design your own metrics: Create custom ways to measure similarity for your data
  • Troubleshoot problems: Understand when and why distance metrics might fail

The Four Rules of Distance Measurement

Think of these rules like the basic principles of any good measurement system:

  • Rule 1 - Non-negativity: You can't have a negative distance (like saying "New York is -50 miles from Boston")
  • Rule 2 - Identity of indiscernibles: If two points are in the exact same place, the distance between them is zero
  • Rule 3 - Symmetry: The distance from A to B is the same as from B to A (like driving to work and back)
  • Rule 4 - Triangle inequality: Going directly from A to C is never longer than going from A to B to C (the shortest distance between two points is a straight line)

Definition of a Metric Space

A metric space is an ordered pair (X, d) where X is a set and d is a metric on X. A metric d: X × X → ℝ is a function that satisfies four fundamental properties for all x, y, z ∈ X:

1. Non-negativity (Positivity)

d(x, y) ≥ 0

Mathematical definition: The distance between any two points is always non-negative.

In Plain English: You can't have a negative distance. It doesn't make sense to say "Point A is -5 units away from Point B."

Real-world analogy: Like saying "New York is -50 miles from Boston" - that's impossible!

2. Identity of Indiscernibles

d(x, y) = 0 ⟺ x = y

Mathematical definition: The distance is zero if and only if the two points are identical.

In Plain English: The only way two points can have zero distance between them is if they're actually the same point.

Real-world analogy: The distance from your house to your house is zero - because they're the same place!

3. Symmetry

d(x, y) = d(y, x)

Mathematical definition: The distance from x to y equals the distance from y to x.

In Plain English: Distance is the same whether you're going from A to B or from B to A.

Real-world analogy: Driving from New York to Boston is the same distance as driving from Boston to New York (assuming the same route).

4. Triangle Inequality

d(x, z) ≤ d(x, y) + d(y, z)

Mathematical definition: The direct distance between two points is always less than or equal to any indirect path through a third point.

In Plain English: Taking a direct route is never longer than taking a detour through a third point.

Real-world analogy: Flying directly from New York to Los Angeles is never longer than flying from New York to Chicago, then from Chicago to Los Angeles.

Metric Space Properties

Four separate diagrams illustrating each metric property with geometric examples

Why These Properties Matter

Think of these properties like the safety rules for a measurement system:

  • Without these rules: Clustering algorithms could produce nonsensical results - like grouping points that are actually far apart
  • With these rules: We can trust that our distance measurements make sense and our clustering results are meaningful
  • They ensure consistency: No matter which algorithm you use, if it follows these rules, it will behave predictably
  • They match our intuition: These rules encode what we already know about distance from everyday experience

These four properties are not arbitrary mathematical abstractions—they encode our intuitive understanding of distance and ensure that clustering algorithms behave predictably and meaningfully.

Real-World Example: GPS Navigation

How these properties work in GPS systems:

  • Non-negativity: GPS never tells you a destination is "-2 miles away"
  • Identity: If you're already at your destination, GPS shows "0.0 miles"
  • Symmetry: The distance from Home to Work is the same as Work to Home (same route)
  • Triangle Inequality: GPS will never suggest a route that's longer than necessary

Without these properties, GPS would give you nonsensical directions!

Non-negativity Impact

What it means: All distances are positive numbers, making clustering results consistent and interpretable.

Real-world analogy: Like having a ruler that only shows positive measurements - you always know what "closer" means.

Why it matters: K-means centroids are always meaningful since all distances are positive, so the algorithm knows which points are truly closest to each center.

Without it: Algorithms might group points that are actually far apart, leading to nonsensical clusters.

Identity Importance

What it means: Identical points have zero distance between them, ensuring they're treated as the same entity.

Real-world analogy: Like having two identical twins - they're treated as the same person for clustering purposes.

Why it matters: Prevents artificial cluster fragmentation due to duplicate data points.

Without it: Identical points might be treated as separate, leading to artificial clusters.

Symmetry Significance

What it means: Distance from A to B is the same as from B to A, making clustering algorithms work consistently.

Real-world analogy: Like a two-way street - the distance is the same whether you're going north or south.

Why it matters: Hierarchical clustering linkage calculations require symmetric distances for consistent results.

Without it: Clustering might depend on the order you process the data, giving different results each time.

Triangle Inequality Utility

What it means: Direct paths are never longer than indirect ones, enabling efficient clustering algorithms.

Real-world analogy: Like GPS always finding the shortest route - no unnecessary detours.

Why it matters: DBSCAN uses triangle inequality to efficiently find neighbors, making it much faster.

Without it: Algorithms would have to check every possible path, making them extremely slow.

Mathematical Notation and Conventions

Throughout this course, we'll use consistent mathematical notation. Understanding this notation is crucial for following the theoretical developments.

Standard Notation

  • ℝⁿ: n-dimensional real vector space
  • x, y, z: Points/vectors in the space (typically column vectors)
  • xᵢ: The i-th component of vector x
  • ‖x‖: Norm of vector x
  • ⟨x, y⟩: Inner product (dot product) of vectors x and y
  • d(x, y): Distance between points x and y
  • ∀: "For all" (universal quantifier)
  • ∃: "There exists" (existential quantifier)
  • ⟺: "If and only if" (bidirectional implication)

Common Distance Families

Distance metrics can be classified into several major families, each with distinct mathematical properties and optimal use cases.

Distance Family Examples Mathematical Form Best For
Lp Norms Euclidean, Manhattan, Chebyshev (Σᵢ |xᵢ - yᵢ|ᵖ)^(1/p) Continuous data, geometric problems
Angular Metrics Cosine, Angular distance Based on vector angles High-dimensional, sparse data
Edit Distances Hamming, Levenshtein Character/element operations Strings, sequences, categorical data
Statistical Distances Mahalanobis, Chi-squared Based on distributions Correlated features, statistical data

Cache-Aware Computing

Memory Hierarchy Considerations:
  • Cache Lines: Modern CPUs load 64-byte cache lines
  • Spatial Locality: Adjacent memory accesses are faster
  • Temporal Locality: Recently accessed data is faster
  • Cache Misses: Can be 100x slower than cache hits
Optimization Strategies:
  • Row-major layout: Store points contiguously for better cache performance
  • Blocking: Process data in cache-sized chunks
  • Prefetching: Load next data while computing current
  • Memory alignment: Align data structures to cache line boundaries
Practical Impact:

Well-optimized distance calculations can be 5-10x faster than naive implementations, with Manhattan distance typically showing greater improvement due to simpler operations.

Algorithm-Specific Optimizations

Different clustering algorithms can leverage specific properties of distance metrics for significant performance improvements.

K-means Optimizations

Euclidean Distance:
  • Squared distances: Avoid square root in comparison
  • Triangle inequality: Skip calculations when possible
  • Precompute centroids: Cache ‖centroid‖² values
  • BLAS libraries: Use optimized linear algebra
Manhattan Distance:
  • Early termination: Stop when distance exceeds threshold
  • Median updates: Use median instead of mean for centroids
  • Sparse optimization: Skip zero components
  • Integer arithmetic: Use when data allows

Hierarchical Clustering

Distance Matrix Optimization:
  • Symmetry: Compute only upper triangle
  • Sparse storage: Use compressed formats for large matrices
  • Incremental updates: Update only affected distances
  • Parallel computation: Distribute matrix calculations

Approximation Methods

For very large datasets, exact distance calculations may be too expensive. Various approximation methods can provide significant speedups with controlled accuracy loss.

Fast Approximation Techniques

  • Random Projections: Johnson-Lindenstrauss lemma for dimensionality reduction
  • Locality-Sensitive Hashing (LSH): Hash similar points to same buckets
  • Sampling: Use subset of features for distance estimation
  • Quantization: Reduce precision for faster computation

Euclidean Distance: The Foundation of Geometric Clustering

Think of Euclidean distance like measuring with a ruler in any direction:

  • It's the "straight line" distance: Like measuring the shortest path between two points on a map
  • It works in any dimension: Whether you're measuring in 2D (like on a map) or 100D (like comparing products with 100 features)
  • It's what we naturally think of as distance: When someone asks "how far apart are these two cities?", this is what they mean
  • It's the foundation for most clustering: Many algorithms assume you're using this type of distance

Euclidean distance is the most intuitive and widely used distance metric in clustering. It represents the straight-line distance between two points in multidimensional space, making it the natural choice for many clustering algorithms.

Why Euclidean Distance is So Important

Euclidean distance is the "gold standard" because:

  • It matches our intuition: When you think "distance," you're thinking Euclidean distance
  • It works well with circular/spherical clusters: Like organizing people by height and weight
  • It's mathematically well-behaved: Follows all the metric space properties perfectly
  • It's computationally efficient: Fast to calculate, even with many dimensions

Understanding the Formula Step by Step

Let's break down the Euclidean distance formula like solving a puzzle:

  • Step 1 - Find the differences: For each feature, subtract the values (xᵢ - yᵢ)
  • Step 2 - Square the differences: This makes everything positive and emphasizes larger differences
  • Step 3 - Add them up: Sum all the squared differences (Σᵢ₌₁ᵈ)
  • Step 4 - Take the square root: This gives you the final distance

Real-world analogy: Like measuring the diagonal of a rectangle - you square the width and height, add them, then take the square root.

Mathematical Definition

Euclidean Distance Formula

d_E(x, y) = √(Σᵢ₌₁ᵈ (xᵢ - yᵢ)²)
Formula Breakdown (In Plain English):
  • d_E(x, y): The Euclidean distance between points x and y
  • √: Square root (like finding the hypotenuse of a triangle)
  • Σᵢ₌₁ᵈ: "Add up for each feature" - go through each dimension
  • (xᵢ - yᵢ)²: "Square the difference" - like (3-1)² = 4

Example: If you have two points (3,4) and (1,2), the distance is √((3-1)² + (4-2)²) = √(4 + 4) = √8 ≈ 2.83

Where:

  • x, y: Two data points (like two customers, two products, etc.)
  • xᵢ, yᵢ: The value of feature i for each point (like height, weight, price)
  • d: The number of features/dimensions (like having 3 features: height, weight, age)

Vector Notation

d_E(x, y) = ||x - y||₂

This represents the L2 norm (Euclidean norm) of the vector difference between x and y.

Geometric Interpretation

In 2D space, Euclidean distance corresponds to the familiar Pythagorean theorem. For points (x₁, y₁) and (x₂, y₂):

d = √((x₂ - x₁)² + (y₂ - y₁)²)

This extends naturally to higher dimensions, where we sum the squared differences across all dimensions and take the square root.

Properties of Euclidean Distance

  • Scale Sensitivity: Euclidean distance is sensitive to the scale of features
  • Rotation Invariant: Distance remains unchanged under rotations
  • Translation Invariant: Distance is unaffected by translations
  • Computational Complexity: O(d) for d-dimensional vectors

Visualization: Euclidean Distance in Different Dimensions

Interactive visualization showing Euclidean distance calculations in 2D, 3D, and higher dimensions

Multi-dimensional Perspective: See how Euclidean distance scales with dimensionality and understand the geometric intuition behind the formula.

Manhattan Distance: The City Block Metric

Think of Manhattan distance like walking through a city with a grid layout:

  • You can't cut through buildings: Like a taxi in Manhattan, you have to follow the streets
  • You can only go up/down and left/right: No diagonal shortcuts allowed
  • It's the sum of horizontal and vertical distances: Add up all the blocks you walk
  • It's often longer than the straight-line distance: But more realistic for many situations

Manhattan distance, also known as L1 distance or taxicab distance, measures distance along axes at right angles. It's called "Manhattan distance" because it resembles the path a taxi would take through city streets that are laid out in a grid pattern.

Why Manhattan Distance Matters

Manhattan distance is perfect when:

  • You have high-dimensional data: Works better than Euclidean with many features
  • You want to be less sensitive to outliers: Large differences don't dominate as much
  • Your data has different scales: More robust to features measured in different units
  • You're dealing with sparse data: When most values are zero, Manhattan works better

Understanding the Manhattan Formula

Let's break down Manhattan distance like counting city blocks:

  • Step 1 - Find the differences: For each feature, subtract the values (xᵢ - yᵢ)
  • Step 2 - Take absolute values: Make everything positive (|xᵢ - yᵢ|)
  • Step 3 - Add them up: Sum all the absolute differences (Σᵢ₌₁ᵈ)
  • No square root needed: Unlike Euclidean, we don't take the square root

Real-world analogy: Like counting how many city blocks you need to walk - you add up horizontal blocks + vertical blocks.

Mathematical Definition

Manhattan Distance Formula

d_M(x, y) = Σᵢ₌₁ᵈ |xᵢ - yᵢ|
Formula Breakdown (In Plain English):
  • d_M(x, y): The Manhattan distance between points x and y
  • Σᵢ₌₁ᵈ: "Add up for each feature" - go through each dimension
  • |xᵢ - yᵢ|: "Absolute difference" - like |3-1| = 2 (always positive)

Example: If you have two points (3,4) and (1,2), the distance is |3-1| + |4-2| = 2 + 2 = 4

Where:

  • x, y: Two data points (like two addresses in a city)
  • |xᵢ - yᵢ|: The absolute difference in feature i (like "how many blocks apart in this direction")
  • d: The number of features/dimensions (like having 2 directions: north-south and east-west)

Vector Notation

d_M(x, y) = ||x - y||₁

This represents the L1 norm (Manhattan norm) of the vector difference between x and y.

Geometric Interpretation

In 2D space, Manhattan distance represents the sum of horizontal and vertical distances. For points (x₁, y₁) and (x₂, y₂):

d = |x₂ - x₁| + |y₂ - y₁|

Unlike Euclidean distance, Manhattan distance doesn't allow diagonal movement, making it more robust to outliers in individual dimensions.

Properties of Manhattan Distance

  • Outlier Robustness: Less sensitive to extreme values in individual dimensions
  • Feature Independence: Each dimension contributes independently to the total distance
  • Computational Efficiency: O(d) complexity, often faster than Euclidean distance
  • Discrete Optimization: Natural choice for integer-valued features

Visualization: Manhattan vs Euclidean Distance

Side-by-side comparison showing Manhattan (L1) and Euclidean (L2) distance paths between the same two points

Path Comparison: Visual demonstration of how Manhattan distance follows grid-like paths while Euclidean distance takes the direct route.

Optimization Techniques for Distance-Based Clustering

Think of optimization like finding the best way to organize a messy room:

  • You start with a goal: Make the room as organized as possible
  • You try different arrangements: Move items around to see what works better
  • You measure your progress: Keep track of how "good" each arrangement is
  • You stop when you can't improve anymore: You've found the best organization

Understanding how distance metrics are optimized in clustering algorithms is crucial for both theoretical understanding and practical implementation. Different optimization techniques are used depending on the clustering algorithm and the specific distance metric employed.

What is Optimization in Clustering?

Optimization in clustering means:

  • Finding the best way to group data: Like organizing books by topic instead of randomly
  • Minimizing the "cost" of clustering: Making sure similar items are together
  • Using mathematical techniques: Algorithms that automatically find good solutions
  • Iteratively improving: Starting with a guess and getting better over time

K-means Optimization with Euclidean Distance

Objective Function

J = Σᵢ₌₁ᵏ Σₓ∈Cᵢ ||x - μᵢ||²

Where:

  • k is the number of clusters
  • Cᵢ is the set of points in cluster i
  • μᵢ is the centroid of cluster i
  • ||x - μᵢ||² is the squared Euclidean distance

Optimal Centroid Update

μᵢ* = (1/|Cᵢ|) Σₓ∈Cᵢ x

The optimal centroid is the arithmetic mean of all points in the cluster, which minimizes the sum of squared Euclidean distances.

Gradient Descent for Distance Optimization

For more complex clustering algorithms, gradient-based optimization can be used to minimize distance-based objective functions:

θ_{t+1} = θₜ - α ∇J(θₜ)

Where:

  • θ represents the parameters being optimized
  • α is the learning rate
  • ∇J(θₜ) is the gradient of the objective function

Computational Complexity Analysis

  • Euclidean Distance: O(d) per pair, O(n²d) for all pairs
  • Manhattan Distance: O(d) per pair, O(n²d) for all pairs
  • Optimization with K-means: O(nktd) where t is iterations
  • Memory Requirements: O(n²) for distance matrix storage

Interactive Distance Optimization Demo

3

Click "Run Optimization" to see how different distance metrics affect clustering results

Applications and Real-World Examples

Think of choosing distance metrics like choosing the right tool for a job:

  • Euclidean distance: Like using a ruler - perfect for measuring straight-line distances
  • Manhattan distance: Like counting city blocks - perfect when you can't go diagonally
  • The right choice depends on your data: Different problems need different approaches
  • Real examples help you decide: Seeing how others solved similar problems

Understanding when and how to apply Euclidean versus Manhattan distance requires examining real-world scenarios where each metric's properties align with problem characteristics. This section explores diverse applications across multiple domains, providing practical guidance for metric selection.

How to Choose the Right Distance Metric

Use Euclidean distance when:

  • Your data is continuous: Like height, weight, temperature, prices
  • Features are on similar scales: All measured in similar units
  • You expect circular/spherical clusters: Like organizing people by height and weight
  • You want the most intuitive results: Straight-line distances make sense

Use Manhattan distance when:

  • You have high-dimensional data: Many features (like 50+ attributes)
  • Features have very different scales: Some in dollars, others in percentages
  • You want to be less sensitive to outliers: Extreme values shouldn't dominate
  • Your data is sparse: Most values are zero

E-commerce and Recommendation Systems

Distance metrics play a crucial role in recommendation systems, where the choice between Euclidean and Manhattan distance can significantly impact recommendation quality and user experience.

Product Similarity (Euclidean)

Use Case: Finding similar products based on numerical features

Features: Price, rating, dimensions, weight

Example: Camera similarity
Product A: [price: 500, rating: 4.2, megapixels: 24, weight: 600g]
Product B: [price: 520, rating: 4.1, megapixels: 26, weight: 580g]
Euclidean distance captures overall similarity well

Why Euclidean: Features are continuous and correlations matter

User Behavior (Manhattan)

Use Case: Finding similar users based on categorical preferences

Features: Category purchases, brand preferences, activity counts

Example: User similarity
User A: [books: 5, electronics: 2, clothing: 8, sports: 0]
User B: [books: 7, electronics: 1, clothing: 6, sports: 1]
Manhattan distance better handles discrete counts

Why Manhattan: Features are counts/frequencies, independent categories

Healthcare and Medical Diagnosis

Medical applications require careful consideration of distance metrics, as the choice can impact diagnostic accuracy and patient outcomes.

Medical Data Types and Metric Selection

Continuous Medical Measurements (Euclidean):
  • Vital signs: Blood pressure, heart rate, temperature
  • Lab values: Blood glucose, cholesterol, protein levels
  • Physical measurements: Height, weight, BMI
  • Imaging features: Tumor dimensions, organ volumes
Discrete Medical Data (Manhattan):
  • Symptom counts: Number of symptoms present
  • Medication dosages: Discrete pill counts
  • Frequency data: Episodes per month, visits per year
  • Severity scales: Pain scales (1-10), functional scores
Case Study: Patient Similarity for Treatment Recommendation
Scenario: Finding similar patients for personalized treatment
Data: Mixed continuous (age, BMI, lab values) and discrete (symptom counts, severity scores)
Solution: Combine normalized Euclidean for continuous features with Manhattan for discrete features
Formula: d_total = w₁ × d_E(continuous) + w₂ × d_M(discrete)

Geographic and Location-Based Services

Geographic applications provide clear intuitive examples of when each distance metric is appropriate.

Geographic Distance Comparison

City map showing Euclidean (straight-line), driving route, and Manhattan grid distances

Air Travel (Euclidean)

Application: Flight routing, airport clustering

Why Euclidean: Aircraft can travel in straight lines (great circle distances)

Example: Grouping airports by geographic proximity for hub-and-spoke networks

Ground Transportation (Manhattan)

Application: Urban delivery, taxi routing

Why Manhattan: Roads constrain movement to grid-like patterns

Example: Optimizing delivery routes in downtown areas with grid street layouts

Service Area Planning

Application: Emergency services, retail locations

Metric Choice: Depends on service type and terrain

Example: Helicopter emergency services (Euclidean) vs ambulance services (Manhattan/road network)

Financial Services and Risk Analysis

Financial applications require careful metric selection as the choice can significantly impact risk assessment and portfolio optimization.

Portfolio Optimization and Risk Management

Asset Correlation Analysis (Euclidean):
  • Use Case: Measuring similarity between asset returns
  • Features: Daily returns, volatility, correlation coefficients
  • Why Euclidean: Captures overall portfolio risk and return relationships
Transaction Pattern Analysis (Manhattan):
  • Use Case: Fraud detection, customer segmentation
  • Features: Transaction counts, frequency, amounts
  • Why Manhattan: Robust to outliers, handles discrete transaction patterns

Computer Vision and Image Processing

Image processing applications demonstrate how different distance metrics capture different aspects of visual similarity.

Pixel-Level Analysis (Euclidean)

Use Case: Image segmentation, color clustering

Features: RGB values, pixel coordinates

Why Euclidean: Natural for continuous color space and spatial relationships

Feature-Based Analysis (Manhattan)

Use Case: Robust feature matching, edge detection

Features: Gradient magnitudes, texture descriptors

Why Manhattan: Less sensitive to noise, better for discrete features

Interactive Distance Calculator

Think of this calculator like a distance measuring tool:

  • You can place two points anywhere: Like marking two spots on a map
  • You can see both distance measurements: Euclidean (straight line) and Manhattan (city blocks)
  • You can experiment with different positions: See how the distances change
  • You can understand the differences: When one is much larger than the other

Experiment with different distance metrics and see how they behave with various data points. This interactive calculator helps you understand the practical differences between Euclidean and Manhattan distances.

How to Use This Calculator

Step-by-step guide:

  1. Enter coordinates for Point 1: Like (1, 1) for position (1, 1)
  2. Enter coordinates for Point 2: Like (4, 3) for position (4, 3)
  3. Click "Calculate Distance": See both Euclidean and Manhattan distances
  4. Try different positions: See how distances change with different points
  5. Compare the results: Notice when Manhattan is much larger than Euclidean

Tip: Try points that are far apart diagonally - you'll see the biggest difference between the two metrics!

Distance Calculator

Calculated Distances

Euclidean Distance: 5.00
Manhattan Distance: 7.00
Ratio (Manhattan/Euclidean): 1.40

Understanding the Results

  • Euclidean Distance: Always represents the shortest path (straight line)
  • Manhattan Distance: Represents the sum of horizontal and vertical distances
  • Ratio Analysis: The ratio shows how much longer the Manhattan path is compared to the direct Euclidean path
  • Dimensional Scaling: Try different point coordinates to see how the relationship changes

Test Your Distance Metrics Knowledge

Think of this quiz like a practice test for driving:

  • It's okay to get questions wrong: That's how you learn! Wrong answers help you identify what to review
  • Each question teaches you something: Even if you get it right, the explanation reinforces your understanding
  • It's not about the score: It's about making sure you understand the key concepts
  • You can take it multiple times: Practice makes perfect!

Evaluate your understanding of distance metrics, mathematical properties, and their applications in clustering.

What This Quiz Covers

This quiz tests your understanding of:

  • Metric space properties: The four rules that make distance measurements work
  • Euclidean vs Manhattan distance: When to use each one and why
  • Mathematical formulas: Understanding what the symbols mean
  • Real-world applications: How distance metrics are used in practice
  • Optimization concepts: How algorithms use distance metrics

Don't worry if you don't get everything right the first time - that's normal! The goal is to learn.

Question 1: Metric Properties

Which property of metric spaces ensures that distance calculations are symmetric?





Question 2: Manhattan Distance

What is the main advantage of Manhattan distance over Euclidean distance?





Question 3: K-means Centroid

In K-means clustering, why is the arithmetic mean the optimal centroid when using Euclidean distance?





Quiz Score

Correct answers: 0 / 3