Chapter 11: Unsupervised Learning — Interview Deep Review - ML Software Engineering: Interview Concept Review

Learning Objectives

By the end of this chapter, you will be able to:

Relate Unsupervised Learning — Interview Deep Review to common ML software engineering interview questions and trade-offs.
Explain when this topic deserves a deeper pass through another tutorial on this site versus staying at recap depth.
Surface assumptions, pitfalls, and follow-up probes an interviewer is likely to use.

PCA as optimal linear compression (under variance lens)

Given centered data matrix X, PCA finds orthogonal directions maximizing variance. Leading eigenvectors of covariance Σ are principal axes; eigenvalues encode explained variance.

Eigenpairs story: eigenvectors point along stretch directions; repeating this aloud beats reciting SVD triple product unless linear algebra round deepens.

Relation to SVD: for centered X, SVD provides numerically stable PCA; singular values ↔ sqrt of eigenvalues.

Nonlinear embeddings (interview caution)

t-SNE preserves local neighborhoods for visualization—distances between clusters not globally meaningful; perplexity hyperparameter shifts cluster appearance.

Isomap approximates geodesics via kNN graph—better when data lie on manifold but costlier.

K-means vs Gaussian mixture models

k-means hard-assigns to nearest centroid; minimizes within-cluster dispersion assuming spherical-ish clusters equal variance—use elbow/silhouette cautiously.

GMM + EM soft-assigns posterior responsibilities; excels with overlapping elliptical clusters. EM alternates E-step (posterior assignment) vs M-step (update means/covariances/weights)— articulate local optima reliance + initialization via k-means.

Factorization machines (elevator)

FM models pairwise feature interactions via low-rank embeddings—hits CTR prediction with sparse categorical fields; contrasts with exploding explicit cross-products.

Interview prompts

Covariance matrix interpretation?
Orthogonal principal components rationale?
Choosing k pragmatically?
k-means vs GMM failure modes?

Go deeper on this site

Comprehensive Clustering Analysis · Linear geometry intuition → Matrix–Vector Multiplication

1. PCA components orthogonal because:

Eigenvectors of symmetric covariance uncorrelated across components—captures orthogonal variance directions sequentially.
CSV files require perpendicular columns.