Chapter 11: Unsupervised Learning — Interview Deep Review
Unsupervised Learning — Interview Deep Review in ML Software Engineering: Interview Concept Review.
Learning Objectives
By the end of this chapter, you will be able to:
- Relate Unsupervised Learning — Interview Deep Review to common ML software engineering interview questions and trade-offs.
- Explain when this topic deserves a deeper pass through another tutorial on this site versus staying at recap depth.
- Surface assumptions, pitfalls, and follow-up probes an interviewer is likely to use.
PCA as optimal linear compression (under variance lens)
Given centered data matrix X, PCA finds orthogonal directions maximizing variance. Leading eigenvectors of covariance Σ are principal axes; eigenvalues encode explained variance.
Eigenpairs story: eigenvectors point along stretch directions; repeating this aloud beats reciting SVD triple product unless linear algebra round deepens.
Relation to SVD: for centered X, SVD provides numerically stable PCA; singular values ↔ sqrt of eigenvalues.
Nonlinear embeddings (interview caution)
t-SNE preserves local neighborhoods for visualization—distances between clusters not globally meaningful; perplexity hyperparameter shifts cluster appearance.
Isomap approximates geodesics via kNN graph—better when data lie on manifold but costlier.
K-means vs Gaussian mixture models
k-means hard-assigns to nearest centroid; minimizes within-cluster dispersion assuming spherical-ish clusters equal variance—use elbow/silhouette cautiously.
GMM + EM soft-assigns posterior responsibilities; excels with overlapping elliptical clusters. EM alternates E-step (posterior assignment) vs M-step (update means/covariances/weights)— articulate local optima reliance + initialization via k-means.
Factorization machines (elevator)
FM models pairwise feature interactions via low-rank embeddings—hits CTR prediction with sparse categorical fields; contrasts with exploding explicit cross-products.
Interview prompts
- Covariance matrix interpretation?
- Orthogonal principal components rationale?
- Choosing k pragmatically?
- k-means vs GMM failure modes?
Go deeper on this site
Comprehensive Clustering Analysis · Linear geometry intuition → Matrix–Vector Multiplication
1. PCA components orthogonal because: