Chapter 10: Supervised Learning — Interview Deep Review
Supervised Learning — Interview Deep Review in ML Software Engineering: Interview Concept Review.
Learning Objectives
By the end of this chapter, you will be able to:
- Relate Supervised Learning — Interview Deep Review to common ML software engineering interview questions and trade-offs.
- Explain when this topic deserves a deeper pass through another tutorial on this site versus staying at recap depth.
- Surface assumptions, pitfalls, and follow-up probes an interviewer is likely to use.
Supervised I recap: linear world + regularization
Linear regression models targets as affine functions of features—works when signal approximately linear and features scaled. Polynomial features increase expressivity but balloon variance; pair with penalties or cross-validation depth control.
L2 ridge shrinks coefficients jointly, stabilizing ill-conditioned designs. L1 lasso promotes sparsity—good when many irrelevant features; informs feature selection narrative but watch correlated groups: lasso may arbitrarily pick one.
Logistic regression applies sigmoid on linear score to map ℝ→(0,1); never treat raw linear scores as calibrated probabilities without justification—link + training loss matter.
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
k-NN and distance thinking
Lazy learner: stores data, votes from neighborhood. Curse of dimensionality makes distance metrics less meaningful in high-D sparse spaces—interviewers want you to tie to scaling, PCA, or embeddings.
Metrics: articulate precision/recall trade, ROC vs PR per prior chapter, and class-weighted strategies for imbalance.
Supervised II: margins, Bayes, ensembles
Naive Bayes: cheap generative baseline with conditional independence assumption—fails when correlated features break naivety but still strong text spam baselines.
SVM: maximize margin; kernels lift to implicit feature space. Cost C trades margin vs violations; γ in RBF tweaks locality. Contrast with logistic: SVM focuses on support vectors; logistic supplies probabilistic semantics after calibration.
Decision trees / RF / boosting: see prior two chapters for depth; here emphasize when interviewer expects you to pivot from linear models to tree ensembles (nonlinear structure, heterogenous features, partial missingness handling with surrogate splits—high-level ok).
Bias–variance & stacking sound bite
Bagging trims variance; boosting attacks bias; stacking meta-learns but needs careful OOF generation to avoid leakage—say that explicitly.
Example FAANG-style follow-ups
- L1 vs L2 effect on correlated predictors?
- Why not use raw linear regression outputs as probabilities?
- ROC mechanics in words without memorizing every threshold.
- When does SVM outperform random forests and vice versa?
Go deeper on this site
1. Largest difference between logistic vs linear regression on binary labels?
2. SVM margin maximization mainly improves: