Course ML Software Engineering: Interview Concept Review Chapter 10 Difficulty intermediate Estimated Time 900 min

Chapter 10: Supervised Learning — Interview Deep Review

Supervised Learning — Interview Deep Review in ML Software Engineering: Interview Concept Review.

59% complete

Learning Objectives

By the end of this chapter, you will be able to:

  • Relate Supervised Learning — Interview Deep Review to common ML software engineering interview questions and trade-offs.
  • Explain when this topic deserves a deeper pass through another tutorial on this site versus staying at recap depth.
  • Surface assumptions, pitfalls, and follow-up probes an interviewer is likely to use.

← Back to course

Supervised I recap: linear world + regularization

Linear regression models targets as affine functions of features—works when signal approximately linear and features scaled. Polynomial features increase expressivity but balloon variance; pair with penalties or cross-validation depth control.

L2 ridge shrinks coefficients jointly, stabilizing ill-conditioned designs. L1 lasso promotes sparsity—good when many irrelevant features; informs feature selection narrative but watch correlated groups: lasso may arbitrarily pick one.

Logistic regression applies sigmoid on linear score to map ℝ→(0,1); never treat raw linear scores as calibrated probabilities without justification—link + training loss matter.

import numpy as np
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

k-NN and distance thinking

Lazy learner: stores data, votes from neighborhood. Curse of dimensionality makes distance metrics less meaningful in high-D sparse spaces—interviewers want you to tie to scaling, PCA, or embeddings.

Metrics: articulate precision/recall trade, ROC vs PR per prior chapter, and class-weighted strategies for imbalance.

Supervised II: margins, Bayes, ensembles

Naive Bayes: cheap generative baseline with conditional independence assumption—fails when correlated features break naivety but still strong text spam baselines.

SVM: maximize margin; kernels lift to implicit feature space. Cost C trades margin vs violations; γ in RBF tweaks locality. Contrast with logistic: SVM focuses on support vectors; logistic supplies probabilistic semantics after calibration.

Decision trees / RF / boosting: see prior two chapters for depth; here emphasize when interviewer expects you to pivot from linear models to tree ensembles (nonlinear structure, heterogenous features, partial missingness handling with surrogate splits—high-level ok).

Bias–variance & stacking sound bite

Bagging trims variance; boosting attacks bias; stacking meta-learns but needs careful OOF generation to avoid leakage—say that explicitly.

Example FAANG-style follow-ups

  • L1 vs L2 effect on correlated predictors?
  • Why not use raw linear regression outputs as probabilities?
  • ROC mechanics in words without memorizing every threshold.
  • When does SVM outperform random forests and vice versa?

1. Largest difference between logistic vs linear regression on binary labels?



2. SVM margin maximization mainly improves: