Chapter 13: Deep Learning I — Vision & Architectures
Deep Learning I — Vision & Architectures in ML Software Engineering: Interview Concept Review.
Learning Objectives
By the end of this chapter, you will be able to:
- Relate Deep Learning I — Vision & Architectures to common ML software engineering interview questions and trade-offs.
- Explain when this topic deserves a deeper pass through another tutorial on this site versus staying at recap depth.
- Surface assumptions, pitfalls, and follow-up probes an interviewer is likely to use.
Fully-connected networks
Composable affine maps + nonlinearities approximate functions; universal approximation story—fine for intuition—but depth helps hierarchical features at cost of optimization difficulty.
Activations: ReLU cheap but dying units; leaky/Swish mitigate partially; interviewer may mention saturation of sigmoid/tanh historically.
Initialization: zero weights → symmetry lock; Xavier/He scales preserve forward variance early.
Convolutional biases
Local connectivity + parameter sharing encode translation equivariance. Valid vs same padding: maintain spatial size vs shrink—articulate interplay with stride and receptive field growth across layers.
Detection families (survey): two-stage propose-then-refine vs one-stage speed—accuracy/latency trade for video interviews.
Autoencoders, VAE sketch, GAN positioning
Autoencoder: bottleneck forces compression; reconstruction error flags anomalies.
VAE: latent regularized via KL to prior enabling generative sampling with crisp ELBO language optional.
GAN: adversarial minimax game yields sharp samples but unstable training.
DL limitations: sample complexity, brittle OOD behaviour, infra cost—pair with evaluation beyond aggregate accuracy slices.
Go deeper on this site
1. Same padding primarily: