Fraud Shield AI
End-to-end fraud detection pipeline for credit card transactions. Covers EDA with timezone-aware feature engineering, scalable PySpark preprocessing, supervised ML, deep learning (MLP/ResNet/LSTM), transformer-based models (FT-Transformer), hybrid ensemble stacking, and a real-time Streamlit inference app with a two-stage risk-tiering system (Auto-Block / Review / Cleared).
Technology Stack
Python
PySpark
Spark MLlib
XGBoost
Optuna
PyTorch
Scikit-learn
Pandas
NumPy
Streamlit
Jupyter
Matplotlib
Seaborn
Key Results
XGBoost (Optuna) — Test F1: 0.607, Test ROC-AUC: 0.988
Best Single Model
Soft Voting / Weighted Ensemble — Test F1: 0.639, Test PR-AUC: 0.651
Best Ensemble
~0.82 at prob ≥ 0.90
Auto Block Precision
~0.5% of transactions flagged for human analyst review
Review Queue
Logistic Regression, Random Forest, XGBoost, MLP, ResNet, LSTM, FT-Transformer, SA-MLP, 3× ensemble variants
Models Trained
1M+ credit card transactions, 30 leak-free engineered features
Dataset
All SMOTE variants degraded Test F1 vs no-resampling baseline (best SMOTE: 0.265 vs baseline 0.290)
Sampling Finding
Streamlit app — CSV upload → real-time two-stage risk prediction
Deployment
Challenges & Solutions
- Handling extreme class imbalance (0.58% fraud rate) — SMOTE and all variants degraded performance; baseline outperformed all synthetic sampling methods
- Preventing temporal data leakage: time-aware train/val split and point-in-time backward velocity features
- Timezone-aware feature engineering: resolving merchant local time from lat/lon coordinates across global locations
- Scalable preprocessing with PySpark for a 1M+ transaction dataset with card-level velocity windows
- Temporal distribution shift: test fraud rate dropped to 0.39% vs 0.58% in training, causing val→test F1 gap (~0.81 → 0.64)
- Combining XGBoost, MLP, and FT-Transformer in a soft-voting ensemble for a ~5% lift over the best single model
- Two-stage risk tiering: calibrating Auto-Block (≥0.90) and Review Queue (0.14–0.90) thresholds for operational precision targets
Project Stats
1
Team Member
2
Months