Fraud Shield AI

End-to-end fraud detection pipeline for credit card transactions. Covers EDA with timezone-aware feature engineering, scalable PySpark preprocessing, supervised ML, deep learning (MLP/ResNet/LSTM), transformer-based models (FT-Transformer), hybrid ensemble stacking, and a real-time Streamlit inference app with a two-stage risk-tiering system (Auto-Block / Review / Cleared).

Machine Learning Completed 2 months
Fraud Shield AI

Technology Stack

Python PySpark Spark MLlib XGBoost Optuna PyTorch Scikit-learn Pandas NumPy Streamlit Jupyter Matplotlib Seaborn

Key Results

XGBoost (Optuna) — Test F1: 0.607, Test ROC-AUC: 0.988
Best Single Model
Soft Voting / Weighted Ensemble — Test F1: 0.639, Test PR-AUC: 0.651
Best Ensemble
~0.82 at prob ≥ 0.90
Auto Block Precision
~0.5% of transactions flagged for human analyst review
Review Queue
Logistic Regression, Random Forest, XGBoost, MLP, ResNet, LSTM, FT-Transformer, SA-MLP, 3× ensemble variants
Models Trained
1M+ credit card transactions, 30 leak-free engineered features
Dataset
All SMOTE variants degraded Test F1 vs no-resampling baseline (best SMOTE: 0.265 vs baseline 0.290)
Sampling Finding
Streamlit app — CSV upload → real-time two-stage risk prediction
Deployment

Challenges & Solutions

  • Handling extreme class imbalance (0.58% fraud rate) — SMOTE and all variants degraded performance; baseline outperformed all synthetic sampling methods
  • Preventing temporal data leakage: time-aware train/val split and point-in-time backward velocity features
  • Timezone-aware feature engineering: resolving merchant local time from lat/lon coordinates across global locations
  • Scalable preprocessing with PySpark for a 1M+ transaction dataset with card-level velocity windows
  • Temporal distribution shift: test fraud rate dropped to 0.39% vs 0.58% in training, causing val→test F1 gap (~0.81 → 0.64)
  • Combining XGBoost, MLP, and FT-Transformer in a soft-voting ensemble for a ~5% lift over the best single model
  • Two-stage risk tiering: calibrating Auto-Block (≥0.90) and Review Queue (0.14–0.90) thresholds for operational precision targets

Project Stats

1
Team Member
2
Months