Churn Risk Intelligence

Production-ready machine learning solution for predicting customer churn with 84.77% ROC-AUC. Features 16 optimized models, SHAP interpretability, Optuna hyperparameter tuning, and comprehensive business insights.

Machine Learning Completed 2 months
Churn Risk Intelligence

Business Impact

Problem Solved

Customer acquisition costs are 5-25x higher than retention costs. This solution identifies at-risk customers with 84.77% ROC-AUC score (80.62% accuracy), enabling proactive retention campaigns.

Value Delivered

Identifies high-risk customers for targeted retention campaigns, reduces revenue loss from customer attrition, and optimizes marketing budget allocation with precision targeting.

Key Results

84.77% ROC-AUC with XGBoost (Optuna-optimized), 80.62% accuracy with Logistic Regression. 16 models trained and evaluated with comprehensive business insights.

Model Performance

84.77%
Best ROC-AUC
XGBoost (Optuna)
80.62%
Best Accuracy
Logistic Regression
84.71%
Best Ensemble
Stacking Classifier
16
Total Models
Trained & Evaluated
Model ROC-AUC Accuracy Precision Recall Optimization
XGBoost (Optuna) 84.77% 79.58% 65.12% 71.23% Optuna
Stacking Classifier 84.71% 79.45% 64.89% 72.15% Ensemble
Logistic Regression 83.12% 80.62% 66.45% 68.32% Standard
Random Forest (Optuna) 83.89% 79.12% 66.78% 69.45% Optuna
KNN + SMOTE 81.23% 75.34% 58.92% 77.27% SMOTE
Gradient Boosting 83.45% 78.67% 63.21% 70.89% Standard

Model Portfolio: 16 machine learning models including Logistic Regression, KNN, Random Forest, Gradient Boosting, XGBoost, and ensemble methods (Stacking, Voting). Both regular and SMOTE variants trained with Optuna hyperparameter optimization.

Technical Architecture

1

Data Preprocessing

Missing value imputation, categorical encoding, feature scaling, and data leakage detection. Modular design with separation of concerns.

2

Feature Engineering

One-hot encoding, data leakage detection, and train-test splitting with stratification. Automated pipeline from data loading to model deployment.

3

Model Development

16 machine learning models: Logistic Regression, KNN, Random Forest, Gradient Boosting, XGBoost. Both regular and SMOTE variants.

4

Hyperparameter Optimization

Optuna integration for Bayesian optimization. Automated tuning for XGBoost and Random Forest with cross-validation for reliable performance estimation.

5

Ensemble Learning

Stacking Classifier with optimized base models and Voting Classifier for robust predictions. Meta-learner optimization for improved performance.

6

Model Interpretability

SHAP values for feature importance analysis and individual prediction explanations. Business-friendly insights generation.

7

Comprehensive Evaluation

Multiple metrics: ROC-AUC, PR-AUC, Accuracy, Precision, Recall, F1-Score. Advanced visualizations: ROC curves, Precision-Recall curves, heatmaps, interactive Plotly dashboards.

Advanced Features

Optuna Optimization

Bayesian hyperparameter optimization for XGBoost and Random Forest. Automated tuning with cross-validation for reliable performance estimation.

Ensemble Methods

Stacking Classifier with optimized base models achieving 84.71% ROC-AUC. Voting Classifier for robust predictions across multiple scenarios.

SHAP Interpretability

SHAP values provide feature importance analysis and individual prediction explanations. Generate actionable, business-friendly insights.

Interactive Dashboards

Plotly dashboards with ROC curves, Precision-Recall curves, heatmaps, and comprehensive business intelligence reports.

Class Imbalance Handling

SMOTE implementation for balanced training. Models trained on both balanced and imbalanced data for comprehensive comparison.

Production-Ready

Automated pipeline with comprehensive error handling, logging, and validation. End-to-end automation from data loading to model deployment.

Key Code Components

Main Pipeline Orchestrator

Model Training Pipeline

Dataset Information

Source

Telco Customer Churn Dataset

Size

7,043 customers × 21 features

Target

Customer churn (Yes/No)

Class Distribution

26.5% churn rate (imbalanced dataset)

Feature Categories

Demographics

Gender, age range, partner/dependent status

Account Information

Contract type, payment method, tenure, billing preferences

Services

Phone, internet, security, backup, streaming services

Financial

Monthly charges, total charges

Business Recommendations

For Maximum Churn Detection

  • Use KNN + SMOTE model for maximum churn detection (77.27% recall)
  • Implement comprehensive retention campaigns for all flagged customers
  • Focus on customers with fiber optic internet and month-to-month contracts
  • Best for high-value customer businesses where missing churners is costly

For Optimal Performance

  • Use XGBoost (Optuna-optimized) for best overall performance (84.77% ROC-AUC)
  • Leverage SHAP values for understanding key churn drivers
  • Use Stacking Classifier (84.71% ROC-AUC) for robust ensemble predictions
  • Ideal for balanced business needs requiring both precision and recall

For Cost-Conscious Operations

  • Use Logistic Regression model for efficient targeting (80.62% accuracy, 66.45% precision)
  • Use Random Forest (Optuna) for highest precision (66.78%)
  • Prioritize customers with electronic check payments and high monthly charges
  • Develop automated retention workflows for scalability

Key Risk Factors to Monitor

Contract Type

Month-to-month contracts show highest churn rates

Internet Service

Fiber optic users demonstrate elevated churn risk

Payment Method

Electronic check payments correlate with increased churn

Tenure

New customers (< 6 months) require attention

Technology Stack

Python Scikit-learn XGBoost Optuna SHAP Pandas NumPy Matplotlib Seaborn Plotly imbalanced-learn Jupyter Joblib