Churn Risk Intelligence

Production-ready machine learning solution for predicting customer churn with 84.77% ROC-AUC. Features 16 optimized models, SHAP interpretability, Optuna hyperparameter tuning, and comprehensive business insights.

Machine Learning Completed 2 months

View Code Documentation

Business Impact

Problem Solved

Customer acquisition costs are 5-25x higher than retention costs. This solution identifies at-risk customers with 84.77% ROC-AUC score (80.62% accuracy), enabling proactive retention campaigns.

Value Delivered

Identifies high-risk customers for targeted retention campaigns, reduces revenue loss from customer attrition, and optimizes marketing budget allocation with precision targeting.

Key Results

84.77% ROC-AUC with XGBoost (Optuna-optimized), 80.62% accuracy with Logistic Regression. 16 models trained and evaluated with comprehensive business insights.

Model Performance

84.77%

Best ROC-AUC

XGBoost (Optuna)

80.62%

Best Accuracy

Logistic Regression

84.71%

Best Ensemble

Stacking Classifier

Total Models

Trained & Evaluated

Model	ROC-AUC	Accuracy	Precision	Recall	Optimization
XGBoost (Optuna)	84.77%	79.58%	65.12%	71.23%	Optuna
Stacking Classifier	84.71%	79.45%	64.89%	72.15%	Ensemble
Logistic Regression	83.12%	80.62%	66.45%	68.32%	Standard
Random Forest (Optuna)	83.89%	79.12%	66.78%	69.45%	Optuna
KNN + SMOTE	81.23%	75.34%	58.92%	77.27%	SMOTE
Gradient Boosting	83.45%	78.67%	63.21%	70.89%	Standard

Model Portfolio: 16 machine learning models including Logistic Regression, KNN, Random Forest, Gradient Boosting, XGBoost, and ensemble methods (Stacking, Voting). Both regular and SMOTE variants trained with Optuna hyperparameter optimization.

Technical Architecture

Data Preprocessing

Missing value imputation, categorical encoding, feature scaling, and data leakage detection. Modular design with separation of concerns.

Feature Engineering

One-hot encoding, data leakage detection, and train-test splitting with stratification. Automated pipeline from data loading to model deployment.

Model Development

16 machine learning models: Logistic Regression, KNN, Random Forest, Gradient Boosting, XGBoost. Both regular and SMOTE variants.

Hyperparameter Optimization

Optuna integration for Bayesian optimization. Automated tuning for XGBoost and Random Forest with cross-validation for reliable performance estimation.

Ensemble Learning

Stacking Classifier with optimized base models and Voting Classifier for robust predictions. Meta-learner optimization for improved performance.

Model Interpretability

SHAP values for feature importance analysis and individual prediction explanations. Business-friendly insights generation.

Comprehensive Evaluation

Multiple metrics: ROC-AUC, PR-AUC, Accuracy, Precision, Recall, F1-Score. Advanced visualizations: ROC curves, Precision-Recall curves, heatmaps, interactive Plotly dashboards.

Advanced Features

Optuna Optimization

Bayesian hyperparameter optimization for XGBoost and Random Forest. Automated tuning with cross-validation for reliable performance estimation.

Ensemble Methods

Stacking Classifier with optimized base models achieving 84.71% ROC-AUC. Voting Classifier for robust predictions across multiple scenarios.

SHAP Interpretability

SHAP values provide feature importance analysis and individual prediction explanations. Generate actionable, business-friendly insights.

Interactive Dashboards

Plotly dashboards with ROC curves, Precision-Recall curves, heatmaps, and comprehensive business intelligence reports.

Class Imbalance Handling

SMOTE implementation for balanced training. Models trained on both balanced and imbalanced data for comprehensive comparison.

Production-Ready

Automated pipeline with comprehensive error handling, logging, and validation. End-to-end automation from data loading to model deployment.

Key Code Components

Main Pipeline Orchestrator

Model Training Pipeline

Dataset Information

Source

Telco Customer Churn Dataset

Size

7,043 customers × 21 features

Target

Customer churn (Yes/No)

Class Distribution

26.5% churn rate (imbalanced dataset)

Feature Categories

Demographics

Gender, age range, partner/dependent status

Account Information

Contract type, payment method, tenure, billing preferences

Services

Phone, internet, security, backup, streaming services

Financial

Monthly charges, total charges

Business Recommendations

For Maximum Churn Detection

Use KNN + SMOTE model for maximum churn detection (77.27% recall)
Implement comprehensive retention campaigns for all flagged customers
Focus on customers with fiber optic internet and month-to-month contracts
Best for high-value customer businesses where missing churners is costly

For Optimal Performance

Use XGBoost (Optuna-optimized) for best overall performance (84.77% ROC-AUC)
Leverage SHAP values for understanding key churn drivers
Use Stacking Classifier (84.71% ROC-AUC) for robust ensemble predictions
Ideal for balanced business needs requiring both precision and recall

For Cost-Conscious Operations

Use Logistic Regression model for efficient targeting (80.62% accuracy, 66.45% precision)
Use Random Forest (Optuna) for highest precision (66.78%)
Prioritize customers with electronic check payments and high monthly charges
Develop automated retention workflows for scalability

Key Risk Factors to Monitor

Contract Type

Month-to-month contracts show highest churn rates

Internet Service

Fiber optic users demonstrate elevated churn risk

Payment Method

Electronic check payments correlate with increased churn

Tenure

New customers (< 6 months) require attention

Technology Stack

Python Scikit-learn XGBoost Optuna SHAP Pandas NumPy Matplotlib Seaborn Plotly imbalanced-learn Jupyter Joblib