ChurnGuard ML
A predictive analytics engine that identifies at-risk clients with high precision, enabling proactive retention strategies that resulted in a 65% reduction in client churn.
Overview
Developed an end-to-end machine learning pipeline to address rising attrition rates. The system processes historical usage patterns, support ticket frequency, and contract telemetry to generate a daily 'Risk Score' for every client, allowing the success team to intervene before a cancellation occurs.
Problem
The company was reacting to churn only after a cancellation notice was filed. There was no quantitative method to identify 'silent' churn—users who were still paying but had stopped using the product—leading to a consistent month-over-month loss in MRR.
Constraints
- Model must achieve high recall to ensure no at-risk clients are missed
- Requires explainable AI (XAI) so sales reps understand 'why' a client is at risk
- Must integrate with existing CRM (Salesforce) via automated API triggers
- Predictions must be updated every 24 hours based on the latest telemetry
Approach
Engineered a Random Forest classification model using Scikit-learn, trained on two years of anonymized user behavior data. I implemented custom feature engineering to capture 'velocity' metrics (e.g., the rate of decline in login frequency) rather than just static totals. To solve the 'black box' problem, I used SHAP values to provide reps with the top three reasons for every high-risk score.
Key Decisions
SHAP (SHapley Additive exPlanations) Integration
A probability score alone isn't actionable for a CSM. By calculating SHAP values, the CRM dashboard displays specific triggers like 'Decreased API usage' or 'Unresolved high-priority tickets,' giving the team a specific script for their outreach.
- Standard Logistic Regression (Better interpretability but lower predictive power)
- Deep Learning/Neural Networks (High accuracy but impossible to explain to non-technical staff)
SMOTE for Class Imbalance
Since churned clients represented only 5% of the total dataset, the model was originally biased toward 'non-churn.' I used Synthetic Minority Over-sampling Technique (SMOTE) to balance the training set, significantly improving the model's sensitivity to at-risk behavior.
- Random Undersampling (Lost too much valuable majority-class data)
- Adjusting Class Weights (Less effective than synthetic generation for this specific data density)
Tech Stack
- Python (Scikit-learn / Pandas)
- SQL (BigQuery)
- SHAP
- Airflow
- FastAPI
- Docker
Result & Impact
- 65% Decrease in annual attritionChurn Reduction
- 92% Precision / 88% RecallModel Accuracy
- Estimated $1.2M in retained ARRRevenue Saved
The project shifted the entire company culture from reactive to proactive. The 'Risk Score' became a primary KPI for the Customer Success team, and the automated alerts allowed them to save accounts that would have otherwise been lost to competitors.
Learnings
- Feature engineering (how data is prepared) is more impactful than model tuning for behavioral prediction.
- Explainability is the key to stakeholder adoption; if the sales team doesn't trust the 'why,' they won't use the tool.
- Data leakage is a significant risk in churn modeling; you must be extremely careful not to include features from 'the future' relative to the prediction point.
Additional Context
The most critical part of the project was the Feature Velocity logic. I realized that a client who has 100 logins a month might look healthy, but if they had 500 logins the previous month, they are actually a high churn risk. By calculating the percentage change in activity over 7, 30, and 90-day windows, the model was able to catch the “downward trend” long before the user actually churned.
The deployment was handled via a FastAPI wrapper inside a Docker container, orchestrated by Airflow. Every morning at 4:00 AM, the pipeline pulls the latest data from BigQuery, runs the inference, and pushes the updated scores and SHAP explanations directly into the CRM.