One of the most important challenges faced by pre-modern enterprises is customer churn management. Preventing customer departures becomes crucial. Even more so with factors such as rising competition from one side, and more demanding customers who want a higher quality at a lower price. Customer migration (churn) affects an increasingly large group of industries. So, how we can face this challenge? The answer is: churn analysis with Machine Learning modeling.
Key points:
The cost of acquisition of a new customer is usually much higher than keeping existing customers. That’s why it’s important to monitor and prevent customer migration. It is also worth noting that a long-term relationship with a customer increases the total value in a couple of ways:
In addition to estimating the probability of migration, we can also determine its effects:
Customer churn project is usually divided into three milestones:
The key element of churn analysis is the selection of relevant data. Therefore, the first phase involves the identification of data sources that will help us determine the behavior of our clients or allow us to broaden our knowledge about them. It is worth to take into account the transaction history of the client, verify data, and carry out a preliminary statistical analysis to determine potential variables for modeling. During this phase, you must define the business conditions that define the status of the churn. For example, a variable can be described using the following question: “Has the customer returned to us (used the service again) in the last 12 months?“.
Depending on the number of data sources, this phase may last from three weeks to up to three months. This is also when Predica specialists meet with the customer. Talking to people from the business units allows us to gain domain knowledge and factor it into the solution. This is a very important stage because well-defined business conditions will later translate into accuracy in metrics.
Finally, it is worthwhile to verify the quality of the simplest predictive analysis model, logistic regression. These steps create a benchmark for the modeling stage. In relation to quality, we will be referring to the tuning and parameterization of more advanced modeling methods.
After the data analysis phase, the team proceeds to pick five models that will be compared to determine the best fit for the business. The range of available possibilities that can be analyzed here will be narrowed down to models whose explained variables are 0.1, because our resulting value will be the probability of churn.
A very frequent algorithm used as a target is the so-called boosted decision tree, which is usually characterized by the best accuracy compared to other algorithms of this type.
When verifying the quality of the model, an obligatory stage is the implementation of the confusion matrix to tune the parameters of the model. Confusion matrix verifies how many times our model has been wrong and how many times it was right in comparison with the real values.
The length of the modeling phase depends on the number of algorithms taken into account for the comparison. When our approach is based on iterative training of the model, it is necessary to assume about 1 month of modeling for 3-5 models.
The last stage is the construction of the data visualization layer. The data that has been enriched in the previous phase with the churn probability value should be visualized in a way that will improve the work of business units such as marketing or sales. Reports like this can be used to build retention-based marketing campaigns.
A common practice is to show the effectiveness of the marketing on churn. This is also a way to calculate ROI of targeted marketing campaigns.
The duration of the data visualization phase depends on the number of required reports: 1 month for every 10 pages of reporting.
You can use Azure Data Factory to integrate data from different systems. It has a set of ready-made connectors, and it will help you manage the whole pipeline of activating individual transformation activities. An interesting new feature is Wrangling and Mapping data flows in Data Factory. These allow you to run data processing and transformations and data move from a source to a destination database.
During the data analysis phase, Azure Notebooks or Azure Databricks come in handy. It’s an interactive environment where you can collaborate with a team of data scientists, data engineers or business analysts.
As for the modeling layer and model training, our experience shows that Azure Machine Learning Service or Azure Databricks work very well. These environments allow for easy implementation using R or Python.
At Predica we visualize data in Power BI. It works as a great self-service Business Intelligence environment for the recipients of the solution.
The above solution can also be built in an on-premise environment using slightly different components.
Want to learn more about customer churn analysis? Stay tuned and read our blog and follow us on social media. We will soon be publishing another article in the series about the possibility to use data analysis in order to understand your customer better. We will also conduct a webinar to explain the topic in more detail and answer your questions.