Dissertation Proposal Template (1500 words)
Programme: MSc Computer Science
Student Name: CHANUKYA CHOWDARY BUNGATAVULA
Student ID: 2091199
Title of the dissertation: Build a Machine Learning Model to Predict Churning Customer
Table of Contents
3. Expected Practical Element Output 3
5. Prerequisite Knowledge/Skills Required 4
Telecom companies are responsible for generating a huge data volume at a very fast rate in the present world. The telecommunication industry has experienced several changes in the past few years which include new technologies, new services as well as market competition liberalization. Telecommunication services experience a huge loss because of customer churning and are considered a significant issue in each industry (Huang et al., 2012).
In this thesis, the main focus is on detecting whether a customer is churned or not with the help of various features such as total day charges, international plans, etc. Furthermore, various approaches will be used that can forecast or predict when a circumstance like this may occur in the future that resulted in customer churning. Many data mining and machine learning technologies are now available as a result of recent breakthroughs in the field of big data, and they may be used to evaluate large amounts of data (Ullah et al., 2019).
Aim: The project aims to build a machine learning model that will help in predicting whether a customer is churned or not from their existing service provider.
Various methods, which include data mining, and machine learning, have been used to predict churn in this literature review. These methods assist businesses in identifying, predicting, and retaining consumers who are at risk of leaving, as well as in customer relationship management (CRM) and decision-making.
Esselaar et al., (2016) performed their study in South Africa and predicted that in his study in South Africa documented that there is a wide range of options accessible to customers, both contract and prepaid, on all mobile networks. For the average consumer, it's impossible to figure out which plan is the most cost-effective based on these prices. Customers end up picking plans which result in financial losses as a result of this.
It has been stated by Rajeshwari and Ravilochanan (2014) that in comparison to the post-paid market, the pre-paid market experiences a much higher churn rate. The churn rate remains high, despite massive retention campaigns and advertising by service providers. Customer satisfaction and the causes of churn were the primary goals of this research. Customers' churn can be slowed as well as even reversed by using the elements that have been discovered. It was a descriptive study, and a structured questionnaire was used, for a survey approach, as a sampling technique.
Ahmad et al., (2019) presented that one of the most important contributions of their work has been the development of a churn prediction model that may aid telecom operators in identifying consumers that are most likely to experience churn. The model built in this study makes use of a machine learning approach on a large data platform, and it introduces a novel approach to feature engineering and selection in the process. The model tested four different algorithms: Extreme Gradient Boosting “XGBOOST”, Gradient Boosted Machine Tree “GBM”, Random Forest, and Decision Tree. The authors concluded that the application of the XGBOOST method, on the other hand, produced the best results. During the development of the churn predicting model, this method was employed for classification.
Using classification and clustering approaches, Ullah et al. (2019) provide a churn prediction model for the telecommunications industry that identifies churning customers as well as the reasons that contribute to their churn. The correlation attribute ranking and information gain filter are used to make the feature selection. For churn data, the RF (Random Forest) algorithm worked well and properly identified 88.63% of the cases, according to the proposed model. Metrics including ROC (receiving operating characteristics) area, f-measure, recall, precision, and accuracy, are used to evaluate how well the churn prediction model works.
Based on historical and real-time data, Claude et al., (2019) have developed an algorithm to forecast whether or not customers will churn, as well as to track their churn and non-churning status. It is advantageous for any business to use the suggested model to develop ideal clients that might quit the service provider, as it allows them to make better decisions in the present environment as well as to accomplish suitable decisions for retaining such customers. The new approach has the following stages: a collection of call data records, conversion of call data records into text data through API, loading of the data into a data warehouse via ETL, and real-time tracking and monitoring of customer churn.
Furthermore, various techniques have been developed by the telecom companies for identifying as well as retaining their valuable customers, because in comparison to attracting new customers it is less expensive to retain old ones. The reason behind this is that costs included in providing concessions, workforce, and advertisements are 5-6 times greater than the cost involved to retain the old customers. Therefore, there is a need of giving attention to identifying the existing churn customers which might help companies in changing this scenario. To retain customers, it is necessary to construct a high-performance and accurate model for detecting consumers who are likely to churn. To complete this project, different ML models such as Random Forest, Logistic Regression, and KNN will be trained which depends on the 80% of sample data and remaining data will be used for applying the trained ML models and evaluating their predictive strength with regards to not churn or churn. This information can be useful for identifying the pain points of customers and companies can resolve those issues for retaining their customers.
For this project, python language will be used. There are several machine learning and data science libraries in python that can be utilized for making predictions depending on the dataset’s various attributes or features. Various python libraries that will be used in this project are:
scikit learn: this library of python helps in performing the machine learning tasks. This library contains most of the classification, regression, and clustering algorithms.
pandas: this library of python helps in data filtering, data pre-processing, and data conversion.
matplotlib: this is the basic plotting library of python which is used for basic visualization in python.
seaborn: this library of python is used for advanced data visualization which is not possible through matplotlib. Different charts and graphs can be plotted with the help of the seaborn library for visualization. This library helps in plotting attractive charts and graphs.
NumPy: this library of python is used for performing mathematical and numerical operations on any data such as array operations, data standardization, normalization, etc.
Further, different technologies used for churn prediction in this project are Decision tree (DT), Random Forest (RF), K-nearest neighbor (KNN), Linear regression (LR).
“Customer Churn Prediction 2020” data used in this project is taken from kaggle.com. This data helps to predict whether a customer will churn from their existing telecommunications provider or not. There are 4250 samples in this particular training data set. In the data, there is one target variable named "churn" and 19 independent features in each sample. The 19-input feature are namely: "state", "account length", "area code", "international plan", "voice mail plan", "number email messages", "total day minutes", "total day calls", "total day charge", "total eve minutes", "total eve calls", "total eve charge", "total night minutes", "total night calls", "total night charge", "total intl minutes", "total intl calls", "total intl charge", "number customer service calls".
Before starting the project, prerequisite knowledge of the following skills are must:
Data analysis: The knowledge of data analysis is required for data cleaning, data pre-processing, and data visualization. At the stage of data cleaning, data is cleaned and the desired data format is acquired. With the help of data visualization, important features and aspects of data can be acquired.
Python: The knowledge of python is required for building a machine learning model, data cleaning, data pre-processing, and data visualization.
Machine learning: The knowledge of machine learning is required for building a churn prediction model. The churn prediction model is a classification model which belongs to the supervised machine learning category.
Model evaluation: The knowledge of model evaluation is required for evaluating the classification results w.r.t. accuracy, precision, recall, selectivity, sensitivity, etc.
The project development has five phases which are Requirements Gathering, Project planning, Project designing, Project implementation, and Project testing. For the 15-week project duration, a brief Gantt chart is depicted below that provides initial estimations for the project tasks.
Table 1: Project Gantt Chart
Task |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
Lit Review |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Project planning |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Project designing |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Project implementation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Project testing |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REFERENCES
[1] Huang, B., Kechadi, M.T. and Buckley, B., 2012. Customer churn prediction in telecommunications. Expert Systems with Applications, 39(1), pp.1414-1425.
[2] Ahmad, A.K., Jafar, A. and Aljoumaa, K., 2019. Customer churn prediction in telecom using machine learning in the big data platform. Journal of Big Data, 6(1), pp.1-24.
[3] Ullah, I., Raza, B., Malik, A.K., Imran, M., Islam, S.U. and Kim, S.W., 2019. A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in the telecom sector. IEEE Access, 7, pp.60134-60149.
[4] Claude, T.J., Cheriuyot, K.W., and Kibe, A.M., Designing a framework for Real-time Churn Prediction in Mobile Telecommunication Industry.
[5] Rajeswari, P.S. and Ravilochanan, P., 2014. Churn analytics on Indian prepaid mobile services. Asian Social Science, 10(13), p.169.
[6] Esselaar, S., Gillwald, A. and Stork, C., 2006. South African telecommunications sector performance review 2016. Johannesburg, Centre Public Policy Research Paper No. 8.