Project information

Primary Goal:

This project uses a dataset from a Phone Provider company the main objective is to highlight the variables associated with churn after finding what variables are most correlated with higher churn rates we make a predictive model to further prove the variables that are most correlated with churn then finally develop strategies to prevent churn based on the analysis.


Why is this useful:

By identifying variables that are responsible for churn we can formulate a strategic plan to implement necessary changes in the business to improve customer retention - Repeat customers spend 67% more than new customers (BIA Advisory Services)

Data Overview:

A variety of variables involved with churning including the types of services provided by the company, customer demographics, and reported churn statistics.

Methodology:

  • 1. Data Reading
  • importing the CSV file and displaying the variables

  • 2. Data Cleaning
  • This particular data had already been cleaned, we checked for null values, missing columns, duplicate values, etc.

  • 3. EDA
  • Typically this section would be longer, but we already had an idea of the variables that would contribute to churning. The main goal was to get an idea of what variables had the greatest correlation with the churn rates for this company. We demonstrate this in a variety of different plots.

  • 4. Feature Engineering
  • Now, computers are really good at understanding numbers, but they can struggle with categories. So, before we can use this data to make predictions (like whether a customer will churn or not), we need to convert these categories into a format that a computer can understand. This is where one-hot encoding comes in. One-hot encoding is a process of converting categorical data into a format that works better with classification and regression algorithms. It creates new binary columns for each category/label present in the original columns. So, in this project, I used one-hot encoding to convert features like ‘TechSupport’, ‘DeviceProtection’, ‘InternetService’, ‘OnlineSecurity’, ‘PaymentMethod’, and ‘Contract’ into a format that your predictive model can understand.

  • 5. Predictive Model Building
  • In this project, we implemented a Logistic Regression model to predict customer churn for a phone service company. The main goal was to identify the key factors that influence customer churn and use this information to predict whether a customer is likely to churn. Logistic Regression was chosen for this task due to its simplicity and interpretability. It not only provides a prediction but also quantifies the influence of each feature on the likelihood of churn. This allows us to understand which aspects of the service are most strongly associated with customer churn. By predicting churn, the company can proactively address customer concerns, improve customer satisfaction, and ultimately reduce churn. This can lead to significant cost savings, as retaining existing customers is often more cost-effective than acquiring new ones.

  • 6. Results From Predictions & Suggestions
  • Lastly, we take the insights from the predictive model and formulate them into visual, and actionable insights for the company to implement to reduce churn.