Two Rookie Mistakes to Avoid When Training a Predictive Model

Mistakes to avoid when training a predictive modelWhen creating predictive models, it's important to measure accuracy to be able to clearly articulate how good the model is. This article talks about two mistakes that are commonly made when measuring these accuracy values.

1. Measuring Accuracy on the Same Data Used for Training

One common mistake that gets made is measuring the accuracy of the same data that was trained. For example, say you have data from 2017 and 2018 for customer churn. Say you feed all that data to train the model and subsequently use the same data to predict and compare the predictions with the actual results. That is like you are given a question paper before the exam to study at home and the exact same question paper was given to you the next day in the exam. Obviously, that person is going to do great in the exam.

A Practical Way to Think About Prediction Accuracy

One of the common questions that gets asked by management when trying to deploy is, "What is the accuracy?" That is the trap companies tend to get into for wanting the best accuracy to go live.

When talking about accuracy, it's important to compare the accuracy that your model provides in comparison to what you do now without the model.

Image title

How Do You Measure If Your Customer Churn Predictive Model Is Good?

Accuracy is a key measure that management looks at it before giving a green light to take the model to production. This article talks about the practical aspect of what to measure and how to measure. Please refer to this article to learn about the common mistakes made in measuring accuracy.

Two Important Points to Consider When Measuring Accuracy

  1. The data to use when measuring the accuracy should not have been used in the training. You can split your data into 80% and 20%. Use 80% to train and use the rest — 20% — to predict and compare the predicted value with the actual outcome to define the accuracy
  2. One outcome eclipsing the other outcome. Say 95% of your transactions are not fraud. If the algorithm marks every transaction is not fraud, its right 95% of the time. So the accuracy is 95% but the 5% its wrong can break the bank. In those scenarios, we need to deal with other metrics such as Sensitivity & Specificity etc. which we will cover in this article in a practical way.

Problem Definition

The goal for this predictive problem is to identify which customers would churn. The dataset has 1000 rows. Use 80% sample (800 rows) for training and the 20% of the Data to measure accuracy. (200 rows). Say we have trained the model with 800 rows and predicting on the 200 rows.

Regulating ML/AI-Powered Systems for Bias

Siri and Alexa are good examples of AI, as they listen to human speech, recognize words, perform searches, and translate the text results back into speech. A recent purchase of an AI company called Dynamic Yield by McDonald’s — which analyzes customer spending/eating habits and recommend them other food to purchase — has taken the use of AI to the next step. AI technologies raise important issues like personal privacy rights and whether machines can ever make fair decisions.

There are two main areas where regulation can be helpful.

Practical Strategies to Handle Missing Values

One of the major challenges in most BI projects is to figure out a way to get clean data. 60 to 80 percent of the total time is spent on cleaning the data before you can make any meaningful sense of it. This is true for both BI and Predictive Analytics projects. To improve the effectiveness of the data cleaning process, the current trend is to migrate from the manual data cleaning to more intelligent machine learning-based processes.

Identify the Type of Missing Values We Are Dealing With

Before we dig into figuring out how to handle missing values, it's critical to figure out the nature of the missing values. There are three possible types, depending on if there exists a relationship between the missing data with the other data in the dataset.