On the Poor Performance of Classifiers

Each time we have a case study in my actuarial courses (with real data), students are surprised to have a hard time getting a “good” model, and they are always surprised to have a low AUC when trying to model the probability to claim a loss, to die, to deal with fraud, etc. And each time I keep saying, “yes, I know, and that’s what we expect because there's a lot of ‘randomness’ in insurance.” To be more specific, I decided to run some simulations and to compute AUCs to see what’s going on. And because I don’t want to waste time fitting models, we will assume that we have each time a perfect model. So I want to show that the upper bound of the AUC is actually quite low! So it’s not a modeling issue, it is a fundamental issue in insurance!

By ‘perfect model’ I mean the following : Ω denotes the heterogeneity factor because people are different. We would love to get P[Y=1∣Ω]. Unfortunately, Ω is unobservable! So we use covariates (like the age of the driver of the car in motor insurance, or of the policyholder in life insurance, etc.). Thus, we have data (yi ,xi )‘s and we use them to train a model, in order to approximate P[Y=1∣X]. And then we check if our model is good (or not) using the ROC curve obtained from confusion matrices, comparing yi‘s and  {widehat}yi‘s where {widehat}yi =1 when P[Yi =1∣xi ] exceeds a given threshold. Here, I will not try to construct models. I will predict {widehat}yi =1 each time the true underlying probability P[Yi =1∣ωi] exceeds a threshold! The point is that it’s possible to claim a loss (y=1) even if the probability is 3% (and most of the time{widehat}y=0), and to not claim one (y=0) even if the probability is 97% (and most of the time {widehat}y =1). That’s the idea with randomness, right?