Foundations of Machine Learning: Part 1

This post is the fifth one of our series on the history and foundations of econometric and machine learning models. The first fours were on econometrics techniques. Part 4 is online here.

In parallel with these tools developed by and for economists, a whole literature has been developed on similar issues, centered on the problems of prediction and forecasting. For Breiman (2001a), the first difference comes from the fact that statistics has developed around the principle of inference (or to explain the relationship linking y to variables x) while another culture is primarily interested in prediction. In a discussion that follows the article, David Cox states very clearly that in statistics (and econometrics) "predictive success... is not the primary basis for model choice ". We will get back here on the roots of automatic learning techniques. The important point, as we will see, is that the main concern of machine learning is related to the generalization properties of a model, i.e. its performance - according to a criterion chosen a priori - on new data, and therefore on non-sample tests.