What’s the difference between statistics and machine learning?

Jonathan Bartlett:

This post is certainly not going to tell you what the difference machine learning and statistics is. Rather I hope that it spurs readers of the post to help me understand their differences.

Historically I think it’s the case that machine learning algorithms were developed in computer science departments of universities, whereas statistics was developed within mathematics or statistics departments. But this is merely about the historical origins, rather than any fundamental distinction.

Machine learning (about which I know a lot less) tends I think to focus on algorithms, and a subset of these has as their objective to prediction some outcome based on a set of inputs (or predictors as we might call them in statistics). In contrast to parametric statistical models, these algorithms typically do not make rigid assumptions about the relationships between the inputs and the outcome, and therefore can perform well then the dependence of the outcome on the predictors is complex or non-linear. The potential to capture such complex relationships is however not unique to machine learning – within statistical models we have flexible parametric / semiparametric, and even non-parametric methods such as non-parametric regression.