These Are the Best Books for Learning Modern Statistics—and They’re All Free

Dan Kopf:

The books are based on the concept of “statistical learning,” a mashup of stats and machine learning. The field of machine learning is all about feeding huge amounts of data into algorithms to make accurate predictions. Statistics is concerned with predictions as well, says Tibshirani, but also with determining how confident we can be about the importance of certain inputs.

This is important in areas like medicine, where a researcher doesn’t just want to know whether a medicine worked, but also why it worked. Statistical learning is meant to take the best ideas from machine learning and computer science, and explain how they can be used and interpreted through a statistician’s lens.

The beauty of these books is that they make seemingly impenetrable concepts—”cross-validation,” “logistical regression,” “support vector machines”—easily understandable. This is because the authors focus on intuition rather than mathematics. Unlike many statisticians, Tibshirani and his coauthors don’t come from a math background. He believes this helps them think conceptually. “We try to explain [concepts] intuitively by explaining the underlying idea first,” he says. “Then we give examples of a situation you would expect it work. And also, a situation where it might not work. I think people really appreciate that.” I certainly did.