Probably Overthinking It
How to Use Data to Answer Questions, Avoid Statistical Traps, and Make Better Decisions
Probably Overthinking It
How to Use Data to Answer Questions, Avoid Statistical Traps, and Make Better Decisions
Statistics are everywhere: in news reports, at the doctor’s office, and in every sort of forecast, from the stock market to the weather. Blogger, teacher, and computer scientist Allen B. Downey knows well that people have an innate ability both to understand statistics and to be fooled by them. As he makes clear in this accessible introduction to statistical thinking, the stakes are big. Simple misunderstandings have led to incorrect medical prognoses, underestimated the likelihood of large earthquakes, hindered social justice efforts, and resulted in dubious policy decisions. There are right and wrong ways to look at numbers, and Downey will help you see which are which.
Probably Overthinking It uses real data to delve into real examples with real consequences, drawing on cases from health campaigns, political movements, chess rankings, and more. He lays out common pitfalls—like the base rate fallacy, length-biased sampling, and Simpson’s paradox—and shines a light on what we learn when we interpret data correctly, and what goes wrong when we don’t. Using data visualizations instead of equations, he builds understanding from the basics to help you recognize errors, whether in your own thinking or in media reports. Even if you have never studied statistics—or if you have and forgot everything you learned—this book will offer new insight into the methods and measurements that help us understand the world.
Reviews
Table of Contents
1. Are You Normal? Hint: No
2. Relay Races and Revolving Doors
3. Defy Tradition, Save the World
4. Extremes, Outliers, and GOATs
5. Better Than New
6. Jumping to Conclusions
7. Causation, Collision, and Confusion
8. The Long Tail of Disaster
9. Fairness and Fallacy
10. Penguins, Pessimists, and Paradoxes
11. Changing Hearts and Minds
12. Chasing the Overton Window
Epilogue
Acknowledgments
Bibliography
Index
Excerpt
Let me start with a premise: we are better off when our decisions are guided by evidence and reason. By “evidence,” I mean data that is relevant to a question. By “reason” I mean the thought processes we use to interpret evidence and make decisions. And by “better off,” I mean we are more likely to accomplish what we set out to do— and more likely to avoid undesired outcomes.
Sometimes interpreting data is easy. For example, one of the reasons we know that smoking causes lung cancer is that when only 20% of the population smoked, 80% of people with lung cancer were smokers. If you are a doctor who treats patients with lung cancer, it does not take long to notice numbers like that.
But interpreting data is not always that easy. For example, in 1971 a researcher at the University of California, Berkeley, published a paper about the relationship between smoking during pregnancy, the weight of babies at birth, and mortality in the first month of life. He found that babies of mothers who smoke are lighter at birth and more likely to be classified as “low birthweight.” Also, low- birthweight babies are more likely to die within a month of birth, by a factor of 22. These results were not surprising.
However, when he looked specifically at the low- birthweight babies, he found that the mortality rate for children of smokers is lower, by a factor of two. That was surprising. He also found that among low-birthweight babies, children of smokers are less likely to have birth defects, also by a factor of 2. These results make maternal smoking seem beneficial for low- birthweight babies, somehow protecting them from birth defects and mortality. The paper was influential. In a 2014 retrospective in the International Journal of Epidemiology, one commentator suggests it was responsible for “holding up anti- smoking measures among pregnant women for perhaps a decade” in the United States. Another suggests it “postponed by several years any campaign to change mothers’ smoking habits” in the United Kingdom.
But it was a mistake. In fact, maternal smoking is bad for babies, low birthweight or not. The reason for the apparent benefit is a statistical error I will explain in chapter 7. Among epidemiologists, this example is known as the low-birthweight paradox. A related phenomenon is called the obesity paradox. Other examples in this book include Berkson’s paradox and Simpson’s paradox. As you might infer from the prevalence of “paradoxes,” using data to answer questions can be tricky. But it is not hopeless. Once you have seen a few examples, you will start to recognize them, and you will be less likely to be fooled. And I have collected a lot of examples.
So we can use data to answer questions and resolve debates. We can also use it to make better decisions, but it is not always easy. One of the challenges is that our intuition for probability is sometimes dangerously misleading. For example, in October 2021, a guest on a well- known podcast reported with alarm that “in the [United Kingdom] 70- plus percent of the people who die now from COVID are fully vaccinated.” He was correct; that number was from a report published by Public Health England, based on reliable national statistics. But his implication— that the vaccine is useless or actually harmful— is wrong.
Be the first to know
Get the latest updates on new releases, special offers, and media highlights when you subscribe to our email lists!