Review of Weapons of Math Destruction by Cathy O’Neil

I’ve been putting off writing this book review for a while now because I’m torn on how to best share my thoughts and impressions of it. There are aspects that, for me, were illuminating new information and case studies. There are points at which I emphatically agree with Cathy’s perspective and point. There are times I agree with her point but not her rhetoric. There are times when I think the narrative might stray into alarmism.

Weapons of Math Destruction starts by giving some examples of ways in which real people’s lives have been significantly effected by mathematical models. The text states that mathematical models “aided and abetted” the housing crisis, the financial crisis, and a rise in unemployment. I wish some statements like that had more exposition in this book. Perhaps for brevity, things were truncated. However, without more support, I find many moments where a bit of handwaving takes place on assertions that I think require more discussion.

For example, the book early on presents a case study around how teachers are evaluated using a closed source mathematical model of their performance. “… attempting to score a teacher’s effectiveness by anayzing the test result of only twenty-five or thirty students is statistically unsound, even laughable.” But why? If teachers A and B each have 15 students of similar backgrounds, and all of A’s students are in the bottom 10 percentile nationally while all of B’s students are in the top 90, A very clear statistical conclusion will be reached despite the small $n$.

Granted, this exagerated effect size is unrealistic. Differences across teacher are probably of smaller magnitude and thus requiring of larger samples for any statistical signal to emerge above the line of being explained by random chance alone. Yet I feel as though this is dismissed too quickly. Test results aren’t simply a collection of Binomial outcomes answering true/false questions. I remember helping grade a Calculus exam once and remarking “oh this student understands the chain rule just fine but doesn’t know the quotient rule”. Perhaps the percentage of students who all miss the same problem could be from a known distribution of high kurtosis which yields an indication of whether or not the teacher might have presented certain ideas poorly. This concept might not play out in practice, but it’s not so implausible that I can throw out a model entirely on the basis of statistical significance without a careful calculation of p-values and the sensitivity of the effect size.

Elsewhere the text surmises that “Ill-conceived mathematical models now micromanage the economy, from advertising to prisons.” I invite the criticism and scrutiny of mathematical models and I appreciate Cathy’s popularizing of some of these efforts, but I wish it was done more in the spirit of inquiry and less quick to assume a position. I think on many things Cathy and I would eventually agree, but I prefer to hold judgement a bit longer myself. The haste with which certain topics are damned leaves me struggling to agree, because I as a reader am not necessarily at the same conclusion yet.

The process of creating potential “weapons of math destruction” (WMDs) is overviewed, and the observation “the folks building WMDs routinely lack data for the behaviors they’re most interested in. So the substitute stand-in data”. I often see many data scientists making this mistake. It’s like the naive analyst who wants to populate a missing value such as age, so they pull it from the average age of the person’s zipcode. One of many reasons this is a terrible idea is called Simpson’s Paradox.

I’m glad Cathy calls out this sort of error, but I’d like to think through the consequences. Companies A and B are going to enter a new product space and compete with each other. Both want to be informed by some available but difficult piece of information to get. Company A is lazy and uses some weakly correlated proxy. Company B invests in observing the true value with higher precision. If the information in question is at all valuable, then Company B should dominate the market and push A out of business because their product will simply be better. The free market should sort this out if consumers make intelligent choices.

Now what about choices that aren’t left up to the free market? Choices made in government or choices made in industries so centralized that a free market isn’t available are a different beast. I wish this distinction had been made in the book. I also wish time had been spent acknowledging that people are working on this use case. For example the work of Christian Sandvig discussed in our Auditing Algorithms episode. The final word is far from said on these sorts of matters, but I think it’s time we talked about solutions instead of just announcing the problems.

Chapter 7 presents the case of Tim Clifford, a teacher, who was evaluated in consecutive years with a score of 6 and 96, out of 100. Uh oh. I agree, this does not look good. We should be asking questions. Lots of them. What input data was used and what methodology derived this score? I’m very glad this book is raising questions and situations like this case study.

But I can’t go along with the book when is thus concludes that “the teacher scores derived from the test measured nothing”. The old red-line body thermometer I own is weak in comparison to a state of the art medical lab’s metrology equipment, but we don’t throw it out for lack of precision.

Let’s assume Mr. Clifford’s scores are not readily explainable by mundane possibilities like developing a sudden illness. Something is quite suspicious and should be investigated. Highlighting an extreme outlier is an excellent way to learn about a model. But we can’t disregard it outright for one (possibly cherry picked) example.

If Clifford’s job was affected by this score, that’s simply wrong. Some human decision maker should be willing to entertain appeals, and one probably ought to be quickly forthcoming in a case like this. But even human reviewers have a rate of error. I’ve heard the leniency of judges varies before and after lunch. We should be working on measuring the precision and accuracy of our mathematical models against the same diagnostics for human decision makers.

The book is a tour through numerous other worthwhile discussions such as the college rankings in US News and World Reports, hiring industry, insurance, and credit. At many points in the book methodological blunders resulting in WMDs are described. There are many wonderful moments when Cathy does a great job pointing out specific errors made by the creators of mathematical models, such as Frederick Hoffman’s failure to stratify his results in his 1983 “Nation at Risk report”. Specific criticisms like this one should be acknowledged or challenged, and dealt with if widely accepted to be true. But for every nice moment like this, there seems to be a balance of of a criticism that, for me, lacks the support I require to get behind it. The enthusiasm is, in fact, offputting for me at times.

Upon skimming through again to write up this post, I can’t find a specific example but I get the impression Cathy would be in favor of legislation putting limits on the damage that could be done by so called WMDs. For example, maybe we should legislate what input signals are not allowed to be used for certain decision making processes. Cathy also is firm that she doens’t believe the free market is the solution. I wouldn’t go that far, but I definitely agree that the free market isn’t always the solution.

I believe there are some applications of mathematical models where oversight needs to be increased. Yet I didn’t find a framework for defining that subset in this book. Further, in my opinion, our congress has done an embarassingly poor job producing legislation about technology in the past. One need look no further than the Computer Fraud and Abuse Act to see that legislation is only the answer in a significantly better informed world. I’m left pretty torn on this topic, when I wish I had found myself more convinced from this text.

Although I have a pile of nit-picks about the book, I emphatically agree with this quote from near the end: “We must come together to police these WMDs, to tame and disarm them”. For me, there’s no doubt about this. Despite some of my criticisms, this book is worth a read.