how to lie with statistics

On How To Lie With Statistics

I recently finished reading How To Lie With Statistics by Darrel Huff. This book was one of seven from Bill Gates’ most recent reading list.

Below are key insights from this book:

So it is with much that you read and hear. Averages and relationships and trends and graphs are not always what they seem, There maybe more in them than meets the eye, and there may be a good deal less. The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, “opinion” polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.

The test of the random sample is this: Does every name or thing in the whole group have an equal chance to be in the sample! The purely random sample is the only land that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it. A more economical substitute, which is almost universally used in such fields as opinion polling and market research, is called stratified random sampling.

It is a trick commonly used, sometimes in innocence but often in guilt, by fellows wishing to influence public opinion or sell advertising space. When you are told that something is an average you still don’t know very much about it unless you can find out which of the common kinds of average it is—mean, median, or mode.

Not all semi-attached figures are products of intentional deception. Many statistics, including medical ones that are pretty important to everybody, are distorted by inconsistent reporting at the source. There are startlingly contradictory figures on such delicate matters as abortions, illegitimate births, and syphilis.

The fallacy is an ancient one that, however, has a powerful tendency to crop up in statistical material, where y one that says that if B follows A, then A has caused B.

Another thing to watch out for is a conclusion in which a correlation has been inferred to continue beyond the data with which it has been demonstrated. It is easy to show that the more it rains in an area, the taller the com grows or even the greater the crop. Rain, it seems, is a blessing. But a season of very heavy rainfall may damage or even ruin the crop. The positive correlation holds up to a point and then quickly becomes a negative one. Above so-many inches, the more it rains the less com you get.

Another fertile field for being fooled lies in the confusion between percentage and percentage points. If your profits should climb from three percent on investment one year to six percent the next, you can make it sound quite modest by calling it a rise of three percentage points. With equal validity you can describe it as a one hundred percent increase. For loose handling of this confusing pair watch particularly the public-opinion pollers.

Not all the statistical information that you may come upon can be tested with the sureness of chemical analysis or of what goes on in an assayer’s laboratory. But you can prod the stuff with five simple questions, and by finding the answers avoid learning a remarkable lot that isn’t so: 1) About the first thing to look for is bias—the laboratory with something to prove for the sake of a theory, a reputation, or a fee; the newspaper whose aim is a good story; labor or management with a wage level at stake…2) How does he know? 3) What’s missing: You won’t always be told how many cases. The absence of such a figure, particularly when the source is an interested one, is enough to throw suspicion on the whole thing. Similarly a correlation given without a measure of reliability (probable error, standard error) is not to be taken very seriously. 4) Did Somebody Change The Subject? when assaying a statistic, watch out for a switch somewhere between the raw figure and the conclusion. One thing is all too often reported as another. 5) Does It Make Sense? ‘Does it make sense?” will often cut a statistic down to size when the whole rigmarole is based on an unproved assumption. You may be familiar with the Rudolf Flesch readability formula. It purports to measure how easy a piece of prose is to read, by such simple and objective items as length of words and sentences. Like all devices for reducing the imponderable to a number and substituting arithmetic for judgment, it is an appealing idea. At least it has appealed to people who employ writers, such as newspaper publishers, even if not to many writers themselves. The assumption in the formula is that such things as word length determine readability. This, to be ornery about it, remains to be proved.

An enjoyable and very educative quick read. I highly recommend it.