Article Revised: March 26, 2019
The quality and process improvement professions tend to rely heavily on statistical information. The very science of quality control can be said to have begun with Walter A. Shewhart’s development of the control chart and discovery of the concepts of special cause and common cause variation. And where would Six Sigma be without statistics? But few would argue with the statement that there is a downside, and a dark side, to statistics. I’ll present a few examples of good, bad, and ugly uses of statistics.
There’s no shortage of words describing the benefits to be derived from using statistics. Statistical texts describe the advantages of using statistical methods at great length; I’ve contributed a page or two myself. Statistics can force us to look at data and facts, rather than relying on opinions and letting strong personalities force their beliefs on others. Statistical thinking uses data to separate variation from special causes and variation from common causes, thereby aiding decision making and learning. Statistics help us test our beliefs and to learn from experience. Statistics are the bridge between raw data and knowledge and understanding. They provide the means by which we can test our theoretical models of reality and learn from them. When statistical analysis fails to confirm our initial hypothesis we are forced to re-evaluate the hypothesis, which leads to improved understanding. Even if statistical analysis confirms our beliefs, we gain insight through the rigor it provides.
Statistical analysis is especially useful when we are faced with complex situations that challenge human understanding. Even two-dimensional problems can become too complex to understand completely without statistical tools. Response surface analysis tells us about optimum process settings, and explains what will happen if the settings are not precisely controlled. Problems of higher dimension are often beyond us unless we break out our statistical tool kit to study the situation. For example, it helps the call center manager understand how contact resolution, waiting time, agent professionalism, and the phone menu combine to create customers who return to buy more, or customers who abandon you and tell their friends to stay away.
Graphics are a useful and necessary tool, but statistics can often enhance our graphical analysis. For example, if a bar chart is created for some metric, there will almost certainly be bars of different heights. But it takes statistics to tell us if the differences are meaningful or merely misleading. Sometimes the graphics are actually graphs of statistics. These uses, and many, many others, are examples of good statistics.
Statistics are numbers. Numbers can not provide a complete picture of anything. They are an abstraction of reality and, because of this, they are not the reality itself. No amount of measurement and quantification can capture any real thing to any significant degree. People often forget this simple fact and begin to think of customers and employees as revenue or cost sources, complaints, or something other than the complete human beings that they are. In this sense, anything that uses numbers represents a potential barrier to understanding the reality being studied.
Statistics are worse than raw numbers. They reduce the numbers themselves into a smaller quantity of numbers. A mean (or any other statistic) is a single number that may represent thousands of individual measurements. There is no possible way that statistics can fail to lose some of the information contained in the original measurements.
Statistics trap us into analysis paralysis. No amount of statistical analysis can ever produce certainty. There is always a probability that our conclusions will be incorrect. This is unavoidable, but some people have real problems dealing with it. Instead of accepting the uncertainty and acting anyway, they gather more data or apply more analysis to the data they already have. I’ve seen Six Sigma Black Belts who can’t be pried out of the Analyze phase with a crowbar.
Statistics are confusing. The aim may be to use statistics to clarify, but many people find statistics to be very confusing. Even technically adept scientists and engineers sometimes have difficulty. For example, try explaining what it means to have a range chart that shows statistical control to a layperson. “The process variability is consistent” doesn’t immediately make sense to most people. More complicated methods require intensive study and a good deal of hands-on experience to understand.
Statistics are often used without graphics. The first three rules of data analysis may be:
- Plot the data
- Plot the data
- Plot the data
but take a look at nearly any scientific paper and you’re likely to find the results of statistical analysis presented in tables and words rather than in charts. Presenting the results of statistical analysis in numbers rather than in pictures is bad practice.
“There are three kinds of lies: lies, damned lies, and statistics.”
The statement, attributed to Benjamin Disraeli, refers to the persuasive power of numbers, the use of statistics to bolster weak arguments, and the tendency of people to disparage statistics that do not support their positions. The deliberate misuse of statistics is an ugly fact of life. One can pick up a newspaper on any given day and find ugly statistical abuse. But it goes beyond attempting to deceive others. People have a tendency to overlook or ignore statistics that contradict their own beliefs, even when there is no one but themselves involved. Statistics are often used by data bullies to pummel their opposition. And statistical experts often aren’t.
Do you have examples or stories of good, bad, and ugly statistical usage? I’d really like to hear about them. Please provide your comments.