After reading our blog article, please feel free to leave a message below in the comments area.
Article Revised: May 3, 2019
I held a Webinar for Pyzdek Institute students entitled Statistical Surprises and Absurdities. Topics discussed included sampling bias, misused and misleading averages, distorting results by use of selective data weighting, selective reporting, missing information, distorted graphics, Say What? and So What? statistics, and much more! Here’s the recording
So what we see here is the title page, and an interesting quote “There are three kinds of lies: lies, damned lies, and statistics.” That’s a quote from Mark Twain, at least it’s attributed to Mark Twain. He attributed it to Disraeli, but most scholars agree that Disraeli didn’t say it. So, let’s just give credit to Mark. I’m going to talk to you about the statistical lies, and I also want to give you another quote from Mr. Twain that I’m going to go into in the end. It’s been the way in which the issue of a statistical lying has been addressed and primarily it is, “When in doubt tell the truth.” So you can’t avoid “lying” if you can’t avoid distorting the statistics, at least let the readers know that you’ve done it and point out some of the ways in which you were not able to control the data.
First of all, what’s a surprise? Well it’s a legitimate statistical result that contradicts our expectations. An
absurdity on the other hand is an unexpected result that leads us to a conclusion that’s not true. So for
example; you may have data on our call center that shows that people who have been on hold for an hour, are just as satisfied with the service that they received as people who are only on hold for one minute or less. That’s an absurd result and what probably is happening if you give some additional thought is that people who don’t mind staying on hold or staying and people who are upset are actually leaving. So, sometimes the statistical results are absurd and if you’re in the business of Six Sigma/Lean
Six Sigma you don’t want to be called on ideas, trying to defend ideas that are really indefensible. Lying with statistics; well that’s a special kind of lying because that’s using statistics and numbers to give a misleading impression and often this is done deliberately. Sometimes it’s done accidentally and it’s important to understand how this happens and learn what is in fact misleading so that you can avoid it when you do your presentations. So a statement that uses statistics to convey a misleading result is the worst kind of lie according to Mark Twain and I tend to agree with him.
One kind of lying or one kind of problem is sample bias. So for example, well first of all let’s talk about a sample. A random sample is a sample in which every observation or every person or thing in the population has an equal chance of being represented in your sample. Let’s consider a statistical claim of a university that may say “The average salary of our get of our graduates is a hundred thousand a year.” Right away your antenna should go up, your thought process should begin to ask questions such as these; Whose address or phone number is the school likely to have? Is it more likely to have alumni club
members or graduates who are homeless people who don’t even have a phone? So if the answer is one or the other well then that group is going to be more heavily represented in the sample. Well considering, let’s say they do have all of the data that they need and they do have all the names and making contact with all of their alum. Who’s most likely to respond? Successful alumni or unsuccessful alumni? Again there’s another source of bias. Let’s consider that we’re asking people how much money they make. Which is more likely: respondents who will overstate their income or respondents who will understate their income? To the extent that these questions and many, many others lead to a sample where the people in it are less likely or more likely than those in the general population to be in the sample, you have a sample bias situation. Excuse me. Accurate responses don’t affect the bias. So, if we’re looking at people overstating, people understating and people telling either their correct income, the person who’s telling you the correct number is not biased. So, therefore their response will not affect the bias. At this point I want you to just think in your own mind other sources of sample bias and I’m sure you can come up with several. When you look at any statistic be it a statistic like this one or even a statistic that you get from your Lean Six Sigma projects or quality improvement projects, consider the possibility that it may be biased and try to measure that bias so that you are dealing with accurate data.
Another way to mislead is just to use a summary statistic and really all statistics are summaries. You’re taking a large number of values and you’re reporting a small number of statistics to represent these values. So let’s look at averages. What might be wrong with an average? Well, let’s say we’ve got some salary data. We have salaries from our previous report of: $50,000; $50,000; $70,000; $500,000. Based on this data, would you say the average salary is: A) $50,000; B) $167,500 or C) $60,000. Well, the correct answer is all of the above. A, is the mode. That’s the value which occurs most often that’s
fifty thousand dollars. B, is the arithmetic mean. So, if you sum those numbers and then divide by the count you’ll have an answer of one hundred sixty seven thousand five hundred. It’s very heavily influenced by that extreme value of five hundred thousand dollars. C, is the median. So, if you have an even number of values the median is midway between the two middle values. Midway between fifty and seventy, you get sixty thousand dollars. So the question would be I guess, which of these most accurately represents what you’re trying to convey to the audience for your data.
It’s also possible that your sample will have some data represented more heavily than other data. So for example; here’s some data that shows a u-shaped pattern. Off to the left we see a peak, it flattens out. Off to the right we see a peak. It’s possible that because of under-representation in the smaller group on the left and over representation of other values in the large group on the right, you can get a pattern that’s distorted so it doesn’t look like the original data just because you’ve got more of one particular data value than another. This actually occurred. In one of the IPCC reports on climate change there was a chart published that statisticians analyzed. The original chart showed what was called a hockey stick pattern, with a flat line for about a thousand years followed by a sharp upward spike. That was because there was an over-representation of a particular type of data. When the data were corrected and the statistical error is taken out you had more of a “U” shaped pattern. So even scientists that have millions of dollars in their budget can make these mistakes.
Another way to misrepresent the data or lie with statistics is selective reporting. So if you watch commercials, oftentimes the company will represent their product with the statistics. They’re going to tell you something about their product; it’s the most popular, it’s the most reliable etc. Well here’s one way in which they may give you result that needs to be questioned: Let’s say that you have an ad and it says “8 out of 10 dentists prefer your toothpaste.” Well one way they may get that ad, is by taking a number of small samples and then picking the one that gives the result that most favors their product. So here’s the sample that they picked, and indeed 8 out of 10 dentists in this sample preferred Smiley
Toothpaste. They fail to report, however, that in another sample only 2 out of 10 dentists preferred their toothpaste. What they’re doing also with this is, they’re taking a number of small samples and cherry-picking. So they’re picking the result that favors the position that they want to convey.
I’m actually old enough to remember ads like this. Back in the day they ran ads just like this, one
in fact, this is a real ad: “More doctors smoke camels than any other cigarette.” Well, what they don’t tell you is there’s missing information. First of all, most doctors don’t smoke. Even back then these were called coffin nails and cancer sticks for a reason. Most doctors understood that there was a
health risk even if it hadn’t been measured and quantified like it has now. But, they understood the risk and they knew it was there and most doctors didn’t smoke at all. They didn’t offer that choice and had they offered that choice, probably the ad would have said, “More doctors don’t smoke then smoke any other cigarette.” Another thing is they would often advertise things like; “Chesterfields have less tar than any other brand.” Well if you take a metric, a measurement, and you rank order the results something is going to be the smallest and something’s going to be the largest. So there’s a statistic that can be reported, it sounds very powerful and scientific, but in fact it has no meaning at all.
I’ve actually seen charts like this one presented in meetings. This is a chart where you can pretty much say whatever you want to because there’s no scale on the left hand side. The actual results of the scale are missing. You’ll see a lot of these in newspapers where they’re making a point about a trend, but without that value you can’t really evaluate the trend. So you have to have a scale and the scale has to be correct. Correct depends on just what you’re trying to tell the audience. So the correctness of a graph depends on the message you’re trying to convey.
So for example: here’s a chart where you have a strong upward trend, an apparent strong upward trend. You say summary headline on the chart “prices skyrocket” but if you look at the scale this time we got the numbers on there. You can see that between the first quarter and the fourth quarter, prices only increased by 0.15 percent. So this is not as large of a price increase as the chart makes it appear. What we can also do is we can make this even more dramatic by changing the chart scale. On this chart you see the same value of 0.15 percent, only it looks even bigger. The reason it does is that instead of starting at 0%, like the previous chart did, we’re going to start with the smallest value of 1/10 percent. We’re going to truncate this chart at the highest value. So now we made .15 percent look even larger. This is a way to distort the impression that you get from the chart by distorting the scale. That’s why a chart like a Pareto diagram has a very carefully defined axis. The Pareto diagram is going to start at zero on the left and right axis and it’s going to go to a hundred percent on the right axis and to close to the total cumulative total on the left. That’s because we want, when we’re doing a Pareto analysis, to compare the bars to the total problem. We want to see how much of the contribution they’re making to the entire problem that we’re trying to study.
Modern computers will let you create charts that use images. You can assign an image to the bar and the chart will rescale the image according to the values on the chart. So, what we might have there.
Welcome to go to webinar web events made easy. My apologies for that interruption here. I’ve got a Bluetooth hearing aid and the battery died in my Bluetooth, so I wouldn’t be able to hear you if I had to unmute you. So that was a moment taken off the seminar for technical difficulties.
Anyway, you have to be careful when you’re using a picture to represent some metric and I’ll show you why. Let’s say that we have a picture of an item or an object such as; cow, house, or car. Normally in the real world our minds are conditioned to look at these things in three dimensions because that’s what they are in the actual real world. One common mistake is to pick an aspect of that image and use it to represent our metric. So let’s say we’re trying to show what happens when the price doubles from two dollars a pound for beef to four dollars a pound. We decided well it sounds very reasonable. We’re going to put an image that’s two inches tall in there for the $2 a pound cow and we’re going to expand that image to four inches when we’re representing the $4 a pound cow. So let’s see what happens. Oh you no longer see the slides. My apologies. You see what you were missing was this this fantastic picture of a cow. So here’s a cow representing $2 by a cow picture that’s two inches high. When we increase the height from two inches to four inches, what we see is that the image of the cow expands by more than that. So if our mind looks at this, we see the surface area has doubled and in fact the total volume of the cow is increased by even more than that. So again that’s something we may innocently do and then inadvertently give a misleading impression.
A couple other kind of statistics I want to talk about; one of them I call is the “say what” statistic. A say what statistic is missing something. It’s generally a statistic that’s missing the denominator, it’s missing the base the missing, which is what we’re using for the standard for comparison. So for example, I might say lean organizations are 20% more efficient. If I don’t say what they’re more efficient than, then I’ve got a say what statistic. What you need to do when you hear statistics is try to contemplate what the statistic is actually saying and if it doesn’t make sense you stop the person who said it and ask for clarification. If you don’t do that, it’s just simply too easy to get the wrong impression or the wrong idea. Often the speaker won’t even know they’ve done it and it’s going to help them convey a clear message to everybody.
Andres I’ve unmuted you do you have a question? OOP a hand went down so I guess that’s a no.
Another kind of statistic would be a “so what” statistic. These are numbers that are provided by people who don’t have any special qualification. So for example; if I were to tell you that more black belts preferred Puffs facial tissues, the correct response from you would be “who cares.” Black belts are no more qualified than anyone else to make this assessment. You’ll often see celebrities testifying before the house or in front of the Senate committee on things such as; topics of interest to them. While we all recognize the celebrity and his opinion or her opinion may be an interesting thing to know about, you still have to bear in mind that they really probably don’t understand the subject matter anymore than you do. Therefore, they’re no better qualified to make a judgement than you are.
Quite often we’re looking at data in tables. Tt’s very important to analyze the data correctly. So here’s data. The data is made up but the problem is not. This was actually a law suit and it involved gender bias in school admissions. What you can see from this table is that we have a school that had a certain number of males apply for admission and the acceptance rate for the male students was 70%. We had a similar number of females apply for admission and their acceptance rate was only 40. It certainly appears that this shows an indication of gender bias; the males are being accepted into the university at a much higher rate than the women are. But it happens that this particular school has another variable that needed considered. They had two programs. The top table represents a program with the very tough admissions criteria so a lot more people were rejected at that school than the bottom college. The bottom columns easy admission data shown in the table on the bottom. So here what we have is the male’s, 200 of these applied for admission, into the tough program the admission rate was only 20%. So only 40 out of the 200 were accepted. 800 women applied and out of the 800, 200 were accepted. So their admission rate was 25%. So what you see is that more women were admitted to the tough program than men. So now we no longer see the bias and may even see a bias in favor of women applicants. On the bottom chart, we see that 800 males applied for admission into the easy to get into school and 83% were accepted. 200 women applied and all of them were accepted. So again the women had a higher acceptance rate than the men. If you add these numbers together you’ll see that again we have the same numbers that add up in the total for the original table. So we have 660 males plus 40 males, that’s 700 who were accepted. We have 200 females in the tough program and 200 in the easy program so that’s 400 women. The numbers in this original table are just split up according to some classification category to create the numbers in the two smaller tables. This actually has a name. This is called “Simpsons Paradox” and it occurs anytime you have a table with data in it that can be classified according to another variable that’s not shown in the table. You see it can actually reverse the direction of the relationship. So you have to be quite careful when you’re looking at data in tabular form, to be sure that you’re actually not masking a relationship by excluding another variable.
Here are some statements that are true but they’re misleading. Let’s assume the following: For every hundred thousand people who don’t drink our soda yippee, then 99999 will survive for a year. But, our yuppie drinkers have a slightly lower life expectancy. So out of a hundred thousand yippee soda drinkers only ninety nine thousand nine hundred ninety eight will survive for a year. Well it’s quite possible that the yippee drinkers enjoy it so much that they’re willing to take that slight additional risk. But let’s consider how this would be reported in the press. This is a true statement: drinking yippee doubles your risk of dying. Well how can that be it certainly doesn’t appear that it’s a big difference between the two. It happens when you report the relative risk instead of the absolute risk. The risk for non-yippee drinkers is one in a hundred thousand, while the risk for yippee drinkers is two in a hundred thousand. The relative risk is this larger number divided by the smaller one. That’s your relative risk and that’s two, thus the headline. It’s not inaccurate. It’s absolutely correct, but it’s a quite misleading.
Let me give you an example from today’s news. So this is a headline from the London Daily Mail: Taking painkillers long-term triples the risk of kidney cancer. I mean how scary is that if you take the aspirin or ibuprofen for your arthritis or some condition that you have. If you read this article, it’s quite long article, at the very bottom you’re going to find this; The overall risk of renal cancer is small, only six per hundred thousand people are expected to develop this. So that means that out of a hundred thousand people, ninety nine thousand nine hundred and ninety four people will not get renal cancer. If you break that up between those who are taking pain killers and those who are not, the small additional absolute risk is very tiny. It’s quite possible that a person who needs pain killers because their pain, would accept this small additional risk. But, that headline is likely to scarab and cause them to make the wrong decision.
Well they’re actually now is a field that’s been spawned about twenty years ago and it’s a field that looks at not just medical science but all research. Its research you know is a an industry worth several hundred billion dollars a year. It deserves our attention. This particular school of thought has looked at medical research. I want to show you some other examples of medical research that have been discovered and are hopefully in the process of being corrected but they still have a ways to go.
Here’s a typical example of a reputable Journal reporting a result. This result, they looked at the scientific study in the summarizer study in their article and in their conclusion was; painkiller taken by millions can increase the heart attack and stroke risk by 40 percent. Now that we know that they’re talking about relative risk, we know that we can’t really evaluate this number unless we see the raw data. The raw data for this particular report is not made available in the article. In fact, this report is a summary of other research so it’s not actual research but it summarizes actual research. That involves a number of other risks that we’re going to talk about in a minute. So here we go, 40% increase what does it mean in terms of us making a decision. We can’t say because we don’t have the data to assess this properly. This study came from a peer-reviewed journal, which sounds impressive until you look at that journal and you see that it’s also published a paper called “Why Most Published Research Findings Are False.” So basically there are a lot of different reasons, but the truth is we have to be very careful when we use the findings to firm reported medical research to make decisions about our lives and the health of ourselves and our loved ones.
But what they have started doing as a result of all the work that’s been done, some journals anyway, have required the authors to point out the limitations of the study that they’re familiar with. This is the study that was reported on the previous page about the pain killers and the risk of heart attacks. First of all it relied on observational studies, rather than randomized controlled trials. So this is the weakest kind of medical research, is an observational study where you’re actually just following a large group of people over a long period of time and these people that are included are really not controlled for in any meaningful way. The data and the studies came from a large administrative database and may not have been comprehensive. So we have missing variables, they don’t include information such as the nonprescription drug use, aspirin, and so on and so forth. We already know from the Simpsons paradox analysis that that could make a lot of difference. There were confounders, so some of the people may have had heart problems and other things going in that caused the data to be skewed or biased. Suffered from heterogeneity; so they had a number of different studies that collected on different groups of people at different points in time for different lengths of time and so on. So they’re mixing apples and oranges and that throws doubt on the findings of these systematic reviews. Give you a case in point: cholesterol studies often include groups, a group that’s known as hypercholesterolemia. These are people who have a genetic predisposition to be extremely sensitive to their cholesterol intake in their diet. If those groups are included in the research groups, you’ll see a much more pronounced effect of cholesterol on heart problems than if that group is excluded from the data.
So the final word; don’t let this happen to you. Be very careful. Try to do your best to tell the truth with your data. Control for bias and take large samples. Try to randomize your sampling process. Just the morals of this story as I see it are twofold. One is, a thing is more than the numbers that represent it. You can’t reduce any real-world thing, especially a person. But, also a process or a product or service you can’t reduce it to set of numbers without giving something up. The second thing is, if you have a large set of numbers and you summarize those numbers with statistics, you’re going to lose something by creating that summary. You have to understand there’s no way to avoid it you can only be aware of it.
Here’s a link to the slides presented in the webinar.
Take a tour! In this interactive video you will discover what you will learn in our Lean Six Sigma Black Belt training and certification program.