Making the Most of Small Samples: Understanding Statistical Analysis Techniques

What is “Small”?

Not enough data. We’ve all been there before. It may be that the production rate is too low. (I’ve worked with processes where one unit per month wasn’t unusual.) It may be that you have many covariates, such as customer research. If you slice-and-dice customers by industry type, geographic region, age, sex, product type, years of experience, etc. even a sample of several thousand isn’t enough to properly populate the resulting table. Or maybe you’ve just started using a new process and the number of units produced is still small.

Philosophically the problem we’re trying to avoid is a logical fallacy known as “hasty generalization.” According to Wikipedia:

Hasty generalization, also known as fallacy of insufficient statistics, fallacy of insufficient sample… is the logical fallacy of reaching an inductive generalization based on too little evidence. It commonly involves basing a broad conclusion upon the statistics of a survey of a small group that fails to sufficiently represent the whole population.
Wikipedia contributors. (2022, April 20). Faulty generalization.

In other words, we look at too few and generalize to the many too quickly. Statistically there are two basic problems:

The sample is not valid.
The sample is not reliable.

Validity means that the sample measures what it is supposed to measure. A sample can be invalid in many different ways. If my sample is small and the population is large and varied, my sample is unlikely to represent all of the richness and variety of the population. But even a large sample might be invalid. For example, my sample might only include units that are conveniently available. I.e., I might only talk to customers who are easy to reach and interview by telephone. Thus my sample would be customers who spoke my language, were in my time zone, and were in their offices. If my business is international, this sample would be invalid no matter how many customers were surveyed.

Reliability, in a statistical sense, means consistency in the result. If repeated samples give much different results, the samples are unreliable. For example, if I survey n=2 customers the variation between repeated samples is likely to be so large that my conclusions would change frequently. What’s happening here is that we are weighting each sample unit equally, but when the sample is small this equal weighting overestimates the importance of any sample unit that is not close to the population mean. In other words, small samples tend to be biased. The sample needs to be large enough to give every unit its proper weight.

What’s a person to do?

In a sense, we can never have “enough” data. Statistics by its nature involves data reduction. We look at a subset of a population, convert the richness of the sample to mere numbers, then we summarize our sample numbers with a few statistical estimates. Despite all of the difficulties, we are forced to deal with inadequate data. Here are a few statistical approaches to help you do this.

Randomize

The three rules of sampling are: randomize, randomize, randomize. Although a very small sample can never adequately represent a large, complex population, randomization reduces the likelihood that your sample will be biased. After you’ve selected your sample you can compare it to the population for known sources of variation. If there are large differences you can either increase your sample size or, resample from scratch, or just keep the differences in mind when you perform your analysis and make your recommendations.

Starting Up a New Process^[1]

Most statistical process control (SPC) training assumes that you are applying SPC to a process that has been in operation for a while. When this isn’t the case, you should use a three-stage approach to SPC. The first stage involves finding the process by collecting data from a small number of samples. First stage control limits are calculated using control chart factors that take the number of subgroups into account. The second stage begins when all of the first stage data are with control limits and applies a different set of control chart factors to the remainder of the test run. The third stage applies to future runs and uses standard control chart factors.

Sequential Statistical Tests

If you are testing a statistical hypothesis the usual approach is to take a sample n, calculate the test statistic, compare the statistic to a critical value or critical region, then reject or fail to reject the hypothesis. However, this is not the approach to take if your goal is to minimize your sample size. A better way is to calculate the test statistic and make the comparison for every sample unit. In other words, look at the first unit, compare to the critical value or region, reject or fail to reject the hypothesis. If inconclusive, look at a second unit. Repeat until you’ve made a decision. These sequential hypothesis tests provide the minimum sample sizes for any given hypothesis. The next smallest sample sizes are obtained from truncated sequential tests, which are sequential tests with a predetermined stopping point. The traditional approach requires the largest samples. You will find information on sequential statistical tests in most textbooks on statistics.

Analysis of Sparsely Populated Tables

When there are many categories of covariates you may end up with large tables containing hundreds or thousands of cells. If many of these cells contain no data, the table is said to be sparsely populated. One way to deal with this is to apply an approach called “smear and sweep.”

Smearing involves selecting a pair of classification variables and creating a two way table from them. The values in the table are then swept into categories according to their ordering on the criterion variable. Table 1 shows an example of smearing death rates per 100,000 operations for a sample classified by age and sex, which are confounding variables. The sweeping is performed by creating a new category based on similar death rates, shown in Table 1 as Roman numerals.

Sex	20-29 y/o	30-44 y/o	45-59 y/o	60-69 y/o
Male	90 I	109 II	118 III	128 III
Female	58 I	85 I	101 II	112 II

Table 1: Smear of Death Rates by Age by Sex

The next iteration uses the new sweep variable and another confounding variable. Table 2 shows this with operation type as the new confounding variable and Roman numerals with a prime (‘) as the new sweep variable.

Operation	I	II	III
A	61 I’	78 I’	97 II’
B	81 I’	102 II’	123 III’
C	103 II’	120 III’	138 III’

Table 2: Smear of Death Rates by Operations by Sweep

At this point the new sweep variable reflects the effects of the three confounding variables age, sex, and operation type. The process continues until all confounding variables are accounted for. If done properly (a lot of judgment is involved in selecting the cutoffs for the sweep variables) the smear and sweep method will produce less biased results than ignoring confounding variables. Some simulation studies show that bias may result from the application of smear and sweep.

[1] Pyzdek, Thomas (1992). Pyzdek’s Guide to SPC Volume Two–Applications and Special Topics. Quality America , Tucson , Arizona , 100-101.

Small Samples