Your cart is currently empty!
How Do I Compute Sigma? Let Me Count the Ways
With SPC work, we normally try to analyze a process distribution’s shape, central tendency and spread. We usually measure this last item by computing an estimate of the process standard deviation, or sigma, designated with the Greek letter ฯ. There are several ways to do this; I’ll discuss the pros and cons of some of the more common methods used to estimate sigma. We’ll refer to these sigma estimates with the symbol ฯ.
For the purpose of discussion, I invented a data set to analyze, as shown in Table 1. The data reflects a process that starts with an average of 10. The process is influenced by a special cause that makes the average increase by 1 every hour. The standard deviation remains constant at 1, and the data follow a normal distribution.
N_{1}  N_{2}  N_{3}  N_{4}  N_{5}  Xbar  R  s  MR 

10.147  9.700  8.722  10.244  11.276  10.018  2.554  0.926  
10.746  12.198  12.733  8.816  10.766  11.052  3.917  1.526  0.599 
12.326  13.095  10.913  11.310  10.310  11.591  2.785  1.116  1.580 
12.415  11.153  12.022  12.226  10.882  11.740  1.533  0.680  0.089 
12.291  13.432  13.596  14.135  13.635  13.418  1.844  0.682  0.124 
16.367  14.673  14.630  16.343  14.915  15.386  1.737  0.892  4.076 
15.608  15.814  15.487  17.972  16.866  16.349  2.485  1.058  0.759 
17.873  19.376  16.345  18.661  15.388  17.529  3.988  1.643  2.265 
19.677  18.539  18.902  19.919  17.915  18.990  2.004  0.822  1.804 
15.954  18.476  19.675  18.619  19.758  18.496  3.804  1.538  3.723 
Method No. 1: The Standard Approach to Sigma Computation
s = \sqrt{\frac{\sum_{i=1}^N (x_i  \bar{x})^2}{N1}}
This is the standard approach, used by calculators and spreadsheets any time users require sample sigma. The numerator is the sum of the squared deviations from the sample averageโi.e., subtract the sample average from the first observation and square it, then the second, etc. Then add the results. If the data cluster close to the average, this sum will be smaller than if they are scattered more widely. Thus, a bigger value of s indicates a process with greater scatter.
The denominator indicates the degrees of freedom. The N1 term in the denominator is a bias correction. For a given sample size, the denominator will be a constant. Thus, estimates of s can be compared directly for different processes when sample sizes are the same. Any observed differences can be attributed to data scatter, which may (or may not) indicate different process scatter.
Shewhart showed that this traditional estimate of s is only valid when the process is stable. If a process is influenced by a special cause, then this estimate will overestimate the process scatter. For our example, the formula estimates s as 3.337, far greater than the actual value of 1. The difference is due to the trend created by the special cause. Because the estimate includes variation from the special cause, detecting the special cause is harder to do. The 3sigma limits from this estimate are 4.446 and 24.468, which include all of the data.
Method No. 2: Reducing Special Cause Variation with Rational Subgroups
\hat{\sigma} = \frac{\bar{R}} {d_2}
Using an estimator that doesn’t include the variation between time periods will alleviate the problem of ฯ being inflated by special causes. Shewhart proposed using rational subgroups to do this. A rational subgroup is a sample selected in such a manner that the opportunity for a special cause to influence the results is minimized. This is often accomplished by selecting consecutive units from a process.
In Table 1, the data are arranged in 10 subgroups of five measurements per subgroup. The first group of five were sampled in hour No. 1, the next group in hour No. 2 and so forth. The table indicates no change in the process during the time the subgroup was collected, so it’s the ideal from the Shewhart perspective.
With these “clean” subgroups, we can estimate the process dispersion for each subgroup, then combine the results to find the overall estimate of ฯ. One way to estimate dispersion is to find the range, R, by subtracting the smallest observation in the subgroup from the largest. After doing this, we can average the R values and use a correction factor, d_{2}, to find ฯ. For subgroups of 5, the d_{2} factor is 2.326; for our data, the average range is 2.665. This gives ฯ=1.146, which is much closer to 1.0.
Method No. 3: Addressing Inefficiency in RangeBased Sigma Estimation
\hat{\sigma} = \frac{\bar{s}}{c_4}
The range uses only two data values from each subgroup, which poses a problem. In statistical terms, it’s inefficient. That is, the estimates of ฯ based on the range will be more erratic than when subgroup ฯ values are used. The range estimate inefficiency gets worse as the subgroup size increases.
Method 3 works by finding the subgroup ฯ values, then averaging them to get ฯ and dividing this by the biascorrection factor c_{4}. Subgroup ฯ values are computed using the formula shown in method 1 for each subgroup separately. Obviously, this is more tedious than finding the range for each subgroup. With method 3, we get an estimate of ฯ=1.158 for our data.
Method No. 4: The Median Moving Range Approach to Sigma Estimation
\hat{\sigma} = 1.047\times{{M}}\tilde{R}
If it isn’t possible or desirable to collect data in subgroups, we can correct for special causes by finding the range between consecutive hourly samples. To get ฯ, we must multiply the median moving range, by the correction factor 1.047. For our data, we get ฯ=1.654. The estimate is somewhat larger than the estimates we obtained from subgrouped data because the moving ranges don’t completely factor out the differences between the subgroup. However, the estimate is closer to the correct value than the ฯ value found with method No. 1. Recent research suggests that this approach gives good results for a wide variety of outofcontrol patterns.
Method No. 5: Using the Average Moving Range for Sigma Estimation
\hat{\sigma} = \frac{{{M}\bar{R}}}{d_2}=\frac{{{M}\bar{R}}}{1.128}
This method for estimating ฯ is based on the average moving range. Doing this for the sample data set gives ฯ=1.532. Because it’s also based on the moving range, this estimate suffers from the same shortcomings as method No. 4.
Comparing the Different Sigma Computation Methods
Table 2 summarizes the results of all of these methods. The least accurate result is found when the standard formula is used. For SPC work, this formula should only be used when a control chart shows good statistical control. Despite the fact that, for our example, the result was slightly more accurate than the estimate, the best formula from a statistical perspective is method No. 3. For subgroups of five, the advantage isn’t all that great, but it becomes greater as the subgroup size increases. The method usually comes in second to the method, unless the statistical advantage is outweighed by some practical concernโsuch as ease of understanding.
Actual sigma  1  
Method No. 1  3.337  Standard Formula 
Method No. 2  1.146  Rbar 
Method No. 3  1.158  sbar 
Method No. 4  1.654  Median Moving Range 
Method No. 5  1.532  Average Moving Range 
The moving range methods, while inferior to the subgroup methods, are far better than the standard ฯ formula. Generally speaking, the median moving range estimate gets the nod over the average moving range estimate.
For the purpose of discussion, I invented a data set to analyze, as shown in Table 1. The data reflects a process that starts with an average of 10. The process is influenced by a special cause that makes the average increase by 1 every hour. The standard deviation remains constant at 1, and the data follow a normal distribution.
4 responses to “How Do I Compute Sigma? Let Me Count the Ways”

Hi,
This is a very useful article. Where would I look up the correction factors required for these computations? (I assume that these factors depend on the size of subgroups used to compute the moving range. In other words, if you had moving range based on groups of 4 or 6 observations instead of 5, the values of the factors would be different for the ones you used).YZK

The old text Statistical Quality Control Methods by Irving W. Burr explains the derivation of most of the correction factors discussed here. You might need to search a bit to find it. Try https://www.abebooks.com/ if you canโt find it elsewhere.


This is a very useful article to understand SPC concerns when using a sigma calculated using the standard formula.
In fact I went and recreate the same population and recalculated the sigma using all five methods to understand the calculations. In all method but method five, the result yielded the same result. Method five gave me 1.479 instead of 1.532.
I believe that the value 1.294 in the MR column should read 0.759 (16.36715.608).
I hope you will correct me if I am wrong. The method five is the one I am looking to use and I would like to be sure I am in full understanding of it.Thanks.

Quite correct about the typo. The table has been updated. Thanks for letting me know!

Leave a Reply