Article Revised: March 27, 2019
Question from a Black Belt student
When selecting a model two of the criteria say that the Standard Error be small and R Square be large.
Since R Square is a proportion, one might think that the usual 95% would be the threshold — correct?
It may be surprising, but most of the cutoff values used in statistics are quite arbitrary. This includes the p-value cutoff of 5% for hypothesis tests and confidence intervals. This discussion of p-values is important because in the discussions of R2 and S below I assume that the p-value for the model is “statistically significant.” In other words, I discuss the cutoffs for R2 and S values based on the assumption that both are from statistical models that meet an arbitrary cutoff for the model’s p-value!
With R2 no arbitrary cutoff has ever become the accepted norm. Thus the advice that R2 should be “large.” In this case, large is in the eye of the experimenter. Indeed, what is considered large varies a great deal according to the type of experiment being conducted. In social science experiments researchers are delighted with statistically significant R2 values as low as 0.2 or even lower. In hard science and engineering experiments R2 values greater than 0.9 is often expected. As a general rule, the more the researcher knows about the science, the better controlled the experiment can be and the expectation for R2 increases. Obviously, humans behavior is poorly understood, to the point where some question the usage of the term “Social Science.” So R2 values that physicists and engineers would dismiss out-of-hand are acceptable in that field.
In Lean Six Sigma we usually find ourselves somewhere in the middle of these two extremes. If our projects involve customer responses, then statistically significant R2 values around 0.5 might be enough to give us the direction we need for improvement. But if we are improving, say, cycle time through a a process, then our threshold would be higher, perhaps 0.7. Still, as you can tell, these are arbitrary. The point is that we want our data to point us in the right direction for making improvements. What R2 value will do this for us varies on a case-by-case and project-by-project basis.
The proper value for the standard error, S, is also subject-matter and experiment or project specific. In fact, S and R2 are just two different ways of describing the same thing: how well the statistical model fits the data. R2 is a proportion or percentage, while S is in the units of the response variable. S is the standard deviation of the residuals (model errors.) Since residuals from good models are normally distributed, the S value can be used to model the distribution of modeling errors in the same units as the response variable. This often makes it easier for subject-matter experts to tell you, the Black Belt, what an acceptable value of S should be.
One last thing, there are published papers that treat various statistical cutoff values, such as p-values, much more rigorously. For example, p-value cutoffs based on economic or risk considerations. If you have a deeper interest in the subject these papers are worth looking up. Look in journals such as The Journal of Quality Technology or Quality Engineering. Expect to see a bit of math in these papers.