# What the Heck is Multicollinearity?

Article Revised: March 26, 2019

A Lean Six Sigma Master Black Belt was perplexed by the software’s correlation and regression analysis output. The results were pure nonsense. In addition to regression coefficients that were negative when common sense told him they should be positive (and vice versa), some of the correlation coefficients were large, while the corresponding regression coefficient p-values were insignificant. What the heck was going on?

The problem with the Master Black Belt‘s data was multicollinearity, a condition that’s encountered more often as Six Sigma practitioners move from engineering and manufacturing data to customer survey data.

Multicollinearity refers to linear intercorrelation among variables. Simply put, if nominally “different” measures actually quantify the same phenomenon to a significant degree–i.e., the variables are accorded different names and perhaps employ different numeric measurement scales but correlate highly with each other–they’re redundant.

Multicollinearity isn’t well understood by the applied statistics community. Re–searchers at Arizona State University found widespread misunderstanding of multicollinearity and other regression analysis concepts among graduate students who had taken, on average, five graduate and undergraduate statistics courses. I’ll try to explain multicollinearity by using a simple, everyday metaphor: a table. Consider the photograph of a table shown in figure 1. This is a finely built table I purchased at Wal-Mart for only \$99 (chairs were included). In my metaphor, the table is a model. The legs of the table represent x variables, or predictors. The top represents the y variable, the predicted variable. My intent is to build a table where the top is well supported by the legs, i.e., the predictions are well explained by the predictors.

The table legs are placed at 90° to each other. This angle provides maximum support for the top. In statistics, when two variables are uncorrelated with each other, they’re said to be orthogonal, or at 90° to one another. If you create a scatter diagram of two uncorrelated variables and draw best-fit lines, you get something that looks like figure 2: The lines are at 90° to one another.