Questions: What is collinearity? Why is it a problem? How do I know if I've got it? What can I do about it?
Materials
When IVs are correlated, there are problems in estimating regression coefficients. Collinearity means that within the set of IVs, some of the IVs are (nearly) totally predicted by the other IVs. The variables thus affected have b and b weights that are not well estimated (the problem of the "bouncing betas"). Minor fluctuations in the sample (measurement errors, sampling error) will have a major impact on the weights.
Big values of VIF are trouble. Some say look for values of 10 or larger, but there is no certain number that spells death. The VIF is also equal to the diagonal element of
R-1, the inverse of the correlation matrix of IVs. Recall that
b =R-1r, so we need to find R-1 to find the beta weights.
Tolerance
Tolerance = 1 - R2i = 1/VIFi
Small values of tolerance (close to zero) are trouble. Some computer programs will complain to you about tolerance. Do not interpret such complaints as computerized comments on silicon diversity; rather look to problems in collinearity.
How to Deal with Collinearity
As you may have noticed, there are rules of thumb in deciding whether collinearity is a problem. People like to conclude that collinearity is not a problem. However, you should at least check to see if it seems to be a problem with your data. If it is, then you have some choices:
- Lump it, but cautiously. Admit that there is ambiguity in the interpretation of the regression coefficients because they are not well estimated. Examine both the regression weights and zero order correlations together to see whether the results make sense. If the regression weights don't make sense, say so and refer to the correlation coefficients. Nonsignificant regression coefficients that correspond to "important" variables are very likely.
- Select or combine variables. If you have multiple indicators of the same variable (e.g., two omnibus cognitive ability tests, two tests of conscientiousness, etc.), add them together (for an alternative, see point 3). If you are in a prediction only context, you may wish to use one of the variable selection methods (e.g., all possible regressions) to choose a useful subset of variables for your equation.
- Factor analyze your IVs to find sets of relatively homogeneous IVs that you can combine (add together).
- Use another type of analysis (path analysis, SEM).
- Use another type of regression (ridge regression).
- Try unit weights, that is, standardize each IV and then add them without estimating regression weights. Of course, this is no longer regression.
No comments:
Post a Comment