PSYC 7804 - Regression with Lab
We want to see how happiness (happiness) relates to age (age), and number of friends (friends).
We can calculate the correlation among our predictors
Although useful, correlation by itself may not be the measure we are interested in. Rather, we may want to know the relationship between two variables after accounting for another set of variables.
Partial correlation and semi-partial correlation are what we are looking for in this case.
If \(r_{xy}\) represent the correlation between \(x\) and \(y\), then
Partial correlation:
where \(z\) is controlled only for \(y\)
where \(z\) is controlled only for \(x\)
spot differences and similarities in related formulas
Notice that the numerator (top part) of these formulas is always the same, and the denominator (bottom part) changes slightly. This means that if you want to understand the differences, you should not look at the numerator at all (pretend it does not exist!), but only at the denominator.
As always, I like calculating things by hand first to make sure I know what is going on 🤷 Let’s say that \(y =\) happiness, \(x =\) age, and \(z =\) friends.
Let’s first get our vanilla correlations
So, \(r_{xy.z}-.16\) is the correlation between happiness and age after taking out the proportion of variance explained by friends in both variables.
The semi-partial correlations between \(x\) and \(y\) accounting for \(z\) can either be:
\(r_{y(x.z)} = -.15\) represents the relationship between \(y\) and \(x\) after the taking out of \(x\) the variance explained by \(z\).
\(r_{x(y.z)} = -.16\) represents the relationship between \(x\) and \(y\) after the taking out of \(y\) the variance explained by \(z\).
ppcor FunctionsIn practice, we use the ppcor package
happiness age friends
happiness 1.0000000 -0.15415119 0.32093907
age -0.1629192 1.00000000 0.03298535
friends 0.3251053 0.03161529 1.00000000
The functions from ppcor work in the same way as the ones used in Lab 2, where significance and other information is saved to a separate element.
We are often (actually always) interested in comparing different models. One often relevant question is: “is it worth it to add more variables as predictors?”
In the case of regression, the most popular way of comparing models is by comparing \(R^2\) from nested models. Let’s go back to how happiness (happiness) relates to age (age); then, we want to know whether adding number of friends (friends) improves our prediction.
The answer seems fairly straightforward: the model with both age and friends does better. But there is a catch with \(R^2\) 🤨
The “problem” with most popular measures of model fit such as \(R^2\) is that they always increase as variables are added, even when the added variables are unrelated to \(Y\).
\(R^2\) increases When we add a random variable 😦
However, this is certainly a case where adding a randomly generated variable is not worth it.
We want to apply the principle of Occam’s razor, and choose the simpler model if the increase in \(R^2\) is not worth it.
In hierarchical regression we check whether adding variables to a model yields a significant \(R^2\) improvement (\(\Delta R^2\)). We can use the anova() function to compare nested regression models. The F-test tells us whether the \(R^2\) improvement is significant.
Analysis of Variance Table
Model 1: happiness ~ age
Model 2: happiness ~ age + friends
Res.Df RSS Df Sum of Sq F Pr(>F)
1 98 234.6
2 97 209.8 1 24.807 11.47 0.001023 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
friends provides a significant improvement in variance explained, \(\Delta R^2 = .1, F(1, 97) = 11.47, p = .001\)
Analysis of Variance Table
Model 1: happiness ~ age + friends
Model 2: happiness ~ age + friends + random_var
Res.Df RSS Df Sum of Sq F Pr(>F)
1 97 209.80
2 96 208.04 1 1.7575 0.811 0.3701
There also exist a family of statistics used for comparing model based on information theory. What you will often see are the Akaike information criterion (AIC; Akaike, 1974) and Bayesian information criterion (BIC; Schwarz, 1978).
Both AIC and BIC calculate model fit, and then penalize it based on model complexity (number of estimated parameters). The logic is that if you are going to add an extra variable to a regression, it should improve fit enough to offset the penalty term. For regression:
\[AIC = N \times ln(\frac{SS_{\mathrm{resisuals}}}{N}) + 2p\]
\(N =\) sample size
\(p =\) number of model parameters (i.e., intercept, slopes, residual variance)
\(SS =\) sum of squares of the residuals
\(ln() =\) stands for the natural log
\[BIC = N \times ln(\frac{SS_{\mathrm{resisuals}}}{N}) + p\times ln(N)\]
The \(N \times ln(\frac{SS_{\mathrm{resisuals}}}{N})\) part, which is shared by both formulas, measures model fit. In contrast \(2p\) is the AIC penalty, while \(p\times ln(N)\) is the BIC penalty. The BIC penalty is stricter.
AIC() and BIC() FunctionsWe can calculate both AIC and BIC for all our models at once with the AIC() and BIC() functions. Both for AIC and BIC, the smallest value is the model that fits best according to the criteria.
df AIC
reg_age 3 375.0604
reg_age_fr 4 365.8845
reg_age_fr_rand 5 367.0433
The second model fits best.
In both cases the second model fits best because the addition of the random variable is not enough to offset the penalty for adding an additional variable (ok, that is to be expected, we added a random variable).
Although, AIC and BIC can disagree (often in my experience!). Usually, AIC likes more complex models, while BIC prefers simpler models. If I have to choose, I tend to trust BIC more.
Finally, information criteria like the AIC and BIC can be used to compare non-nested models (the data must be the same though), unlike \(\Delta R^2\), which can only be used to compare nested models.
There are a couple of caveats to be aware of when using AIC and BIC.
Different functions/software will calculate AIC and BIC differently (I was confused for a bit myself before finding this and this). You should not compare AIC and BIC if they come from different functions/software (or, if you need to, be very careful).
The AIC() function adds a \(N \times \mathrm{ln}(2\pi) + N\) to the AIC (for math reasons), but other functions may not. So always use the same function/software to calculate AIC and BIC.
Lab 6: Semi-Partial, Partial-Correlations, and Model Comparison