Home > Moderated Multiple Regression v5
MODERATED MULTIPLE REGRESSION:
WORK NOTES AND SYNTAX
Version 6
Winnifred R. Louis, School of Psychology, University of Queensland
w.louis@psy.uq.edu.au
� W. R. Louis, 2009.
You can distribute
the following freely for non-commercial use provided you retain the
credit to me and periodically send me appreciative e-mails.
READER BEWARE
- undergrads should read with caution - sometimes the advice re writing
and analysis here contradicts what is advised in your courses. Obviously
you must follow the advice given in courses. The discrepancies
could be because 1) undergrad stats is an idealized version of reality
whereas postgrads grapple with real data and publication pressure 2)
statistical decision-making requires personal choices and different
profs & practitioners may differ.
A wise practice for any write-up
is to scan the intended publication outlet (the journal, other theses
in the same lab, etc.) and try to find 3 or 4 examples of how the analysis
has been written up before there to serve as models.
What is a moderator?
It is a variable that changes
the relationship between an IV and a DV. A significant interaction
between the moderator and the IV means that the effect of the IV on
the DV changes depending on the level of the moderator. In multiple
regression, we say that the simple slope of the IV on the DV changes
depending on the level of the moderator, and with continuous moderators
we generally compare “high” levels of the moderator (+1 standard
deviation above the mean) to “low” levels (-1 SD below the mean).
Mediators vs Moderators
In mediation, the IV and the
mediator are associated (correlated), and the IV and the DV are correlated,
and there is an implied causal path (“because”) that links the three
variables. The IV causes the DV because the IV causes the
mediator which causes the DV.
In moderation (to get a significant
interaction), the IVs need not be correlated with each other or with
the DV. In moderation, the link between the IV and the DV is different
for high vs low levels of the moderator. There is no because.
It’s more like if-then contingencies: If there’s high moderator,
then the IV does this with the DV, and if there’s low moderator, the
IV does this with the DV.
The IV (self-esteem) impacts on grades (the DV) but it’s moderated by motivation to study. [At high motivation, there’s a link between self-esteem and grades, but at low motivation, there’s no link – everyone does badly.]
Hard drugs lead to increased mortality but it’s moderated by car ownership.
[At low car ownership, drugs lead to mortality, but at high car ownership the link is stronger.]
Communication leads to relationship
satisfaction but it’s moderated by utterance valence. [If valence
is positive, communication increases relationship satisfaction.
If negative, communication reduces relationship satisfaction.]
Writing up moderated multiple
regression
A write-up for a two-way moderated multiple regression generally has four parts: the text, two tables, and the graph.
How to do this in SPSS
FREQUENCIES
VARIABLES=iv1 iv2 iv3 iv4 dv1 dv2 gender group
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
1. With multiple DVs, one can use Analyze > Correlate > Bivariate
2. enter all ivs and DVs
3. click options > “Exclude cases listwise” and in the same window “Means and standard deviations” > continue
4. click paste
CORRELATIONS
/VARIABLES= iv1 iv2 iv3 iv4 dv1 dv2 gender group
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=LISTWISE .
Run this syntax. Use
the means and standard deviations and inter-correlations to form in
Table 1. Often Table 1 also contains the scale reliabilities in
the diagonal. You get this from earlier reliability analyses when
you created the scales.
NB Your IVs should not be too
highly intercorrelated – a rule of thumb is anything over .3 you should
ponder whether there’s mediation happening or whether the two IVs
are tapping the same thing & could be averaged. See Tabachnick
and Fidell (1996) on this point.
3. Calculate centered
scores for all IVs by subtracting the mean: I like to use c_ as
a prefix indicating it’s a centered score. Work in the syntax
window (too much time otherwise going through compute).
Compute c_iv1 = iv1 – [numerical mean as seen in output for correlations or freq above].
Compute c_iv2 = iv2 – [numerical mean as seen in output].
Compute c_iv3 = iv3 – [numerical mean as seen in output].
Compute c_iv4 = iv4 – [numerical mean as seen in output].
execute.
*recode the categorical variables
so that they have meaningful zero points and only two levels.
I do not recommend using 1, 2; this has a bad effect on the constant
/ graphs etc. Do not use 0, 1 unless the zero group is a baseline
or reference group. I recommend 1, -1 unless you have thought
deeply about alternatives. But if you have extremely unequal n
in the two levels you probably should think deeply about alternatives
and go with weighted effect coding (e.g., for 75% women, women = +.25
and men -.75). See Aiken and West (1991) on this point.
If (gender=2) women = 1 .
If (gender=1) women = -1 .
Execute.
*assuming the original coding
was women are 2, men 1, this creates a two-group *categorical IV where
+1 are women and -1 are men.
For our group variable, if there are 3 groups, we need to create two (k-1) variables for the regression.
*The first one:
If (group=1) grp1vs23 = 2 .
If (group>1) grp1vs23 = -1 .
Execute.
*creates a contrast code comparing
the first group (e.g., a control condition) to the last two. Another
way of doing the same thing is:
If (group=1) grp1vs23 = 2 .
If (group=2) grp1vs23 = -1 .
If (group=2) grp1vs23 = -1 .
Syntax for the second contrast code:
If (group=1) grp2vs3 = 0 .
If (group=2) grp2vs3 = 1 .
If (group=3) grp2vs3 = -1 .
*creates a contrast code comparing the latter two groups to each other.
Execute .
*you pick contrasts that are
orthogonal to each other and based on theory.
If there is one meaningful
baseline or reference group such as a control condition, you can use
dummy coding (0,1) to compare each condition to the controls:
If (group=1) dum2v1 = 0 .
If (group=1) dum3v1 = 0 .
If (group=2) dum2v1 = 1 .
If (group=2) dum3v1 = 0 .
If (group=3) dum2v1 = 0 .
If (group=3) dum3v1 = 1 .
Execute.
*Usually dummy codes less useful
than contrast codes in my opinion.
FREQUENCIES
VARIABLES=c_iv1 c_iv2 c_iv3 women grp1vs23 grp2vs3
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
* Whenever you create variables,
take the time to look at your formulae twice for errors, and then to
look at the distributions and make sure that they make sense and you
didn’t make an error in the computation. Never skip this
step. Always check your newly created variables to see
if they have reasonable (near zero) means and standard deviations.
You don’t center the DVs
as this serves no statistical purpose. (However, if you do center
the DVs, nothing bad happens – you get the exact same regression results.)
4. Calculate the interaction term.
********************************************************************
*SYNTAX FOR two continuous variables interacting.
********************************************************************
*I like to use the prefix ci_
for variables to indicate it’s an interaction.
compute ci_v1xv2 = c_iv1 * c_iv2 .
execute.
FREQUENCIES
VARIABLES=ci_v1xv2
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
5. Analyse > Regression > Linear; enter DV and IVs ; click on statistics “descriptives”.
NB that for other purposes,
some of which will be discussed in this talk, I also see value in clicking
on R2ch, part and partial correlations, collinearity diagnostics, durbin-watson
and casewise diagnostics. Also plots histogram of residuals, normal
p-p plot, and the scatterplot zresid by zpred. Hit paste.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v1xv2
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*Table 2 is created based on
the output from this main analysis. You also report the R2 change
for each block and note the significant coefficients in the text.
Note. For clarity, you put
the interaction term in a separate block. This is because even if
it is not significant, an interaction term distorts the coefficients
for its component variables. So you should not interpret or
report a ‘direct effect’ for one IV if its interaction is in the
same block / equation .
To avoid extra complexity and
word count, people sometimes put in the components and the interactions
in the same block. If you do this and you want to report / interpret
the individual components’ effects (e.g. if the direct effect of iv1
is sig but there’s an interaction term in the same block – regardless
of whether the interaction coefficient is significant or not) you need
to add a footnote in the text re the fact that if you drop out the interaction
term there’s no decrease in R2 and no change in the pattern of the
coefficients. This is because, to restate, controlling for (entering)
the interaction term often results in a main effect dropping out or
becoming significant. To test this even if you plan to report
the main effects and interaction in a single block you can add in a
block dropping out the interaction term. If it changes the pattern
you would like to report in some way you would either footnote it or
use the more standard regression model structure of entering the component
IVs in Block 1, and the interaction in Block 2.
*syntax for joint entry followed by dropping out the interaction term to see if it’s an issue.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 ci_v1xv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=REMOVE ci_v1xv2
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
If for eg c_iv1 is sig in b1 but not
in b2 then it should NOT be interpreted from B1, as a general rule.
*6. If the interaction coefficient
is significant, plot the data. I normally use the Excel File (Winnifred’s
interaction plotting). There is also a useful program at the preacher
web site (google preacher moderation).
You need to decide how to break
up the simple slope analysis. As a general rule you look at the
simple slopes of the variable you are most interested in, at each level
of the variable(s) (i.e., the moderator(s)) you are less interested
in. If you graph in Excel, note the unstandardized slopes in the
excel file that you’re interested in.
*7. calculate the simple slopes for +/- 1 SD of the moderator.
*To do this first look up the standard deviation for the moderator. Say IV1 is the moderator and its SD is 1.28.
*You then use the syntax below to calculate new centered scores for low and high.
*notice the counter-intuitive formulae where you center for low levels of the moderator by adding SD, for high by subtracting.
*nb you don’t need to capitalize
the L and H in the interaction term – I just do that to make it stand
out more when I’m checking my syntax.
compute c_iv1lo = c_iv1 + 1.28 .
compute c_iv1hi = c_iv1 – 1.28 .
compute ci_v2v1L = c_iv1lo * c_iv2 .
compute ci_v2v1H = c_ivhi * c_iv2 .
execute .
FREQUENCIES
VARIABLES=ci_iv1lo ci_iv1hi ci_v2v1L ci_v2v1H
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
*run two regression analyses, substituting in iv1 low and high and interaction low and high and reading out the simple slope of v2 in the final block (ie, the block with the interaction in it).
*NB all the other control / IV variables you have in the original model have to be in the regression equation for the simple slopes too. Otherwise your simple slopes will not come out right.
*For the simple slope of v2
at low v1 :.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1lo c_iv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v2xv1L
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*Report the c_iv2 coefficient
in Block 2 as the simple slope of v2 at low v1 (after the interaction
term with the centered low moderator has been entered; the centered
low moderator itself was already entered in Block 1, & both must
be in the equation).
*for the simple slope of v2 at high v1 : .
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1hi c_iv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v2xv1H
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3) .
*Report c_iv2 coefficient in Block 2 as the simple slope of v2 at high v1.
*If you have done it right,
the unstandardised slopes in the excel file will be similar to the unstandardised
slopes in SPSS, providing a double check that you’re doing it right.
********************************************************************
*SYNTAX FOR one continuous variable interacting with a two-group categorical variable (e.g., gender).
********************************************************************
*I like to use ci to indicate it’s an interaction.
*Assume that your categorical
variable has been meaningfully coded, e.g. +1 women -1 men.
compute ci_v1xwo = c_iv1 * women .
execute.
FREQUENCIES
VARIABLES=ci_v1xwo
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v1xwo
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*If the interaction coefficient
is significant, plot the data in the Excel File (Winnifred’s interaction
plotting). NB usually you would use the categorical variable as
the moderator because it doesn’t make sense to have a line running
from low to high [category].
Note the unstandardized slopes
in the excel file that you’re interested in.
*Because we’re dealing with
groups, it is permissible and in some cases desirable to simply break
the data up by group and run a simple calculation of the slope by doing
separate regression analyses for each group. You can cite Aiken
and West (1991) to justify this analysis if you need to but it is common
practice and most reviewers will not quibble. Go to data view
in SPSS and click on Data > Split File > Compare groups.
Select the moderator and click the little arrow to put it in the box
marked “Groups Based On:”. Hit paste. If ga2 were your
moderator, you should get:
SORT CASES BY ga2 .
SPLIT FILE
LAYERED BY ga2 .
*Copy your previous regression equation below.
*delete the moderator variable and the
interaction term.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 grp1vs23 grp2vs3
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3) .
*Below the regression equation, type “SPLIT FILE OFF .” NB the period at the end.
SPLIT FILE OFF .
Highlight the whole section
from “Sort cases” to “Split file off” with your cursor and hit
the arrow in the spss syntax toolbar, so it all runs (or use control
R). You should get model output and coefficients for each group
of ga2 separately.
Note that if you have a problem
of small N, or heterogeneous variance, you may prefer to use simple
slope calculations on the pooled data set. But on the other hand,
heterogeneous variance may be meaningful and real in the data – in
which case if you use the pooled data (or just calculate what the slopes
“should be” according to the regression equation) your results may
be misleading. (You can tell if you have these concerns
if the slopes in excel differ from the SPSS results, even after you’ve
double checked your SPSS syntax to make sure you’re doing it right.)
To use pooled data with the
categorical variable, just center as before. A variable with 1
-1 coding and equal n has an SD of 1 – but you should check this first.
FREQUENCIES
VARIABLES=women
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
compute c_womhi = women – 1 .
compute c_womlo = women + 1 .
compute ci_v1woL = c_iv1 * c_womlo .
compute ci_v1woH = c_iv1 * c_womhi .
execute .
FREQUENCIES
VARIABLES=c_womhi c_womlo ci_v1woL ci_v1woH
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
*run two regression analyses, substituting in women low and high and interaction low and high and reading out the simple slope of v1 in the final block (ie, the block with the interaction in it).
*NB all the other variables have to be
in the regression equation too.
*For the simple slope of v1 at low women
(i.e., for men), read the coefficient for v1 in block 2:.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 ci_womlo grp1vs23 grp2vs3 /METHOD=ENTER ci_v1xwoL
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*For the simple slope of v1 at high women
(i.e., for women in the sample), read the coefficient for v1 in block
2:.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 ci_womhi grp1vs23 grp2vs3 /METHOD=ENTER ci_v1woH
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
********************************************************************
*SYNTAX FOR one continuous variable interacting with a three-group categorical variable (e.g., a manipulation with 1 as control and 2 and 3 as two treatments).
********************************************************************
*where there are three categorical groups, you need two variables to code this. (With k categories, you need k-1 variables to code group membership. Read Howell or Tabachnick and Fidell on this point if nec.) The way we’ve done it, we are using unweighted effect codes where the first variable compares the control group (group 1) to the two treatments and the second compares the two treatments (groups 2 and 3) to each other.
*You need to look at the interaction
of the variable with each of the two effect codes.
*creating the 2 effect codes.
If (cond=1) grp1vs23 = 2 .
If (cond>1) grp1vs23= -1 .
If (cond=1) grp2vs3 = 0 .
If (cond=2) grp2vs3 = 1 .
If (cond=3) grp2vs3 = -1 .
Execute .
*creating an interaction term
for EACH categorical coding variable – so you have k-1 interaction
terms.
compute ci_v1g1v = c_iv1 * grp1vs23 .
compute ci_v1g23 = c_iv1 * grp2vs3 .
execute.
*checking the distributions of the new variables for weirdness.
FREQUENCIES
VARIABLES=ci_v1g1v ci_v1g23
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
*regressing the dv on the ivs and control variables in Block 1, and the interaction terms in Block 2.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v1g1v ci_v1g23
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
NB that you can *never* test
one effect code in a categorical variable with more than two groups
without the other code(s) also being in the equation. (Well you
can, but the results are misleading – the variable could be sig when
in reality the groups are not significantly different, or vice versa.
You always need k-1 interaction terms in the equation together.)
*If the second block R2 ch
is significant, the continuous variable interacts with the categorical
group. Each of the two coefficients serves as a planned contrast.
If the first interaction term is significant, the slope of iv1 is different
in the control condition compared to the other two conditions, and if
the second is significant, the slope is different in treatment 2 compared
to treatment 3. To break down any of these interactions, you can
again plot the data in Excel (WIP). Because we have a complex
design, you probably want to simply break the data up by group and run
a simple calculation of the slope for each group.
SORT CASES BY cond .
SPLIT FILE
LAYERED BY cond .
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 women
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3) .
SPLIT FILE OFF .
Read the simple slope coefficient and significance level for IV1 for each group as appropriate.
If you get results from the
split-file analysis that don’t match the pooled data, you have a problem
of power and/or unequal variance. You can then calculate the simple
slopes on the pooled data, using a similar syntax structure as above.
********************************************************************
*SYNTAX FOR THREE CONTINUOUS VARIABLES INTERACTING.
********************************************************************
*These have three two-way interactions
to consider, plus the three-way.
compute ci_v1xv2 = c_iv1 * c_iv2 .
compute ci_v1xv3 = c_iv1 * c_iv3 .
compute ci_v2xv3 = c_iv2 * c_iv3 .
compute ci_123 = c_iv1 * c_iv2 * c_iv3 .
execute.
FREQUENCIES
VARIABLES=ci_v1xv2 ci_v1xv3 ci_v2xv3 ci_123
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
*A good way of entering the
variables is to have the two-ways entered in their own block, followed
by the three-way in a third block. Again, you shouldn’t interpret
the component direct effects or any lower order interactions if there
is a higher-order interaction in the same block. If for space
considerations you put everything in together and you want to interpret
a main effect, you need to footnote it is still there if all the interactions
are removed. If you put everything in together and you want to
interpret a two-way you need to footnote that it is still there if the
three-way is removed. Etc..
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1 c_iv2 c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v1xv2 ci_v1xv3 ci_v2xv3 /METHOD=ENTER ci_123
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*If the threeway interaction
is significant, instead of looking at the simple two-way interactions
you can go directly to looking at the simple slopes [technically the
simple simple slopes, but no one calls them that] of the key IV at high
and low levels of the two moderators. You can cite Aiken and West
on this point if you need to but it has become common practice.
*Plot the data in the Excel
File (Winnifred’s interaction plotting). Decide how to break
up the simple slope analysis. Note the unstandardized slopes in
the excel file that you’re interested in. NB it is a big PITA
doing the simple slopes in a 3-way so scrutinize the graph to pick the
best IV, if there isn’t a theoretical driver.
*Calculate the simple slopes for +/- 1 SD of each moderator.
*Say iv1 with its SD 1.28 and
IV2 with an SD of 2.41 are moderating the effect of IV3. We will
calculate the simple simple effects of IV3 at each combo of levels of
high and low Iv1 and Iv2.
*notice the counter-intuive formulae where center for low by adding SD, for high by subtracting.
*notice the PITA number of
interactions to recalculate.
compute c_iv1lo = c_iv1 + 1.28 .
compute c_iv1hi = c_iv1 – 1.28 .
compute c_iv2lo = c_iv2 + 2.41 .
compute c_iv2hi = c_iv2 – 2.41 .
execute.
compute ci_v3v1L = c_iv3 * c_iv1lo .
compute ci_v3v1H = c_iv3 * c_iv1hi .
compute ci_v3v2L = c_iv3 * c_iv2lo .
compute ci_v3v2H = c_iv3 * c_iv2hi .
execute.
compute ci_1Lx2L = c_iv1lo * c_iv2lo .
compute ci_1Lx2H = c_iv1lo * c_iv2hi .
compute ci_1Hx2L = c_iv1hi * c_iv2lo .
compute ci_1Hx2H = c_iv1hi * c_iv2hi .
execute.
compute ci_31L2L = c_iv3 * c_iv1lo * c_iv2lo .
compute ci_31L2H = c_iv3 * c_iv1lo * c_iv2hi .
compute ci_31H2L = c_iv3 * c_iv1hi * c_iv2lo .
compute ci_31H2H = c_iv3 * c_iv1hi * c_iv2hi .
execute.
*Stare carefully at your syntax.
Re-read the syntax looking for errors, which are incredibly easy to
miss and needless to say can result in meaningless results which when
discovered after thesis chapters are written or manuscripts submitted
result in heart-rending psychological agony. Go through each formula.
Does it make sense? Ogle it for correctness. Never skip
this step.
FREQUENCIES
VARIABLES=ci_iv1lo ci_iv1hi c_iv2lo c_iv2hi ci_v3v1L ci_v3v1H ci_v3v2L ci_v3v2H ci_1Lx2L ci_1Lx2H ci_1Hx2L ci_1Hx2H ci_31L2L ci_31L2H ci_31H2L ci_31H2H
/STATISTICS=STDDEV MINIMUM MAXIMUM SEMEAN MEAN MEDIAN SKEWNESS SESKEW
KURTOSIS SEKURT
/HISTOGRAM
/ORDER= ANALYSIS .
*run FOUR regression analyses, substituting in iv1 low and high and interaction low and high and reading out the simple slope of the key IV in the final block (ie, the block with the interaction in it).
*NB all the other control variables from the original interaction equation have to be in the regression model for the simple slopes too.
*For the simple slope of iv3
at low v1 low v2:.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1lo c_iv2lo c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v3v1L civ3v2L ci_1Lx2L /METHOD=ENTER ci_31L2L
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*take the coefficient for iv3
from the final block (with all the variables and interactions in it)
– this is the simple simple slope for IV3 for low IV1 and low IV2.
The unstandardized coefficient should match the excel file. You
usually report the beta and p-value.
*Simple slope of V3 at IV1
lo iv2 high.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1lo c_iv2hi c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v3v1L civ3v2H ci_1Lx2H /METHOD=ENTER ci_31L2H
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*Simple slope of V3 at IV1 high iv2 high.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1hi c_iv2hi c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v3v1H civ3v2H ci_1Hx2H /METHOD=ENTER ci_31H2H
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*Simple slope of V3 at IV1 high iv2 low
.
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1hi c_iv2lo c_iv3 c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v3v1H civ3v2L ci_1Hx2L /METHOD=ENTER ci_31H2L
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
********************************************************************
* ONE CONTINUOUS VARIABLE INTERACTING WITH TWO TWO-LEVEL GROUP VARIABLES .
********************************************************************
*If it’s significant, the
groups will be the moderators and it’s then ok to split the data file
into the four groups and run the simple slope for the continuous variable
for each. But if you find you run into problems with N and variance,
and you’ve effect coded your groups (1 / -1) just repeat the analyses
above – the group variables will have SDs of 1ish, depending on the
equal n.
********************************************************************
FOUR LEVELS AND ABOVE.
4-ways and above are similar
to 3-ways and above. They can be quite interesting, but people
will hassle you about having an overly complex design. Many successful
researchers have a rule of thumb to study only 3 way interactions or
lower.
If you do insist on looking
at a four-way, don’t forget:
1. Never interpret a main
effect or interaction coefficient if other higher-order terms are in
the equation. If something is sig on its own and drops out
if higher interactions are entered and the higher order interactions
are ns you can report & interpret it. But if it’s sig when
considered jointly with higher-order interactions and ns on its own
it’s likely artefactual and should not be interpreted unless you’re
desperate – remember, it probably won’t replicate. (This is
not true for simple slopes, which are always reported from the final
block with all the variables entered including the interactions of the
IV(s) with the centered hi or lo moderator(s).)
2. Higher-order interactions
involve more possible lower-order interactions – if you’re looking
for a 2way, you have only one 2-way interaction to consider; if you
look for a 3-way, you have 3 two-ways to consider, and if you’re looking
for a 4-way, you have 6 2-ways (and 4 3-ways) to consider. The
same rule as above about not interpreting lower-order interactions’
coefficients if higher-order interactions are in the equation applies.
But in addition, if you enter all the 2-ways and get significant results,
but the interaction disappears if you drop out the other ns two-ways,
it’s possibly artefactual and should only be reported if you had theoretical
justification for the two-way (or can easily scrabble a justification
up and are desperate for results but again, remember it probably won’t
replicate.).
3. All interactions have lower
power than the main effects for a given sample size, and the higher-order
the interaction the less power. (If you have 80 people and 2 marginal
means in a one-way design, you have 40 per estimate, vs 20 each for
4 means in a two-way interaction with two two-level variables, or 10
each for 8 means in a 2 x 2 x 2 interaction. And don’t forget
with a possible interaction between 2 continuous variables with 7 points
each, in theory you’re estimating 49 cells (7x7).) All of this means
that if you get a significant interaction, esp. with continuous IVs,
you’re doing well. And if you’re trying to find significant
interactions you need large N because non-significant results with moderate
N will open you to a charge of inadequate power when you submit your
manuscript. 4-ways need enormous samples to yield the power for
reliable detection of effects.
4. Interactions are extremely
vulnerable to outlier effects, and the higher-order the interaction
the more distorting the outlier can be. If you have a wonky interaction
you may find it disappears if you control for univariate outliers (detectable
through frequency distributions) and multivariate outliers (detectable
through casewise diagnostics and/or through mahal and cooks saving residuals).
If you have an expected interaction, it is still wise to check whether
it disappears if you take out the handful of people.
To check the effects of deleting
outliers, first save the file as a new file (e.g., noouts.sav) and then
delete outliers and rerun your regression syntax. You’ll need
to do multiple passes as new multivariate outliers will turn up in the
new analyses.
5. In general, you should never
test for interactions in MR unless the interactions are theoretically
justified.
6. Because of the problems
of power and instability (i.e., interactions significant in one study
and not in the next because of power and outlier issues) you cannot
really take seriously four-way interactions in most psych data sets
– they are too small. If you expect one theoretically, it’s
probably best to conduct small scale studies that address the simple
simple interactions and simple simple simple slopes and then write up
the multiple studies as one package arguing the 4-way on theoretical
grounds, vs trying to do one huge study which has a high chance of failure
because of the instability of interactions in MR and low power.
However, if you are trolling
through your data and find a cool interaction you long to report, bear
in mind that it will likely not replicate and reviewers know this and
will be sceptical in manuscripts.
In theses, it’s not as anti-normative
to report four-way interactions, but it’s still unusual.
7. Presentation of a 4-way:
If the interaction is significant, instead of looking at the simple
three-way interactions, simple simple two-ways, and then the simple
simple simple slopes, you can go directly to looking at the simple
slopes [technically the simple simple simple slopes, but no one calls
them that] of the key IV at high and low levels of the three moderators.
You can cite Aiken and West on this point if you need to but it has
become common practice.
Because of the # of simple
slopes, you’ll need an additional Table for them. Graphs are
less common in manuscripts (b/c they’re too expensive and you would
need too many graphs to describe all the slopes). However they’re
good for theses. E.g., to decompose a four-way, you might show
the four simple simple two-way interactions, picking the moderators
on theoretical grounds. Or perhaps only the significant simple
simple two-ways.
In the text, it’s very common
to use summary language: “For high identifiers (+1 SD), the perceived
group norm predicted behaviour in all conditions, βs > .23, ps
< .05, whereas low identifiers (-1 SD) conformed to the norm when
the intergroup relationship was perceived as more illegitimate (+1 SD)
and more impermeable (+1 SD), β = .27, p = .037, but not if
either legitimacy or permeability were low (-1 SD), |β|s < .09,
ps > .176.” NB in this case the perceived group norm
has been highlighted as the key IV, identification as the key moderator,
and the other 2 moderators of less importance. But if your theoretical
model doesn’t allow some moderators to be seen as less important,
you may need to discuss all of the simple slopes individually in the
text. In addition, it would be common to have a Table of all the
simple slopes and their significance (some add the constant for each
simple slope equation as well).
8. Under some conditions, the
theory predicts simple interactions or simple simple interactions, in
which case you cannot go directly to the simple simple simple slopes.
Go to the lowest theoretically justified test. E.g., if you hypothesize
that group identification interacts with norms to predict behaviour,
such that only high identifiers conform, but this effect is found only
in the presence of illegitimacy and impermeability, then you would report
your sig 4-way and then go straight to the tests of the simple simple
interaction of ID and norms at each level of high and low legitimacy
and permeability.
NB your write-up should be
as short as possible to minimize strain on reviewers’ brains.
“As predicted, the 4-way interaction was significant when entered
in the final block, R2 ch = , F=, β=, p=. Tests of the simple
interactions of identity x norms revealed the predicted interaction
only under low (-1SD) legitimacy / low permeability, β=, p=; the other
simple interactions were not reliable, βs < , ps >. Follow-up
analyses for the simple slope of group norms for high (+1 SD) and low
(-1 SD) identifiers revealed …” NB you can delete the R2 ch
F for the 4-way interaction to shorten the text or if you entered
all the interactions in a single block.
NB also when you report the
simple simple interactions you report them from the final block with
all the higher-order interactions entered with the moderators centered
at high and low.
9. Again, if any of the variables
involved are categorical, Aiken and West can be cited to justify splitting
the file by groups and running separate regressions to test the simple
interactions or slopes. However, if you run into a problem of
power and/or heterogeneity of variance using the pooled sample and centering
can be better for you.
10. The WIP excel file can’t
create graphs for a 4-way, but if you run the syntax for the simple
three-way interactions at high and low [moderator1] you can plot those
in the graph. Or go through the hand calculations from the regression
equation for the 4-way.
Example of syntax for 4 continuous
IVs going directly to the simple slope of IV4.
Made up SDs would be observed
from the original regression syntax.
*six levels of the moderators.
compute c_iv1lo = c_iv1 + 1.28 .
compute c_iv1hi = c_iv1 – 1.28 .
compute c_iv2lo = c_iv2 + 2.41 .
compute c_iv2hi = c_iv2 – 2.41 .
compute c_iv3lo = c_iv2 + 1.61 .
compute c_iv3hi = c_iv2 – 1.61 .
execute.
*2 way interactions – 18 [3 x 2 + 3
x 2 x 2].
compute ci_1L4= c_iv4 * c_iv1lo .
compute ci_1H4 = c_iv4 * c_iv1hi .
compute ci_2L4 = c_iv4 * c_iv2lo .
compute ci_2H4 = c_iv4 * c_iv2hi .
compute ci_3L4 = c_iv4 * c_iv3lo .
compute ci_3H4 = c_iv4 * c_iv3hi .
execute.
compute ci_1L2L = c_iv1lo * c_iv2lo .
compute ci_1L2H = c_iv1lo * c_iv2hi .
compute ci_1H2L = c_iv1hi * c_iv2lo .
compute ci_1H2H = c_iv1hi * c_iv2hi .
compute ci_1L3L = c_iv3lo * c_iv1lo .
compute ci_1L3H = c_iv3lo * c_iv1hi .
compute ci_1H3L = c_iv3hi * c_iv1lo .
compute ci_1H3H = c_iv3hi * c_iv1hi .
compute ci_2L3L = c_iv3lo * c_iv2lo .
compute ci_2L3H = c_iv3lo * c_iv2hi .
compute ci_2H3L = c_iv3hi * c_iv2lo .
compute ci_2H3H = c_iv3hi * c_iv2hi .
execute .
*Three way interactions – 20 of em.
compute ci_1L2L4 = c_iv4 * c_iv1lo * c_iv2lo .
compute ci_1L2H4 = c_iv4 * c_iv1lo * c_iv2hi .
compute ci_1H2L4 = c_iv4 * c_iv1hi * c_iv2lo .
compute ci_1H2H4 = c_iv4 * c_iv1hi * c_iv2hi .
compute ci_3L2L4 = c_iv4 * c_iv3lo * c_iv2lo .
compute ci_3L2H4 = c_iv4 * c_iv3lo * c_iv2hi .
compute ci_3H2L4 = c_iv4 * c_iv3hi * c_iv2lo .
compute ci_3H2H4 = c_iv4 * c_iv3hi * c_iv2hi .
compute ci_1L3L4 = c_iv4 * c_iv1lo * c_iv3lo .
compute ci_1L3H4 = c_iv4 * c_iv1lo * c_iv3hi .
compute ci_1H3L4 = c_iv4 * c_iv1hi * c_iv3lo .
compute ci_1H3H4 = c_iv4 * c_iv1hi * c_iv3hi .
execute.
compute ci3L1L2L = c_iv3lo * c_iv1lo * c_iv2lo .
compute ci3L1L2H = c_iv3lo * c_iv1lo * c_iv2hi .
compute ci3L1H2L = c_iv3lo * c_iv1hi * c_iv2lo .
compute ci3L1H2H = c_iv3lo * c_iv1hi * c_iv2hi .
compute ci3H1L2L = c_iv3hi * c_iv1lo * c_iv2lo .
compute ci3H1L2H = c_iv3hi * c_iv1lo * c_iv2hi .
compute ci3H1H2L = c_iv3hi * c_iv1hi * c_iv2lo .
compute ci3H1H2H = c_iv3hi * c_iv1hi * c_iv2hi .
execute.
*Four way interactions – 8 of em.
compute ci_LLL4 = c_iv1lo * c_iv2lo * c_iv3lo * c_iv4 .
compute ci_LLH4 = c_iv1lo * c_iv2lo * c_iv3hi * c_iv4 .
compute ci_LHL4 = c_iv1lo * c_iv2hi * c_iv3lo * c_iv4 .
compute ci_LHH4 = c_iv1lo * c_iv2hi * c_iv3hi * c_iv4 .
compute ci_HLL4 = c_iv1hi * c_iv2lo * c_iv3lo * c_iv4 .
compute ci_HLH4 = c_iv1hi * c_iv2lo * c_iv3hi * c_iv4 .
compute ci_HHL4 = c_iv1hi * c_iv2hi * c_iv3lo * c_iv4 .
compute ci_HHH4 = c_iv1hi * c_iv2hi * c_iv3hi * c_iv4 .
execute.
*Stare carefully at your syntax.
Re-read the syntax looking for errors, which are incredibly easy to
miss and needless to say can result in meaningless results which when
discovered after thesis chapters are written or manuscripts submitted
result in heart-rending psychological agony. Go through each formula.
Does it make sense? Is it low when it should be low and high when it
should be high? Ogle it for correctness. Never skip this step.
*Check the frequency
distributions of all the interaction terms created for errors.
Never skip this step.
NB you’re not looking for
univariate outliers in these distributions of the interaction terms
– higher order interaction terms are generally extremely spiky (kurtotic)
with long tails and alarming looking outliers. You exclude outliers
in higher order interaction analysis at the univariate level only for
the original IV variables and then based on multivariate outlier statistics
only (e.g., casewise diagnostics, cooks or mahalanobis scores – see
Tabachnick and Fidel on this for a review).
*SIMPLE SLOPE OF IV4 for low
IV1 low IV2 low IV3. NB you read the slope of IV4 as the coefficient
for c_iv4 in the final block with all the interactions in the
equation. NB all the other control variables from the original
equation should also be present in the equation. (You need to use the
same names as used to create the variables obviously – to simplify
I reordered #s in terms below so the variable creation above has different
variable names.)
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT dv1
/METHOD=ENTER c_iv1lo c_iv2lo c_iv3lo c_iv4 women grp1vs23 grp2vs3 /METHOD=ENTER ci_v1Lv2L ci_v1Lv3L ci_v1Lv4 ci_v2Lv3L ci_v2Lv4 ci_v3Lv4 /METHOD=ENTER ci_1L2L3L ci_1L2L4 ci_1L3L4 ci_2L3L4
/method=ENTER ci_LLL4
/SCATTERPLOT=(*ZRESID ,*ZPRED )
/RESIDUALS DURBIN HIST(ZRESID) NORM(ZRESID)
/CASEWISE PLOT(ZRESID) OUTLIERS(3)
.
*You’ll be running 8 of these
tests in which you methodically look at IV 4 for LLL, LLH, LHL, LHH,
HLL, HLH, HHL, and HHH. In each of the 8 tests, you’ll need
to change multiple terms involving each moderator – 1 each in block
1 and block 4, 3 in block 3, and 3 in block 2. If you make any
mistakes your final # will be wrong (and you will probably never know
…). For this reason, it’s all the more important to have the
syntax saved so you can go over it later and double-check for errors.
However, if you’ve plotted the simple 3-ways in WIP, you’ll be able
to check the unstandardised coefficient against the excel file.
*********************************************************************
All Rights Reserved Powered by Free Document Search and Download
Copyright © 2011