Main Page | Report this Page
 
   
Science Forum Index  »  Space - Consult Forum  »  multiple linear regression...
Page 1 of 2    Goto page 1, 2  Next
Author Message
...
Posted: Mon May 12, 2008 6:40 pm
Guest
Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon
Ray Koopman...
Posted: Mon May 12, 2008 9:10 pm
Guest
On May 12, 9:40 pm, wallis... at (no spam) gmail.com wrote:
Quote:
Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

How did you code the IVs in the regression?
JW...
Posted: Tue May 13, 2008 5:26 am
Guest
On May 13, 12:10 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
Quote:
On May 12, 9:40 pm, wallis... at (no spam) gmail.com wrote:





Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

How did you code the IVs in the regression?- Hide quoted text -

- Show quoted text -

IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2. I
standardized the DV and IV before running the regression.
JW...
Posted: Tue May 13, 2008 5:34 am
Guest
On May 13, 5:25 am, Scott Seidman <namdiestt... at (no spam) mindspring.com> wrote:
Quote:
wallis... at (no spam) gmail.com wrote in news:e279a357-ff1d-4c77-91ec-
305300318... at (no spam) p25g2000pri.googlegroups.com:





Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

Don't worry about "a lot less significant".  Pick your alpha, and stick
with it.

--
Scott
Reverse name to reply- Hide quoted text -

- Show quoted text -

Thanks, but that's not very heplful advice. Two similar statistical
tests of the same data set shouldn't produce such a difference in
their estimation that the effects of an IV are due to chance, and so
my question is why? Am I doing something wrong or have I misunderstood
something about the tests? (if it makes you more comfortable assume
that I'm trying to estimate effect size, or that I have an alpha level
of 0.0001).
Ray Koopman...
Posted: Tue May 13, 2008 5:39 am
Guest
On May 13, 8:26 am, JW <wallis... at (no spam) hotmail.com> wrote:
Quote:
On May 13, 12:10 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 12, 9:40 pm, wallis... at (no spam) gmail.com wrote:

Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

How did you code the IVs in the regression?

IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2.
I standardized the DV and IV before running the regression.

Were these the degrees of freedom for the two analyses?

Anova Regression
IV1 4 1
IV2 1 1
IV1*IV2 4 1
JW...
Posted: Tue May 13, 2008 6:14 am
Guest
On May 13, 8:39 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
Quote:
On May 13, 8:26 am, JW <wallis... at (no spam) hotmail.com> wrote:





On May 13, 12:10 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 12, 9:40 pm, wallis... at (no spam) gmail.com wrote:

Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

How did you code the IVs in the regression?

IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2.
I standardized the DV and IV before running the regression.

Were these the degrees of freedom for the two analyses?

         Anova   Regression
  IV1      4       1
  IV2      1       1
IV1*IV2    4       1- Hide quoted text -

- Show quoted text -

For the Anova, yes that's exactly right. For the regression, I'm not
sure. I assume those are the degrees of freedom, but I can't see it
reported anywhere (I'm using MATLAB).

However, I think I might have done something stupid - I standardized
the interaction term. In other words, I multiplied IV1 and IV2 and
then standardized the result. If I do it the other way round, i.e.
standardize IV1 and IV2 and then multiply to return the interaction
term, my regression results look a lot more like the ANOVA results.
Might this be the explanation for what I'm doing wrong?
Scott Seidman...
Posted: Tue May 13, 2008 7:25 am
Guest
wallisjon at (no spam) gmail.com wrote in news:e279a357-ff1d-4c77-91ec-
30530031854f at (no spam) p25g2000pri.googlegroups.com:

Quote:
Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

Don't worry about "a lot less significant". Pick your alpha, and stick
with it.

--
Scott
Reverse name to reply
Ray Koopman...
Posted: Tue May 13, 2008 8:13 am
Guest
On May 13, 9:14 am, JW <wallis... at (no spam) hotmail.com> wrote:
Quote:
On May 13, 8:39 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 13, 8:26 am, JW <wallis... at (no spam) hotmail.com> wrote:
On May 13, 12:10 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 12, 9:40 pm, wallis... at (no spam) gmail.com wrote:

Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

How did you code the IVs in the regression?

IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2.
I standardized the DV and IV before running the regression.

Were these the degrees of freedom for the two analyses?

Anova Regression
IV1 4 1
IV2 1 1
IV1*IV2 4 1

For the Anova, yes that's exactly right. For the regression, I'm not
sure. I assume those are the degrees of freedom, but I can't see it
reported anywhere (I'm using MATLAB).

However, I think I might have done something stupid - I standardized
the interaction term. In other words, I multiplied IV1 and IV2 and
then standardized the result. If I do it the other way round, i.e.
standardize IV1 and IV2 and then multiply to return the interaction
term, my regression results look a lot more like the ANOVA results.
Might this be the explanation for what I'm doing wrong?

Standardizing the product term, IV1*IV2, will change both its
coefficient and the constant coefficient, but it won't change the
significance of the product. But that's not your real problem,
which is that the anova is using IV1 as a nominal-scale variable
(i.e., 5 unordered categories), whereas the regression is using IV1
as an interval-scale variable. To get the regression to give results
equivalent to the anova, you would have to use the first 4 powers
of IV1 (i.e., IV1, IV1^2, IV1^3, IV1^4) converted to orthogonal
polynomials as in the following table:

Original 1 2 3 4 5
Linear -2 -1 0 1 2
Quadratic 2 -1 -2 -1 2
Cubic -1 2 0 -2 1
Quartic 1 -4 6 -4 1,

recode IV2 as +1 and -1, and use all 4 product terms for the
interaction. Then the p-values from the regression should be the
same as those from the anova (unless the cell sizes are unequal,
in which case there are several ways of doing the anova, all of
which give different p-values).
JW...
Posted: Tue May 13, 2008 11:01 am
Guest
On May 13, 11:52 am, Richard Ulrich <Rich.Ulr... at (no spam) comcast.net> wrote:
Quote:
On Tue, 13 May 2008 09:14:37 -0700 (PDT), JW <wallis... at (no spam) hotmail.com
wrote:

On May 13, 8:39�am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 13, 8:26�am, JW <wallis... at (no spam) hotmail.com> wrote:

[snip, some]
JW





IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2.
I standardized the DV and IV before running the regression.
Ray
Were these the degrees of freedom for the two analyses?

� � � � �Anova � Regression
� IV1 � � �4 � � � 1
� IV2 � � �1 � � � 1
IV1*IV2 � �4 � � � 1- Hide quoted text -

- Show quoted text -
JW
For the Anova, yes that's exactly right. For the regression, I'm not
sure. I assume those are the degrees of freedom, but I can't see it
reported anywhere (I'm using MATLAB).

However, I think I might have done something stupid - I standardized
the interaction term. In other words, I multiplied IV1 and IV2 and
then standardized the result. If I do it the other way round, i.e.
standardize IV1 and IV2 and then multiply to return the interaction
term, my regression results look a lot more like the ANOVA results.
Might this be the explanation for what I'm doing wrong?

I sounds like there were two separate problems in
translating the ANOVA to the regression.

Ray points out that using 5 categories is inherently
a different test (with 4 d.f.)  from using one continuous
variable (with 1 d.f.).

Also, as above, "centering" the terms for the interaction
before multiplying them is a vital step that gives a
different result.  When positive-signed numbers are
multiplied, the result is correlated fairly well with each
of the multipliers.  Because of that, the result will be
"confounded with" the other predictors.  They account for
some of the same variance.  That's usually not a desirable
way to look at interactions.

--
Rich Ulrich

http://www.pitt.edu/~wpilib/index.html- Hide quoted text -

- Show quoted text -


Hi Ray and Rich,

Thank you very much for your help - I really appreciate it. Just to be
clear I wasn't trying to replicate the ANOVA exactly, but using it
more as a sanity check. I could see from plotting it that there was a
(linear) effect of IV1, no effect of IV2 and no interaction (and the
ANOVA confirmed this). But I was getting strange results when trying
the same thing with multiple linear regression, specifically that it
wasn't as 'significant' as it looked from the graph.

It seems that "centering" was key. When I centered the variables it
made little difference whether I standardized the product term or not,
but when they were uncentered it made a big difference. So I guess my
final question is, generally speaking do people standardize their
product term or not?

Thanks again!
JW...
Posted: Tue May 13, 2008 11:32 am
Guest
On May 13, 2:01 pm, JW <wallis... at (no spam) hotmail.com> wrote:
Quote:
On May 13, 11:52 am, Richard Ulrich <Rich.Ulr... at (no spam) comcast.net> wrote:





On Tue, 13 May 2008 09:14:37 -0700 (PDT), JW <wallis... at (no spam) hotmail.com
wrote:

On May 13, 8:39�am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 13, 8:26�am, JW <wallis... at (no spam) hotmail.com> wrote:

[snip, some]
JW

IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2.
I standardized the DV and IV before running the regression.
Ray
Were these the degrees of freedom for the two analyses?

� � � � �Anova � Regression
� IV1 � � �4 � � � 1
� IV2 � � �1 � � � 1
IV1*IV2 � �4 � � � 1- Hide quoted text -

- Show quoted text -
JW
For the Anova, yes that's exactly right. For the regression, I'm not
sure. I assume those are the degrees of freedom, but I can't see it
reported anywhere (I'm using MATLAB).

However, I think I might have done something stupid - I standardized
the interaction term. In other words, I multiplied IV1 and IV2 and
then standardized the result. If I do it the other way round, i.e.
standardize IV1 and IV2 and then multiply to return the interaction
term, my regression results look a lot more like the ANOVA results.
Might this be the explanation for what I'm doing wrong?

I sounds like there were two separate problems in
translating the ANOVA to the regression.

Ray points out that using 5 categories is inherently
a different test (with 4 d.f.)  from using one continuous
variable (with 1 d.f.).

Also, as above, "centering" the terms for the interaction
before multiplying them is a vital step that gives a
different result.  When positive-signed numbers are
multiplied, the result is correlated fairly well with each
of the multipliers.  Because of that, the result will be
"confounded with" the other predictors.  They account for
some of the same variance.  That's usually not a desirable
way to look at interactions.

--
Rich Ulrich

http://www.pitt.edu/~wpilib/index.html-Hide quoted text -

- Show quoted text -

Hi Ray and Rich,

Thank you very much for your help - I really appreciate it. Just to be
clear I wasn't trying to replicate the ANOVA exactly, but using it
more as a sanity check. I could see from plotting it that there was a
(linear) effect of IV1, no effect of IV2 and no interaction (and the
ANOVA confirmed this). But I was getting strange results when trying
the same thing with multiple linear regression, specifically that it
wasn't as 'significant' as it looked from the graph.

It seems that "centering" was key. When I centered the variables it
made little difference whether I standardized the product term or not,
but when they were uncentered it made a big difference. So I guess my
final question is, generally speaking do people standardize their
product term or not?

Thanks again!- Hide quoted text -

- Show quoted text -

Actually, scrap that, I've just confused myself again.

Isn't "centering" the first step in standardizing anyway? My
understanding of standardizing is that we subtract the mean and then
divide by the standard deviation. So subtracting the mean gives me -2
-1 0 1 2 for IV1 and -0.5 0.5 for IV2, and then I divide both by the
standard deviation.

If I then multiply these scores as the product term and put this into
my regression equation, I get something sensible from the regression
coefficients, but if I now standardize the product term (by
subtracting its mean and dividing by its standard deviation) the
values I get regarding the significance of the regression coefficients
look off. So I think I'm still doing something wrong that I'm not
seeing.
Ray Koopman...
Posted: Tue May 13, 2008 12:34 pm
Guest
On May 13, 2:32 pm, JW <wallis... at (no spam) hotmail.com> wrote:
Quote:
[...]
Isn't "centering" the first step in standardizing anyway? My
understanding of standardizing is that we subtract the mean and then
divide by the standard deviation. So subtracting the mean gives me -2
-1 0 1 2 for IV1 and -0.5 0.5 for IV2, and then I divide both by the
standard deviation.

If I then multiply these scores as the product term and put this into
my regression equation, I get something sensible from the regression
coefficients, but if I now standardize the product term (by
subtracting its mean and dividing by its standard deviation) the
values I get regarding the significance of the regression coefficients
look off. So I think I'm still doing something wrong that I'm not
seeing.

If your original model is y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + e,
and you change it to y = a0 + a1*x1 + a2*x2 + a3*((x1*x2 - m)/s) + e,
where m and s are any arbitrary constants,
your new results should be
a0 = b0 + b3*m,
a1 = b1,
a2 = b2,
a3 = b3*s.
Are you getting something else?
Richard Ulrich...
Posted: Tue May 13, 2008 1:52 pm
Guest
On Tue, 13 May 2008 09:14:37 -0700 (PDT), JW <wallisjon at (no spam) hotmail.com>
wrote:

Quote:
On May 13, 8:39 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 13, 8:26 am, JW <wallis... at (no spam) hotmail.com> wrote:

[snip, some]

JW > >
Quote:
IV1 naturally ran from 1 to 5. IV2 was dummy coded as 1 and 2.
I standardized the DV and IV before running the regression.
Ray
Were these the degrees of freedom for the two analyses?

         Anova   Regression
  IV1      4       1
  IV2      1       1
IV1*IV2    4       1- Hide quoted text -

- Show quoted text -
JW
For the Anova, yes that's exactly right. For the regression, I'm not
sure. I assume those are the degrees of freedom, but I can't see it
reported anywhere (I'm using MATLAB).

However, I think I might have done something stupid - I standardized
the interaction term. In other words, I multiplied IV1 and IV2 and
then standardized the result. If I do it the other way round, i.e.
standardize IV1 and IV2 and then multiply to return the interaction
term, my regression results look a lot more like the ANOVA results.
Might this be the explanation for what I'm doing wrong?

I sounds like there were two separate problems in
translating the ANOVA to the regression.

Ray points out that using 5 categories is inherently
a different test (with 4 d.f.) from using one continuous
variable (with 1 d.f.).

Also, as above, "centering" the terms for the interaction
before multiplying them is a vital step that gives a
different result. When positive-signed numbers are
multiplied, the result is correlated fairly well with each
of the multipliers. Because of that, the result will be
"confounded with" the other predictors. They account for
some of the same variance. That's usually not a desirable
way to look at interactions.

--
Rich Ulrich

http://www.pitt.edu/~wpilib/index.html
JW...
Posted: Tue May 13, 2008 3:22 pm
Guest
On May 13, 3:34 pm, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
Quote:
On May 13, 2:32 pm, JW <wallis... at (no spam) hotmail.com> wrote:

[...]
Isn't "centering" the first step in standardizing anyway? My
understanding of standardizing is that we subtract the mean and then
divide by the standard deviation. So subtracting the mean gives me -2
-1 0 1 2 for IV1 and -0.5 0.5 for IV2, and then I divide both by the
standard deviation.

If I then multiply these scores as the product term and put this into
my regression equation, I get something sensible from the regression
coefficients, but if I now standardize the product term (by
subtracting its mean and dividing by its standard deviation) the
values I get regarding the significance of the regression coefficients
look off. So I think I'm still doing something wrong that I'm not
seeing.

If your original model is y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + e,
and you change it to y = a0 + a1*x1 + a2*x2 + a3*((x1*x2 - m)/s) + e,
where m and s are any arbitrary constants,
your new results should be
a0 = b0 + b3*m,
a1 = b1,
a2 = b2,
a3 = b3*s.
Are you getting something else?

OK, so I rechecked all my code. I think I've been changing too many
things at once, and had screwed up exactly which variables I was or
was not standardizing. So yes, I can now confirm that is exactly what
I get.

I have another question though :(

I've looked round on the web and from what I can find it seems
everybody uses dummy variables of 0 and 1. Why is this? If centering
is important to interpreting the interaction term, why don't people
use dummy variables of -1 and 1?
Scott Seidman...
Posted: Tue May 13, 2008 4:43 pm
Guest
JW <wallisjon at (no spam) hotmail.com> wrote in news:9b470872-d4f1-4aa9-b2bf-
be52b918144d at (no spam) b9g2000prh.googlegroups.com:

Quote:
On May 13, 5:25 am, Scott Seidman <namdiestt... at (no spam) mindspring.com> wrote:
wallis... at (no spam) gmail.com wrote in news:e279a357-ff1d-4c77-91ec-
305300318... at (no spam) p25g2000pri.googlegroups.com:





Hi,

I'm having trouble interpreting the results of a multiple linear
regression, or at least matching it up to what I've obtained with
other statistical tests. I have one dependent variable, and two
independent variables. When I run the analysis through a 2-way ANOVA
I
get a highly significant main effect for IV1 (p<10-15) with no other
main effect or interaction. When I run the data through a multiple
linear regression (with IV1, IV2 and IV1*IV2), I get a significant F
value for the overall regression equation (again p<10-15). However,
when I look at significance of the individual regression
coefficients
(using a t-test) IV1 is a lot less significant (p<0.005), while IV2
and the interaction are not significant. Why is there such a huge
difference between the significance of the overall equation and the
significance of the individual regression coeefficients? What am I
missing here?

Any help appreciated!

Jon

Don't worry about "a lot less significant".  Pick your alpha, and
stick
with it.

--
Scott
Reverse name to reply- Hide quoted text -

- Show quoted text -

Thanks, but that's not very heplful advice. Two similar statistical
tests of the same data set shouldn't produce such a difference in
their estimation that the effects of an IV are due to chance, and so
my question is why? Am I doing something wrong or have I misunderstood
something about the tests? (if it makes you more comfortable assume
that I'm trying to estimate effect size, or that I have an alpha level
of 0.0001).


Think of the F value as sort of a pre-hoc, and the p values on the
regression coefficients as sort of post-hoc. They don't need to have the
same, or similar values. Someday you might run into a case where you see
a prehoc difference you can't account for with posthoc testing.


--
Scott
Reverse name to reply
Ray Koopman...
Posted: Tue May 13, 2008 8:10 pm
Guest
On May 13, 6:22 pm, JW <wallis... at (no spam) hotmail.com> wrote:
Quote:
[...]
I have another question though :(

I've looked round on the web and from what I can find it seems
everybody uses dummy variables of 0 and 1. Why is this? If centering
is important to interpreting the interaction term, why don't people
use dummy variables of -1 and 1?

Centering is important to interpreting the main effects, not the
interaction. Consider a regression model with two predictors and
their product:

y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + e.

Consider also a similar model for the same data,
but with the predictors centered at c1 and c2:

y = a0 + a1(x1-c1) + a2(x2-c2) + a3(x1-c1)(x2-c2) + e.

Expanding the second model, collecting terms, and equating the
coefficients to those in the first model gives

b0 = a0 - a1*c1 - a2*c2 + a3*c1*c2
b1 = a1 - a3*c2
b2 = a2 - a3*c1
b3 = a3,

which we can invert to get

a3 = b3
a2 = b2 + b3*c1
a1 = b1 + b3*c2
a0 = b0 + b1*c1 + b2*c2 + b3*c1*c2.

Centering will not change the interaction coefficient, because
a3 = b3, regardless of c1 and c2. But centering x1 will change
the main effect coefficient of x2, because a2 depends on c1;
and centering x2 will change the main effect coefficient of x1,
because a1 depends on c2.

More generally, centering a variable does not change its own main
effect coefficient, or the coefficients of any of its interactions;
it changes the coefficients of the variables that it interacts with.
For instance, if x1 interacts with x2 and x3 then centering x1 will
change the main effect coefficients of both x2 and x3, and if there
is also an x1*x2*x3 three-way interaction then centering x1 will
change the coefficient of x2*x3 as well.
 
Page 1 of 2    Goto page 1, 2  Next   All times are GMT - 5 Hours
The time now is Wed Jul 23, 2008 3:56 pm