| |
 |
|
|
Science Forum Index » Space - Consult Forum » multiple linear regression...
Page 2 of 2 Goto page Previous 1, 2
|
| Author |
Message |
| ... |
Posted: Tue May 13, 2008 9:18 pm |
|
|
|
Guest
|
Quote: snip
I've looked round on the web and from what I can find it seems
everybody uses dummy variables of 0 and 1. Why is this? If centering
is important to interpreting the interaction term, why don't people
use dummy variables of -1 and 1?
Multiplying by 1 turns on an effect. Multiplying by zero turns it off.
If D equals 1 or 0, then the equation
y =a+ bX + cD + dDX+ u
equals a+ bX when D=0 and
(a+c) + (b+d)X
when D=1
-Dick Startz |
|
|
| Back to top |
|
| ... |
Posted: Wed May 14, 2008 6:07 am |
|
|
|
Guest
|
On May 13, 11:10 pm, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
Quote: On May 13, 6:22 pm, JW <wallis... at (no spam) hotmail.com> wrote:
[...]
I have another question though :(
I've looked round on the web and from what I can find it seems
everybody uses dummy variables of 0 and 1. Why is this? If centering
is important to interpreting the interaction term, why don't people
use dummy variables of -1 and 1?
Centering is important to interpreting the main effects, not the
interaction. Consider a regression model with two predictors and
their product:
y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + e.
Consider also a similar model for the same data,
but with the predictors centered at c1 and c2:
y = a0 + a1(x1-c1) + a2(x2-c2) + a3(x1-c1)(x2-c2) + e.
Expanding the second model, collecting terms, and equating the
coefficients to those in the first model gives
b0 = a0 - a1*c1 - a2*c2 + a3*c1*c2
b1 = a1 - a3*c2
b2 = a2 - a3*c1
b3 = a3,
which we can invert to get
a3 = b3
a2 = b2 + b3*c1
a1 = b1 + b3*c2
a0 = b0 + b1*c1 + b2*c2 + b3*c1*c2.
Centering will not change the interaction coefficient, because
a3 = b3, regardless of c1 and c2. But centering x1 will change
the main effect coefficient of x2, because a2 depends on c1;
and centering x2 will change the main effect coefficient of x1,
because a1 depends on c2.
More generally, centering a variable does not change its own main
effect coefficient, or the coefficients of any of its interactions;
it changes the coefficients of the variables that it interacts with.
For instance, if x1 interacts with x2 and x3 then centering x1 will
change the main effect coefficients of both x2 and x3, and if there
is also an x1*x2*x3 three-way interaction then centering x1 will
change the coefficient of x2*x3 as well.
Hi Ray,
Thank you for that very clear explanation. I'm begining to see how I
can work this out from first principles rather than relying on the
trial-and-error approach that I have been adopting.
I have read arguments that dummy variables should not be standardized.
I think the concern was that by dividing by their standard deviation,
it rendered the interpretation of the dummy variable difficult. But
that aside, everything I have read seems to advise coding dummy
variables as 0 and 1.
But, if I understand this correctly, if we center x1, but leave x2 as
0 and 1, aren't we in some sense decreasing the likelihood of seeing a
significant effect of x1 (unless b3 is zero I guess, which in practice
seems unlikely)? What's so special about dummy variables that they
should not be centered (if indeed that advice is correct)?
Thanks! |
|
|
| Back to top |
|
| JW... |
Posted: Wed May 14, 2008 10:17 am |
|
|
|
Guest
|
Quote: There are two important things to know about centering:
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
Hmm, that seems to contradict what Ray just wrote doesn't it? If
centering x2 changes b1, then how can it not alter the test of
significance on b1 (since it presumably doesn't alter the standard
error associated with b1)? |
|
|
| Back to top |
|
| JW... |
Posted: Wed May 14, 2008 11:12 am |
|
|
|
Guest
|
On May 14, 1:18 pm, richardsta... at (no spam) comcast.net wrote:
Quote: On Wed, 14 May 2008 13:17:30 -0700 (PDT), JW <wallis... at (no spam) hotmail.com
wrote:
There are two important things to know about centering:
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
Hmm, that seems to contradict what Ray just wrote doesn't it? If
centering x2 changes b1, then how can it not alter the test of
significance on b1 (since it presumably doesn't alter the standard
error associated with b1)?
Ah, but it does change the standard error...by exactly enough to keep
the t-stat from changing.
-Dick
I think you'll have to explain that to me - how does adding or
subtracting a constant change the variance? |
|
|
| Back to top |
|
| Ray Koopman... |
Posted: Wed May 14, 2008 1:10 pm |
|
|
|
Guest
|
On May 14, 12:30 pm, richardsta... at (no spam) comcast.net wrote:
Quote: [...]
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
[...]
Not so. Suppose I center x2 at c2 = -b1/b3. That will give a1 = 0,
but it's standard error will certainly not be zero.
In general, if centering changes the numeric value of a coefficient
then both its statistical significance and conceptual meaning will
also change. |
|
|
| Back to top |
|
| ... |
Posted: Wed May 14, 2008 2:30 pm |
|
|
|
Guest
|
On Wed, 14 May 2008 09:07:34 -0700 (PDT), wallisjon at (no spam) gmail.com wrote:
Quote: On May 13, 11:10 pm, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
On May 13, 6:22 pm, JW <wallis... at (no spam) hotmail.com> wrote:
[...]
I have another question though :(
I've looked round on the web and from what I can find it seems
everybody uses dummy variables of 0 and 1. Why is this? If centering
is important to interpreting the interaction term, why don't people
use dummy variables of -1 and 1?
Centering is important to interpreting the main effects, not the
interaction. Consider a regression model with two predictors and
their product:
y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + e.
Consider also a similar model for the same data,
but with the predictors centered at c1 and c2:
y = a0 + a1(x1-c1) + a2(x2-c2) + a3(x1-c1)(x2-c2) + e.
Expanding the second model, collecting terms, and equating the
coefficients to those in the first model gives
b0 = a0 - a1*c1 - a2*c2 + a3*c1*c2
b1 = a1 - a3*c2
b2 = a2 - a3*c1
b3 = a3,
which we can invert to get
a3 = b3
a2 = b2 + b3*c1
a1 = b1 + b3*c2
a0 = b0 + b1*c1 + b2*c2 + b3*c1*c2.
Centering will not change the interaction coefficient, because
a3 = b3, regardless of c1 and c2. But centering x1 will change
the main effect coefficient of x2, because a2 depends on c1;
and centering x2 will change the main effect coefficient of x1,
because a1 depends on c2.
More generally, centering a variable does not change its own main
effect coefficient, or the coefficients of any of its interactions;
it changes the coefficients of the variables that it interacts with.
For instance, if x1 interacts with x2 and x3 then centering x1 will
change the main effect coefficients of both x2 and x3, and if there
is also an x1*x2*x3 three-way interaction then centering x1 will
change the coefficient of x2*x3 as well.
Hi Ray,
Thank you for that very clear explanation. I'm begining to see how I
can work this out from first principles rather than relying on the
trial-and-error approach that I have been adopting.
I have read arguments that dummy variables should not be standardized.
I think the concern was that by dividing by their standard deviation,
it rendered the interpretation of the dummy variable difficult. But
that aside, everything I have read seems to advise coding dummy
variables as 0 and 1.
But, if I understand this correctly, if we center x1, but leave x2 as
0 and 1, aren't we in some sense decreasing the likelihood of seeing a
significant effect of x1 (unless b3 is zero I guess, which in practice
seems unlikely)? What's so special about dummy variables that they
should not be centered (if indeed that advice is correct)?
Thanks!
There are two important things to know about centering:
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
(2) Different subject areas have very different conventions about
centering. In some areas it is routine, in others--economics for
example--it is almost unheard of. If you search the archives you can
find an exchange between OMU and myself on this point a couple of
months back.
-Dick Startz |
|
|
| Back to top |
|
| ... |
Posted: Wed May 14, 2008 3:18 pm |
|
|
|
Guest
|
On Wed, 14 May 2008 13:17:30 -0700 (PDT), JW <wallisjon at (no spam) hotmail.com>
wrote:
Quote:
There are two important things to know about centering:
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
Hmm, that seems to contradict what Ray just wrote doesn't it? If
centering x2 changes b1, then how can it not alter the test of
significance on b1 (since it presumably doesn't alter the standard
error associated with b1)?
Ah, but it does change the standard error...by exactly enough to keep
the t-stat from changing.
-Dick |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Wed May 14, 2008 4:16 pm |
|
|
|
Guest
|
On Wed, 14 May 2008 09:07:34 -0700 (PDT), wallisjon at (no spam) gmail.com wrote:
[snip, previous]
Quote:
Hi Ray,
Thank you for that very clear explanation. I'm begining to see how I
can work this out from first principles rather than relying on the
trial-and-error approach that I have been adopting.
I have read arguments that dummy variables should not be standardized.
I think the concern was that by dividing by their standard deviation,
it rendered the interpretation of the dummy variable difficult. But
that aside, everything I have read seems to advise coding dummy
variables as 0 and 1.
Nobody divides dummy variables by their SDs, that I
am aware of. Most computer programs provide the "standardized
regression coefficients" as a by-product, and call that "beta" as
opposed to "b".
Centering a dummy makes the main effect of the dummy
harder to interpret. Centering the dummy values when computing
the interaction takes care of the interpretation problem, too.
Looking at the tests on main effect *before* entering the
interaction terms (and not, after) also avoids the distortion of
the main effect by the interaction coding.
Quote:
But, if I understand this correctly, if we center x1, but leave x2 as
0 and 1, aren't we in some sense decreasing the likelihood of seeing a
significant effect of x1 (unless b3 is zero I guess, which in practice
seems unlikely)? What's so special about dummy variables that they
should not be centered (if indeed that advice is correct)?
Especially for "polynomial regression", the program itself
will use dummy variables that are centered and scored to
achieve the same weight for each term.
Programs that provide computation of interactions generally
provide information (hard to read) on what is indicated by
each separate term. When you do it yourself, you have to
figure it out for yourself -- and centering usually makes that
easier. (Concerning another post - I don't know what they
do in economics; there are some geniuses working there, and
there also are some flat-out statistical idiots.)
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| ... |
Posted: Wed May 14, 2008 8:57 pm |
|
|
|
Guest
|
On Wed, 14 May 2008 14:12:22 -0700 (PDT), JW <wallisjon at (no spam) hotmail.com>
wrote:
Quote: On May 14, 1:18 pm, richardsta... at (no spam) comcast.net wrote:
On Wed, 14 May 2008 13:17:30 -0700 (PDT), JW <wallis... at (no spam) hotmail.com
wrote:
There are two important things to know about centering:
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
Hmm, that seems to contradict what Ray just wrote doesn't it? If
centering x2 changes b1, then how can it not alter the test of
significance on b1 (since it presumably doesn't alter the standard
error associated with b1)?
Ah, but it does change the standard error...by exactly enough to keep
the t-stat from changing.
-Dick
I think you'll have to explain that to me - how does adding or
subtracting a constant change the variance?
I'm sorry, I wasn't clear. If you subtract a constant from the
variable, it obviously doesn't change the variance. (I was thinking of
standardizing, where you also divide the variable by something.)
But so long as there is an intercept in the equation, if you add or
subtract a constant from the variable the coefficient doesn't change.
-Dick |
|
|
| Back to top |
|
| ... |
Posted: Wed May 14, 2008 9:01 pm |
|
|
|
Guest
|
On Wed, 14 May 2008 16:10:31 -0700 (PDT), Ray Koopman <koopman at (no spam) sfu.ca>
wrote:
Quote: On May 14, 12:30 pm, richardsta... at (no spam) comcast.net wrote:
[...]
(1) Centering has NO effect on tests of significance. You should get
numerically identical t-statistics on the centered and uncentered
data.
[...]
Not so. Suppose I center x2 at c2 = -b1/b3. That will give a1 = 0,
but it's standard error will certainly not be zero.
In general, if centering changes the numeric value of a coefficient
then both its statistical significance and conceptual meaning will
also change.
Ray, I wasn't very clear.
Is the original regression is
y = a0 +b0*x
with the standard error of b equal to s0
and you replace x with
x1 = c + d*x
then in the regression
y = a1 +b1*x1
we will find b1 = b0/d and the standard error s1=s0/d. The
t-statistic will be b1/s1=b0/s0
-Dick |
|
|
| Back to top |
|
| ... |
Posted: Wed May 14, 2008 9:05 pm |
|
|
|
Guest
|
On Wed, 14 May 2008 17:16:53 -0400, Richard Ulrich
<Rich.Ulrich at (no spam) comcast.net> wrote:
Quote: snip
(Concerning another post - I don't know what they
do in economics; there are some geniuses working there, and
there also are some flat-out statistical idiots.)
This is true. Also in true in all other areas where statistics is
applied.
:) |
|
|
| Back to top |
|
| Richard Ulrich... |
Posted: Thu May 15, 2008 12:10 am |
|
|
|
Guest
|
On Wed, 14 May 2008 19:05:23 -0700, richardstartz at (no spam) comcast.net wrote:
Quote: On Wed, 14 May 2008 17:16:53 -0400, Richard Ulrich
Rich.Ulrich at (no spam) comcast.net> wrote:
snip
(Concerning another post - I don't know what they
do in economics; there are some geniuses working there, and
there also are some flat-out statistical idiots.)
This is true. Also in true in all other areas where statistics is
applied.
:)
- In economics, they can seem to achieve some
success, and even become Presidential advisors....
Plenty of predictions, but no "controlled experiments".
The Republicans started treated economics as a
subset of "spin-doctoring" with Reagan, and haven't
applied "scientific" standards to it since the 1980s.
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html |
|
|
| Back to top |
|
| ... |
Posted: Thu May 15, 2008 4:50 am |
|
|
|
Guest
|
On May 14, 2:16 pm, Richard Ulrich <Rich.Ulr... at (no spam) comcast.net> wrote:
Quote: On Wed, 14 May 2008 09:07:34 -0700 (PDT), wallis... at (no spam) gmail.com wrote:
[snip, previous]
Hi Ray,
Thank you for that very clear explanation. I'm begining to see how I
can work this out from first principles rather than relying on the
trial-and-error approach that I have been adopting.
I have read arguments that dummy variables should not be standardized.
I think the concern was that by dividing by their standard deviation,
it rendered the interpretation of the dummy variable difficult. But
that aside, everything I have read seems to advise coding dummy
variables as 0 and 1.
Nobody divides dummy variables by their SDs, that I
am aware of. Most computer programs provide the "standardized
regression coefficients" as a by-product, and call that "beta" as
opposed to "b".
Centering a dummy makes the main effect of the dummy
harder to interpret. Centering the dummy values when computing
the interaction takes care of the interpretation problem, too.
Looking at the tests on main effect *before* entering the
interaction terms (and not, after) also avoids the distortion of
the main effect by the interaction coding.
But, if I understand this correctly, if we center x1, but leave x2 as
0 and 1, aren't we in some sense decreasing the likelihood of seeing a
significant effect of x1 (unless b3 is zero I guess, which in practice
seems unlikely)? What's so special about dummy variables that they
should not be centered (if indeed that advice is correct)?
Especially for "polynomial regression", the program itself
will use dummy variables that are centered and scored to
achieve the same weight for each term.
Programs that provide computation of interactions generally
provide information (hard to read) on what is indicated by
each separate term. When you do it yourself, you have to
figure it out for yourself -- and centering usually makes that
easier. (Concerning another post - I don't know what they
do in economics; there are some geniuses working there, and
there also are some flat-out statistical idiots.)
--
Rich Ulrich
http://www.pitt.edu/~wpilib/index.html
Thank you to everybody for their advice in this thread - its really
been invaluable for me. |
|
|
| Back to top |
|
| ... |
Posted: Thu May 15, 2008 8:52 am |
|
|
|
Guest
|
On Thu, 15 May 2008 01:10:48 -0400, Richard Ulrich
<Rich.Ulrich at (no spam) comcast.net> wrote:
Quote: On Wed, 14 May 2008 19:05:23 -0700, richardstartz at (no spam) comcast.net wrote:
On Wed, 14 May 2008 17:16:53 -0400, Richard Ulrich
Rich.Ulrich at (no spam) comcast.net> wrote:
snip
(Concerning another post - I don't know what they
do in economics; there are some geniuses working there, and
there also are some flat-out statistical idiots.)
This is true. Also in true in all other areas where statistics is
applied.
:)
- In economics, they can seem to achieve some
success, and even become Presidential advisors....
Plenty of predictions, but no "controlled experiments".
The Republicans started treated economics as a
subset of "spin-doctoring" with Reagan, and haven't
applied "scientific" standards to it since the 1980s.
Rich:
In fact economists do controlled experiments. It's probably the most
"fashionable" area in economics right now.
-Dick |
|
|
| Back to top |
|
| |
Page 2 of 2 Goto page Previous 1, 2
All times are GMT - 5 Hours
The time now is Wed Aug 20, 2008 6:14 am
|
|