Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Regression...
Page 1 of 1    
Author Message
sagar...
Posted: Wed Jun 25, 2008 10:09 pm
Guest
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.
lw...
Posted: Fri Jun 27, 2008 4:18 am
Guest
On Jun 26, 4:09 am, sagar <ariji... at (no spam) gmail.com> wrote:
Quote:
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.

They may not stay the same if there is collinearity between your
predictor variables. Check out their covariance matrix to be sure
that's what's happening. If you are using R there is also a tool vif
in the DAAG package that estimates the variance inflation of your
coefficient estimates due to collinearity between the predictors. It
means that the magnitude of your coefficients don't have a ready
physical interpretation due to intertwined effects. You can get
around the variance inflation of coefficient estimates by transforming
onto principal components axes, which should give you better
predictive ability but you still won't have physically interpretable
coefficients.
lw...
Posted: Fri Jun 27, 2008 5:56 am
Guest
On Jun 27, 10:47 am, Richard Startz <richardstar... at (no spam) comcast.net>
wrote:
Quote:
The usual interpretation of the regression coefficients is as the
partial derivative of the dependent variable with respect to the
particular independent variable. This is not affected by collinearity
between the predictors.
-Dick Startz

It would be nice to be able to always interpret the regression
coefficients this way, but it only makes sense if the predictors are
actually independent. What does a partial derivative with respect to
one variable mean if other variables necessarily change in correlation
with that variable?

I suppose I should have said multicollinearity, ie where two or more
predictor variables in a multiple regression model are highly
correlated, but it certainly can have a large effect on coefficient
estimates. Maindonald (2003) has a stark example of this in its
chapter on multiple linear regression (6.7.1), or you could also see
the Wikipedia page (http://en.wikipedia.org/wiki/Multicollinearity).
As a simple example, imagine predicting the height of people by using
measurements of the length of each of their feet. You might expect
the coefficient of one foot to simply double if you removed the other
foot from the model, but in reality both estimated coefficients would
be wildly inaccurate due to their high degree of collinearity. If you
removed one foot from the model it may well tell you that someone has
a negative height based on the measurement of one of their feet.

Ref. Maindonald and Braun (2003). Data Analysis and Graphics Using
R, An Example-based Approach. Cambridge University Press, New York NY.
Richard Startz...
Posted: Fri Jun 27, 2008 9:47 am
Guest
On Fri, 27 Jun 2008 07:18:26 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com>
wrote:

Quote:
On Jun 26, 4:09 am, sagar <ariji... at (no spam) gmail.com> wrote:
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.

They may not stay the same if there is collinearity between your
predictor variables. Check out their covariance matrix to be sure
that's what's happening. If you are using R there is also a tool vif
in the DAAG package that estimates the variance inflation of your
coefficient estimates due to collinearity between the predictors. It
means that the magnitude of your coefficients don't have a ready
physical interpretation due to intertwined effects. You can get
around the variance inflation of coefficient estimates by transforming
onto principal components axes, which should give you better
predictive ability but you still won't have physically interpretable
coefficients.

The usual interpretation of the regression coefficients is as the
partial derivative of the dependent variable with respect to the
particular independent variable. This is not affected by collinearity
between the predictors.
-Dick Startz
Paul Rubin...
Posted: Fri Jun 27, 2008 10:38 am
Guest
Richard Startz wrote:
Quote:
On Fri, 27 Jun 2008 07:18:26 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com
wrote:

On Jun 26, 4:09 am, sagar <ariji... at (no spam) gmail.com> wrote:
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.
They may not stay the same if there is collinearity between your
predictor variables. Check out their covariance matrix to be sure
that's what's happening. If you are using R there is also a tool vif
in the DAAG package that estimates the variance inflation of your
coefficient estimates due to collinearity between the predictors. It
means that the magnitude of your coefficients don't have a ready
physical interpretation due to intertwined effects. You can get
around the variance inflation of coefficient estimates by transforming
onto principal components axes, which should give you better
predictive ability but you still won't have physically interpretable
coefficients.

The usual interpretation of the regression coefficients is as the
partial derivative of the dependent variable with respect to the
particular independent variable. This is not affected by collinearity
between the predictors.
-Dick Startz

It's the partial derivative of the conditional mean of the d.v. given
the i.v.s. If you change the i.v.s, you're changing which conditional
mean you're talking about.

Suppose I have d.v. Y and possible i.v.s X, Z and W. For simplicity,
we'll assume they all have mean zero and std. dev. 1. Let's also assume
that X and Z are correlated by X and W are uncorrelated. If I look at Y
~ X + Z v. Y ~ X + W, I get equations

Y = b1*X + b2*Z + noise v. Y = c1*X + c2*W + noise

where as usual we'll assume the noise is uncorrelated to everybody in
sight (and the noises in the two equations are different). Let <,>
denote the covariance operator; then the first equation gives me

<Y,X> = b1 + b2*<Z,X>

and the second gives me

<Y,X> = c1

exploiting the lack of correlation between X, W and the noises. So b1 =
c1 iff Z and X are uncorrelated; I assumed they were correlated, so I
expect different coefficients for X in the two equations.

/Paul
Richard Startz...
Posted: Fri Jun 27, 2008 11:07 am
Guest
On Fri, 27 Jun 2008 11:38:38 -0400, Paul Rubin <rubin at (no spam) msu.edu> wrote:

Quote:
Richard Startz wrote:
On Fri, 27 Jun 2008 07:18:26 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com
wrote:

On Jun 26, 4:09 am, sagar <ariji... at (no spam) gmail.com> wrote:
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.
They may not stay the same if there is collinearity between your
predictor variables. Check out their covariance matrix to be sure
that's what's happening. If you are using R there is also a tool vif
in the DAAG package that estimates the variance inflation of your
coefficient estimates due to collinearity between the predictors. It
means that the magnitude of your coefficients don't have a ready
physical interpretation due to intertwined effects. You can get
around the variance inflation of coefficient estimates by transforming
onto principal components axes, which should give you better
predictive ability but you still won't have physically interpretable
coefficients.

The usual interpretation of the regression coefficients is as the
partial derivative of the dependent variable with respect to the
particular independent variable. This is not affected by collinearity
between the predictors.
-Dick Startz

It's the partial derivative of the conditional mean of the d.v. given
the i.v.s. If you change the i.v.s, you're changing which conditional
mean you're talking about.

Suppose I have d.v. Y and possible i.v.s X, Z and W. For simplicity,
we'll assume they all have mean zero and std. dev. 1. Let's also assume
that X and Z are correlated by X and W are uncorrelated. If I look at Y
~ X + Z v. Y ~ X + W, I get equations

Y = b1*X + b2*Z + noise v. Y = c1*X + c2*W + noise

where as usual we'll assume the noise is uncorrelated to everybody in
sight (and the noises in the two equations are different). Let <,
denote the covariance operator; then the first equation gives me

Y,X> = b1 + b2*<Z,X

and the second gives me

Y,X> = c1

exploiting the lack of correlation between X, W and the noises. So b1 =
c1 iff Z and X are uncorrelated; I assumed they were correlated, so I
expect different coefficients for X in the two equations.

/Paul

I complete agree about your explanation of why one expects different
coefficients. Perhaps I misunderstood the point you made about
multicollinearity. What I should have said is that the regression
coefficients have the interpretation of the partial derivative of the
conditional mean, and this is true even though the independent
variables are correlated with one another.

-Dick
Richard Startz...
Posted: Fri Jun 27, 2008 11:13 am
Guest
On Fri, 27 Jun 2008 08:56:38 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com>
wrote:

Quote:
On Jun 27, 10:47 am, Richard Startz <richardstar... at (no spam) comcast.net
wrote:
The usual interpretation of the regression coefficients is as the
partial derivative of the dependent variable with respect to the
particular independent variable. This is not affected by collinearity
between the predictors.
-Dick Startz

It would be nice to be able to always interpret the regression
coefficients this way, but it only makes sense if the predictors are
actually independent. What does a partial derivative with respect to
one variable mean if other variables necessarily change in correlation
with that variable?

A partial derivative answers the question of what happens if you
change one variable while holding the others constant. This is true
whether or not there is multicollinearity. As you cogently point out,
sometimes one *can't* change one variable without changing another.

But there are many instances in which data has been gathered with
highly correlated independent variables and one is nonetheless
planning on a policy intervention that changes one variable without
changing the others.

Quote:
I suppose I should have said multicollinearity, ie where two or more
predictor variables in a multiple regression model are highly
correlated, but it certainly can have a large effect on coefficient
estimates. Maindonald (2003) has a stark example of this in its
chapter on multiple linear regression (6.7.1), or you could also see
the Wikipedia page (http://en.wikipedia.org/wiki/Multicollinearity).
As a simple example, imagine predicting the height of people by using
measurements of the length of each of their feet. You might expect
the coefficient of one foot to simply double if you removed the other
foot from the model, but in reality both estimated coefficients would
be wildly inaccurate due to their high degree of collinearity. If you
removed one foot from the model it may well tell you that someone has
a negative height based on the measurement of one of their feet.


Presumably, the standard errors would be very large in this case and
the investigator would conclude that neither foot coefficient is
significant, but that the two feet coefficients are jointly
significant.
Quote:
Ref. Maindonald and Braun (2003). Data Analysis and Graphics Using
R, An Example-based Approach. Cambridge University Press, New York NY.

-Dick Startz
RichUlrich...
Posted: Fri Jun 27, 2008 3:14 pm
Guest
On Fri, 27 Jun 2008 08:56:38 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com>
wrote:

Quote:
On Jun 27, 10:47 am, Richard Startz <richardstar... at (no spam) comcast.net
wrote:
The usual interpretation of the regression coefficients is as the
partial derivative of the dependent variable with respect to the
particular independent variable. This is not affected by collinearity
between the predictors.
-Dick Startz

It would be nice to be able to always interpret the regression
coefficients this way, but it only makes sense if the predictors are
actually independent. What does a partial derivative with respect to
one variable mean if other variables necessarily change in correlation
with that variable?

- I think that Dick S. missed the specification in the original
problem, that the variables in Case 2 included an additional
(and correlated) predictor.

Quote:

I suppose I should have said multicollinearity, ie where two or more
predictor variables in a multiple regression model are highly
correlated, but it certainly can have a large effect on coefficient
estimates. Maindonald (2003) has a stark example of this in its
chapter on multiple linear regression (6.7.1), or you could also see
the Wikipedia page (http://en.wikipedia.org/wiki/Multicollinearity).
As a simple example, imagine predicting the height of people by using
measurements of the length of each of their feet. You might expect
the coefficient of one foot to simply double if you removed the other
foot from the model, but in reality both estimated coefficients would
be wildly inaccurate due to their high degree of collinearity.

That "but" is too strong. I *do* expect the coefficients of each
foot to be the same, and about half the single-foot value, because
I've seen that sort of thing. (Well, I'm extrapolating... with
Left-Right feet, I would never start with both, but start with
predictors that used the "average" and the "difference".)

Both estimated coefficients *might* be wildly inaccurate.
Because the 2-dimensional confidence ellipse is long and narrow,
there is a great range of pairs-of-coefficients which give
very-nearly the same predictions.


Quote:
If you
removed one foot from the model it may well tell you that someone has
a negative height based on the measurement of one of their feet.

Ref. Maindonald and Braun (2003). Data Analysis and Graphics Using
R, An Example-based Approach. Cambridge University Press, New York NY.

--
Rich Ulrich
lw...
Posted: Sat Jun 28, 2008 3:46 pm
Guest
On Jun 27, 4:14 pm, RichUlrich <rich.ulr... at (no spam) comcast.net> wrote:
Quote:
That "but" is too strong.  I *do*  expect the coefficients of each
foot to be the same, and about half the single-foot value,  because
I've seen that sort of thing.  (Well, I'm extrapolating... with
Left-Right feet, I would never start with both, but start with
predictors that used the "average" and the "difference".)

Both estimated coefficients *might*  be wildly inaccurate.
Because the 2-dimensional confidence ellipse is long and narrow,
there is a great range of pairs-of-coefficients which give
very-nearly the same predictions.  

I don't know, is it too strong? As Dick S. pointed out, the standard
errors of the coefficients will be very large, so if they turned out
to be about the same and about half of the single-foot value, it would
just be lucky. Here's a contrived and rather extreme example I ran in
R. x1 and x2 are nearly the same with a small amount of noise as
difference (Pearson correlation of 0.999993, and y is just the sum of
the two with more noise added. A linear model of the form y ~ x1+x2
tends to result on one large positive coefficient and one large
negative coefficient, adding up to about 2. In this case the
coefficients are 37 and -35, with se of 29.


Quote:
x1 <- 1:10
x2 <- x1+rnorm(10,sd=0.01)
y <- x1+x2+rnorm(10,sd=1)
x1
[1] 1 2 3 4 5 6 7 8 9 10
x2
[1] 0.987971 1.985931 2.995624 3.991472 4.983735 5.987358

7.027451
[8] 7.992228 9.011095 10.005329
Quote:
cor(x1,x2)
[1] 0.999993
y
[1] 2.873586 4.688333 4.711484 6.487986 10.227832 12.632284

13.158161
[8] 15.659828 16.782769 20.887456
Quote:
junklm <- lm(y~x1+x2)
summary(junklm)

Call:
lm(formula = y ~ x1 + x2)

Residuals:
Min 1Q Median 3Q Max
-1.4291 -0.6496 0.1609 0.6301 1.1325

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5935 0.8421 -0.705 0.504
x1 36.8800 28.9650 1.273 0.244
x2 -34.8266 28.8900 -1.205 0.267

Residual standard error: 0.9885 on 7 degrees of freedom
Multiple R-squared: 0.979, Adjusted R-squared: 0.973
F-statistic: 163.4 on 2 and 7 DF, p-value: 1.334e-06

Quote:
betterlm <- lm(y~I(x1+x2)+I(x1-x2))
summary(betterlm)

Call:
lm(formula = y ~ I(x1 + x2) + I(x1 - x2))

Residuals:
Min 1Q Median 3Q Max
-1.4291 -0.6496 0.1609 0.6301 1.1325

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.59352 0.84206 -0.705 0.504
I(x1 + x2) 1.02670 0.06602 15.552 1.10e-06 ***
I(x1 - x2) 35.85327 28.92744 1.239 0.255
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9885 on 7 degrees of freedom
Multiple R-squared: 0.979, Adjusted R-squared: 0.973
F-statistic: 163.4 on 2 and 7 DF, p-value: 1.334e-06
RichUlrich...
Posted: Sun Jun 29, 2008 7:06 pm
Guest
On Sat, 28 Jun 2008 18:46:28 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com>
wrote:

Quote:
On Jun 27, 4:14 pm, RichUlrich <rich.ulr... at (no spam) comcast.net> wrote:
That "but" is too strong.  I *do*  expect the coefficients of each
foot to be the same, and about half the single-foot value,  because
I've seen that sort of thing.  (Well, I'm extrapolating... with
Left-Right feet, I would never start with both, but start with
predictors that used the "average" and the "difference".)

Both estimated coefficients *might*  be wildly inaccurate.
Because the 2-dimensional confidence ellipse is long and narrow,
there is a great range of pairs-of-coefficients which give
very-nearly the same predictions.  

I don't know, is it too strong? As Dick S. pointed out, the standard
errors of the coefficients will be very large, so if they turned out
to be about the same and about half of the single-foot value, it would
just be lucky. Here's a contrived and rather extreme example I ran in
R. x1 and x2 are nearly the same with a small amount of noise as
difference (Pearson correlation of 0.999993, and y is just the sum of
the two with more noise added. A linear model of the form y ~ x1+x2
tends to result on one large positive coefficient and one large
negative coefficient, adding up to about 2. In this case the
coefficients are 37 and -35, with se of 29.

[snip, detail]


Okay, here is a more careful explanation of my point.

Given two measures that are "almost the same", the smart way
to model them is to replace them with their average and difference.
Is the difference important? In your model, it *is*.

If it shows up that way, it is something to take special note of.
You figure out what it means by... taking the difference, and
looking at what that derived score means.

That is not likely to happen in the sort of data that I typically
model, where each variable has its own bit of error, and the
reliabilities combine to make the sum almost ideal. (Data with
small Ns and outliers can behave otherwise.) These are the
data that split the loadings in half, though not very precisely.


--
Rich Ulrich
Richard Startz...
Posted: Sun Jun 29, 2008 10:59 pm
Guest
On Sun, 29 Jun 2008 20:06:56 -0400, RichUlrich
<rich.ulrich at (no spam) comcast.net> wrote:

Quote:
On Sat, 28 Jun 2008 18:46:28 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com
wrote:

On Jun 27, 4:14 pm, RichUlrich <rich.ulr... at (no spam) comcast.net> wrote:
That "but" is too strong.  I *do*  expect the coefficients of each
foot to be the same, and about half the single-foot value,  because
I've seen that sort of thing.  (Well, I'm extrapolating... with
Left-Right feet, I would never start with both, but start with
predictors that used the "average" and the "difference".)

Both estimated coefficients *might*  be wildly inaccurate.
Because the 2-dimensional confidence ellipse is long and narrow,
there is a great range of pairs-of-coefficients which give
very-nearly the same predictions.  

I don't know, is it too strong? As Dick S. pointed out, the standard
errors of the coefficients will be very large, so if they turned out
to be about the same and about half of the single-foot value, it would
just be lucky. Here's a contrived and rather extreme example I ran in
R. x1 and x2 are nearly the same with a small amount of noise as
difference (Pearson correlation of 0.999993, and y is just the sum of
the two with more noise added. A linear model of the form y ~ x1+x2
tends to result on one large positive coefficient and one large
negative coefficient, adding up to about 2. In this case the
coefficients are 37 and -35, with se of 29.

[snip, detail]

Okay, here is a more careful explanation of my point.

Given two measures that are "almost the same", the smart way
to model them is to replace them with their average and difference.
Is the difference important? In your model, it *is*.

If it shows up that way, it is something to take special note of.
You figure out what it means by... taking the difference, and
looking at what that derived score means.

That is not likely to happen in the sort of data that I typically
model, where each variable has its own bit of error, and the
reliabilities combine to make the sum almost ideal. (Data with
small Ns and outliers can behave otherwise.) These are the
data that split the loadings in half, though not very precisely.

Replacing two variables with their average and difference doesn't
change the regression at all, although it may make for a more
convenient interpretation.

If the estimated coefficients on the original variables (x1 and x2)
are b1 and b2, and the estimated coefficients on (x1-x2) and (x1+x2)/2
are c1 and c2, then

b1 = c1+c2/2 and b2 = -c1+c2/2

The R-squared and residuals will be identical.

-Dick Startz
RichUlrich...
Posted: Mon Jun 30, 2008 2:21 pm
Guest
On Sun, 29 Jun 2008 20:59:53 -0700, Richard Startz
<richardstartz1 at (no spam) comcast.net> wrote:

Quote:
On Sun, 29 Jun 2008 20:06:56 -0400, RichUlrich
rich.ulrich at (no spam) comcast.net> wrote:

On Sat, 28 Jun 2008 18:46:28 -0700 (PDT), lw <leviwaldron at (no spam) gmail.com
wrote:

On Jun 27, 4:14 pm, RichUlrich <rich.ulr... at (no spam) comcast.net> wrote:
That "but" is too strong.  I *do*  expect the coefficients of each
foot to be the same, and about half the single-foot value,  because
I've seen that sort of thing.  (Well, I'm extrapolating... with
Left-Right feet, I would never start with both, but start with
predictors that used the "average" and the "difference".)

Both estimated coefficients *might*  be wildly inaccurate.
Because the 2-dimensional confidence ellipse is long and narrow,
there is a great range of pairs-of-coefficients which give
very-nearly the same predictions.  

I don't know, is it too strong? As Dick S. pointed out, the standard
errors of the coefficients will be very large, so if they turned out
to be about the same and about half of the single-foot value, it would
just be lucky. Here's a contrived and rather extreme example I ran in
R. x1 and x2 are nearly the same with a small amount of noise as
difference (Pearson correlation of 0.999993, and y is just the sum of
the two with more noise added. A linear model of the form y ~ x1+x2
tends to result on one large positive coefficient and one large
negative coefficient, adding up to about 2. In this case the
coefficients are 37 and -35, with se of 29.

[snip, detail]

Okay, here is a more careful explanation of my point.

Given two measures that are "almost the same", the smart way
to model them is to replace them with their average and difference.
Is the difference important? In your model, it *is*.

If it shows up that way, it is something to take special note of.
You figure out what it means by... taking the difference, and
looking at what that derived score means.

That is not likely to happen in the sort of data that I typically
model, where each variable has its own bit of error, and the
reliabilities combine to make the sum almost ideal. (Data with
small Ns and outliers can behave otherwise.) These are the
data that split the loadings in half, though not very precisely.

Replacing two variables with their average and difference doesn't
change the regression at all, although it may make for a more
convenient interpretation.

If the estimated coefficients on the original variables (x1 and x2)
are b1 and b2, and the estimated coefficients on (x1-x2) and (x1+x2)/2
are c1 and c2, then

b1 = c1+c2/2 and b2 = -c1+c2/2

The R-squared and residuals will be identical.

-Dick Startz

Yes, thanks for adding that. "More convenient interpretation"
is the big gain. If you're not into interpretation, forget it.

In this case, the gain includes having standard errors and
tests that are usable.

--
cprice...
Posted: Tue Jul 01, 2008 2:32 am
Guest
I think the intuitive side of the answer has been neglected here, or
at least, could be stated a little more directly.

The slopes of your independent variables show the effect of that
variable on the dep variable while holding constant all of the other
indep variables you chose to include. Another way to phrase this is
that the slope controls for the effect of the other indep variables
included. Or phrased yet another way, the slope shows the effect of
that variable independent of the effect of any of the other variables.
(and yet another way is the partial derivative idea, which isn't
helpful to someone who doesn't know calculus)


Consider the two models: y=b1*x1 + b2*x2 + e and y=c1*x1 + c2*x2 +
c3*x3 + u


The slope b1 shows the effect of x1 on y, while controlling for x2. It
is not independent of x3, it does not control for x3, however you want
to say it.

The slope c1 shows the effect of x1 on y, while controlling for x2 and
x3.



Now, if the variable x1 just so happens to be independent of x3, then
not controlling for the effect of x3 is fine, and the effect of x1 on
y (ie, the slopes b1 and c1) will be the same, whether you include x3
or not.





Now, closer to the original question, consider the two models:


y=b1*x1 + b2*x2 + e and y=c1*x1 + c2*x3 + u


The slope b1 shows the effect of x1 on y, while controlling for x2. It
is not independent of x3, it does not control for x3, however you want
to say it.

The slope c1 shows the effect of x1 on y, while controlling just for
x3.

When will b1 and c1 be the same?

They will be the same when contorlling for x2 and x3 are not needed to
see the effect of x1 on y.

That is to say, they will be the same when x1 is independent of the
variables x2 and x3. This is the same conclusion reached in Paul
Rubin's post, but I think this answer is more intuitive.


This answer also begs another question, which I will save for a new
post.



-CP









On Jun 26, 4:09 am, sagar <ariji... at (no spam) gmail.com> wrote:
Quote:
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.
cprice...
Posted: Tue Jul 01, 2008 2:42 am
Guest
Consider the following model:

y = k + b1*x1 + b2*x2 + e


The slope b1 measures the effect of x1 on y, while holding x2
constant. We can get b1 by running the above regression, or in the
following way, using two single-variable regressions, now with all
variables mean-centered and variance=1.


We want the part of x1 that is independent of x2. We can get this by
doing

x1 = c1*x2 + u


The residuals here, u, are uncorrelated with x2, and represent the
component of x1 that is independent of x2.


Now regress y on these residuals, and this slope will represent the
effect of x1 on y, while holding x2 constant.

y = w1*u


And we should have w1 = b1.


My question is to ask for some proof of this. This was taken from
Pindyck and Rubinfeld "Econometric Models and Economic Forecasts",
2ed, appendix 4.2. Here they show a proof using regular scalar algebra
(very messy) for this specific case of a 2 variable regression, but I
am sure there must be some way to prove it in general using linear
algebra...



-CP


On Jun 26, 4:09 am, sagar <ariji... at (no spam) gmail.com> wrote:
Quote:
Hi,
Suppose I've a regression model as follows:
I'm regressing per capita consumption on level of income, age & family
size in Case 1.
I'm regressing per capita consumption again on income, age & level of
education (say) in Case 2.
Will the beta coefficient of income & age will remain same in both
Case 1 & Case 2? If no, why?
Arijit.
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Tue Dec 02, 2008 2:16 am