Main Page | Report this Page
Science Forum Index  »  Statistics - Math Forum  »  How does "after correcting for...." work?...
Page 1 of 1    

How does "after correcting for...." work?...

Author Message
root...
Posted: Fri Oct 23, 2009 10:23 pm
Guest
I just heard a news story about a new look at the
data collected in the Framingham heart study. The
researchers want to tease out some effects in the
data while correcting for the interaction of some
other factors. I know how an experiment can be
designed to minimize the interaction of unwanted
variables, but in this case the data are fixed
and have been so over the last 30 years of collection.

I often hear researchers make a statement like:
"every hour of daily exercise extends one's lifetime
by two hours, after correcting for such effects as
diet and genetic factors". I just made that statement
up as an example. Continuing with the example,
suppose the researcher has a wealth of data including
age at death, exercise history, dietary habits, age
at death of all family members, whatever might be
relevant. Just how does one process the data to
eliminate the effects of the confounding factors?

Thanks very much for any answers.
 
Bruce Weaver...
Posted: Sat Oct 24, 2009 7:43 am
Guest
root wrote:
[quote]I just heard a news story about a new look at the
data collected in the Framingham heart study. The
researchers want to tease out some effects in the
data while correcting for the interaction of some
other factors. I know how an experiment can be
designed to minimize the interaction of unwanted
variables, but in this case the data are fixed
and have been so over the last 30 years of collection.

I often hear researchers make a statement like:
"every hour of daily exercise extends one's lifetime
by two hours, after correcting for such effects as
diet and genetic factors". I just made that statement
up as an example. Continuing with the example,
suppose the researcher has a wealth of data including
age at death, exercise history, dietary habits, age
at death of all family members, whatever might be
relevant. Just how does one process the data to
eliminate the effects of the confounding factors?

Thanks very much for any answers.
[/quote]
I think it's easiest to see what this means in the context of
categorical variables. Here's an example of a case-control study.
E+ and E- refer to Yes and No for exposure to the variable of
interest. Cases are people who have the disease of interest, and
Controls folks who do not have the disease. Here is the 2x2 table:

Case Ctl
E+ 400 410
E- 600 590

The odds ratio is 0.959 (95% CI, 0.802, 1.147).

But now, let's look at the 2x2 tables for males and females
separately.

Male Female
Case Ctl Case Ctl
E+ 160 80 E+ 240 330
E- 440 320 E- 160 270

OR for Males: 1.455 (1.073, 1.972)
OR for Females: 1.227 (0.949, 1.586)

If I wanted "adjust for gender" in the way you describe above, I
could run a logistic regression model that includes both Exposure
and Gender as variables (but not the interaction between the two).
The odds ratio for Exposure from that model is:

OR for exposure: 1.318 (1.083, 1.603)

This odds ratio is a pooled estimate (or weighted average, if you
like) of the odds ratios for males & females. The Mantel-Haenszel
method of pooling odds ratios yields a nearly identical result:

Pooled OR (MH Method): 1.318 (1.084, 1.604)

So in general terms, when you adjust for a categorical variable,
you are in essence computing the measure of effect size (e.g.,
odds ratio) for each stratum of that variable, and then computing
a weighted average of all those effect sizes. Of course, pooling
across the strata like this only makes sense if there is no
interaction between the variables. (The presence of an
interaction between the variables tells you that the effect of one
variable varies across the levels of the other, and so it would
not make a lot of sense to compute a pooled estimate.)

HTH.

--
Bruce Weaver
bweaver at (no spam) lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."
 
root...
Posted: Sat Oct 24, 2009 11:01 am
Guest
Bruce Weaver <bweaver at (no spam) lakeheadu.ca> wrote:
[quote]
So in general terms, when you adjust for a categorical variable,
you are in essence computing the measure of effect size (e.g.,
odds ratio) for each stratum of that variable, and then computing
a weighted average of all those effect sizes. Of course, pooling
across the strata like this only makes sense if there is no
interaction between the variables. (The presence of an
interaction between the variables tells you that the effect of one
variable varies across the levels of the other, and so it would
not make a lot of sense to compute a pooled estimate.)

HTH.

[/quote]
I don't see how to extend your yes/no to cases where the
control variables are continuous. In fact, suppose all
the extraneous factors are continuous so we can perform
any level of regression.

Thanks for responding.
 
Ray Koopman...
Posted: Sat Oct 24, 2009 1:19 pm
Guest
On Oct 24, 1:23 am, root <NoEM... at (no spam) home.org> wrote:
[quote]I just heard a news story about a new look at the
data collected in the Framingham heart study. The
researchers want to tease out some effects in the
data while correcting for the interaction of some
other factors. I know how an experiment can be
designed to minimize the interaction of unwanted
variables, but in this case the data are fixed and
have been so over the last 30 years of collection.

I often hear researchers make a statement like:
"every hour of daily exercise extends one's lifetime
by two hours, after correcting for such effects as
diet and genetic factors". I just made that statement
up as an example. Continuing with the example,
suppose the researcher has a wealth of data including
age at death, exercise history, dietary habits, age
at death of all family members, whatever might be
relevant. Just how does one process the data to
eliminate the effects of the confounding factors?

Thanks very much for any answers.
[/quote]
"Correcting" the regression of y on x1 for the effects
of x2,x3,... usually means getting the residuals from
the regression of x1 on x2,x3,... and then using those
residuals to predict y.
 
root...
Posted: Sat Oct 24, 2009 2:17 pm
Guest
Ray Koopman <koopman at (no spam) sfu.ca> wrote:
[quote]
"Correcting" the regression of y on x1 for the effects
of x2,x3,... usually means getting the residuals from
the regression of x1 on x2,x3,... and then using those
residuals to predict y.
[/quote]
When I first read your response I thought you meant
the *one* residual resulting from a regression of
x1 against all the other Xi. On second reading
I think you mean to generate a residual for each
regression:
x1 against x2, x1 against x3, ....
and then forming a regression of y against the
collection of the residuals.

Each of the residuals is orthogonal to x1, so
this regression wouldn't give any information
about y against x1, which is our original objective.

Do you mean to follow up with still another regression?
This ultimate regression would regress the residual of
the previous regression against x1. I have to ponder
this for a while, but it seems that this would work.

Thanks for the response.
 
root...
Posted: Sat Oct 24, 2009 2:43 pm
Guest
root <NoEMail at (no spam) home.org> wrote:
[quote]Ray Koopman <koopman at (no spam) sfu.ca> wrote:

"Correcting" the regression of y on x1 for the effects
of x2,x3,... usually means getting the residuals from
the regression of x1 on x2,x3,... and then using those
residuals to predict y.

When I first read your response I thought you meant
the *one* residual resulting from a regression of
x1 against all the other Xi. On second reading
I think you mean to generate a residual for each
regression:
x1 against x2, x1 against x3, ....
and then forming a regression of y against the
collection of the residuals.

Each of the residuals is orthogonal to x1, so
this regression wouldn't give any information
about y against x1, which is our original objective.

Do you mean to follow up with still another regression?
This ultimate regression would regress the residual of
the previous regression against x1. I have to ponder
this for a while, but it seems that this would work.

Thanks for the response.
[/quote]
After pondering I don't like it. For one reason I
can't see how to extend the method to the case
where we are interested in more than one independent
and want to eliminate the effects of the extraneous
independents.

Suppose we have the dependent Y, and two classes
of independents: Xi, factors of interest, and
Zi factors whose effect we want to eliminate.

First we regress Y against the Zi and develop
the residual. This residual is orthogonal to
the subspace formed by the Zi. Now we can
regress that residual against the Xi. This
second regression will try to explain only
that part of the variation in Y which cannot
be explained by the Zi.
 
Ray Koopman...
Posted: Sat Oct 24, 2009 3:47 pm
Guest
On Oct 24, 5:43 pm, root <NoEM... at (no spam) home.org> wrote:
[quote]root <NoEM... at (no spam) home.org> wrote:
Ray Koopman <koop... at (no spam) sfu.ca> wrote:
"Correcting" the regression of y on x1 for the effects
of x2,x3,... usually means getting the residuals from
the regression of x1 on x2,x3,... and then using those
residuals to predict y.

When I first read your response I thought you meant
the *one* residual resulting from a regression of
x1 against all the other Xi. On second reading
I think you mean to generate a residual for each
regression:
x1 against x2, x1 against x3, ....
and then forming a regression of y against the
collection of the residuals.

Each of the residuals is orthogonal to x1, so
this regression wouldn't give any information
about y against x1, which is our original objective.

Do you mean to follow up with still another regression?
This ultimate regression would regress the residual of
the previous regression against x1. I have to ponder
this for a while, but it seems that this would work.

Thanks for the response.

After pondering I don't like it. For one reason I
can't see how to extend the method to the case
where we are interested in more than one independent
and want to eliminate the effects of the extraneous
independents.

Suppose we have the dependent Y, and two classes
of independents: Xi, factors of interest, and
Zi factors whose effect we want to eliminate.

First we regress Y against the Zi and develop
the residual. This residual is orthogonal to
the subspace formed by the Zi. Now we can
regress that residual against the Xi. This
second regression will try to explain only
that part of the variation in Y which cannot
be explained by the Zi.
[/quote]
Close, but no cigar. To correct the regression of Y on X1
for the effects of X2,X3,... you need to regress Y on the
component of X1 that is orthogonal to the space spanned
by X2,X3,.... You get that by regressing X1 on X2,X3,...
simultaneously -- one multiple regression equation,
not several univariate regression equations -- and then
regressing Y on the residuals from that regression. Note
that all this is done automatically, with each predictor
corrected for the effects of all the others, when you
regress Y on X1,X2,X3,... simultaneously.
 
root...
Posted: Sat Oct 24, 2009 10:22 pm
Guest
Ray Koopman <koopman at (no spam) sfu.ca> wrote:
[quote]On Oct 24, 5:43 pm, I wrote:

First we regress Y against the Zi and develop
the residual. This residual is orthogonal to
the subspace formed by the Zi. Now we can
regress that residual against the Xi. This
second regression will try to explain only
that part of the variation in Y which cannot
be explained by the Zi.
[/quote]
Here I meant the i in Xi and Zi to stand for several indices.
In the paragraph above, if we are interested in only
one variable, x1, then Xi=x1, and the Zi are x2,x3,x4

[quote]
Close, but no cigar. To correct the regression of Y on X1
for the effects of X2,X3,... you need to regress Y on the
component of X1 that is orthogonal to the space spanned
by X2,X3,.... You get that by regressing X1 on X2,X3,...
simultaneously -- one multiple regression equation,
not several univariate regression equations -- and then
regressing Y on the residuals from that regression. Note
that all this is done automatically, with each predictor
corrected for the effects of all the others, when you
regress Y on X1,X2,X3,... simultaneously.
[/quote]
Thanks again. I think you are saying what I was trying
to say. In my paragraph above I was trying to extend
the problem to the case where we have several variables
of interest, and several more we want to ignore. In
my paragraph above, when we have only x1 of interest
what I said is exactly what you are saying.
 
root...
Posted: Sun Oct 25, 2009 7:39 pm
Guest
I wrote:
[quote]Ray Koopman <koopman at (no spam) sfu.ca> wrote:
To correct the regression of Y on X1
for the effects of X2,X3,... you need to regress Y on the
component of X1 that is orthogonal to the space spanned
by X2,X3,.... You get that by regressing X1 on X2,X3,...
simultaneously -- one multiple regression equation,
not several univariate regression equations -- and then
regressing Y on the residuals from that regression. Note
that all this is done automatically, with each predictor
corrected for the effects of all the others, when you
regress Y on X1,X2,X3,... simultaneously.

Thanks again. I think you are saying what I was trying
to say. In my paragraph above I was trying to extend
the problem to the case where we have several variables
of interest, and several more we want to ignore. In
my paragraph above, when we have only x1 of interest
what I said is exactly what you are saying.

[/quote]
I think the proposed method only provides a lower bound for the desired
effect. I show this by example:suppose you want to determine the effect
of sunlight on the average vitamin C content of an apple. Your input
dependent variable is the vitamin content of apples over time. You use
as independent variables the variation of sunlight over time, and some
other "independents" that are affected by sunlight. Note, we know that
sunlight isn't affected by whatever the other independents are, so the
real effect of sunlight is simply that determined by the regression
of apple vitamin C against sunlight. In the real world, though, the
relationship between independents isn't always clear. Following the
proposed method of isolating the effect of sunlight, you first regress
apple vitamin C against the other independents, and form the residuals of
that regression. You then determine the effect of sunlight by regressing
the residual against sunlight. However, some of the effect of sunlight
has already been removed, so this second regression provides a lower
bound to the effect of sunlight. Similarly, a simple regression of
the original independent against sunlight provides an upper bound
to the effect [in the case of sunlight the upper bound is the answer].
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Fri Dec 04, 2009 4:33 am