Main Page | Report this Page
Science Forum Index  »  Statistics - Math Forum  »  Linear Regression With Dependent Weights?...
Page 1 of 1    

Linear Regression With Dependent Weights?...

Author Message
Eric M. Van...
Posted: Fri Nov 06, 2009 10:42 am
Guest
I have a dependent variable which can be nicely modeled with a linear
expression on 8 variables. (n = 578, r = .977, p values of the variables
nearly all in the 10^-5 range or better.)

The catch is this: both logic and an error analysis shows that the linear
weights are not fixed, but are in fact themselves dependent on a master
controlling variable. That master variable has a relatively narrow range
_in this data set_, which is why we can get such a nice correlation, but
we would like to be able to use this model in instances where the master
variable is much smaller or higher and hence where significant errors may
enter.

Currently I have nice _average values_ for the linear weights. What I'd
like to do is express each one as a different linear function of the
master variable (they may not be linear functions but it will get us a
lot closer to the truth!).

Is there any algorithm / software tool that can crack this?

I'm hopeful that there are indeed ways to deal with this because I can
think of real-world situations that seem like they might behave like
this. For instance, you might have an IQ predictor based as a regression
on 4 variables, but the weights of the variables might depend on socio-
economic status. It would seem to be a logical next step in constructing
models of reality.

(For those who are curious, what I'm working on is run scoring in
baseball! The relative importance of a 2B or HR versus, say a SB, is
very much a function of the overall OBP -- as OBP goes up the differences
between various offensive events become smaller, as OBP goes down events
such as the HR become much more important than a SB. You therefore can't
use a model with fixed weights to accurately estimate run scoring in a
single game.)
 
Ray Koopman...
Posted: Fri Nov 06, 2009 10:42 am
Guest
On Nov 6, 9:24 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[quote]On Nov 6, 7:42 am, "Eric M. Van" <em... at (no spam) post.harvard.edu> wrote:
I have a dependent variable which can be nicely modeled with a linear
expression on 8 variables. (n = 578, r = .977, p values of the variables
nearly all in the 10^-5 range or better.)

The catch is this: both logic and an error analysis shows that the linear
weights are not fixed, but are in fact themselves dependent on a master
controlling variable. That master variable has a relatively narrow range
_in this data set_, which is why we can get such a nice correlation, but
we would like to be able to use this model in instances where the master
variable is much smaller or higher and hence where significant errors may
enter.

Currently I have nice _average values_ for the linear weights. What I'd
like to do is express each one as a different linear function of the
master variable (they may not be linear functions but it will get us a
lot closer to the truth!).

Is there any algorithm / software tool that can crack this?

I'm hopeful that there are indeed ways to deal with this because I can
think of real-world situations that seem like they might behave like
this. For instance, you might have an IQ predictor based as a regression
on 4 variables, but the weights of the variables might depend on socio-
economic status. It would seem to be a logical next step in constructing
models of reality.

(For those who are curious, what I'm working on is run scoring in
baseball! The relative importance of a 2B or HR versus, say a SB, is
very much a function of the overall OBP -- as OBP goes up the differences
between various offensive events become smaller, as OBP goes down events
such as the HR become much more important than a SB. You therefore can't
use a model with fixed weights to accurately estimate run scoring in a
single game.)

In some circles your 'master' variable is called a 'moderator'
variable. To model its linear action, add it and its products with
the predictors to the model. For instance, if you have 4 predictors
(x1,x2,x3,x4) and 2 moderators (z1,z2), your model would be

y = b00 + b10*x1 + b20*x2 + b30*x3 + b40*x4 +
b01*z1 + b11*x1*z1 + b21*x2*z1 + b31*x3*z1 + b41*x4*z1 +
b02*z2 + b12*x1*z2 + b22*x2*z2 + b32*x3*z2 + b42*x4*z2 + error.

In general, if you have p predictors and q moderators,
there will be (p+1)(q+1) parameters to estimate.
[/quote]
Addendum: Note that the moderated model can be interpreted as

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + error,

with b0 = b00 + b01*z1 + b02*z2,
b1 = b10 + b11*z1 + b12*z2,
b2 = b20 + b21*z1 + b22*z2,
b3 = b30 + b31*z1 + b32*x2.
 
Ray Koopman...
Posted: Fri Nov 06, 2009 10:42 am
Guest
On Nov 6, 9:40 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[quote][...]

Addendum: Note that the moderated model can be interpreted as

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + error,

with b0 = b00 + b01*z1 + b02*z2,
b1 = b10 + b11*z1 + b12*z2,
b2 = b20 + b21*z1 + b22*z2,
b3 = b30 + b31*z1 + b32*x2.
[/quote]
Corrected Addendum: The moderated model can be interpreted as

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + error,

with b0 = b00 + b01*z1 + b02*z2,
b1 = b10 + b11*z1 + b12*z2,
b2 = b20 + b21*z1 + b22*z2,
b3 = b30 + b31*z1 + b32*x2,
b4 = b40 + b41*z1 + b42*x2.
 
Ray Koopman...
Posted: Fri Nov 06, 2009 10:42 am
Guest
On Nov 6, 7:42 am, "Eric M. Van" <em... at (no spam) post.harvard.edu> wrote:
[quote]I have a dependent variable which can be nicely modeled with a linear
expression on 8 variables. (n = 578, r = .977, p values of the variables
nearly all in the 10^-5 range or better.)

The catch is this: both logic and an error analysis shows that the linear
weights are not fixed, but are in fact themselves dependent on a master
controlling variable. That master variable has a relatively narrow range
_in this data set_, which is why we can get such a nice correlation, but
we would like to be able to use this model in instances where the master
variable is much smaller or higher and hence where significant errors may
enter.

Currently I have nice _average values_ for the linear weights. What I'd
like to do is express each one as a different linear function of the
master variable (they may not be linear functions but it will get us a
lot closer to the truth!).

Is there any algorithm / software tool that can crack this?

I'm hopeful that there are indeed ways to deal with this because I can
think of real-world situations that seem like they might behave like
this. For instance, you might have an IQ predictor based as a regression
on 4 variables, but the weights of the variables might depend on socio-
economic status. It would seem to be a logical next step in constructing
models of reality.

(For those who are curious, what I'm working on is run scoring in
baseball! The relative importance of a 2B or HR versus, say a SB, is
very much a function of the overall OBP -- as OBP goes up the differences
between various offensive events become smaller, as OBP goes down events
such as the HR become much more important than a SB. You therefore can't
use a model with fixed weights to accurately estimate run scoring in a
single game.)
[/quote]
In some circles your 'master' variable is called a 'moderator'
variable. To model its linear action, add it and its products with
the predictors to the model. For instance, if you have 4 predictors
(x1,x2,x3,x4) and 2 moderators (z1,z2), your model would be

y = b00 + b10*x1 + b20*x2 + b30*x3 + b40*x4 +
b01*z1 + b11*x1*z1 + b21*x2*z1 + b31*x3*z1 + b41*x4*z1 +
b02*z2 + b12*x1*z2 + b22*x2*z2 + b32*x3*z2 + b42*x4*z2 + error.

In general, if you have p predictors and q moderators,
there will be (p+1)(q+1) parameters to estimate.
 
Paul...
Posted: Fri Nov 06, 2009 10:42 am
Guest
On Nov 6, 10:42 am, "Eric M. Van" <em... at (no spam) post.harvard.edu> wrote:
[quote]I have a dependent variable which can be nicely modeled with a linear
expression on 8 variables. (n = 578, r = .977, p values of the variables
nearly all in the 10^-5 range or better.)

The catch is this: both logic and an error analysis shows that the linear
weights are not fixed, but are in fact themselves dependent on a master
controlling variable.  That master variable has a relatively narrow range
_in this data set_, which is why we can get such a nice correlation, but
we would like to be able to use this model in instances where the master
variable is much smaller or higher and hence where significant errors may
enter.

Currently I have nice _average values_ for the linear weights.  What I'd
like to do is express each one as a different linear function of the
master variable (they may not be linear functions but it will get us a
lot closer to the truth!).

Is there any algorithm / software tool that can crack this?

I'm hopeful that there are indeed ways to deal with this because I can
think of real-world situations that seem like they might behave like
this.  For instance, you might have an IQ predictor based as a regression
on 4 variables, but the weights of the variables might depend on socio-
economic status.  It would seem to be a logical next step in constructing
models of reality.

(For those who are curious, what I'm working on is run scoring in
baseball!  The relative importance of a 2B or HR versus, say a SB, is
very much a function of the overall OBP -- as OBP goes up the differences
between various offensive events become smaller, as OBP goes down events
such as the HR become much more important than a SB.  You therefore can't
use a model with fixed weights to accurately estimate run scoring in a
single game.)
[/quote]
I might be missing something here, but why not make it a quadratic
polynomial? Regress the response variables on the original
predictors, the master variable, and the product of the master
variable with each of the original predictors. Since it's a
polynomial, you can still use OLS regression.

/Paul
 
Eric M. Van...
Posted: Sat Nov 07, 2009 5:51 am
Guest
Ray, this is PRECISELY what I need, and you explained it at exactly the
level I understand. Thankfully I just have one moderator but I now know
how to deal with multiple ones if I run into that. Thanks muchly.

"Paul" also provided, I think, the same answer (thanks, Paul!), but (as
someone whose last formal stats class was in prep school in 1972) it went
over my head!


Ray Koopman <koopman at (no spam) sfu.ca> wrote in news:e9213085-aa6d-47cb-8e64-
287097a4c3bb at (no spam) m33g2000pri.googlegroups.com:

[quote]On Nov 6, 9:40 am, Ray Koopman <koop... at (no spam) sfu.ca> wrote:
[...]


Corrected Addendum: The moderated model can be interpreted as

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + error,

with b0 = b00 + b01*z1 + b02*z2,
b1 = b10 + b11*z1 + b12*z2,
b2 = b20 + b21*z1 + b22*z2,
b3 = b30 + b31*z1 + b32*x2,
b4 = b40 + b41*z1 + b42*x2.
[/quote]
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Thu Nov 26, 2009 2:34 am