Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Math Forum  »  Regression model under-predicts high values of Y? What can I
Page 1 of 1    
Author Message
Guest
Posted: Wed Jan 17, 2007 10:45 pm
HI all, I'm modeling a DV that ranges between 1 and 30, with a mean of
11.38 and standard deviation of 5.042. My model are consistently
under-predicting the higher values of Y (highest Y hat is 1Cool, which
incidentally are the cases I am most interested in predicting correctly
(most important according to my theory. I have tried taking the
natural log of the DV, and this does not help. Does anyone have
suggestions on what I can do to improve my explanatory power on the
higher values of Y?
Ray Koopman
Posted: Thu Jan 18, 2007 4:04 am
Guest
pbrewster@hotmail.com wrote:
Quote:
HI all, I'm modeling a DV that ranges between 1 and 30, with a mean of
11.38 and standard deviation of 5.042. My model are consistently
under-predicting the higher values of Y (highest Y hat is 1Cool, which
incidentally are the cases I am most interested in predicting correctly
(most important according to my theory. I have tried taking the
natural log of the DV, and this does not help. Does anyone have
suggestions on what I can do to improve my explanatory power on the
higher values of Y?

That's called "regression to the mean". It's not a bug, but a feature
that is intrinsic to all regression. The residuals are always
positively correlated with y: r(y,y-yhat) = sqrt[1 - R(y,X)^2].
The only way to reduce r(y,y-yhat) is to get better predictors,
to raise R(y,X)^2.
Sanni
Posted: Thu Jan 18, 2007 4:58 am
Guest
Hi,

I don't know whether you have tried applying Logarithmic model. Take
the log values for the dependent variable and take the value of
independent variable as it is and fit the model. Then, I hope you would
get the result for higher values of Y too easily.


Ray Koopman wrote:
Quote:
pbrewster@hotmail.com wrote:
HI all, I'm modeling a DV that ranges between 1 and 30, with a mean of
11.38 and standard deviation of 5.042. My model are consistently
under-predicting the higher values of Y (highest Y hat is 1Cool, which
incidentally are the cases I am most interested in predicting correctly
(most important according to my theory. I have tried taking the
natural log of the DV, and this does not help. Does anyone have
suggestions on what I can do to improve my explanatory power on the
higher values of Y?

That's called "regression to the mean". It's not a bug, but a feature
that is intrinsic to all regression. The residuals are always
positively correlated with y: r(y,y-yhat) = sqrt[1 - R(y,X)^2].
The only way to reduce r(y,y-yhat) is to get better predictors,
to raise R(y,X)^2.
David Winsemius
Posted: Sat Jan 20, 2007 11:51 pm
Guest
pbrewster@hotmail.com wrote in news:1169088317.041838.75630@
11g2000cwr.googlegroups.com:

Quote:
HI all, I'm modeling a DV that ranges between 1 and 30, with a mean of
11.38 and standard deviation of 5.042. My model are consistently
under-predicting the higher values of Y (highest Y hat is 1Cool, which
incidentally are the cases I am most interested in predicting correctly
(most important according to my theory. I have tried taking the
natural log of the DV, and this does not help. Does anyone have
suggestions on what I can do to improve my explanatory power on the
higher values of Y?

Suggestions:


Include interaction terms.

Check your residuals with a normal Q-Q plot.

Consider other non-linear models (besides log transform Y). Some models may
have the same effect on fit as interactions but have simpler forms.

--
David Winsemius
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Thu Jul 24, 2008 4:28 pm