pbrewster@hotmail.com wrote:
HI all, I'm modeling a DV that ranges between 1 and 30, with a mean of
11.38 and standard deviation of 5.042. My model are consistently
under-predicting the higher values of Y (highest Y hat is 1

, which
incidentally are the cases I am most interested in predicting correctly
(most important according to my theory. I have tried taking the
natural log of the DV, and this does not help. Does anyone have
suggestions on what I can do to improve my explanatory power on the
higher values of Y?
That's called "regression to the mean". It's not a bug, but a feature
that is intrinsic to all regression. The residuals are always
positively correlated with y: r(y,y-yhat) = sqrt[1 - R(y,X)^2].
The only way to reduce r(y,y-yhat) is to get better predictors,
to raise R(y,X)^2.