Main Page | Report this Page
Science Forum Index  »  Space - Consult Forum  »  Overfitting, correction for optimism and shrinkage...
Page 1 of 1    

Overfitting, correction for optimism and shrinkage...

Author Message
Daniel...
Posted: Wed Sep 09, 2009 8:37 am
Guest
Good afternoon all,

I want to develop a prediction model for a specific outcome of
interest. Among the many issues of concern is that of overfitting. I
have been reading on the subject and had a few questions. Some of my
references, which are:

- Multivariable prognostic models: Issues in developing models,
evaluating assumptions and adequacy, and measuring and reducing errors
(StatMed vol. 15 pp361-387)
- Clinical Prediction Models, EW Steyerberg (chapters 5, 13)

largely focus on developing a prediction regression model and then
obtaining performance measures that are corrected for optimism (for
example, R square, c statistic). These references also discuss
shrinkage of regression coefficients to improve predictions from a
regression model. While I understand the importance of correcting the
performance measures for optimism, so as to not overstate the merits
of the prediction model when using another dataset, I would have
thought of approaching the subject from a different angle. That is, I
would have focused on shrinkage first, in order to obtain the
shrinkage factor and then correct the regression parameter estimates
of my model using the said shrinkage factor, and then obtaining
performance measures using the model with shrinkage-corrected
regression parameter estimates. Because our goal is to publish this
prediction model so that clinicians may use it in their practice, we
would want to focus on shrinkage first to get "better" parameter
estimates and then interest ourselves in the optimism-corrected
performance measures. Are these two approaches similar, or am I
missing an important point here?

Thank you,

Daniel
 
Rich Ulrich...
Posted: Wed Sep 09, 2009 7:18 pm
Guest
On Wed, 9 Sep 2009 11:37:25 -0700 (PDT), Daniel
<daniel.biostatistics at (no spam) gmail.com> wrote:

[quote:306e4ddbd9]Good afternoon all,

I want to develop a prediction model for a specific outcome of
interest. Among the many issues of concern is that of overfitting. I
have been reading on the subject and had a few questions. Some of my
references, which are:

- Multivariable prognostic models: Issues in developing models,
evaluating assumptions and adequacy, and measuring and reducing errors
(StatMed vol. 15 pp361-387)
- Clinical Prediction Models, EW Steyerberg (chapters 5, 13)

largely focus on developing a prediction regression model and then
obtaining performance measures that are corrected for optimism (for
example, R square, c statistic). These references also discuss
shrinkage of regression coefficients to improve predictions from a
regression model.
[/quote:306e4ddbd9]
I am curious about what those sources say about shrinkage,
since I haven't seen its use for anything clinical.

We had a poster a couple of years ago who was adamant
that there was never anything useful in it.

My own impression is that it can be useful when you have highly
collinear predictors such that the resulting equation has
"suppressor" variables, and you can't get rid of them by
re-scaling or re-framing the set of predictors. (That is --
take logs or whatever; or compute and use a difference
of two similar predictors instead of entering them both.)

If that's the case, then they could work. But don't forget
the cross-validation.

[quote:306e4ddbd9]While I understand the importance of correcting the
performance measures for optimism, so as to not overstate the merits
of the prediction model when using another dataset, I would have
thought of approaching the subject from a different angle. That is, I
would have focused on shrinkage first, in order to obtain the
shrinkage factor and then correct the regression parameter estimates
of my model using the said shrinkage factor, and then obtaining
performance measures using the model with shrinkage-corrected
regression parameter estimates. Because our goal is to publish this
prediction model so that clinicians may use it in their practice, we
would want to focus on shrinkage first to get "better" parameter
estimates and then interest ourselves in the optimism-corrected
performance measures. Are these two approaches similar, or am I
missing an important point here?
[/quote:306e4ddbd9]
Cross-validation is how you check the model. Yes, the shrunken
version should show better on cross-validation than the original
model; that's the justification for it.

I wonder - The people that I have seen try to construct clinical
predictors almost *always* start out with too many variables,
and they want to use stepwise selection to get a slimmer model.
You don't mention that, but that would be a far bigger hazard
to replication than the simple issue of shrinkage. The only cure
for "too many variables" -- if you won't trim them rationally --
is *extensive* cross-validation. Like, using 5 or 10 times as
many cases to validate as are used to derive predictions.

How big is your N?

--
Rich Ulrich
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Tue Nov 24, 2009 4:30 am