| |
 |
|
|
Science Forum Index » Statistics - Math Forum » generating outliers in a simulation study...
Page 1 of 1
|
| Author |
Message |
| A.A.A... |
Posted: Thu Jul 03, 2008 12:50 am |
|
|
|
Guest
|
Hi all,
I want to test the performance of a certain mathematical programming
model in the presence of outliers .This should be done through a
simulation study, so i want to generate datasets containing
outliers.How could this be done?
Thanks inadvance |
|
|
| Back to top |
|
| Russell... |
Posted: Thu Jul 03, 2008 3:03 am |
|
|
|
Guest
|
On Jul 3, 6:50 am, "A.A.A" <ayaf... at (no spam) yahoo.com> wrote:
Quote: Hi all,
I want to test the performance of a certain mathematical programming
model in the presence of outliers .This should be done through a
simulation study, so i want to generate datasets containing
outliers.How could this be done?
Thanks inadvance
Not to sound flip, but probably in an infinite
number of ways. The one you'll want depends on
what you're testing, how rigorous you want to be,
and maybe other things. First you probably need
to decide what the distribution of values is that
you expect in the absence of outliers so you can
determine what values will actually be outliers.
You may care about how good the algorithm that
generates those values is (there are random number
generators with poor characteristics, others with
better characteristics). Then you need to decide
the distribution of the outliers and how to generate
them. Without more details it is difficult to be
more specific. You may need to sit down with a
consultant to really communicate your needs and get
adequate help. This medium isn't the best for such
collaborative work.
Cheers,
Russell |
|
|
| Back to top |
|
| Paul Rubin... |
Posted: Thu Jul 03, 2008 10:18 am |
|
|
|
Guest
|
Russell wrote:
Quote: On Jul 3, 6:50 am, "A.A.A" <ayaf... at (no spam) yahoo.com> wrote:
Hi all,
I want to test the performance of a certain mathematical programming
model in the presence of outliers .This should be done through a
simulation study, so i want to generate datasets containing
outliers.How could this be done?
Thanks inadvance
Not to sound flip, but probably in an infinite
number of ways. The one you'll want depends on
what you're testing, how rigorous you want to be,
and maybe other things. First you probably need
to decide what the distribution of values is that
you expect in the absence of outliers so you can
determine what values will actually be outliers.
You may care about how good the algorithm that
generates those values is (there are random number
generators with poor characteristics, others with
better characteristics). Then you need to decide
the distribution of the outliers and how to generate
them. Without more details it is difficult to be
more specific. You may need to sit down with a
consultant to really communicate your needs and get
adequate help. This medium isn't the best for such
collaborative work.
This question comes up frequently in the area of math programming models
for classification analysis (and hopefully I won't get flamed for
mentioning them in a stats group ). As Russell says, there's all
sorts of ways outliers can crop up. Since "classical" discriminant
analysis is grounded in multivariable normal populations, a common
approach is to generate MVN data and contaminate it with data from a
different population (usually also MVN). I think the intent there is to
simulate what you might call a defective sampling situation, in which
you think a sample comes entirely from one population but in fact it
does not.
I suspect it's impossible to test whether a model/method is robust to
all types of outliers. Assuming there is a specific application
context, I think Russell has the right idea: ask context experts what
sorts of outlier problems they tend to encounter. Then test against the
most common type(s).
/Paul |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Sat Sep 06, 2008 2:31 pm
|
|