| |
 |
|
|
Science Forum Index » Space - Consult Forum » Outlier detection
Page 1 of 1
|
| Author |
Message |
| Guest |
Posted: Thu Jan 04, 2007 8:14 pm |
|
|
|
|
Hi all,
I'm writting a program and I need to find the best regression line in a
scatter plot. In the exemple below I need to identify the path made by
the * and identify outliers, here represented by +.
It's easy to find out by eye but I need to it programmatically.
In my case, each axis represents a DNA sequene and similarities between
them are plotted.
Here, 4 linear regression lines can be fitted to the plot made by the
stars * and they have the same slope. The background noise can be worse
making the identification of outliers harder.
My question is how can i identify the most probable path along the
scatter plot and discard background noise? Is there some kind of
density based algorithm which i could apply to this?
Can anyone help me?
Cheers
Mathieu
| *
| *
|
| *
| + *
| *
| + *
|
| *
| *
| *
| * +
| * +
| *
|__________________________________ |
|
|
| Back to top |
|
| Guest |
Posted: Fri Jan 05, 2007 12:13 am |
|
|
|
|
m.fourment@gmail.com wrote:
Quote: Hi all,
I'm writting a program and I need to find the best regression line in a
scatter plot. In the exemple below I need to identify the path made by
the * and identify outliers, here represented by +.
It's easy to find out by eye but I need to it programmatically.
In my case, each axis represents a DNA sequene and similarities between
them are plotted.
Here, 4 linear regression lines can be fitted to the plot made by the
stars * and they have the same slope. The background noise can be worse
making the identification of outliers harder.
My question is how can i identify the most probable path along the
scatter plot and discard background noise? Is there some kind of
density based algorithm which i could apply to this?
Can anyone help me?
snip
If you want to identify multivariate outliers for any number of
variables, the Mardia test is one that has been popular. There's a SAS
macro to run it, and in R it's part of the dprep package. Whether your
examples even qualify as outliers is an empirical question that this
and other related tests allow to to quantify and answer with the usual
p-value. However, don't forget that outliers can be interesting by
themselves... an 18 year old person is not extremely deviant, nor is a
person who earns $60K, but an 18 year old earning $50K is exceptional
and worthy of study!!
cheers,
LFB |
|
|
| Back to top |
|
| Guest |
Posted: Fri Jan 05, 2007 9:58 am |
|
|
|
|
m.fourment@gmail.com wrote:
Quote: Hi all,
I'm writting a program and I need to find the best regression line in a
scatter plot. In the exemple below I need to identify the path made by
the * and identify outliers, here represented by +.
It's easy to find out by eye but I need to it programmatically.
In my case, each axis represents a DNA sequene and similarities between
them are plotted.
Here, 4 linear regression lines can be fitted to the plot made by the
stars * and they have the same slope. The background noise can be worse
making the identification of outliers harder.
My question is how can i identify the most probable path along the
scatter plot and discard background noise? Is there some kind of
density based algorithm which i could apply to this?
Can anyone help me?
Cheers
Mathieu
| *
| *
|
| *
| + *
| *
| + *
|
| *
| *
| *
| * +
| * +
| *
|__________________________________
Intervention Detection has been used in Time Series Analysis to detect
Pulses, Level Shifts, SEasonal PUlses and Local Time Trends. Since
cross-sectional i.e. non-sequential data is a proper subset of Time
Series, the tools of Intervention Detection can be used.
It is not clear that you are using time series data because you did not
disclose X but for the moment I will assume that is a time series plot.
A similar problem arises wjen you have a series like
1,9,1,9,1,9,5,9 ... One needs to detect the anomaly at time period 7
You can use a piece of FREEWARE call FREEFORE available from AFS at
http://www.autobox.com/freef.exe . It will provide a detailed report on
the anomalies that it detected.
Hope this helps.
Dave Reilly
Automatic Forecasting Systems
215-675-0652
http://www.autobox.com |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Wed Oct 08, 2008 1:47 am
|
|