| |
 |
|
|
Science Forum Index » Statistics - Math Forum » Outlier detection
Page 1 of 1
|
| Author |
Message |
| Guest |
Posted: Thu Jan 04, 2007 7:55 pm |
|
|
|
|
Hi all,
I'm writting a program and I need to find the best regression line in a
scatter plot. In the exemple below I need to identify the path made by
the * and identify outliers, here represented by +.
It's easy to find out by eye but I need to it programmatically.
In my case, each axis represents a DNA sequene and similarities between
them are plotted.
Here, 4 linear regression lines can be fitted to the plot made by the
stars * and they have the same slope. The background noise can be worse
making the identification of outliers harder.
My question is how can i identify the most probable path along the
scatter plot and discard background noise? Is there some kind of
density based algorithm which i could apply to this?
Can anyone help me?
Cheers
Mathieu
| *
| *
|
| *
| + *
| *
| + *
|
| *
| *
| *
| * +
| * +
| *
|__________________________________ |
|
|
| Back to top |
|
| David Jones |
Posted: Fri Jan 05, 2007 6:10 am |
|
|
|
Guest
|
m.fourment@gmail.com wrote:
Quote: Hi all,
I'm writting a program and I need to find the best regression line
in
a scatter plot. In the exemple below I need to identify the path
made
by the * and identify outliers, here represented by +.
See the following book for some possible approaches....
A.C. Atkinson and M. Riani, Robust Diagnostic and Regression
Analysis, Springer, 2000.
David Jones |
|
|
| Back to top |
|
| Graham Jones |
Posted: Fri Jan 05, 2007 7:12 am |
|
|
|
Guest
|
<m.fourment@gmail.com> wrote in message
news:1167954952.866154.105900@51g2000cwl.googlegroups.com...
Quote: Hi all,
I'm writting a program and I need to find the best regression line in a
scatter plot. In the exemple below I need to identify the path made by
the * and identify outliers, here represented by +.
It's easy to find out by eye but I need to it programmatically.
In my case, each axis represents a DNA sequene and similarities between
them are plotted.
Here, 4 linear regression lines can be fitted to the plot made by the
stars * and they have the same slope. The background noise can be worse
making the identification of outliers harder.
My question is how can i identify the most probable path along the
scatter plot and discard background noise? Is there some kind of
density based algorithm which i could apply to this?
Can anyone help me?
Cheers
Mathieu
| *
| *
|
| *
| + *
| *
| + *
|
| *
| *
| *
| * +
| * +
| *
|__________________________________
It sounds (and looks) like you are trying to align two sequences so that an
`edit cost' between them is minimised. This has very little to do with
linear regression or outlier detection. It is well-known problem with a
well-known solution. See for example Introduction to Bioinformatics, Lesk,
2005. Or try searching with terms like
pairwise sequence alignment
dynamic programming
Graham |
|
|
| Back to top |
|
| David Jones |
Posted: Fri Jan 05, 2007 8:21 am |
|
|
|
Guest
|
David Jones wrote:
Quote: m.fourment@gmail.com wrote:
Hi all,
I'm writting a program and I need to find the best regression line
in
a scatter plot. In the exemple below I need to identify the path
made
by the * and identify outliers, here represented by +.
See the following book for some possible approaches....
A.C. Atkinson and M. Riani, Robust Diagnostic and Regression
Analysis, Springer, 2000.
David Jones
And for an extended approach not restricted to regression (ie treating
variables more symmetrically) see:
Atkinson AC, Riani M, Cerioli A. Exploring Multivariate Data with the
Forward Search
Springer 2004
David Jones |
|
|
| Back to top |
|
| |
|
Page 1 of 1
All times are GMT - 5 Hours
The time now is Tue Oct 14, 2008 2:05 am
|
|