Main Page | Report this Page
Science Forum Index  »  Statistics - Math Forum  »  Testing distances non-parametrically and efficiently...
Page 1 of 1    

Testing distances non-parametrically and efficiently...

Author Message
fuscus...
Posted: Sun Oct 25, 2009 11:31 pm
Guest
Hello,

I'm a relative newcomer to statistics, and I am faced with the
following problem. I have two populations of points in a metric space
- in other words, for each pair of points A and B I can compute a
symmetric non-negative distance d(A,B) that follows the triangular
inequality d(A,B)<=d(A,C)+d(C,B). I would like an *efficient* non-
parametric test to establish whether two random points in the first
population are more likely than not of being closer than two random
points in the second population.

One naive approach would just be to take a sample of n points from the
first distribution and compute the distances between all n(n-1)/2
pairs; do the same from the second population; and then use a Wilcoxon-
Mann-Whitney ranked test over the distances. The problem with this
approach is that the distances are not independent, due to the
triangular inequality: if a point is very close to two other points,
those two points can't be too far apart.

Another obvious approach would be to take random (disjoint) pairs of
points from each population, compute the distance for each pair, and
run Wilcoxon-Mann-Whithney on the distances; but this approach is
inefficient in terms of points required, since it yields only n/2
distances from n points (in my particular application, I can only
sample very very few points).

As a compromise one could take one point from each population, and
compute the distances of n additional points from it. This yields n-1
distances from n points that I guess are *more*or*less* independent.
It's a big "more or less", though, and I still have the impression
that I am wasting information.

I'm sure people must have looked into this problem before. Any
suggestions?
 
Rich Ulrich...
Posted: Mon Oct 26, 2009 1:26 pm
Guest
On Mon, 26 Oct 2009 02:31:26 -0700 (PDT), fuscus <fuscus at (no spam) gmail.com>
wrote:

[quote]Hello,

I'm a relative newcomer to statistics, and I am faced with the
following problem. I have two populations of points in a metric space
- in other words, for each pair of points A and B I can compute a
symmetric non-negative distance d(A,B) that follows the triangular
inequality d(A,B)<=d(A,C)+d(C,B). I would like an *efficient* non-
parametric test to establish whether two random points in the first
population are more likely than not of being closer than two random
points in the second population.

One naive approach would just be to take a sample of n points from the
first distribution and compute the distances between all n(n-1)/2
pairs; do the same from the second population; and then use a Wilcoxon-
Mann-Whitney ranked test over the distances. The problem with this
approach is that the distances are not independent, due to the
triangular inequality: if a point is very close to two other points,
those two points can't be too far apart.

Another obvious approach would be to take random (disjoint) pairs of
points from each population, compute the distance for each pair, and
run Wilcoxon-Mann-Whithney on the distances; but this approach is
inefficient in terms of points required, since it yields only n/2
distances from n points (in my particular application, I can only
sample very very few points).

As a compromise one could take one point from each population, and
compute the distances of n additional points from it. This yields n-1
distances from n points that I guess are *more*or*less* independent.
It's a big "more or less", though, and I still have the impression
that I am wasting information.

I'm sure people must have looked into this problem before. Any
suggestions?
[/quote]
How does this differ from comparing the variances of two
samples? Assuming there is no important difference -

That could be the simple F-ratio of the two variances, larger
over smaller, indexed by their respective degrees of freedom.

"Levene's test" is more robust for the purposes of checking
the "homogeneity" assumption of ANOVA. For that, you take
the means (or medians) and compute the ANOVA (or t-test, for
two samples) on all the absolute-distances from the means.

--
Rich Ulrich
 
 
Page 1 of 1    
All times are GMT - 5 Hours
The time now is Thu Nov 26, 2009 10:02 pm