Main Page | Report this Page
 
   
Science Forum Index  »  Statistics - Education Forum  »  Help with distance metric! Urgent! Any help welcome
Page 1 of 1    
Author Message
stacey
Posted: Wed Dec 13, 2006 9:23 am
Guest
Hello everybody,

I need some help... ;(
i am encountering a problem with the distance metrics cosine and
correlation, while i do clustering.

I do the clustering in Matlab. There is a function pdist, which applies
the distance metric to an array of vectors.

When i use cosine as the distance metric i get this error message:

??? Error using ==> pdist
Some points have small relative magnitudes, making them effectively
zero. Either remove those points, or choose a distance other than
cosine.

a similar message occurs when i use correlation:
?? Error using ==> pdist
Some points have small relative standard deviations, making them
effectively constant. Either remove those points, or choose a distance
other than
correlation.

Does anybody have a clue why this occurs?? I think that there is a
division with a very small small number.. but why??? With other
distance metrics(euclidean, squared euclidean, manhattan) it works just
fine.

Oh i forgot to tell you that my vectors are pretty big.. 800 or more
dimensions.. and there are plenty of 0 elements. I mean, that each
vector usually has around 50 non zero elements..all the rest are 0.
The non zeroelements are integers that go up to around 37.000.

I tried a little trick. I added a small number to all the elements of
my vectors and re-evaluated pdist. It seems to work.
Especially: i added 0.0001 and the cosine metric seems to work just
fine, but not the correlation.
To make the correlation metric work, i found out from experiments that
i had to add 15.0001. It needed the 0.0001, otherwise whichever amount
i added (even 165) i get the same error as before. The important i
think here is that i have to add 0.0001.

If i have my vectors normalized, (lets say i devide each vectors with
each max element) i still get the same errors. My vectors now are
between 0 and 1.
And again, when i add something small like 0.0001 the cosine metric
works but the not the correlation. Correlation works when i add 0.01.

I know the numbers may mean nothing to you.. but i just gave an
example.
I am at a loss.
Also i dont know if its correct to mendle with my original data... but
i think that i dont do anything wrong.. cause i add something to all
the elements of all my vectors.

I would appreciate any help you can give me..

Thank you,

Stacey
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Wed Dec 03, 2008 9:09 pm