Main Page | Report this Page
Computers Forum Index  »  Computer Artificial Intelligence - Genetic  »  Clustering on Large Amount of Data...
Page 1 of 1    

Clustering on Large Amount of Data...

Author Message
Fred...
Posted: Tue Jul 14, 2009 7:55 am
Guest
Dear all,

Is there any effective algorithm which could perform text
clustering over tens of millions of articles? Thanks very much!
 
Kent Paul Dolan...
Posted: Tue Jul 14, 2009 2:16 pm
Guest
Fred wrote:

Quote:
Is there any effective algorithm which could
perform text clustering over tens of millions of
articles? Thanks very much!

The answer is surely "yes", since Google's find
"similar" pages works, against a dataset of around
6,ooo,ooo,ooo web pages, which means that a nearness
metric exists, which means that a clustering
algorithm can work, but I've no clue what that
algorithm is.

xanthian.
 
Clif Davis...
Posted: Wed Jul 15, 2009 1:23 pm
Guest
On Jul 14, 2:55 am, Fred <hn.ft.p... at (no spam) gmail.com> wrote:
Quote:
Dear all,

        Is there any effective algorithm which could perform text
clustering over tens of millions of articles? Thanks very much!

By effective algorithm over that magnitude I assume you mean one that
runs in order n log(n) or less? And the answer is that it depends on
the type of clustering,

If you have an effective procedure for assigning a text a position in
a multi-dimensional space (with a fixed number of dimensions) and if
you assume a fixed number of clusters and your procedure is to assign
trial central positions to the each of the clusters, make a pass
assigning each text to the closest cluster (the one with the closest
putative center) and average the positions of the cluster as you go so
that you have new central positions for the cluster at the end, and
you do this a fixed number of times, then you have an order n process
where n is the number of texts.

If you want something more sophisticated it may be less "effective".
Clif Davis
 
 
Page 1 of 1    
All times are GMT
The time now is Sat Nov 28, 2009 11:13 am