 |
|
| Computers Forum Index » Computer Artificial Intelligence - Genetic » Clustering on Large Amount of Data... |
|
Page 1 of 1 |
|
| Author |
Message |
| Fred... |
Posted: Tue Jul 14, 2009 7:55 am |
|
|
|
Guest
|
Dear all,
Is there any effective algorithm which could perform text
clustering over tens of millions of articles? Thanks very much! |
|
|
| Back to top |
|
|
|
| Kent Paul Dolan... |
Posted: Tue Jul 14, 2009 2:16 pm |
|
|
|
Guest
|
Fred wrote:
Quote: Is there any effective algorithm which could
perform text clustering over tens of millions of
articles? Thanks very much!
The answer is surely "yes", since Google's find
"similar" pages works, against a dataset of around
6,ooo,ooo,ooo web pages, which means that a nearness
metric exists, which means that a clustering
algorithm can work, but I've no clue what that
algorithm is.
xanthian. |
|
|
| Back to top |
|
|
|
| Clif Davis... |
Posted: Wed Jul 15, 2009 1:23 pm |
|
|
|
Guest
|
On Jul 14, 2:55 am, Fred <hn.ft.p... at (no spam) gmail.com> wrote:
Quote: Dear all,
Is there any effective algorithm which could perform text
clustering over tens of millions of articles? Thanks very much!
By effective algorithm over that magnitude I assume you mean one that
runs in order n log(n) or less? And the answer is that it depends on
the type of clustering,
If you have an effective procedure for assigning a text a position in
a multi-dimensional space (with a fixed number of dimensions) and if
you assume a fixed number of clusters and your procedure is to assign
trial central positions to the each of the clusters, make a pass
assigning each text to the closest cluster (the one with the closest
putative center) and average the positions of the cluster as you go so
that you have new central positions for the cluster at the end, and
you do this a fixed number of times, then you have an order n process
where n is the number of texts.
If you want something more sophisticated it may be less "effective".
Clif Davis |
|
|
| Back to top |
|
|
|
|
|
All times are GMT
The time now is Sat Nov 28, 2009 11:13 am
|
|