Searching for documents and other items on the Web or computers is
often tedious and time consuming. Time is money. Highly paid
professionals spend hours, days, and even longer searching for
information on the Web or computers. Most search today is done using
key word and phrase matching, often combined with various ranking
schemes for the search results. Occasionally more advanced methods
such as logical queries, e.g. search for “rocket scientist” and NOT
“space”, and regular expressions are used. All of these methods have
significant limitations and often require lengthy human review and
further manual searching of the search results.
The dream search engine would search by topic, by the detailed content
of the items searched, ideally finding the desired information
immediately. Actual understanding of text remains a unfulfilled
promise of artificial intelligence. Statistical language processing
can achieve a degree of searching by topic. This article introduces
the basic concepts and mathematics of statistical language processing
and its applications to search. It gives a brief introduction and
overview of more advanced techniques in statistical language
processing as applied to search. It also includes sample Ruby code
illustrating some simple statistical language processing methods.
http://math-blog.com/2009/10/25/faster-better-cheaper-search-engines/