| Computers Forum Index » Computer Artificial Intelligence - Language » Wkipaedia and n-grams... |
|
Page 1 of 1 |
|
| Author |
Message |
| Ian Parker... |
Posted: Tue Sep 29, 2009 8:46 pm |
|
|
|
Guest
|
I have been taking a look at the Hutter files with a view to stripping
out the HTML and trying out LSA. I am well on my way to doing this. In
the mean time I have been taking a look at the headings. This is very
clearly an early version of Wikipaedia as most of the headings seem to
be empty.
http://sites.google.com/site/aitranslationproject/wikipaedia
One other observation. Wiki attempts to be multilingual. You will
notice Arabic translations of all the filled in titles. This is quite
important. It implies that Wiki represents a poly lingual dictionary
of what might be termed the most important scholarly phrases.
You know I have been bellyaching the fact that Google translates the
Stefan Boltzmann law as four times the temperature and that the
surface area of a sphere is 8ğR. This gives an opportunity to get a
number of "truth" headings.
Does anyone know how I could get an up to date copy of the Wikipaedia
files.
Unfortunately Brainchild does not give descriptions of scholarly
bigrams and n-grams. I do not know how religion came into it.
Wikipaedia both then and now was/is a website of scientific consensus.
- Ian Parker
- Ian Parker |
|
|
| Back to top |
|
|
|
| Mok-Kong Shen... |
Posted: Wed Sep 30, 2009 12:34 pm |
|
|
|
Guest
|
Ian Parker wrote:
Quote: Does anyone know how I could get an up to date copy of the Wikipaedia
files.
Presumably an unsatisfactory answer of mine due to misunderstanding:
But if you access a webpage, you can save its content as file with
a mouse click.
M. K. Shen |
|
|
| Back to top |
|
|
|
| Ian Parker... |
Posted: Wed Sep 30, 2009 3:23 pm |
|
|
|
Guest
|
On 30 Sep, 09:34, Mok-Kong Shen <mok-kong.s... at (no spam) t-online.de> wrote:
Quote: Ian Parker wrote:
Does anyone know how I could get an up to date copy of the Wikipaedia
files.
Presumably an unsatisfactory answer of mine due to misunderstanding:
But if you access a webpage, you can save its content as file with
a mouse click.
You cannot do that for the whole of Wiki. Anyway I am interested in
the n-gram translations which do not appear in the text.
To reiterate my idea is that if you have a number of principal n-grams
these can be accessed whenever they occur in text. What we need is
this.
n-gram | Translations | LSA Vector | OWL
In a text in any language we can spot a principle n-gram by this
method. The translation is the Wiki translation. We can also encode
some OWL. This to me would represent that start of "understanding".
That this has not been done is obvious when we look at some
translations.
- Ian Parker
BTW - The text I have is the Hutter text which I got from Matt Mahoney. |
|
|
| Back to top |
|
|
|
| Lluc Potrony... |
Posted: Fri Oct 16, 2009 9:59 am |
|
|
|
Guest
|
Ian Parker wrote:
Quote: Does anyone know how I could get an up to date copy of the Wikipaedia
files.
Do you know the page <http://en.wikipedia.org/wiki/
Wikipedia_database>? It gives several aproaches to getting the
Wikipedia info. |
|
|
| Back to top |
|
|
|
|