Main Page | Report this Page
Computers Forum Index  »  Computer Artificial Intelligence - Language  »  First Release of the Babel machine translation system...
Page 1 of 1    

First Release of the Babel machine translation system...

Author Message
Graham Shaw...
Posted: Sat Jun 20, 2009 1:06 pm
Guest
Version 0.0.0 of the Babel machine translation system
is now available for download at:

http://www.babel.org.uk/download/

The translation system is ultimately intended to provide the means for
programs to generate output in the user's native language without having
to manually translate each program into each language (as is currently
the practice when using GNU gettext or an equivalent).

This initial release has much more limited, but nevertheless useful
functionality, concerning the translation of numbers into words (for
example 42 into 'forty-two'. It is able to do this in 54 languages
for cardinal numbers and 24 languages for ordinal numbers.

One major caveat is that the output for most languages has not been
checked by anyone fluent in those languages. Anyone willing and able
to assist with this process would be most welcome.

For further information please refer to the project website and
development log:

http://www.babel.org.uk/
http://www.babel.org.uk/blog/

or join the project mailing list:

http://lists.riscpkg.org/cgi-bin/mailman/listinfo/babel

The translation system is free software.
Redistribution and modification are permitted within the terms of the
GNU General Public License (version 3 or any later version).
--
Graham Shaw (http://www.riscpkg.org/~gdshaw/)
The RISC OS Packaging Project (http://www.riscpkg.org/)
Project Babel (http://www.babel.org.uk/)
 
Graham Shaw...
Posted: Sun Jun 21, 2009 6:01 pm
Guest
Ian Parker wrote:
Quote:

There is just one fly in the ointment as far as I am concerned. It is
simply this. What Babel is attempting to do is a lot less than what
formal mathematical methods are doing.

In what way is that a problem? That I'm not going off on some
strange tangent actually sounds quite reassuring to me.

Quote:
(snip)

Isn't this exactly what Mizar is doing? Alcor is a front end of Mizar
and quite a user friendly interface is provided.

I'd be very surprised if Mizar were the only other system using
a similar notation, given its relationship to predicate logic.

Quote:
http://www.google.co.uk/search?hl=en&q=mizar+and+alcor+formal+proof&btnG=Google+Search&meta=&aq=f&oq=

This is the result of a search on formal proof.

http://groups.google.com/group/creatingAI/browse_thread/thread/772ab668e92d79f8

Here is an interesting thread where I have discussed a number of the
principle concepts. The fact of the matter is that everything in
README is provided for by Mizar. Alcor is attempting a verbal output.
Here is some correspondence I have had with Arnold Neumeier at Vienna
University. We were thinking principally in terms of English and
German although other languages will follow.

OK, I've taken a look at the Mizar home page, and also at the
Wikipedia article. According to the latter it is closed-source,
which is a pretty fundamental deal-breaker for the application
I'm targetting.

Also, even if the underlying engine could do what's needed, I
get the impression that its field of application is currently
very narrow (ie. mathematics). If that's correct then I can't
see it bringing much to the table - most of the problems that
I need to solve at this point lie outside that domain.

Quote:
I feel that Babel should look at proof engines and what they are
attempting before they go any further.

Always happy to look at alternative approaches, but I'm not
sure what specifically would be useful in this case.
--
Graham Shaw (http://www.riscpkg.org/~gdshaw/)
The RISC OS Packaging Project (http://www.riscpkg.org/)
Project Babel (http://www.babel.org.uk/)
 
Mok-Kong Shen...
Posted: Sun Jun 21, 2009 8:08 pm
Guest
Graham Shaw wrote:

Quote:
The only predicates that have been set in stone are
those for representing cardinal and ordinal numbers.

Could you give a tiny practical example of what (beyond the numbers)
you intend to translate from, say, French or German, even though
this is currently not yet implemented?

Thanks,

M. K. Shen
 
Ian Parker...
Posted: Sun Jun 21, 2009 8:18 pm
Guest
On 21 June, 15:01, Graham Shaw <gds... at (no spam) riscpkg.org> wrote:
Quote:
Ian Parker wrote:

There is just one fly in the ointment as far as I am concerned. It is
simply this. What Babel is attempting to do is a lot less than what
formal mathematical methods are doing.

In what way is that a problem?  That I'm not going off on some
strange tangent actually sounds quite reassuring to me.

(snip)

Isn't this exactly what Mizar is doing? Alcor is a front end of Mizar
and quite a user friendly interface is provided.

I'd be very surprised if Mizar were the only other system using
a similar notation, given its relationship to predicate logic.

http://www.google.co.uk/search?hl=en&q=mizar+and+alcor+formal+proof&b...

This is the result of a search on formal proof.

http://groups.google.com/group/creatingAI/browse_thread/thread/772ab6...

Here is an interesting thread where I have discussed a number of the
principle concepts. The fact of the matter is that everything in
README is provided for by Mizar. Alcor is attempting a verbal output.
Here is some correspondence I have had with Arnold Neumeier at Vienna
University. We were thinking principally in terms of English and
German although other languages will follow.

OK, I've taken a look at the Mizar home page, and also at the
Wikipedia article.  According to the latter it is closed-source,
which is a pretty fundamental deal-breaker for the application
I'm targetting.

http://www.uclic.ucl.ac.uk/people/j.gow/papers/alcor-jar.pdf

I do not think it is a closed system. It is very much a system in
development. The reference I have given you gives you a description of
the system. It ALSO gives you the people who are working on it. I am
sure they would be pleased to hear from you. The verbalisation of
(say) Radiation/Unit area = emissivity * T^4. The presentation of
equations in words is an important part of what the Alcor team is
working on.

I am replying to a number of threads. I must repeat that you are NOT
translating from French into German you are vocalising what is in
essence Mathematics into French, German, English, Arabic etc. I have
shown a (gedanken) translation of Arabic into Alcor/Mizar.
Quote:

Also, even if the underlying engine could do what's needed, I
get the impression that its field of application is currently
very narrow (ie. mathematics).  If that's correct then I can't
see it bringing much to the table - most of the problems that
I need to solve at this point lie outside that domain.

This is really what I am debating. I think you have far too limited a

view of what Mathematics is. If I speak of the Stefan Boltzmann law I
am presenting an entry into a highly formalised system of logic. I
have to say proportionasl to the fourTH (four is absolute nonsense)
power of the absolute temperature.

Can I present less formal information? I feel I should make a number
of points.

1) C is A(B). I can present this to Mizar. In fact the entire Babel
structure is supported by Mizar. Mizar does, as I have pointed out
already, a great deal more.

2) As I have said whatever Babel can do Mizar can. How would I present
non mathematical data to Mizar?

Well as I have already said ant boolean expression can be fed in.
Mizar does not have to know what the symbols mean, merely how to
manipulate them. Looking at the Arabs again could I teach Mizar about
settlements? Yes I could. At the simplest level I could draw a map. I
could also present the west Bank as a graph indicating what check
points you have to pass through to get from A to B and the time
involved. Could it verbalize this? Yes it could - In principle anyway.
Algorithms could be similar to those of Sat Nav systems.

Alcor verbalisation is still being worked on. The methodology is open.

Quote:
I feel that Babel should look at proof engines and what they are
attempting before they go any further.

Always happy to look at alternative approaches, but I'm not
sure what specifically would be useful in this case.

Mizar is 30 years old. It seems to me that Babel is going down a path
that is already well trodden. Mizar/Alcor is generally regarded as the
most mature although there are indeed others. Alcor presents the most
natural looking mathematics.

http://www.cl.cam.ac.uk/~jrh13/hol-light/

Mathematicians would like HOL LITE and Alcor to be able to talk to one
and other which they can't do. They would also I would expect like
Babel to be able to talk to Alcor/Mizar.


- Ian Parker
> --
 
Mok-Kong Shen...
Posted: Sun Jun 21, 2009 10:50 pm
Guest
Graham Shaw wrote:

Quote:
To be clear, it doesn't translate from any natural language,
only to natural languages from BabelScript.

The chemical elements have been implemented (but are not yet
supported) in English, French, Italian, Spanish, Portuguese,
Danish, Welsh and Irish Gaelic. You can generate them thus:

babel-translate -l fr chem:element:iron[noun]

(The required part of speech needs to be specified because this
is not a complete sentence.)

With the help of the supplied specification, each word is thus
translated correctly. But I don't yet capture what kind of
practical applications you envisage for your system, if translation
of natural language sentences isn't the goal. (Sorry for my
poor comprehension capability.)

Thanks,

M. K. Shen
 
Graham Shaw...
Posted: Mon Jun 22, 2009 12:43 am
Guest
Mok-Kong Shen wrote:
Quote:
Graham Shaw wrote:

To be clear, it doesn't translate from any natural language,
only to natural languages from BabelScript.

The chemical elements have been implemented (but are not yet
supported) in English, French, Italian, Spanish, Portuguese,
Danish, Welsh and Irish Gaelic. You can generate them thus:

babel-translate -l fr chem:element:iron[noun]

(The required part of speech needs to be specified because this
is not a complete sentence.)

With the help of the supplied specification, each word is thus
translated correctly. But I don't yet capture what kind of
practical applications you envisage for your system, if translation
of natural language sentences isn't the goal. (Sorry for my
poor comprehension capability.)

One-to-many applications, where the same text potentially needs
to be translated into many different languages.

I'm particularly interested in the localisation of computer
programs. At present this is usually done using a system called
GNU gettext, however that has limitations:

- each program must be manually translated into each required
language, so to support n programs in m languages you need to
perform n*m translations;

- its ability to deviate from stock phrases is very limited.

- its model for pluralisation is inadequate for some languages,
and it does not support other forms of grammatical agreement.

Another system I've used, the RISC OS MessageTrans module, is
conceptually very similar but has a global dictionary which any
program can call upon. Unfortunately the one provided was too
small to be useful for most programs, but it led me to ask what
could be done with a much larger dictionary working at the level
of words rather than stock phrases.
--
Graham Shaw (http://www.riscpkg.org/~gdshaw/)
The RISC OS Packaging Project (http://www.riscpkg.org/)
Project Babel (http://www.babel.org.uk/)
 
Graham Shaw...
Posted: Mon Jun 22, 2009 9:42 am
Guest
Ian Parker wrote:
Quote:
On 21 June, 15:01, Graham Shaw <gds... at (no spam) riscpkg.org> wrote:
Ian Parker wrote:

OK, I've taken a look at the Mizar home page, and also at the
Wikipedia article. According to the latter it is closed-source,
which is a pretty fundamental deal-breaker for the application
I'm targetting.

http://www.uclic.ucl.ac.uk/people/j.gow/papers/alcor-jar.pdf

I do not think it is a closed system. It is very much a system in
development.

What I said (or rather, what Wikipedia claimed) is that it is
closed-source, as in, not supplied with source code.

That would make it incompatible with the Open Source Definition.
Since I'm not in the business of developing non-free software, it
isn't something I would want to use directly.

Quote:
Also, even if the underlying engine could do what's needed, I
get the impression that its field of application is currently
very narrow (ie. mathematics). If that's correct then I can't
see it bringing much to the table - most of the problems that
I need to solve at this point lie outside that domain.

This is really what I am debating. I think you have far too limited a
view of what Mathematics is. If I speak of the Stefan Boltzmann law I
am presenting an entry into a highly formalised system of logic. I
have to say proportionasl to the fourTH (four is absolute nonsense)
power of the absolute temperature.

Yes, but are you talking about what it has the potential to do,
or what it can do now?

In other words, does it actually have the dictionaries needed to
translate into a large number of languages across a wide range of
topics?

Not that I have them either, of course, but that's 99% or more
of the problem. The effort involved in writing a translation
engine pales into insignificance by comparison.

(Even for the little I've done so far, I've spent much more time
reading grammar books than writing code.)

Quote:
I feel that Babel should look at proof engines and what they are
attempting before they go any further.
Always happy to look at alternative approaches, but I'm not
sure what specifically would be useful in this case.

Mizar is 30 years old. It seems to me that Babel is going down a path
that is already well trodden. Mizar/Alcor is generally regarded as the
most mature although there are indeed others. Alcor presents the most
natural looking mathematics.

http://www.cl.cam.ac.uk/~jrh13/hol-light/

I'll take a look at both of them to see what can be learned.
Also I note that HOL Light is free software, which helps
significantly.
--
Graham Shaw (http://www.riscpkg.org/~gdshaw/)
The RISC OS Packaging Project (http://www.riscpkg.org/)
Project Babel (http://www.babel.org.uk/)
 
Ian Parker...
Posted: Mon Jun 22, 2009 10:50 am
Guest
On 22 June, 06:42, Graham Shaw <gds... at (no spam) riscpkg.org> wrote:
Quote:

What I said (or rather, what Wikipedia claimed) is that it is
closed-source, as in, not supplied with source code.

Wikipaedia is not 100% reliable. It if fairly god though. In this case

what it is claiming is true but slightly misleading. It is indeed NOT
open source, but there are quite a lot of developers who have access
to the code. What you have to do is convince them that you have got
something to contribute, and negotiate a deal.

There is another significant piece of software which is also not open
source, but has a lot of open source contributions and that is
Mathematica. Arnold Neumeier is basing his "Mathematical Assistant" on
Mizar, Alcor and Mathematica. Mathematica and Wolfram Alpha does
things as opposed to dealing in mere symbols. It will calculate
radiative loss for you. We would not expect Mathematica to be relevant
in terms of translation, but that little bit of Arabic proves the
contrary.

Quote:
That would make it incompatible with the Open Source Definition.
Since I'm not in the business of developing non-free software, it
isn't something I would want to use directly.

 >> Also, even if the underlying engine could do what's needed, I
 >> get the impression that its field of application is currently
 >> very narrow (ie. mathematics).  If that's correct then I can't
 >> see it bringing much to the table - most of the problems that
 >> I need to solve at this point lie outside that domain.
 
 > This is really what I am debating. I think you have far too limited a
 > view of what Mathematics is. If I speak of the Stefan Boltzmann law I
 > am presenting an entry into a highly formalised system of logic. I
 > have to say proportionasl to the fourTH (four is absolute nonsense)
 > power of the absolute temperature.

Yes, but are you talking about what it has the potential to do,
or what it can do now?

A little bit of both. When I am saying Babel is a subset of Mizar.
Babel < Mizar refers to the PRESENT. The discussion of Wolfram Alpha
and the Stefan Boltzmann law is, to some extent, speculation on the
future course of AI. Wolfram is interesting in so far as it can
trigger callculations from questions.

Only Wolfram/Mathematica can do things like route planning.
Quote:

In other words, does it actually have the dictionaries needed to
translate into a large number of languages across a wide range of
topics?

No, it really only has dictionaries for English. No doubt Arnold will

put German in. A dictionary in a language would simply be one file.
Don't get me wrong, what you are doing is valuable. I just want it to
have the flexibility to interface with knowledge engines.

What in fact struck me was that in Arabic->English Google made a
correct identification. The maths put into the original Arabic was
correct too. The English translation was obviously a verbalisation. If
it had simply translated the maths and ignored the Arabic it would
have done a better job.

I feel that a knowledge engine of some description would be of great
help when translating.

Quote:
Not that I have them either, of course, but that's 99% or more
of the problem.  The effort involved in writing a translation
engine pales into insignificance by comparison.

(Even for the little I've done so far, I've spent much more time
reading grammar books than writing code.)

 >>> I feel that Babel should look at proof engines and what they are
 >>> attempting before they go any further.
 >> Always happy to look at alternative approaches, but I'm not
 >> sure what specifically would be useful in this case.
 
 > Mizar is 30 years old. It seems to me that Babel is going down a path
 > that is already well trodden. Mizar/Alcor is generally regarded as the
 > most mature although there are indeed others. Alcor presents the most
 > natural looking mathematics.
 
 >http://www.cl.cam.ac.uk/~jrh13/hol-light/

I'll take a look at both of them to see what can be learned.
Also I note that HOL Light is free software, which helps
significantly.
-

This probably wont matter to you but HOL lite is less popular with
mathematicians because the proofs given are less like natural
mathematics than Mizar.

http://groups.google.co.uk/group/sci.math.research/browse_frm/thread/38b4981f397713e9/9d5171d1e6b3dc8c?hl=en&q=Alcor+author:Ian+author:Parker#9d5171d1e6b3dc8c

Here is a discussion on the whole topic.

http://groups.google.co.uk/group/sci.math.research/browse_frm/thread/38b4981f397713e9/9d5171d1e6b3dc8c?hl=en&q=Alcor+author:Ian+author:Parker#9d5171d1e6b3dc8c

- Ian Parker
 
Ian Parker...
Posted: Mon Jun 22, 2009 8:51 pm
Guest
On 22 June, 14:36, Brian Martin
<brianNOS... at (no spam) futuresoftware.com.auNOSPAM> wrote:
Quote:

It would be far simpler for a multilingual application to simply have an
array of message format strings, with interpolated parameter values, as
is already common practice. "blah blah %d blah blah %s"

This is in fact (probably) the way Alcor/Mizar or Wolfram would

envisage it. Once Alcor provides English the others will quickly
follow. As you say they are in the same format.

Quote:
I worked back in 2005 on a commercial system which supported 22
languages in almost as many character sets, including traditional
Chinese, simplified Chinese, Japanese, Korean, most European languages,
including for example Czech, Swedish, Danish, etc and both Portuguese
and Brazilian Portuguese, as well as the obvious French & German. I'm
afraid the systems which might benefit from Babel have already solved
these issues albeit in a dumb pragmatic fashion, long ago. This system
was tailored to the specific phrases required, rather than attempting
generic translations.

Practical commercial systems must render each phrase in the appropriate
idiom, often tailored to specific variants eg. German vs Switzer-Deutch,
  Portuguese vs Brazilian Portuguese, to be of any practical use.

Any system based on a neutral representational language is adding a
significant additional hurdle to the development effort.

But there is a neutral representation language. It's called a proof

engine. All the things mentioned are in fact in the Wolfram database.
You simply translate Wolfram statements and put them in a file. You
are quite right.

What might be more important is understanding. That is to say using
natural language to trigger programs. Wolfram triggers Mathematica
programs in this way.


- Ian Parker

What happens when I throw something into a spinning black hole? W
should tell me.
 
Graham Shaw...
Posted: Tue Jun 23, 2009 10:11 am
Guest
Ian Parker wrote:
Quote:
On 22 June, 06:42, Graham Shaw <gds... at (no spam) riscpkg.org> wrote:
What I said (or rather, what Wikipedia claimed) is that it is
closed-source, as in, not supplied with source code.

Wikipaedia is not 100% reliable. It if fairly god though. In this case
what it is claiming is true but slightly misleading. It is indeed NOT
open source, but there are quite a lot of developers who have access
to the code.

I don't see that as misleading, just telling people the key facts.
You have to understand that of the very small number of people who
are interested in software licensing at all, a substantial proportion
of them will first and foremost want to know whether the licence
conforms to the OSD. (That or the DFSG, which are very similar.)

Quote:
What you have to do is convince them that you have got
something to contribute, and negotiate a deal.

Why would I want to do that when I have the option to create a
solution that is 100% free?

(Plus it would prevent any program that uses the GPL from linking
against my library.)

The question above is a rhetorical one, BTW - this isn't the right
forum to be debating the merits of free versus non-free software.
All that needs to be said is that for my target application,
conformance to the OSD is non-negotiable and would override any
technical considerations.

Quote:
This probably wont matter to you but HOL lite is less popular with
mathematicians because the proofs given are less like natural
mathematics than Mizar.

I've looked briefly at HOL Light, and I can imagine how you could
do much of the symbol manipulation with it, but not the morphology.

Also I didn't see anything equivalent to my concept of a 'tag'
(metadata which can be attached to a sub-expression). These aren't
essential, and I considered doing without them at one point,
but found that putting secondary information into the tree was
unwieldy and made pattern-matching difficult.

The first point applies to Mizar too, not sure about the second
point yet, and I have't yet looked at what Alcor adds to the mix.

Also, I note that Mizar uses ASCII. Isn't that a bit of a problem
even for German, let alone more exotic languages?
--
Graham Shaw (http://www.riscpkg.org/~gdshaw/)
The RISC OS Packaging Project (http://www.riscpkg.org/)
Project Babel (http://www.babel.org.uk/)
 
 
Page 1 of 1    
All times are GMT
The time now is Fri Dec 04, 2009 10:17 am