 |
|
| Science Forum Index » Languages Forum » Google Translator treatment of Bulgarian and... |
|
Page 1 of 9 Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9 Next |
|
| Author |
Message |
| Harlan Messinger... |
Posted: Tue Sep 29, 2009 8:33 pm |
|
|
|
Guest
|
Someone created a now-deleted article on English Wikipedia with the
following text:
Пред многу векови, кога човекот започнувал да учи како да живее во
заедница со другите луѓе, постоеле поединци кои со својата мудрост и
знаење ги воделе другите. Најчесто тоа биле најстарите членови на
заедницата или оние кои во дадена област веќе се докажале, го стекнале
потребното искуство за да можат да дадат објективно мислење на
конкретната тема. Оваа појава продолжила и во наредните векови. Зависно
од времето некогаш овие поединци кои ја имале довербата на народот биле
црковни старешини, некогаш кралеви, војсководци, мудреци, учители.
Заедничко им било тоа што имале доволно кредибилитет за своето мислење
да го наметнат на дел од народот. Во денешно време личностите кои се
истакнати во областа во која работат или кои успеале на друг начин да се
стекнат со доверба кај широката јавност, често пати пишуваат колумни. Во
литературата колумната претставува публицистички стил на искажување на
мислење на конкретна тема. Притоа, битно е да се спомене дека текстот не
е новинарски, нема за цел да презентира релевантни факти, ниту пак има
за цел да го повика читателот да направи нешто. Според најтесното
гледање на колумната, таа претставува искажување на мислење презентирано
низ печатените медиуми.
I went to the Google Translator to see if I could find out which
Cyrillic-using language this was. If I test Bulgarian, the tool returns
Многу centuries before, when chovekot zapochnuval Kako to learn to live
st zaednitsa pm луѓе other, which postoele poedintsi pm својата mudrost
знаење them vodele and others. SG & Најчесто bile најстарите zaednitsata
or members of st onie which is an area веќе dokazhale him steknale
needful artificial able to give објективно мислење specific theme. [snip]
If I test Macedonian, it becomes almost perfectly clear:
Many centuries ago, when man was beginning to learn how to live in
community with others, there were individuals with their wisdom and
knowledge to lead others. Often they were the oldest members of the
community or those in an area already proven, he gained the necessary
experience to be able to give an objective opinion on the specific
topic. [snip]
If Bulgarian and Macedonian were virtually the same language, wouldn't a
Bulgarian-English translator and a Macedonian-English translator come
out with more or less the same thing? |
|
|
| Back to top |
|
|
|
| Panu... |
Posted: Tue Sep 29, 2009 9:44 pm |
|
|
|
Guest
|
On Sep 30, 5:33am, Harlan Messinger
<hmessinger.removet... at (no spam) comcast.net> wrote:
[quote:99b59ec2a9]
If Bulgarian and Macedonian were virtually the same language, wouldn't a
Bulgarian-English translator and a Macedonian-English translator come
out with more or less the same thing?
[/quote:99b59ec2a9]
Well, for starters, Macedonian Cyrillic is much more akin to Serbian
Cyrillic. For instance, they use J as a Cyrillic letter. |
|
|
| Back to top |
|
|
|
| Trond Engen... |
Posted: Wed Sep 30, 2009 12:58 am |
|
|
|
Guest
|
Harlan Messinger skreiv:
[quote:38dbf9f1f1]Someone created a now-deleted article on English Wikipedia with the
following text:
Пред многу векови, кога човекот започнувал да учи како да живее во
заедница со другите луѓе, постоеле поединци кои со својата мудрост и
знаење ги воделе другите. Најчесто тоа биле најстарите членови на
заедницата или оние кои во дадена област веќе се докажале, го
стекнале потребното искуство за да можат да дадат објективно мислење
на конкретната тема. [...]
I went to the Google Translator to see if I could find out which
Cyrillic-using language this was. If I test Bulgarian, the tool
returns
Многу centuries before, when chovekot zapochnuval Kako to learn to
live st zaednitsa pm луѓе other, which postoele poedintsi pm својата
mudrost знаење them vodele and others. SG & Најчесто bile најстарите
zaednitsata or members of st onie which is an area веќе dokazhale him
steknale needful artificial able to give објективно мислење specific
theme. [snip]
If I test Macedonian, it becomes almost perfectly clear:
Many centuries ago, when man was beginning to learn how to live in
community with others, there were individuals with their wisdom and
knowledge to lead others. Often they were the oldest members of the
community or those in an area already proven, he gained the necessary
experience to be able to give an objective opinion on the specific
topic. [snip]
If Bulgarian and Macedonian were virtually the same language,
wouldn't a Bulgarian-English translator and a Macedonian-English
translator come out with more or less the same thing?
[/quote:38dbf9f1f1]
Two features of language. Both are true of Norwegian and Swedish:
- There may be important ortographical differences even if the spoken
languages are close. It wouldn't trick a human translator, but the
machine is helpless.
- Different specialized lexicon due to different history.
Two possible features of the test:
- These translators learn from user input, and the amount of
translations and qualified feedback might be much higher in Macedonian.
ISTM that the Macedonian exile communities in English speaking countries
are far larger than the Bulgarian.
- The test can have been designed to highlight differences, e.g. using
specialized lexicon or dialectal forms or usages in one language or the
other.
--
Trond Engen |
|
|
| Back to top |
|
|
|
| Athel Cornish-Bowden... |
Posted: Wed Sep 30, 2009 1:29 am |
|
|
|
Guest
|
On 2009-09-30 04:33:42 +0200, Harlan Messinger
<hmessinger.removethis at (no spam) comcast.net> said:
[quote:af3a5ad8dd]Someone created a now-deleted article on English Wikipedia with the
following text:
Пред многу векови, кога човекот започнувал ...
I went to the Google Translator to see if I could find out which
Cyrillic-using language this was. If I test Bulgarian, the tool returns
Многу centuries before, when chovekot ...
If I test Macedonian, it becomes almost perfectly clear:
Many centuries ago, when man was beginning to learn...
If Bulgarian and Macedonian were virtually the same language, wouldn't
a Bulgarian-English translator and a Macedonian-English translator come
out with more or less the same thing?
[/quote:af3a5ad8dd]
I think this tells us more about the limitations of machine translation
and the skills of Google's programmers than it does about Bulgarian and
Macedonian. I did a similar experiment with languages that are more
familiar (to me). Taking the following text (also from Wikipedia):
José Sócrates estudou nas escolas básicas e na Escola Secundária Frei
Heitor Pinto, situadas na Covilhã, cidade onde viveu na sua juventude.
Ingressou em 1975 no recém-criado Instituto Superior de Engenharia de
Coimbra (ISEC), em Coimbra, tendo obtido, em 1979, um diploma de
bacharelato como engenheiro técnico civil.
Any human who knows Spanish can tell immediately that this is not
Spanish, and can understand it virtually without fault. However, when
asked to translate it from Spanish to English, Google Translator gives
José Sócrates estudou basic nas escolas e na Escola Secundária Frei
Heitor Pinto, located Covilhã na, na sua cidade onde viveu juventude.
Ingressou recém-em 1975 no Instituto Superior de Engenharia servant of
Coimbra (ISEC), em Coimbra, tendo Obtido, em 1979, um bacharelato
diploma as civil technical engenheiro.
To be honest, I didn't expect the result to be quite as bad as it is; I
expected it to be more akin to your Bulgarian-Macedonian example. I
think the point is that Google Translator doesn't try to guess the
language, and gives correct translations only for words (básicas,
técnico) that are absolutely identical in Portuguese and Spanish, and
not necessarily even then: I was surprised it couldn't cope with
"Instituto".
--
athel |
|
|
| Back to top |
|
|
|
| Ekkehard Dengler... |
Posted: Wed Sep 30, 2009 1:40 am |
|
|
|
Guest
|
Athel Cornish-Bowden wrote:
[quote:1d30da0fa2]On 2009-09-30 04:33:42 +0200, Harlan Messinger
hmessinger.removethis at (no spam) comcast.net> said:
Someone created a now-deleted article on English Wikipedia with the
following text:
???? ????? ??????, ???? ??????? ?????????? ...
I went to the Google Translator to see if I could find out which
Cyrillic-using language this was. If I test Bulgarian, the tool
returns
????? centuries before, when chovekot ...
If I test Macedonian, it becomes almost perfectly clear:
Many centuries ago, when man was beginning to learn...
If Bulgarian and Macedonian were virtually the same language,
wouldn't a Bulgarian-English translator and a Macedonian-English
translator come out with more or less the same thing?
I think this tells us more about the limitations of machine
translation and the skills of Google's programmers than it does about
Bulgarian and Macedonian. I did a similar experiment with languages
that are more familiar (to me). Taking the following text (also from
Wikipedia):
Jos Scrates estudou nas escolas bsicas e na Escola Secundria Frei
Heitor Pinto, situadas na Covilh, cidade onde viveu na sua juventude.
Ingressou em 1975 no recm-criado Instituto Superior de Engenharia de
Coimbra (ISEC), em Coimbra, tendo obtido, em 1979, um diploma de
bacharelato como engenheiro tcnico civil.
Any human who knows Spanish can tell immediately that this is not
Spanish, and can understand it virtually without fault. However, when
asked to translate it from Spanish to English, Google Translator gives
Jos Scrates estudou basic nas escolas e na Escola Secundria Frei
Heitor Pinto, located Covilh na, na sua cidade onde viveu juventude.
Ingressou recm-em 1975 no Instituto Superior de Engenharia servant of
Coimbra (ISEC), em Coimbra, tendo Obtido, em 1979, um bacharelato
diploma as civil technical engenheiro.
To be honest, I didn't expect the result to be quite as bad as it is;
I expected it to be more akin to your Bulgarian-Macedonian example. I
think the point is that Google Translator doesn't try to guess the
language, and gives correct translations only for words (bsicas,
tcnico) that are absolutely identical in Portuguese and Spanish, and
not necessarily even then: I was surprised it couldn't cope with
"Instituto".
[/quote:1d30da0fa2]
A human translator might have left "Instituto Superior de Engenharia"
untranslated as well, because it's the actual name of the institute.
Regards,
Ekkehard |
|
|
| Back to top |
|
|
|
| Athel Cornish-Bowden... |
Posted: Wed Sep 30, 2009 2:37 am |
|
|
|
Guest
|
On 2009-09-30 09:40:04 +0200, "Ekkehard Dengler" <ED-RS at (no spam) t-online.de> said:
[quote:7da5335e90]Athel Cornish-Bowden wrote:
On 2009-09-30 04:33:42 +0200, Harlan Messinger
hmessinger.removethis at (no spam) comcast.net> said:
Someone created a now-deleted article on English Wikipedia with the
following text:
???? ????? ??????, ???? ??????? ?????????? ...
I went to the Google Translator to see if I could find out which
Cyrillic-using language this was. If I test Bulgarian, the tool
returns
????? centuries before, when chovekot ...
If I test Macedonian, it becomes almost perfectly clear:
Many centuries ago, when man was beginning to learn...
If Bulgarian and Macedonian were virtually the same language,
wouldn't a Bulgarian-English translator and a Macedonian-English
translator come out with more or less the same thing?
I think this tells us more about the limitations of machine
translation and the skills of Google's programmers than it does about
Bulgarian and Macedonian. I did a similar experiment with languages
that are more familiar (to me). Taking the following text (also from
Wikipedia):
Jos Scrates estudou nas escolas bsicas e na Escola Secundria Frei
Heitor Pinto, situadas na Covilh, cidade onde viveu na sua juventude.
Ingressou em 1975 no recm-criado Instituto Superior de Engenharia de
Coimbra (ISEC), em Coimbra, tendo obtido, em 1979, um diploma de
bacharelato como engenheiro tcnico civil.
Any human who knows Spanish can tell immediately that this is not
Spanish, and can understand it virtually without fault. However, when
asked to translate it from Spanish to English, Google Translator gives
Jos Scrates estudou basic nas escolas e na Escola Secundria Frei
Heitor Pinto, located Covilh na, na sua cidade onde viveu juventude.
Ingressou recm-em 1975 no Instituto Superior de Engenharia servant of
Coimbra (ISEC), em Coimbra, tendo Obtido, em 1979, um bacharelato
diploma as civil technical engenheiro.
To be honest, I didn't expect the result to be quite as bad as it is;
I expected it to be more akin to your Bulgarian-Macedonian example. I
think the point is that Google Translator doesn't try to guess the
language, and gives correct translations only for words (bsicas,
tcnico) that are absolutely identical in Portuguese and Spanish, and
not necessarily even then: I was surprised it couldn't cope with
"Instituto".
A human translator might have left "Instituto Superior de Engenharia"
untranslated as well, because it's the actual name of the institute.
[/quote:7da5335e90]
True, and the fact that it didn't translate the "de" in the name would
seem to confirm your point.
--
athel |
|
|
| Back to top |
|
|
|
| Christian Weisgerber... |
Posted: Wed Sep 30, 2009 4:35 am |
|
|
|
Guest
|
Harlan Messinger <hmessinger.removethis at (no spam) comcast.net> wrote:
[quote:7f94681491]Someone created a now-deleted article on English Wikipedia with the
following text:
Пред многу векови, кога човекот
започнувал да учи како да живее во
заедница со другите луѓе, постоеле
поединци кои со својата мудрост и
знаење ги воделе другите. Најчесто тоа
биле најстарите членови на
заедницата или оние кои во дадена
област веќе се докажале, го стекнале
[...][/quote:7f94681491]
[quote:7f94681491]I went to the Google Translator to see if I could find out which
Cyrillic-using language this was.
[/quote:7f94681491]
Macedonian. Ѓ and Ќ are a dead giveaway.
[quote:7f94681491]If Bulgarian and Macedonian were virtually the same language, wouldn't a
Bulgarian-English translator and a Macedonian-English translator come
out with more or less the same thing?
[/quote:7f94681491]
Translation machines have no comprehension of the text. As a result,
they are thrown off even by minor typos and misspellings that many
human readers barely notice. If a hypothetical machine translator
only knew American spelling, it might try to connect a "tyre change"
to the city in Lebanon.
Human readers have no trouble with this sentence that once appeared
in the French Wikipedia:
L'ancien président polonais Lech Wałęsa à pour sa part déclarer
que la démocratie et les transformations qu'elle occasionne ne
peuvent se faire "d'un seul coup" et que "l'important est que les
ukrainiens ne commettent pas trop d'erreurs".
Machine translators struggle with an ungrammatical infinitive
construction without ever realizing that it's simply a misspelled
passé composé.
Even if they are "virtually the same language", two separate language
standards are likely to have a lot of small differences on this
scale. As for Bulgarian and Macedonian, their respective alphabets
are already quite different.
--
Christian "naddy" Weisgerber naddy at (no spam) mips.inka.de |
|
|
| Back to top |
|
|
|
| Harlan Messinger... |
Posted: Wed Sep 30, 2009 4:54 am |
|
|
|
Guest
|
Panu wrote:
[quote:e422e77cbe]On Sep 30, 5:33 am, Harlan Messinger
hmessinger.removet... at (no spam) comcast.net> wrote:
If Bulgarian and Macedonian were virtually the same language, wouldn't a
Bulgarian-English translator and a Macedonian-English translator come
out with more or less the same thing?
Well, for starters, Macedonian Cyrillic is much more akin to Serbian
Cyrillic. For instance, they use J as a Cyrillic letter.
D'oh! I guess it didn't occur to me, perhaps because of how much I'd[/quote:e422e77cbe]
been told of how alike they are, that their writing systems aren't quite
the same. |
|
|
| Back to top |
|
|
|
| Duan Vukoti... |
Posted: Wed Sep 30, 2009 6:35 am |
|
|
|
Guest
|
On Sep 30, 9:44am, Panu <craoibhi... at (no spam) gmail.com> wrote:
[quote:eff4124a57]On Sep 30, 5:33am, Harlan Messinger
hmessinger.removet... at (no spam) comcast.net> wrote:
If Bulgarian and Macedonian were virtually the same language, wouldn't a
Bulgarian-English translator and a Macedonian-English translator come
out with more or less the same thing?
Well, for starters, Macedonian Cyrillic is much more akin to Serbian
Cyrillic. For instance, they use J as a Cyrillic letter.
[/quote:eff4124a57]
No, in this case, the letter J is in fact an obstacle for a MT. Almost
wherever the Serbs are using sound j that sound is missing in
"Macedonian" (jedinka edinka 'entity', jedar edar 'firm', jesen esen
'autumn' etc.); on the other hand, the "Macedonian" sound j is mainly
a substitution for Serbian "lj" (zemlja zemja, divljak divjak 'savage'
etc.).
DV |
|
|
| Back to top |
|
|
|
| Athel Cornish-Bowden... |
Posted: Wed Sep 30, 2009 11:14 am |
|
|
|
Guest
|
On 2009-09-30 16:35:13 +0200, naddy at (no spam) mips.inka.de (Christian Weisgerber) said:
[quote:e9f98b9aa2][ ... ]
Human readers have no trouble with this sentence that once appeared
in the French Wikipedia:
L'ancien président polonais Lech Wałęsa à pour sa part déclarer
que la démocratie et les transformations qu'elle occasionne ne
peuvent se faire "d'un seul coup" et que "l'important est que les
ukrainiens ne commettent pas trop d'erreurs".
[/quote:e9f98b9aa2]
Nonetheless, there was endless discussion 15 years or so about whether
one could attribute the similar grammatical error in a message
allegedly written by a murder victim in her own blood saying "Omar m'a
tuer" constituted evidence as to who had written it.
--
athel |
|
|
| Back to top |
|
|
|
| Christian Weisgerber... |
Posted: Fri Oct 02, 2009 8:57 am |
|
|
|
Guest
|
Panu <craoibhin66 at (no spam) gmail.com> wrote:
[quote:9c5befae30]Well, for starters, Macedonian Cyrillic is much more akin to Serbian
Cyrillic. For instance, they use J as a Cyrillic letter.
[/quote:9c5befae30]
Here's a side-by-side comparison:
bg А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ь Ю Я
mk А Б В Г Д Ѓ Е Ж З Ѕ И Ј К Л Љ М Н Њ О П Р С Т Ќ У Ф Х Ц Ч Џ Ш
--
Christian "naddy" Weisgerber naddy at (no spam) mips.inka.de |
|
|
| Back to top |
|
|
|
| António Marques... |
Posted: Fri Oct 02, 2009 10:58 am |
|
|
|
Guest
|
Harlan Messinger wrote:
[quote:bb19e48816]D'oh! I guess it didn't occur to me, perhaps because of how much I'd
been told of how alike they are, that their writing systems aren't
quite the same.
[/quote:bb19e48816]
The first thing someone does when trying to split languages is to split
spellings. |
|
|
| Back to top |
|
|
|
| António Marques... |
Posted: Fri Oct 02, 2009 11:06 am |
|
|
|
Guest
|
Athel Cornish-Bowden wrote:
[quote:3f3594a79a]On 2009-09-30 16:35:13 +0200, naddy at (no spam) mips.inka.de (Christian Weisgerber)
said:
[ ... ]
Human readers have no trouble with this sentence that once appeared
in the French Wikipedia:
L'ancien président polonais Lech Wałęsa à pour sa part déclarer
que la démocratie et les transformations qu'elle occasionne ne
peuvent se faire "d'un seul coup" et que "l'important est que les
ukrainiens ne commettent pas trop d'erreurs".
Nonetheless, there was endless discussion 15 years or so about whether
one could attribute the similar grammatical error in a message allegedly
written by a murder victim in her own blood saying "Omar m'a tuer"
constituted evidence as to who had written it.
[/quote:3f3594a79a]
Because the inscription was quite firmly written and the victim an
educated person. The matter of fact is that it was quite possible that
it had been written by the killer (someone else) in order to incriminate
Omar. What's to stop a killer from doing that? To me, it looks like a
victim able to write so clearly
(http://www.affaires-criminelles.com/lexique_18.php) should have been
able to seek help. |
|
|
| Back to top |
|
|
|
| Antnio Marques... |
Posted: Fri Oct 02, 2009 1:16 pm |
|
|
|
Guest
|
On Oct 2, 7:57 pm, na... at (no spam) mips.inka.de (Christian Weisgerber) wrote:
[quote:08d5ccf049]Panu <craoibhi... at (no spam) gmail.com> wrote:
Well, for starters, Macedonian Cyrillic is much more akin to Serbian
Cyrillic. For instance, they use J as a Cyrillic letter.
Here's a side-by-side comparison:
bg А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ь Ю Я
mk А Б В Г Д Ѓ Е Ж З Ѕ И Ј К Л Љ М Н Њ О П Р С Т Ќ У Ф Х Ц Ч Џ Ш
[/quote:08d5ccf049]
This really reminds me of something someone on the intarwebs once
wrote:
Ja nenavižu kirilliçu. Ja dumaju čto vse dolžny pisat' russkij jazyk
na latiniçe kak bol'šinstvo drugih slavjanskih jazykov. Kiriliça - êto
sovsem ustarelaja pis'mennaja sistema, vdabavok russkie bukvy očen'
urodlivye. Tol'ko smotret' na nih vredit zreniju, a s drugoj storony,
net ničego na svete krasivee latinskih bukv. |
|
|
| Back to top |
|
|
|
| ... |
Posted: Fri Oct 02, 2009 9:56 pm |
|
|
|
Guest
|
António Marques <entonio at (no spam) gmail.com> wrote:
[quote:ba2f03e11c]This really reminds me of something someone on the intarwebs once
wrote:
Ja nenavižu kirilliçu. Ja dumaju čto vse dolžny pisat' russkij jazyk
na latiniçe kak bol'šinstvo drugih slavjanskih jazykov. Kiriliça - êto
sovsem ustarelaja pis'mennaja sistema, vdabavok russkie bukvy očen'
urodlivye. Tol'ko smotret' na nih vredit zreniju, a s drugoj storony,
net ničego na svete krasivee latinskih bukv.
[/quote:ba2f03e11c]
Nice
I'd like to know what led him to use ç instead of c, though.
And personally, I'd use a real character instead for soft sign (j or
even ь), so that it does not clash with ukrainian, and an ' for the hard
sign. And 'ia' instead of 'ja' after consonants (and then you do not
need the hard sign...)
--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik at (no spam) kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread! |
|
|
| Back to top |
|
|
|
|
|
All times are GMT - 5 Hours
The time now is Mon Nov 30, 2009 12:24 am
|
|