Discussion:
Google Translate
Michael Bauer
2018-10-20 14:51:55 UTC
Permalink
Can someone please switch off Google Translate for gd as soon as
possible? It produces gibberish and we don't want anyone using it and
it's costing me extra time to check if the Machinery tab is giving me GT
gibberish or a partial match.

Michael
Matjaz Horvat
2018-10-20 16:26:18 UTC
Permalink
Hi Michael,

It should be disabled now.

And with that, I'm hijacking this thread to announce that we replaced
Microsoft Translator with Google Translate as our machine translation
provider. :)

Background: we used to be on a free plan of Microsoft Translator, but
it only worked with the old version of the API, which is now being
deprecated. It started to behave unreliably lately, often not
returning any results.

We chose to switch Google Translate, mostly because it doubles the
number of Pontoon locales with MT support (103 vs. 48).

-Matjaž
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible? It produces gibberish and we don't want anyone using it and
it's costing me extra time to check if the Machinery tab is giving me GT
gibberish or a partial match.
Michael
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Michael Bauer
2018-10-20 17:09:16 UTC
Permalink
Post by Matjaz Horvat
Hi Michael,
It should be disabled now.
Thank you
Post by Matjaz Horvat
We chose to switch Google Translate, mostly because it doubles the
number of Pontoon locales with MT support (103 vs. 48).
I'm not sure if 'number of locales' is a good litmus for going with
either, as an aside. It ought to be quality of translations, if you ask
me... Many of the locales for which Google has less data produce
translations similar to those it does for Gaelic 'all this base are
belong to us' kind of translations.

Perhaps it should be an optional feature for smaller locales which is
only turned on if a locale lead asks for it - imagine if someone powers
through a whole load of machine translations without doing due
diligence, someone then has to hunt down all this machine translation
junk. Having had to do this, I can tell you it's pretty soul destroying.
Not to mention the embarrassment of having it in a release.

Michael
Matjaz Horvat
2018-10-20 18:05:35 UTC
Permalink
Post by Michael Bauer
Post by Matjaz Horvat
We chose to switch Google Translate, mostly because it doubles the
number of Pontoon locales with MT support (103 vs. 48).
I'm not sure if 'number of locales' is a good litmus for going with
either, as an aside. It ought to be quality of translations, if you ask
me... Many of the locales for which Google has less data produce
translations similar to those it does for Gaelic 'all this base are
belong to us' kind of translations.
To clarify, by mostly I mean language support is the most significant
_difference_ between the two, and it favours Google Translate.

The most important _factor_ in making the decision was actually
quality, but we were unable to find any serious research favouring one
service over the other.

Ideally, we'll have an opportunity to conduct such research on our own
(because of the specifics of the content we localize), and see which
service (if any) works better for which locale (and maybe even for
which project). To make that happen, we'd need to enable both services
(or even more) and measure which provides more useful (used) results.

-Matjaž
A. C.
2018-10-21 15:31:09 UTC
Permalink
I think it's better to exclude Google Translation from the count.
Post by Matjaz Horvat
Hi Michael,
It should be disabled now.
And with that, I'm hijacking this thread to announce that we replaced
Microsoft Translator with Google Translate as our machine translation
provider. :)
Background: we used to be on a free plan of Microsoft Translator, but
it only worked with the old version of the API, which is now being
deprecated. It started to behave unreliably lately, often not
returning any results.
We chose to switch Google Translate, mostly because it doubles the
number of Pontoon locales with MT support (103 vs. 48).
-Matjaž
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible? It produces gibberish and we don't want anyone using it and
it's costing me extra time to check if the Machinery tab is giving me GT
gibberish or a partial match.
Michael
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Julen Ruiz Aizpuru
2018-10-20 18:05:41 UTC
Permalink
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!

Julen.
Matjaz Horvat
2018-10-20 18:08:03 UTC
Permalink
Done.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Emin Mastizada
2018-10-20 20:20:53 UTC
Permalink
Hi Matjaz,
Yes, google has many languages but it doesn't make sense for some of them
(I can't get good results for Azerbaijani, Turkish and Russian). The best
part of Microsoft Translation was that, It wasn't machine translation, as I
remembet, it was Microsoft's own translations for projects like Windows and
etc.

I was going to make a PR to have a selection in user settings. Would be
awesome if you could turn it off for "az" too.
Post by Michael Bauer
Done.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Francesco Lodolo
2018-10-21 06:26:34 UTC
Permalink
We were using the Microsoft Translation API, as far I know there are
no results coming from their terminology database.

Francesco

[1] equivalent to https://www.bing.com/translator
[2] https://www.microsoft.com/language/
Il giorno sab 20 ott 2018 alle ore 22:21 Emin Mastizada
Post by Emin Mastizada
Hi Matjaz,
Yes, google has many languages but it doesn't make sense for some of them
(I can't get good results for Azerbaijani, Turkish and Russian). The best
part of Microsoft Translation was that, It wasn't machine translation, as I
remembet, it was Microsoft's own translations for projects like Windows and
etc.
I was going to make a PR to have a selection in user settings. Would be
awesome if you could turn it off for "az" too.
Post by Michael Bauer
Done.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Matjaz Horvat
2018-10-21 06:38:28 UTC
Permalink
Post by Emin Mastizada
The best
part of Microsoft Translation was that, It wasn't machine translation, as I
remembet, it was Microsoft's own translations for projects like Windows and
etc.
No, Microsoft translations actually come from a different service -
Microsoft Terminology - and we still use that:
https://www.microsoft.com/language/

You can find those translations marked as "Microsoft" in Machinery.

-Matjaž
Michal Stanke
2018-10-21 09:27:25 UTC
Permalink
I started translating new strings in Pontoon this morning I also have
two cents about Google Translate.

At first the results seemed rubbish, but for longer texts they are
surprisingly good. However Google does not respect our terminology and
style, which makes me a little nervous about these machine suggestions
being submitted without checks. The more they are shown as 100 % match
in machinery.

My personal workflow is to quickly walk trough the new strings, as a
stretch submit those that have very close match from the translation
memory (e.g. were moved or slightly reworded), and then roll up my
sleeves for the actual work on the rest. The Google Translate entries in
the machinery disturb this flow by being shown as 100% match, so I am
actually worried I will do the mistake myself.

Matjaž, can we please remove the percentage from MT entries completely,
plus sort them to the end of the list in machinery, below any more or
less accurate matches from TM or MS terminology? Then one can
distinguish MT from TM just by the presence of the green number, without
actually reading the name of the origin. Moving them to the bottom of
the list can reduce the risk from the paragraph above, that someone
would blindly suggest or even approve MT results without proper check.

Wow, this e-mail got a lot longer than I expected. I like the quality of
new MT for quite a portion of the strings. It's not the quality we want
to have in the product, but somewhere it's so good, that it makes me
worry we would stop properly checking it and half-blindly accept
whatever it produces.
Post by Matjaz Horvat
Post by Emin Mastizada
The best
part of Microsoft Translation was that, It wasn't machine translation, as I
remembet, it was Microsoft's own translations for projects like Windows and
etc.
No, Microsoft translations actually come from a different service -
https://www.microsoft.com/language/
You can find those translations marked as "Microsoft" in Machinery.
-Matjaž
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
--
Michal Stanke
Matjaz Horvat
2018-10-21 15:26:41 UTC
Permalink
Post by Michal Stanke
Matjaž, can we please remove the percentage from MT entries completely,
plus sort them to the end of the list in machinery, below any more or
less accurate matches from TM or MS terminology? Then one can
distinguish MT from TM just by the presence of the green number, without
actually reading the name of the origin. Moving them to the bottom of
the list can reduce the risk from the paragraph above, that someone
would blindly suggest or even approve MT results without proper check.
That was indeed misleading (and different from the old
implementation), and several other users complained about it.

It's gone now.

Sorry for the inconvenience.

-Matjaž
Michal Stanke
2018-10-21 19:08:30 UTC
Permalink
Thank you, much better now.

--
Michal Stanke
Post by Matjaz Horvat
Post by Michal Stanke
Matjaž, can we please remove the percentage from MT entries completely,
plus sort them to the end of the list in machinery, below any more or
less accurate matches from TM or MS terminology? Then one can
distinguish MT from TM just by the presence of the green number, without
actually reading the name of the origin. Moving them to the bottom of
the list can reduce the risk from the paragraph above, that someone
would blindly suggest or even approve MT results without proper check.
That was indeed misleading (and different from the old
implementation), and several other users complained about it.
It's gone now.
Sorry for the inconvenience.
-Matjaž
Emin Mastizada
2018-10-23 21:22:53 UTC
Permalink
Sorry for that, there were no new strings matching to the MS Terminology
on that day and all suggestions were from translation history and
google. I think MS Translations never worked for "az".

There is no good MT for "az" and having a suggestion misleads as we are
using machinery and locales at same time.
Post by Matjaz Horvat
Post by Emin Mastizada
The best
part of Microsoft Translation was that, It wasn't machine translation, as I
remembet, it was Microsoft's own translations for projects like Windows and
etc.
No, Microsoft translations actually come from a different service -
https://www.microsoft.com/language/
You can find those translations marked as "Microsoft" in Machinery.
-Matjaž
--
Emin Mastizada
Python/Django Developer
https://www.mastizada.com
Eduardo Trápani
2018-10-24 13:58:24 UTC
Permalink
Hi, could you also turn it off for 'eo'?

Thanks.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Jeff Beatty
2018-10-24 14:07:30 UTC
Permalink
Hey everyone,

Hijacking this thread to share a bit of information.

No MT solution will be perfect; but we can do more to improve the quality
of the output coming from Google Translate, as well as flag that output as
coming from machine translation in Pontoon (btw, thank you all for those
great ideas). Because it's the end of the year, we've elected to implement
the API that gives us access to the generic neural + phrase-based MT engine
for the remainder of 2018. In 2019, we'll invest in training Google's MT
engine on Mozilla data and benchmarking the output quality. We're hoping
that the improved output plus Pontoon-based quality features will make this
a useful tool for localizers rather than a distraction, disruption, or a
catalyst for creating more work for reviewers.

Sorry for any temporary disruption this has caused.

Jeff
Post by Eduardo Trápani
Hi, could you also turn it off for 'eo'?
Thanks.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Rhoslyn Prys
2018-10-24 14:33:23 UTC
Permalink
Hi Jeff and everyone,

Google Translate works pretty well in Welsh, I'd reckon 80%ish, I used
it on Firefox Monitor this morning and I'm happy. :-)

Some issues:

1. It produces a code (') when there should be an apostrophe

Mae eich cyfrineiriau'n gwarchod mwy na'ch cyfrifon. / Mae eich
cyfrineiriau'n gwarchod mwy na'ch cyfrifon.

Google Translate does this in Poedit as well, though I've not noticed
this on the Google Translate website. Does this, does this os something
similar occur in other languages? Any chance of stopping this?

2. Terminology - it would be useful to be able to stipulate certain terms.

Otherwise, pretty good.

Rhos
Post by Jeff Beatty
Hey everyone,
Hijacking this thread to share a bit of information.
No MT solution will be perfect; but we can do more to improve the quality
of the output coming from Google Translate, as well as flag that output as
coming from machine translation in Pontoon (btw, thank you all for those
great ideas). Because it's the end of the year, we've elected to implement
the API that gives us access to the generic neural + phrase-based MT engine
for the remainder of 2018. In 2019, we'll invest in training Google's MT
engine on Mozilla data and benchmarking the output quality. We're hoping
that the improved output plus Pontoon-based quality features will make this
a useful tool for localizers rather than a distraction, disruption, or a
catalyst for creating more work for reviewers.
Sorry for any temporary disruption this has caused.
Jeff
Post by Eduardo Trápani
Hi, could you also turn it off for 'eo'?
Thanks.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Michael Bauer
2018-10-24 14:46:37 UTC
Permalink
For Gaelic it always puts the ' in the wrong place i.e. instead of a'
dèanamh, it does a 'dèanamh. We have yet to find a way to get Google to
fix *any*thing, including this. I think they do languages like Welsh and
Gaelic, lock them in a room and bury the key next to the witch king of
Angmar...

Michael
Post by Rhoslyn Prys
Google Translate does this in Poedit as well, though I've not noticed
this on the Google Translate website. Does this, does this os
something similar occur in other languages? Any chance of stopping this?
Rhoslyn Prys
2018-10-24 15:28:35 UTC
Permalink
Michael, there is still hope:   Linguists, update your resumes because
Baidu thinks it has cracked fast AI translation

https://www.theregister.co.uk/2018/10/24/ai_translation_baidu/

"Translating between Japanese and German to English and Chinese,
therefore, more difficult. “There is a well-known joke in the UN that a
German-to-English interpreter often has to pause and “wait for the
German verb”. Standard Arabic and Welsh are verb-subject-object , which
is even more different from SVO,” he said."

They must be working on the Celtic languages... :-0
Post by Michael Bauer
For Gaelic it always puts the ' in the wrong place i.e. instead of a'
dèanamh, it does a 'dèanamh. We have yet to find a way to get Google
to fix *any*thing, including this. I think they do languages like
Welsh and Gaelic, lock them in a room and bury the key next to the
witch king of Angmar...
Michael
Post by Rhoslyn Prys
Google Translate does this in Poedit as well, though I've not noticed
this on the Google Translate website. Does this, does this os
something similar occur in other languages? Any chance of stopping this?
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Jeff Beatty
2018-10-24 14:48:34 UTC
Permalink
Good to know Rhoslyn, thanks for sharing.

I remember that we had set up Celtic locales to also use a Celtic
language-specific MT engine. I hope this change didn't override that. If
so, we should get a bug on file.

Jeff
Post by Rhoslyn Prys
Hi Jeff and everyone,
Google Translate works pretty well in Welsh, I'd reckon 80%ish, I used
it on Firefox Monitor this morning and I'm happy. :-)
1. It produces a code (') when there should be an apostrophe
Mae eich cyfrineiriau'n gwarchod mwy na'ch cyfrifon. / Mae eich
cyfrineiriau'n gwarchod mwy na'ch cyfrifon.
Google Translate does this in Poedit as well, though I've not noticed
this on the Google Translate website. Does this, does this os something
similar occur in other languages? Any chance of stopping this?
2. Terminology - it would be useful to be able to stipulate certain terms.
Otherwise, pretty good.
Rhos
Post by Jeff Beatty
Hey everyone,
Hijacking this thread to share a bit of information.
No MT solution will be perfect; but we can do more to improve the quality
of the output coming from Google Translate, as well as flag that output
as
Post by Jeff Beatty
coming from machine translation in Pontoon (btw, thank you all for those
great ideas). Because it's the end of the year, we've elected to
implement
Post by Jeff Beatty
the API that gives us access to the generic neural + phrase-based MT
engine
Post by Jeff Beatty
for the remainder of 2018. In 2019, we'll invest in training Google's MT
engine on Mozilla data and benchmarking the output quality. We're hoping
that the improved output plus Pontoon-based quality features will make
this
Post by Jeff Beatty
a useful tool for localizers rather than a distraction, disruption, or a
catalyst for creating more work for reviewers.
Sorry for any temporary disruption this has caused.
Jeff
Post by Eduardo Trápani
Hi, could you also turn it off for 'eo'?
Thanks.
Post by Julen Ruiz Aizpuru
Post by Michael Bauer
Can someone please switch off Google Translate for gd as soon as
possible?
Can this please also be disabled for 'eu' as well? Thanks!
Julen.
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Matjaz Horvat
2018-10-24 15:16:10 UTC
Permalink
Post by Rhoslyn Prys
1. It produces a code (') when there should be an apostrophe
We're just testing a fix for this, you can check it out on Pontoon stage server:
https://mozilla-pontoon-staging.herokuapp.com/cy/

The difference is that the new behaviour tells the API that the
content we're sending is "plain text" and not "html". For details,
please see https://bugzilla.mozilla.org/show_bug.cgi?id=1501088
Post by Rhoslyn Prys
I remember that we had set up Celtic locales to also use a Celtic
language-specific MT engine. I hope this change didn't override that. If
so, we should get a bug on file.
We use https://github.com/kscanne/caighdean, but it only supports ga-IE.

-Matjaž
Rhoslyn Prys
2018-10-24 15:56:49 UTC
Permalink
Mattjaz, that appears to be working on the state server. I tried a few
strings on AMO and the were OK, no '.

Thanks.
Post by Matjaz Horvat
Post by Rhoslyn Prys
1. It produces a code (') when there should be an apostrophe
https://mozilla-pontoon-staging.herokuapp.com/cy/
The difference is that the new behaviour tells the API that the
content we're sending is "plain text" and not "html". For details,
please see https://bugzilla.mozilla.org/show_bug.cgi?id=1501088
Post by Rhoslyn Prys
I remember that we had set up Celtic locales to also use a Celtic
language-specific MT engine. I hope this change didn't override that. If
so, we should get a bug on file.
We use https://github.com/kscanne/caighdean, but it only supports ga-IE.
-Matjaž
Gabriela Montagu
2018-10-24 16:34:09 UTC
Permalink
It works quite fine for es-AR, that is, no more gibberish as before.
It doesn't include our informal addressing people "vos" we use for websites
though, but the LATAM "tu".
Post by Rhoslyn Prys
Mattjaz, that appears to be working on the state server. I tried a few
strings on AMO and the were OK, no '.
Thanks.
Post by Matjaz Horvat
Post by Rhoslyn Prys
1. It produces a code (') when there should be an apostrophe
We're just testing a fix for this, you can check it out on Pontoon stage
https://mozilla-pontoon-staging.herokuapp.com/cy/
The difference is that the new behaviour tells the API that the
content we're sending is "plain text" and not "html". For details,
please see https://bugzilla.mozilla.org/show_bug.cgi?id=1501088
Post by Rhoslyn Prys
I remember that we had set up Celtic locales to also use a Celtic
language-specific MT engine. I hope this change didn't override that. If
so, we should get a bug on file.
We use https://github.com/kscanne/caighdean, but it only supports ga-IE.
-Matjaž
_______________________________________________
dev-l10n mailing list
https://lists.mozilla.org/listinfo/dev-l10n
Matjaz Horvat
2018-10-24 14:30:45 UTC
Permalink
Post by Eduardo Trápani
Hi, could you also turn it off for 'eo'?
Disabled.

To add to what Jeff said: next year we're also planning to make
several changes to the translation interface and one of the proposals
is to make the contents of the Machinery always visible (instead of
hiding them behind a tab). That would eliminate the problem the
"count" is causing in the Machinery tab right now.

-Matjaž
Loading...