JoelNothman.com

28 September, 2008

Hebrew-English online translation

Filed under: Hebrew, Technology by Joel @ 12:22 pm, 28 September 2008.

It seems Google Translate has finally added Hebrew to its canon of transled languages (along with another 35). It seems they don’t have translation from web search enabled yet, but you can play with it (translate Dutch to Hebrew for instance) at Google Translate.

I borrow the example text used in one reporting blog:

משטרת גרמניה עצרה שני צעירים בחשד שהתכוונו לבצע פיגוע במטוס של חברת התעופה ההולנדית קיי-אל-אם. כוחות משטרת גרמניה פשטו על המטוס שחנה בשדה התעופה בקלן, זמן קצר לפני שהמריא בחזרה להולנד והוציאו ממנו את שני הצעירים, אזרח גרמני יליד סומליה בן 24 ואזרח סומליה בן 23.

Google Translate says:

German police arrested two youths suspected Shaatcwano an attack on the plane of Dutch airline Kay - to - if. German police forces raided the plane parked at the airport Cologne, shortly before Smria Leclnde back and took him to the two young men, a German citizen born in Somalia 24 Uezarh Somalia age 23.

There are a number of interesting things here:

Assuming something is a proper name if it can’t otherwise be understood is quite a normal approach. But it’s unusual that Google has particular trouble with “שהתכוונו”, “שהמריא” and “ואזרח”, which I don’t consider particularly uncommon words. These, and the messed up “להולנד” all have the common feature of attached prefixes (proclitics), and Google gets it right for all but “המריא” when these are removed. Obviously their word segmentation systems could be improved, or could be adjusted so that if the end system resorts to considering it a proper noun, it might go back and check whether there were some proclitics it failed to lop off. In practice, implementing such a feedback loop may not be worthwhile if the system wants to be fast.

Go take a look at the proper names it forms. It puts some funny letters in there, transliterating:

  • ה ([h]) as nothing (which a lot of Israelis do, but I’m guessing that the system is being hugely biased by the silent הs at the ends of many female names);
  • ו ([v]) as “w”, maybe because “w” always translates to Hebrew in names as ו, but it makes Google look very academic (or Iraqi/Yemenite) to transliterate the vavs in words as waws.
  • כ ([k]) becomes “c”, but so does some non-existant letter in להולנד! What’s going on there?
  • ח (usu. [x]) becomes “h” (rather than “ch” or “kh”), but I guess it is only ever found when transliterating Arabic names, and Ahmed is more common than Achmed.
  • The vowels are also interesting. Especially the spurious “e” on the end of להולנד, but it’s already clear that it’s done a strange job on that one.

Kay - to - if (KLM) is obviously entertaining, but there’s not really much to say about it (except that apparently they split tokens on hyphens).

The most interesting phrase translation is “and took him to the two young men” from “והוציאו ממנו את שני הצעירים”. It would appear as if they took the ו on the end of והוציאו as referring to the object (והוציאוֹ) rather than the subject (והוציאוּ), but seeing as the former is quite rare in contemporary written Hebrew, this may mean they have a wide variety of texts from various ages. And then ממנו seems to disappear altogether. So maybe I’ve just misinterpreted how the system makes a mistake. At the end of the day, the system is all numbers, so no one can really be certain how it made the mistake…

One of the few other online Hebrew-English translation services is Reverso:

A police of Germany stopped two young on suspicion that meant to execute an attack in the airplane of the Dutch airline KAY but them. Forces a police of Germany spreaded on the airplane that parked in the airfield Bkln, a short time before took Off back/in return to Holland and withdrew from him you two the young, German born citizen Somalia ben 24 and citizen Somalia ben23.

Comparing to this translation, we see that Reverso generally does a better job of splitting off proclitics and so makes less apparent mistakes. But its grammar is certainly much poorer, both in English and in Hebrew, thinking for instance that “צעירים” should be understood as an adjective rather than a noun; and that one makes an attack in a plane rather than on it; or that the singular משטרת should be translated “a police”; or that “את” is better translated “you” than as a direct-object marker. Compare also Google’s handling of the compound noun phrase “כוחות משטרת גרמניה” as “German police forces” rather than “Forces a police of Germany”. Also interesting is Reverso’s offering of a choice for בחזרה as “back/in return”.

Overall, while reverso handles word segmentation somewhat better, Google has a much more fluid grammar and chooses more appropriate words in translation.

I haven’t tried translating the other direction (English to Hebrew) yet, or any other combination of languages where I would be under-qualified. I leave that as an exercise to the reader.

And no, they don’t do Yiddish yet. Real Soon Now.

Yes, it’s been a long time. Yes, I won’t be talking much here till November. Shana tova anyway! Enjoy translating your New Year cards from strange Israeli rellies…

29 May, 2008

No q in Nakba

Filed under: Language, Society and culture by Joel @ 10:00 am, 29 May 2008.

After a few articles about “Al-Naqba” in the AJN, I wrote to suggest that they should be using a k and not a q:

There is no q in “Al-Naqba”. The Arabic spelling includes the equivalent of a Hebrew kaf, not their quf.

It seems ‘q’ is used, often by Jewish sources, to Arabise the word and make it seem more foreign and distasteful.

Even the spellings of words can express one’s biases, just as “Moslem”, once an accepted variant, is now considered more derogatory than “Muslim”.

The AJN should utilise the more neutral and accurate spellings, and write articles on “Nakba” rather than “Naqba”.

The printed letter stops after the second paragraph, which I maybe should have made more clear: I do not accuse the Jewish press of a conspiracy to use a stigmatised spelling variant. Language is more subtle and subconscious than that.

I try not to dictate others’ language use. In the case of a newspaper, though, there are always editorial style guides, and I wanted to point out two factors in the spelling of this word:

  1. Phonology: there is a letter q in Arabic, but it’s not used in the word “nakba”.
  2. Sociolinguistics: people have a choice to use “nakba” or “naqba” as both are found in the English press (according to Google in about 10:1 ratio). They may actually use the latter because they perceive it as a more “authentic” transliteration. Of course, it is not. On the other hand, it does make the word look more foreign, and so its use carries some pre-conceived “Arab” feeling that makes the word no longer neutral.

Of course, the word is naturally not a neutral word, whichever way it is spelt. People will often react to it either with distate or with pride. Nonetheless, it shouldn’t be spelt in the “unbiased press” in a way that shows one’s side and one’s ignorance more than necessary.

29 January, 2008

On swearing and swearing: sociolinguistics and the third commandment

Filed under: Halakha, Hebrew, Language, Tanakh by Joel @ 12:30 am, 29 January 2008.

The Third Commandment treats the matter of mistreating God’s name quite bluntly:

Do not take the name of the Lord your God in vain; for the Lord will not acquit one who takes His name in vain.

Rashi follows the translation of Onkelos in suggesting that the repeated “taking in vain” is once an injunction against those who swear by the Name falsely, and once against those who swear needlessly.

Judaism abounds in traditions of protecting the sanctity of Divine Names in writing, and avoiding them in speech except when necessary. In fact, (להבדיל) the Rabbinic manner of protecting the divine name has taken on characteristics commonly found in linguistic taboo associated with swearing (the other type), euphemism, or political correctness. (more…)

23 December, 2007

Evening’s roses: erev shel shoshanim

Filed under: Hebrew, Music, Poetry by Joel @ 5:41 pm, 23 December 2007.

Another upcoming wedding, another song. Erev shel shoshanim is a classic. Unfortunately, the first few results for translations of its lyrics are far too literal and hardly able to be sung to its beautiful tune.

The original song also approximately rhymes the 2nd and 4th line of each of its three stanzas, which none of those translations do. So here is my go at a singable translation of Erev Shel Shoshanim:

Evening of roses
Let’s go out among the trees
Spices, perfumes, sweetest myrrh
Furnish beneath your knees

Slowly the nighttime falls
A rose-scented wind above
I whisper to you, my love, a song
Softly a song of love

At dawn, a cooing dove
Your hair’s filled with moisture’s beads
Your lips to the morning are a rose
The rose that I pick for me

Erev shel shoshanim
Netze na el habustan
Mor besamim ulevona
Leraglech miftan

Layla yored le’at
Veruach shoshana noshva
Hava elchash lakh shir balat
Zemer shel ahava

Shachar homa yona
Roshech malei telalim
Pikh el haboker shoshana
Ektefeinu li

ערב של שושנים
נצא נא אל הבוסתן
מור בשמים ולבונה
לרגלך מפתן

לילה יורד לאט
ורוח שושן נושבה
הבה אלחש לך שיר בלאט
זמר של אהבה

שחר הומה יונה
ראשך מלא טללים
פיך אל הבוקר שושנה
אקטפנו לי

26 November, 2007

Strength and yearning: translating Hebrew poetry

Filed under: Hebrew, Music, Poetry by Joel @ 12:23 am, 26 November 2007.

I just came back from the first in a series of close friends’ weddings. All in all it was beautiful and a lot of fun. As the bride entered, I and another three (including her grandmother) sang (two verses of) a setting of a 17th century poem, based on the Song of Songs, which I also had the opportunity to translate.

Having never tried to translate poetry before, it was an exciting challenge. Some poems require a literal translation; others need to have the right sense but also the rhythm and rhyme. In this case, I chose the latter.

With the help of others, especially Simon Holloway, this is what we came up with:

Chishki Chizki (חשקי חזקי) by Isaac Aboab da Fonseca (1605-1693)

My strength, my yearning day by day:
O king, dispel my dark away!
My source, my sun, though still so bright:
Your sun, my king, shall give me light.

Awake; Awake! O ten-stringed lyre:
Sing all your songs in voiced desire.
Your moon, your glow, need not return:
Here comes your light; my light is born.

חִשְׁקִי חִזְקִי מִדֵּי יוֹם יוֹם
מַהֵר הָאֵר מַלכִּי חָשׁכִּי
רִמְשִׁי שִׁמְשִׁי עוֹד לֹא יִכְבֶּה
יָאִיר לִי אוֹר שִׁמְשֵׁךְ מַלְכִּי

עוּרִי עוּרִי נֵבֶל עָשׂוֹר
בְּקוֹל זִמְרָה שִׁירִים שִׁירִי
יַרְחֵךְ זַרְחֵךְ לֹא יָבוֹא עוֹד
כִּי בָא אוֹרֵךְ קוּמִי אוֹרִי

8 November, 2007

Abraham in discourse

Filed under: Hebrew, Tanakh by Joel @ 11:24 pm, 8 November 2007.

Genesis reports Abraham being involved in a few very intense dialogues, and it is interesting to notice some of the phrases he introduces his speech with. In chapter 15, his address to God is “My lord, Hashem”. When bargaining with God over the lives of the people of Sodom (chapter 18), he is more elaborate:

  • Here I venture to speak to my Lord, I who am but dust and ashes… (הנה-נא הואלתי לדבר אל אדני ואנכי עפר ואפר)
  • Let not my Lord be angry if I go on… (אל-נא יחר לאדני ואדברה)
  • And again: Here I venture to speak to my Lord… (הנה-נא הואלתי לדבר אל אדני)
  • Let not my Lord be angry if I speak even this last time… (אל-נא יחר לאדני ואדברה אף-הפעם)

Appropriate language to speak with God? Maybe, but when it comes to negotiations with men, the relationship is more equal. Abraham discusses the purchase of a burial site for his late Sarah in chapter 23, and from both parties involved, the speech introduction is usually “my lord, hear me” (אדני שמעני) or “hear me, my lord” (שמעני אדני) or “no, my lord, hear me” (לא אדני שמעני) or “but if you will hear me” (אך אם אתה לו שמעני). Listening skills are in high demand, but…
(more…)

29 October, 2007

Regular expressions for Mishnaic tractates

Filed under: Hebrew, Judaism, Technology by Joel @ 4:37 pm, 29 October 2007.

Various transliteration conventions (or a lack thereof) and dialectal differences make it very difficult at times to gather all possible variations for transcribing Hebrew words into English characters. This can make using search engines to find Hebrew terms in English sources very difficult, or could make it hard for a piece of software to identify what someone is referring to when they enter a string of text. For example, biblical book names each have a number of ways of being written, and my BibRef solves this by simply storing a list of alternative names and abbreviations.

Another way of identifying an entered string with one of many options is with regular expressions. As such, I have attempted below to devise regular expressions to match all expected spellings for each tractate (masechet, masekhet, maseches, meseches, etc.) of the Mishnah. Please note that this is only a draft: I expect to improve the regular expressions, and feedback is much appreciated.

Using this as a background study, it may be possible to automate the building of regular expressions for Hebrew words (with vowels given), although many of the expressions below also cover a number of irregularities that would be hard to incorporate into such a builder. Consequently, one could also build a list of all possible alternative spellings for a word, which could then be used with a search engine to make searches of these Hebrew words comprehensive. (Edit: the current expressions below overgenerate way too much and would probably be inappropriate for that task.)
(more…)

13 October, 2007

October

Filed under: Language by Joel @ 7:35 pm, 13 October 2007.

I pointed out this time last year that the Hebrew month of (Mar)cheshvan actually comes from the Akkadian for “eighth month”. So it means the same as October.

Nonetheless, October is the tenth month, and (Mar)cheshvan is the second.

3 October, 2007

Pleasing petitions - a change of vowels

Filed under: Hebrew, Siddur by Joel @ 4:27 pm, 3 October 2007.

On festivals, before Kohanim bless the congregation, Ashkenazim insert an alternative nusach for the “avodah” beracha of the amida prayer:

ותערב לפניך עתירתנו כעולה וכקרבן. אנא, רחום, ברחמיך הרבים השב שכינתך לציון עירך, וסדר העבודה לירושלים. ותחזינה עינינו בשובך לציון ברחמים, ושם נעבדך ביראה כימי עולם וכשנים קדמוניות. ברוך אתה ה’ שאותך לבדך ביראה נעבוד.

May our petition be pleasing before you as a sacrificial offering. Please, the Merciful, in your great mercy, return your presence to Zion your city, and the temple service to Jerusalem. And may our eyes see your return to Zion with mercy, and there we shall serve you in awe as in ancient times and earlier years. Blessed be you, Lord, for you alone will we serve in awe.

As well as being a beautiful prayer and, it seems, having an interesting history, I was alerted a few days ago to a variation in the vowels of the first word. We find:

וְתֵעָרֵב – vetēʿārēv
in Artscroll
וְתֶעֱרַב – veteʿĕrav
in “Adler”, “Birnbaum”, Hebrew Publishing Co. 1928, Koren, Meforash, Routledge, Shilo, “Singer”

The meaning is apparrently unaffected by the change of vowels. I have become used to the Artscroll version, and yet I prefer the alternative, and not just because it is much more popular. Rather, here’s why…
(more…)

26 August, 2007

Scribal law and children as judges

Filed under: Halakha, Hebrew, Paleography by Joel @ 12:01 pm, 26 August 2007.

I’ve been reading the Laws of Tefillin in the Mishna Berura, particularly its descriptions of the laws for scribes. It gives incessant detail for what makes a particular letter valid and what doesn’t. And then it prescribes that in cases of doubt, one should ask a “תינוק שאינו לא חכם ולא טיפש” (Shulchan Aruch Orach Chayim 32:16), a child that is neither clever nor stupid, to attempt to identify the letter.

In modern Hebrew, a תינוק is a baby (and etymologically is implied, deriving from ינק, to suckle). But in order to read, it obviously needs to be older than that. The Kitzur Shulchan Aruch (24:5) writes that such a child is one “שאין מבין את העניין, אבל יודע ומבין את האותיות”—he doesn’t understand the issue, but knows and understands the letters. Similarly, the Mishna Berura (32:49) explains that such a child-decisor is too clever if he understands the issues of when letters need to be fixed, but not too clever if he knows the letters well and can’t read the words. On the other hand, one too stupid cannot read the letters; all-in-all one who can read the letters, even if not proficient or expert in them, may judge in such a case (32:50).

Final Nun and Kaf in modern and scribal print So here comes the issue. Most children nowadays are not taught scribal letters first-off. Most would be taught the alphabets of modern printed Hebrews: either what we find in our siddurim, or in Modern Israeli printed texts. And these are all significantly different from the prescribed scribal art. Even I, for instance, might initially read a valid but thin scribal ך (final kaf) as a ן (final nun), because although the nun of the scribe is very different to their kaf, I am more familiar with a printed nun.

How is a child raised on one script meant to identify letters in another?

Is there a halakhic solution to this problem?

Next Page »

Powered by WordPress