Buckwalter transliteration

The Buckwalter Arabic transliteration was developed as part of the ALPNET Arabic Project being run by Dr. Ken Beesley in 1988.

Start

The first Arabic language analyst for the project was a BYU undergraduate student named Derek Foxley, hired as part-time. Derek was in 4th year Arabic courses at the time at BYU. (see the first page of one of the first presentations given by Dr. Beesley, in 1989, at University of Utah, foot notes list the contributors in order up to that point) . Tim Buckwalter was employed several months later as a full-time employee of ALPNET. Tim was also a Phd candidate in Arabic at the time. One of his tasks on the project was to collaborate with and assign Arabic language tasks to the part-time employee, Derek.

Dr. Beesley mentored Tim and Derek in some of the finer details of linguistics (as they were all sitting in the same 20x20 office in Provo UT, using DEC and Sun Sparc workstations). And one day at the whiteboard Dr. Beesley prodded Derek and Tim to come up the transliteration schema at that moment. Derek had been entering most of the data at that point in the project so was ready to address this. Nevertheless, in close collaboration with Tim he came up with nearly all the characters used for the transliteration table. Tim oversaw Derek's Arabic tasks and made the final adjustments and refinements the transliteration table. It had no name at the time, however, Tim in a few years following the project had entered thousands of textual items using the transliteration schema and presented it, and championed it, many times as well. It is was, therefore, appropriately named after him.

At the time, no such one-for-one letter transliteration was in use, or at least none that the team was aware of.

Dr. Beesley later moved to Xerox, who bought the rights to the ALPNET data in the 1990s. This is documented in several other articles that Dr. Ken Beesley has presented over the years.

Commentary on the System

The Buckwalter Transliteration is an ASCII only transliteration scheme, representing Arabic orthography strictly one-to-one, unlike the more common romanization schemes that add morphological information not expressed in Arabic script. Thus, for example, a wāw will be transliterated as w regardless of whether it is realized as a vowel /uː/ or a consonant /w/. Only when the wāw is modified by a hamzah (ؤ) does the transliteration change to &. This allows the user to type or convert text exactly as it is seen.

However, there has been some critique of the transliteration schema. Some users state that the unmodified letters are straightforward to read (except for *=dhaal and E=ayin, v=thaa), but the transliterations of letters with diacritics and the harakat take some time to get used to, for example the nunated -un, -an, -in appear as N, F, K, and the sukūn ("no vowel") as o. Taʾ marbūṭah ة is p. The difficulty probably has happened because usually the Buckwalter Transliteration is used and/or presented without the rationale behind the letters. Though those particular letters seem to be random they are actually mnemonically linked to the original letter.

Furthermore, since the original Buckwalter scheme was developed, several other variants have emerged, although they are not all standardized. Buckwalter transliteration is not compatible with XML, so "XML safe" versions often modify the following characters: < > & (أ إ and ؤ respectively; Buckwalter suggests transliterating them as I O W, respectively). Completely "safe" transliteration schemes replace all non-alphanumeric characters (such as

;*) with alphanumeric characters. For a complete description of different Buckwalter schemes as well as a more detailed discussion of the trade-offs between different schemes, see.[1]

When transliterating Arabic text, several other issues may arise. First, some Arabic characters are not specified in the transliteration table, including non-alphabetic characters such as ۞ and ۝, punctuation such as ؛ ؟, and "Hindi" or "Eastern Arabic" numerals. Similarly, sometimes Arabic sentences will borrow non-Arabic letters from Persian, some of which are defined in the full Buckwalter table.[2] Symbols that are not defined in the transliteration table may be deleted, kept as non-Latin symbols embedded in transliterated text, or transliterated into different (non-conflicting) Latin symbols. (For instance, it is straightforward to convert from Hindi numerals to Arabic numerals.) Another issue that arises is how to handle transliterating Arabic text with embedded ASCII text; for instance, an Arabic sentence that refers to "IBM" or an Arabic sentence that includes a quote in English. If the Latin text is not explicitly marked, it is a challenge to distinguish transliterated Arabic from Latin. If transliterated text with embedded Latin is later transliterated back to Arabic, the Latin text will be transliterated into garbage Arabic. Finally, another important decision to make is how much normalization of the Arabic text should be done during transliteration. This may include removing ـ kashida, removing short vowels and/or other diacritics, and/or normalizing spelling.[1]

On the other hand, all, typical markings one would expect to use when writing - !@#%?.,;:()[] +=were not used because they are also used in Arabic text. Thus, if the English IBM did appear in English, in the Arabic text it was in the original concept supposed be marked by putting double quotes around it ""IBM"". This mechanism allows for automatic language processing to take place leaving non-Arabic text as is, unprocessed when it sees the double quotes. Originally, even < > & were not used either especially < > which are French borrowed quote marks because they are occasionally used in Arabic text. These were added later as a necessity. Their XML safe versions keep with the mnemonic device devised (and discussed below) in that I O W correspond (if imprecisely) to each of the sounds made.

Key Concepts in Development of the Table

There were three key concepts used the transliteration schema:

The first was that each Arabic letter (sound) can only correspond to one English language character. Some Arabic letters produce a sound that corresponds to 2 English letters when written. Therefore, a single letter or common symbol would have to be used for them.

The second concept was to use the familiar if possible. If an Arabic letter had always been associated with the letter “s” in English, for example, then it would be easier to remember if it could be kept that way. (Don't reinvent the wheel!)

The third key concept was that the table had to be fully, easily mnemonic. Therefore, every single item correlates in the following order of preference a) to the sound of the Arabic letter, or b) to a physical aspect of the original Arabic letter or, c) to the name it is called.

Mechanics

Lower case letters were used in preference. However, when there are multiple Arabic letters that have a similar sounds then for more open sounds the lower case letter was used and for more close/restricted sounds an upper case letter was used. For example, in Arabic there are 2 letters with a “d” sound. The more open sound was given a small “d” and the heavier, more closed sound was assigned a Upper Case “D”.

In other words, an Upper Case indicates that the letter is similar to a lower case letter – but has a qualitative difference in some way.

Buckwalter transliteration table

Arabic letters	ا	ب	ت	ث	ج	ح	خ	د	ذ	ر	ز	س	ش	ص	ض	ط	ظ	ع	غ	ف	ق	ك	ل	م	ن	ه	و	ي	ی‎[3]
DIN 31635	ʾ / ā	b	t	ṯ	ǧ	ḥ	ḫ	d	ḏ	r	z	s	š	ṣ	ḍ	ṭ	ẓ	ʿ	ġ	f	q	k	l	m	n	h	w / ū	y	ī
Buckwalter	A			v	j	H	x		*				$	S	D	T	Z	E	g								w	y	Y
Qalam	' / aa			th			kh		dh				sh					`	gh								w	y	Y
BATR	A / aa			c			K		z'				x					E	g								w / uu	y	ii
IPA (MSA)	ʔ, aː	b	t	θ	dʒ ɡ ʒ	ħ	x	d	ð	r	z	s	ʃ	sˤ	dˤ	tˤ	ðˤ zˤ	ʕ	ɣ	f	q	k	l	m	n	h	w, uː	j, iː

hamza

lone hamza: '
hamza on alif: >
hamza below alif: <
hamza on wa: &
hamza on ya: }

alif

madda on alif: |
alif al-wasla: {
dagger alif: `
alif maqsura: Y

harakat

fatha: a
damma: u
kasra: i
fathatayn: F
dammatayn: N
kasratayn K
shadda: ~
sukun: o

ta marbouta: p

tatwil: _

Explanation of the Mnemonics Used in the Buckwalter Transliteration

ا	A	This letter produces an “A” sound. It is not a lower case “a” because that would conflict with the “fetah” diacritical mark which has a softer “a” sound
ب	b	This letter a “b” sound and is nearly always written as “b” in English.
ة	p	This is the “tah marbutah” and a “p” looks very similar to the way it is written when connected to a preceding letter.
ت	t	This letter produces an open “t” sound and is nearly always written as “t” in English.
ث	v	This letter produces a “th” sound found in “theater”. There are 3 dots above it that when written look like an upside down “v” – therefore, a “v” was used.
ج	j	This letter in MSA is pronounced “j” and is nearly always written as “j” in English.
ح	H	This letter produces a heavy “h” sound produced in the back of the mouth/throat and it conflicts with a soft “h” sound of another letter - so an Upper case “H” is used.
خ	x	This letter produces a “kh” sound not dissimilar to the way an English speaker says the name of the letter “x”.
د	d	This letter is a soft “d” sound and conflicts with another “d” sounding letter found later so the lower case d was used.
ذ	*	This letter produces a “th” sound found in the word “this”. Often written as “zh”. It has a dot above it so the single asterisk that looks similar to a dot above the line was used.
ر	r	This letter sounds like “r” and is nearly always written as “r” in English. Lower case "r".
ز	z	This letter sounds like “z” and is nearly always written as “z” in English. Lower case "z".
س	s	This letter sounds likes “s” and is nearly always written as “s” in English. Lower case "s".
ش	$	This letter looks like similar “s”, but sounds like “sh” so the dollar sign was used because it looks like “s” but also has an extra property (a line through it). We could not use Upper Case “S” because of the next letter below – which is a heavy sounding s
ص	S	This letter sounds like an “s” but is deeper, further back in the mouth/throat so it was given an Upper Case “S” so it does not conflict with the soft “s” shown previously.
ض	D	This letter sounds like an “d” but is deeper, further back in the mouth/throat so it was given an Upper Case “d” so it does not conflict with the soft “d” shown previously.
ط	T	This letter sounds like a “t” but is deeper, further back in the mouth/throat so it was given an Upper Case “t” so it does not conflict with the soft “t” shown previously.
ظ	Z	This letter sounds like “th” or “zh” but is deeper, further back in the mouth/throat so it was given an Upper Case “Z” so it does not conflict with the soft “z” shown previously.
ع	E	This letter has no English equivalent. So a pure physical mnemonic was used. When you look at the full Arabic letter and the English Upper Case “E” (especially a hand written "E"), they are remarkably similar.
غ	g	This letter has no English equivalent. It has often been written as “gh”. So we kept the "g" and used a physical mnemonic as well. It has a similar appearance to the lower case letter “g”.
ف	f	This letter sounds like “f” and is nearly always written as “f” in English. Lower case "f"
ق	q	This letter sounds similar to the letter “q” and is often written as “q” in English. Lower Case "q"
ك	k	This letter sounds like “k” and is nearly always written as “k” in English. Lower Case "k"
ل	l	This letter sounds like “l” and is nearly always written as “l” in English. Lower Case "l"
م	m	This letter sounds like “m” and is nearly always written as “m” in English. Lower Case "m"
ن	n	This letter sounds like “n” and is nearly always written as “n” in English. Lower Case "n"
ه	h	This letter sounds like “h” and is soft. It conflicts with another heavier sounding “h” shown above . Since this one produces a softer sounding “h” the lower case was used.
و	w	This letter sounds like “w” and is often written as “s” in English.
ی	Y	We used a physical mnemonic here, as it looks like the next letter, but has no dots underneath. So it looks the same in our English transliteration too (it is Upper Case)
ي	y	This letter sounds like “Y” and is often written as “y” in English. It is Lower Case while the letter above is Upper Case.
ً	F	In Arabic this is called the “fethatain”, the dual fetha. Upper Case “F” because the lower case is already used, and F is a reminder of “Fethatain”
ٌ	N	The double “demma” is often referred to in English language classes as “Nunation” it is pronounced as a soft “oon” sound. Lower case “n” is already used, plus consistency with “F” for doubled markings, means we used Upper Case “N”
ٍ	K	This is the “Kesratain”, the dual “kesra”, it makes an “in” sound. Lower case “k” is already used, plus consistency with “F” for doubled markings, means we used Upper Case “K”
َ	a	This is the single “fetha”, and makes the short “a” sound, many have used “a” to represent this, plus both "F" and "f" have been used. So a lower case "a" as is traditional was used.
ُ	u	This is the single “demma” and makes the “oo” short vowel sound, many have used “u” to represent this in English text So a lower case "u" as is traditional was used.
ِ	i	This is the single “kesra” and makes the “i” short vowel sound, many have used “i” lower case to represent this in English text,
ّ	~	This is the “shedda” which represents a doubling of the sound/letter it is above. The tilde is also a marking that sits above a letter and is found on most English keyboards. It is a physical mnemonic.
ْ	o	This is the “sukkun” and represents that there is no vowel sound on that letter. We used a close physical mnemonic of lower case “o”

The original ALPNET team quickly adopted this schema. Even though Dr. Beesley had no background in Arabic he was quickly able to understand and use it. The strength of the Buckwalter Transliteration is that every single Arabic letter is represented distinctly. Yet, its reliance on traditional transliterations or mnemonic devices for anything non-traditional makes it very easy to learn.

Sample

The First Article of The Universal Declaration of Human Rights:

Arabic Text

يُولَدُ جَمِيعُ ٱلنَّاسِ أَحْرَارًا مُتَسَاوِينَ فِي ٱلْكَرَامَةِ وَٱلْحُقُوقِ. وَقَدْ وُهِبُوا عَقْلًا وَضَمِيرًا وَعَلَيْهِمْ أَنْ يُعَامِلَ بَعْضُهُمْ بَعْضًا بِرُوحِ ٱلْإِخَاءِ.[4]

Buckwalter Transliteration

yuwladu jamiyEu {ln~aAsi >aHoraArFA mutasaAwiyna fiy {lokaraAmapi wa{loHuquwqi. waqado wuhibuwA EaqolFA waDamiyrFA waEalayohimo >ano yuEaAmila baEoDuhumo baEoDFA biruwHi {lo<ixaA'i.

English Text

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.[5]

Notes

Habash, Nizar. Introduction to Arabic Natural Language Processing. Morgan & Claypool, 2010.
Buckwalter, Tim. Buckwalter Arabic Transliteration Table.
In Egypt, Sudan and sometimes other regions, the final form is sometimes ی (without dots).
"Universal Declaration of Human Rights - Arabic (Alarabia)". ohchr.org. OHCHR. 2016. Retrieved October 22, 2016.
"Universal Declaration of Human Rights - English". ohchr.org. OHCHR. 2016. Retrieved October 22, 2016.

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Habash2010-1] Habash, Nizar. Introduction to Arabic Natural Language Processing. Morgan & Claypool, 2010.

[2] Buckwalter, Tim. Buckwalter Arabic Transliteration Table.

[3] In Egypt, Sudan and sometimes other regions, the final form is sometimes ی (without dots).

[4] "Universal Declaration of Human Rights - Arabic (Alarabia)". ohchr.org. OHCHR. 2016. Retrieved October 22, 2016.

[5] "Universal Declaration of Human Rights - English". ohchr.org. OHCHR. 2016. Retrieved October 22, 2016.