ARABIX: CONTEXT-SENSITIVE MODELS FOR SPOKEN MODERN STANDARD ARABIC

Back to Page Authors: Khaled Elghamry, Attia Youseif

Keywords: modern standard Arabic, conversational agents, corpus, language variation, language learning

Abstract: One of the recommendations for the conversational agents being currently developed for Arabic is to use a language variant that sounds educated yet friendly and natural. Modern Standard Arabic (MSA) has been adopted so far as the candidate for such task. One of the issues here is how to handle word-final vowels in spoken MSA (sMSA) that mark the syntactic case for adjectives and nouns, and mood for verbs. There are two extremes in this respect; the first is to drop these vowels across the board, and the other is to pronounce these vowels following the exact rules of an Arabic grammar textbook, regardless of the context. Previous studies on the subject were mainly qualitative. Using a quantitative corpus-based approach, this paper shows that real-world corpora do not support any of these two scenarios and that there is no one-fits-all model of sMSA. we show that this space is rich where almost every area in this space corresponds to a model or flavor of sMSA that is optimal for the country, gender and topic or domain. The corpus was collected from: [a] official news channels from all Arab countries, [b] channels covering pan-Arab issues such as Aljazeera, Al-Arabiya and Skynews, [c] and the Arabic-speaking versions of France 24, CNN, BBC, and Euronews. The total sample contains 314 minutes and 49 seconds, divided almost equally between male and female news anchors. The sample was manually transcribed. Each word was labeled for the presence or absence of a word-final vowel that marks the mood or case of such word, given its context. Each news piece was labeled with the anchor’s gender and nationality. The preliminary results show that the full use of case markers seems to be an idealized scenario that exists only in Arabic grammar textbooks. The suggestion of dropping cases across the board seems hard to be realized in real-world corpora. For example, our analysis shows that Iraqi anchors pronounce word-final vowels the least with %42, and the Tunisians the highest with %61. Our analysis also shows that there is no significant difference between the male and female anchors, with %55 and %53, respectively. The effect of the topic is currently under investigation. As for where these vowels are pronounced our analysis shows that this can be explained using the immediate context of the vowel. For example, such a vowel on a word-final /b/ followed a word with an initial ‘alif wasl’ is pronounced %70 of the time, whereas such a vowel on a ‘taa marbouta” followed by a /f/ is pronounced in only %18. The paper discusses two main explanations of the reasons why case and mode vowels are pronounced with these ratios. The first revolves around the idea that the pronunciation of these vowels is subject to what we call “Optimization of Rhythmical Effect” (ORE) where the speaker opts for the choice that optimizes the rhythmical euphonic effect of his message on the listener, and that the choices of the same speaker may differ for the same contexts given the topic and the desired effect. The other possible explanation is what we call “Efficient Necessary Minimum” (ENM) based on the least effort that gets the message through without ambiguity, yet still sounds educated and knowledgeable. In addition to conversational agents, the paper also discusses possible applications in language learning, parsing, literary stylistics, and religious discourse studies.