Lexical resources on Arabic tend to focus on the standard version of the language (Modern Standard Arabic, MSA), mostly used in written and formal sources. However, the diffusion of informal genres has increasingly made it necessary the production of wider resources, encompassing the features of spoken varieties commonly found in written texts. The Lahajat project addresses this need by providing a series of rule-based transformations that enlarge existing lexical resources for MSA in order to cover for typical morphonological features found in spoken varieties. In particular, two specific case studies are shown that apply to two widely diverging varieties, Egyptian Arabic and Tunisian Arabish.

Lahajat: A rule-based converter of standard arabic lexical databases into spoken arabic forms / Lancioni, Giuliano; Gugliotta, Elisa; Pettinari, Valeria. - (2016), pp. 395-399.

Lahajat: A rule-based converter of standard arabic lexical databases into spoken arabic forms

Gugliotta, Elisa;
2016-01-01

Abstract

Lexical resources on Arabic tend to focus on the standard version of the language (Modern Standard Arabic, MSA), mostly used in written and formal sources. However, the diffusion of informal genres has increasingly made it necessary the production of wider resources, encompassing the features of spoken varieties commonly found in written texts. The Lahajat project addresses this need by providing a series of rule-based transformations that enlarge existing lexical resources for MSA in order to cover for typical morphonological features found in spoken varieties. In particular, two specific case studies are shown that apply to two widely diverging varieties, Egyptian Arabic and Tunisian Arabish.
2016
Inglese
Lancioni, Giuliano; Gugliotta, Elisa; Pettinari, Valeria
4th IEEE International Colloquium on Information Science and Technology (CiSt)
395
399
5
Esperti anonimi
Lexical Database, Negation, Morphological Analysis, French-speaking, Postage, Standard Language, Code-switching, Rule-based Approach, Personal Pronouns, Transliteration, Loanwords, Parallel Corpus, Social Media Text
Internazionale
No
info:eu-repo/semantics/bookPart
Lancioni, Giuliano; Gugliotta, Elisa; Pettinari, Valeria
2 Contributo in Volume::2.1 Contributo in volume (Capitolo o Saggio)
3
268
Lahajat: A rule-based converter of standard arabic lexical databases into spoken arabic forms / Lancioni, Giuliano; Gugliotta, Elisa; Pettinari, Valeria. - (2016), pp. 395-399.
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/362150
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact