AEPC: Designing an Arabic/English parallel corpus

Hind M. Alotaibi

Abstract


Abstract – Parallel corpora ‒ collections of aligned translated texts of two or more languages ‒ play a significant
role in translation and contrastive studies. Given the importance of the availability of such learning resources for
the education and training of translators, Arabic suffers from a lack of such learning resources. Although there
are a limited number of free Arabic/English parallel corpora, a major drawback is that they are domain-restricted
corpora, which limits their benefits for Arabic translation education. This paper describes an ongoing project to
design and construct a balanced, representative, and free-to-use Arabic English parallel corpus (AEPC). In
addition, the project involves the design and implementation of an Arabic/English concordance tool. The
proposed parallel corpus and its tool can be integrated into translators’ training institutions as an educational
resource for translation studies and teaching. It can be used in training and testing Arabic/English machine
translation systems. The first phase of this project involved compiling high-quality translated text samples; all
translations were done by human translators. The corpus covers a wide range of text types and rich metadata.
The target figure for the corpus is minimally 10 million words, with the intention to increase that figure in the
future. After compiling the texts, manual (i.e. human-aided) alignment was performed, offering better outcomes
in terms of accuracy compared to automated alignment. The second phase of this project involved designing a
web interface with a bilingual concordancer, where users can explore the content of the AEPC in both English
and Arabic.

Keywords


Parallel corpus, translation, concordancer, computational linguistics, ESL

Full Text:

PDF


Research in Corpus Linguistics (RiCL, ISSN 2243-4712)