Improving the Accuracy of English-Arabic Statistical Sentence Alignment

Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been d...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Mansour, Nashat (author)
مؤلفون آخرون:	Salameh, Mohammad (author), Zantout, Rached (author)
التنسيق:	article
منشور في:	2011
الوصول للمادة أونلاين:	http://hdl.handle.net/10725/2963 http://iajit.org/PDF/vol.8,no.2/9-999.pdf
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

الوصف
الملخص:	Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.

Improving the Accuracy of English-Arabic Statistical Sentence Alignment

مواد مشابهة