Improving the Accuracy of English-Arabic Statistical Sentence Alignment
Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been d...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , |
| Format: | article |
| Published: |
2011
|
| Online Access: | http://hdl.handle.net/10725/2963 http://iajit.org/PDF/vol.8,no.2/9-999.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513459733921792 |
|---|---|
| author | Mansour, Nashat |
| author2 | Salameh, Mohammad Zantout, Rached |
| author2_role | author author |
| author_facet | Mansour, Nashat Salameh, Mohammad Zantout, Rached |
| author_role | author |
| dc.creator.none.fl_str_mv | Mansour, Nashat Salameh, Mohammad Zantout, Rached |
| dc.date.none.fl_str_mv | 2011 2016-01-26T13:56:28Z 2016-01-26T13:56:28Z 2016-01-26 |
| dc.identifier.none.fl_str_mv | 1683-3198 http://hdl.handle.net/10725/2963 Salameh, M., Zantout, R., & Mansour, N. (2011). Improving the accuracy of English-Arabic statistical sentence alignment. Int. Arab J. Inf. Technol., 8(2), 171-177. http://iajit.org/PDF/vol.8,no.2/9-999.pdf |
| dc.language.none.fl_str_mv | en |
| dc.relation.none.fl_str_mv | The International Arab Journal of Information Technology |
| dc.rights.*.fl_str_mv | info:eu-repo/semantics/openAccess |
| dc.title.none.fl_str_mv | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| dc.type.none.fl_str_mv | Article info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/article |
| description | Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences. |
| eu_rights_str_mv | openAccess |
| format | article |
| id | LAURepo_2d2e4072802345bc6d0e9bf76dc23572 |
| identifier_str_mv | 1683-3198 Salameh, M., Zantout, R., & Mansour, N. (2011). Improving the accuracy of English-Arabic statistical sentence alignment. Int. Arab J. Inf. Technol., 8(2), 171-177. |
| language_invalid_str_mv | en |
| network_acronym_str | LAURepo |
| network_name_str | Lebanese American University repository |
| oai_identifier_str | oai:laur.lau.edu.lb:10725/2963 |
| publishDate | 2011 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| spelling | Improving the Accuracy of English-Arabic Statistical Sentence AlignmentMansour, NashatSalameh, MohammadZantout, RachedMultilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.PublishedN/A2016-01-26T13:56:28Z2016-01-26T13:56:28Z20112016-01-26Articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article1683-3198http://hdl.handle.net/10725/2963Salameh, M., Zantout, R., & Mansour, N. (2011). Improving the accuracy of English-Arabic statistical sentence alignment. Int. Arab J. Inf. Technol., 8(2), 171-177.http://iajit.org/PDF/vol.8,no.2/9-999.pdfenThe International Arab Journal of Information Technologyinfo:eu-repo/semantics/openAccessoai:laur.lau.edu.lb:10725/29632016-08-25T08:46:04Z |
| spellingShingle | Improving the Accuracy of English-Arabic Statistical Sentence Alignment Mansour, Nashat |
| status_str | publishedVersion |
| title | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| title_full | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| title_fullStr | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| title_full_unstemmed | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| title_short | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| title_sort | Improving the Accuracy of English-Arabic Statistical Sentence Alignment |
| url | http://hdl.handle.net/10725/2963 http://iajit.org/PDF/vol.8,no.2/9-999.pdf |