Improving the Accuracy of English-Arabic Statistical Sentence Alignment

Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been d...

Full description

Saved in:
Bibliographic Details
Main Author: Mansour, Nashat (author)
Other Authors: Salameh, Mohammad (author), Zantout, Rached (author)
Format: article
Published: 2011
Online Access:http://hdl.handle.net/10725/2963
http://iajit.org/PDF/vol.8,no.2/9-999.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513459733921792
author Mansour, Nashat
author2 Salameh, Mohammad
Zantout, Rached
author2_role author
author
author_facet Mansour, Nashat
Salameh, Mohammad
Zantout, Rached
author_role author
dc.creator.none.fl_str_mv Mansour, Nashat
Salameh, Mohammad
Zantout, Rached
dc.date.none.fl_str_mv 2011
2016-01-26T13:56:28Z
2016-01-26T13:56:28Z
2016-01-26
dc.identifier.none.fl_str_mv 1683-3198
http://hdl.handle.net/10725/2963
Salameh, M., Zantout, R., & Mansour, N. (2011). Improving the accuracy of English-Arabic statistical sentence alignment. Int. Arab J. Inf. Technol., 8(2), 171-177.
http://iajit.org/PDF/vol.8,no.2/9-999.pdf
dc.language.none.fl_str_mv en
dc.relation.none.fl_str_mv The International Arab Journal of Information Technology
dc.rights.*.fl_str_mv info:eu-repo/semantics/openAccess
dc.title.none.fl_str_mv Improving the Accuracy of English-Arabic Statistical Sentence Alignment
dc.type.none.fl_str_mv Article
info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/article
description Multilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.
eu_rights_str_mv openAccess
format article
id LAURepo_2d2e4072802345bc6d0e9bf76dc23572
identifier_str_mv 1683-3198
Salameh, M., Zantout, R., & Mansour, N. (2011). Improving the accuracy of English-Arabic statistical sentence alignment. Int. Arab J. Inf. Technol., 8(2), 171-177.
language_invalid_str_mv en
network_acronym_str LAURepo
network_name_str Lebanese American University repository
oai_identifier_str oai:laur.lau.edu.lb:10725/2963
publishDate 2011
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Improving the Accuracy of English-Arabic Statistical Sentence AlignmentMansour, NashatSalameh, MohammadZantout, RachedMultilingual natural language processing systems are increasingly relying on parallel corpus to ameliorate their output. Parallel corpora constitute the basic block for training a statistical natural language processing system and creating translation and language models. Several systems have been devised that automatically align words of a pair of sentences, each in a language. Such systems have been used successfully with European languages. In this paper, one such system is used to align sentences in an English-Arabic corpus. The system works poorly given raw unaligned sentence English-Arabic sentence pairs. This prompted the development of a preprocessing step to be applied to the Arabic sentences. The same corpus was then preprocessed and a significant improvement is reported when alignment is attempted using the preprocessed unaligned sentences.PublishedN/A2016-01-26T13:56:28Z2016-01-26T13:56:28Z20112016-01-26Articleinfo:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/article1683-3198http://hdl.handle.net/10725/2963Salameh, M., Zantout, R., & Mansour, N. (2011). Improving the accuracy of English-Arabic statistical sentence alignment. Int. Arab J. Inf. Technol., 8(2), 171-177.http://iajit.org/PDF/vol.8,no.2/9-999.pdfenThe International Arab Journal of Information Technologyinfo:eu-repo/semantics/openAccessoai:laur.lau.edu.lb:10725/29632016-08-25T08:46:04Z
spellingShingle Improving the Accuracy of English-Arabic Statistical Sentence Alignment
Mansour, Nashat
status_str publishedVersion
title Improving the Accuracy of English-Arabic Statistical Sentence Alignment
title_full Improving the Accuracy of English-Arabic Statistical Sentence Alignment
title_fullStr Improving the Accuracy of English-Arabic Statistical Sentence Alignment
title_full_unstemmed Improving the Accuracy of English-Arabic Statistical Sentence Alignment
title_short Improving the Accuracy of English-Arabic Statistical Sentence Alignment
title_sort Improving the Accuracy of English-Arabic Statistical Sentence Alignment
url http://hdl.handle.net/10725/2963
http://iajit.org/PDF/vol.8,no.2/9-999.pdf