Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

<p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural...

Full description

Saved in:
Bibliographic Details
Main Author: George Mikros (19197997) (author)
Other Authors: Dimitris Boumparis (19198000) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author’s Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.</p><h2>Other Information</h2><p dir="ltr">Published in: Digital Scholarship in the Humanities<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/llc/fqae028" target="_blank">https://dx.doi.org/10.1093/llc/fqae028</a></p>