Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

<p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural...

Full description

Saved in:
Bibliographic Details
Main Author: George Mikros (19197997) (author)
Other Authors: Dimitris Boumparis (19198000) (author)
Published: 2024
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513510347636736
author George Mikros (19197997)
author2 Dimitris Boumparis (19198000)
author2_role author
author_facet George Mikros (19197997)
Dimitris Boumparis (19198000)
author_role author
dc.creator.none.fl_str_mv George Mikros (19197997)
Dimitris Boumparis (19198000)
dc.date.none.fl_str_mv 2024-06-05T03:00:00Z
dc.identifier.none.fl_str_mv 10.1093/llc/fqae028
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Cross-linguistic_authorship_attribution_and_gender_profiling_Machine_translation_as_a_method_for_bridging_the_language_gap/26355028
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Human society
Gender studies
Information and computing sciences
Machine learning
Language, communication and culture
Linguistics
authorship attribution
author profiling
Machine Translation
multilingual word embeddings
Authors’ Multilevel N-gram Profiles
lexical diversity
dc.title.none.fl_str_mv Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author’s Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.</p><h2>Other Information</h2><p dir="ltr">Published in: Digital Scholarship in the Humanities<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/llc/fqae028" target="_blank">https://dx.doi.org/10.1093/llc/fqae028</a></p>
eu_rights_str_mv openAccess
id Manara2_ac20481d90f9747de694645200d8da53
identifier_str_mv 10.1093/llc/fqae028
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/26355028
publishDate 2024
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gapGeorge Mikros (19197997)Dimitris Boumparis (19198000)Human societyGender studiesInformation and computing sciencesMachine learningLanguage, communication and cultureLinguisticsauthorship attributionauthor profilingMachine Translationmultilingual word embeddingsAuthors’ Multilevel N-gram Profileslexical diversity<p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author’s Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.</p><h2>Other Information</h2><p dir="ltr">Published in: Digital Scholarship in the Humanities<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/llc/fqae028" target="_blank">https://dx.doi.org/10.1093/llc/fqae028</a></p>2024-06-05T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1093/llc/fqae028https://figshare.com/articles/journal_contribution/Cross-linguistic_authorship_attribution_and_gender_profiling_Machine_translation_as_a_method_for_bridging_the_language_gap/26355028CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/263550282024-06-05T03:00:00Z
spellingShingle Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
George Mikros (19197997)
Human society
Gender studies
Information and computing sciences
Machine learning
Language, communication and culture
Linguistics
authorship attribution
author profiling
Machine Translation
multilingual word embeddings
Authors’ Multilevel N-gram Profiles
lexical diversity
status_str publishedVersion
title Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
title_full Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
title_fullStr Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
title_full_unstemmed Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
title_short Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
title_sort Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
topic Human society
Gender studies
Information and computing sciences
Machine learning
Language, communication and culture
Linguistics
authorship attribution
author profiling
Machine Translation
multilingual word embeddings
Authors’ Multilevel N-gram Profiles
lexical diversity