Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap
<p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | |
| Published: |
2024
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513510347636736 |
|---|---|
| author | George Mikros (19197997) |
| author2 | Dimitris Boumparis (19198000) |
| author2_role | author |
| author_facet | George Mikros (19197997) Dimitris Boumparis (19198000) |
| author_role | author |
| dc.creator.none.fl_str_mv | George Mikros (19197997) Dimitris Boumparis (19198000) |
| dc.date.none.fl_str_mv | 2024-06-05T03:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1093/llc/fqae028 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Cross-linguistic_authorship_attribution_and_gender_profiling_Machine_translation_as_a_method_for_bridging_the_language_gap/26355028 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Human society Gender studies Information and computing sciences Machine learning Language, communication and culture Linguistics authorship attribution author profiling Machine Translation multilingual word embeddings Authors’ Multilevel N-gram Profiles lexical diversity |
| dc.title.none.fl_str_mv | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author’s Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.</p><h2>Other Information</h2><p dir="ltr">Published in: Digital Scholarship in the Humanities<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/llc/fqae028" target="_blank">https://dx.doi.org/10.1093/llc/fqae028</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_ac20481d90f9747de694645200d8da53 |
| identifier_str_mv | 10.1093/llc/fqae028 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/26355028 |
| publishDate | 2024 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gapGeorge Mikros (19197997)Dimitris Boumparis (19198000)Human societyGender studiesInformation and computing sciencesMachine learningLanguage, communication and cultureLinguisticsauthorship attributionauthor profilingMachine Translationmultilingual word embeddingsAuthors’ Multilevel N-gram Profileslexical diversity<p dir="ltr">This study explores the feasibility of cross-linguistic authorship attribution and the author’s gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google’s Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author’s Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.</p><h2>Other Information</h2><p dir="ltr">Published in: Digital Scholarship in the Humanities<br>License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1093/llc/fqae028" target="_blank">https://dx.doi.org/10.1093/llc/fqae028</a></p>2024-06-05T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1093/llc/fqae028https://figshare.com/articles/journal_contribution/Cross-linguistic_authorship_attribution_and_gender_profiling_Machine_translation_as_a_method_for_bridging_the_language_gap/26355028CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/263550282024-06-05T03:00:00Z |
| spellingShingle | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap George Mikros (19197997) Human society Gender studies Information and computing sciences Machine learning Language, communication and culture Linguistics authorship attribution author profiling Machine Translation multilingual word embeddings Authors’ Multilevel N-gram Profiles lexical diversity |
| status_str | publishedVersion |
| title | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| title_full | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| title_fullStr | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| title_full_unstemmed | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| title_short | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| title_sort | Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap |
| topic | Human society Gender studies Information and computing sciences Machine learning Language, communication and culture Linguistics authorship attribution author profiling Machine Translation multilingual word embeddings Authors’ Multilevel N-gram Profiles lexical diversity |