Developing an online hate classifier for multiple social media platforms

<p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multip...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Joni Salminen (7434770) (author)
مؤلفون آخرون: Maximilian Hopf (14153376) (author), Shammur A. Chowdhury (14153379) (author), Soon-gyo Jung (7434773) (author), Hind Almerekhi (7434776) (author), Bernard J. Jansen (7434779) (author)
منشور في: 2020
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513566558650368
author Joni Salminen (7434770)
author2 Maximilian Hopf (14153376)
Shammur A. Chowdhury (14153379)
Soon-gyo Jung (7434773)
Hind Almerekhi (7434776)
Bernard J. Jansen (7434779)
author2_role author
author
author
author
author
author_facet Joni Salminen (7434770)
Maximilian Hopf (14153376)
Shammur A. Chowdhury (14153379)
Soon-gyo Jung (7434773)
Hind Almerekhi (7434776)
Bernard J. Jansen (7434779)
author_role author
dc.creator.none.fl_str_mv Joni Salminen (7434770)
Maximilian Hopf (14153376)
Shammur A. Chowdhury (14153379)
Soon-gyo Jung (7434773)
Hind Almerekhi (7434776)
Bernard J. Jansen (7434779)
dc.date.none.fl_str_mv 2020-01-02T18:00:00Z
dc.identifier.none.fl_str_mv 10.1186/s13673-019-0205-6
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Developing_an_online_hate_classifier_for_multiple_social_media_platforms/21598500
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Artificial intelligence
Human-centred computing
Machine learning
Online hate
Toxicity
Social media
Machine learning
dc.title.none.fl_str_mv Developing an online hate classifier for multiple social media platforms
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.</p><h2>Other Information</h2> <p> Published in: Human-centric Computing and Information Sciences<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1186/s13673-019-0205-6" target="_blank">http://dx.doi.org/10.1186/s13673-019-0205-6</a></p>
eu_rights_str_mv openAccess
id Manara2_9ba2b85d9d7c5f916ef917bef2c79020
identifier_str_mv 10.1186/s13673-019-0205-6
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/21598500
publishDate 2020
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Developing an online hate classifier for multiple social media platformsJoni Salminen (7434770)Maximilian Hopf (14153376)Shammur A. Chowdhury (14153379)Soon-gyo Jung (7434773)Hind Almerekhi (7434776)Bernard J. Jansen (7434779)Information and computing sciencesArtificial intelligenceHuman-centred computingMachine learningOnline hateToxicitySocial mediaMachine learning<p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.</p><h2>Other Information</h2> <p> Published in: Human-centric Computing and Information Sciences<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1186/s13673-019-0205-6" target="_blank">http://dx.doi.org/10.1186/s13673-019-0205-6</a></p>2020-01-02T18:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1186/s13673-019-0205-6https://figshare.com/articles/journal_contribution/Developing_an_online_hate_classifier_for_multiple_social_media_platforms/21598500CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/215985002020-01-02T18:00:00Z
spellingShingle Developing an online hate classifier for multiple social media platforms
Joni Salminen (7434770)
Information and computing sciences
Artificial intelligence
Human-centred computing
Machine learning
Online hate
Toxicity
Social media
Machine learning
status_str publishedVersion
title Developing an online hate classifier for multiple social media platforms
title_full Developing an online hate classifier for multiple social media platforms
title_fullStr Developing an online hate classifier for multiple social media platforms
title_full_unstemmed Developing an online hate classifier for multiple social media platforms
title_short Developing an online hate classifier for multiple social media platforms
title_sort Developing an online hate classifier for multiple social media platforms
topic Information and computing sciences
Artificial intelligence
Human-centred computing
Machine learning
Online hate
Toxicity
Social media
Machine learning