Developing an online hate classifier for multiple social media platforms
<p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multip...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , , |
| منشور في: |
2020
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513566558650368 |
|---|---|
| author | Joni Salminen (7434770) |
| author2 | Maximilian Hopf (14153376) Shammur A. Chowdhury (14153379) Soon-gyo Jung (7434773) Hind Almerekhi (7434776) Bernard J. Jansen (7434779) |
| author2_role | author author author author author |
| author_facet | Joni Salminen (7434770) Maximilian Hopf (14153376) Shammur A. Chowdhury (14153379) Soon-gyo Jung (7434773) Hind Almerekhi (7434776) Bernard J. Jansen (7434779) |
| author_role | author |
| dc.creator.none.fl_str_mv | Joni Salminen (7434770) Maximilian Hopf (14153376) Shammur A. Chowdhury (14153379) Soon-gyo Jung (7434773) Hind Almerekhi (7434776) Bernard J. Jansen (7434779) |
| dc.date.none.fl_str_mv | 2020-01-02T18:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1186/s13673-019-0205-6 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Developing_an_online_hate_classifier_for_multiple_social_media_platforms/21598500 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Artificial intelligence Human-centred computing Machine learning Online hate Toxicity Social media Machine learning |
| dc.title.none.fl_str_mv | Developing an online hate classifier for multiple social media platforms |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.</p><h2>Other Information</h2> <p> Published in: Human-centric Computing and Information Sciences<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1186/s13673-019-0205-6" target="_blank">http://dx.doi.org/10.1186/s13673-019-0205-6</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_9ba2b85d9d7c5f916ef917bef2c79020 |
| identifier_str_mv | 10.1186/s13673-019-0205-6 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/21598500 |
| publishDate | 2020 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Developing an online hate classifier for multiple social media platformsJoni Salminen (7434770)Maximilian Hopf (14153376)Shammur A. Chowdhury (14153379)Soon-gyo Jung (7434773)Hind Almerekhi (7434776)Bernard J. Jansen (7434779)Information and computing sciencesArtificial intelligenceHuman-centred computingMachine learningOnline hateToxicitySocial mediaMachine learning<p>The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.</p><h2>Other Information</h2> <p> Published in: Human-centric Computing and Information Sciences<br> License: <a href="https://creativecommons.org/licenses/by/4.0" target="_blank">https://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="http://dx.doi.org/10.1186/s13673-019-0205-6" target="_blank">http://dx.doi.org/10.1186/s13673-019-0205-6</a></p>2020-01-02T18:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1186/s13673-019-0205-6https://figshare.com/articles/journal_contribution/Developing_an_online_hate_classifier_for_multiple_social_media_platforms/21598500CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/215985002020-01-02T18:00:00Z |
| spellingShingle | Developing an online hate classifier for multiple social media platforms Joni Salminen (7434770) Information and computing sciences Artificial intelligence Human-centred computing Machine learning Online hate Toxicity Social media Machine learning |
| status_str | publishedVersion |
| title | Developing an online hate classifier for multiple social media platforms |
| title_full | Developing an online hate classifier for multiple social media platforms |
| title_fullStr | Developing an online hate classifier for multiple social media platforms |
| title_full_unstemmed | Developing an online hate classifier for multiple social media platforms |
| title_short | Developing an online hate classifier for multiple social media platforms |
| title_sort | Developing an online hate classifier for multiple social media platforms |
| topic | Information and computing sciences Artificial intelligence Human-centred computing Machine learning Online hate Toxicity Social media Machine learning |