PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits

<p>Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity tr...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Hind Almerekhi (7434776) (author)
مؤلفون آخرون:	Haewoon Kwak (5747558) (author), Joni Salminen (7434770) (author), Bernard J. Jansen (7434779) (author)
منشور في:	2022
الموضوعات:	Information and computing sciences Library and information studies Online toxicity Conversation threads Reddit Toxicity triggers Neural networks Social media
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

_version_	1864513518708981760
author	Hind Almerekhi (7434776)
author2	Haewoon Kwak (5747558) Joni Salminen (7434770) Bernard J. Jansen (7434779)
author2_role	author author author
author_facet	Hind Almerekhi (7434776) Haewoon Kwak (5747558) Joni Salminen (7434770) Bernard J. Jansen (7434779)
author_role	author
dc.creator.none.fl_str_mv	Hind Almerekhi (7434776) Haewoon Kwak (5747558) Joni Salminen (7434770) Bernard J. Jansen (7434779)
dc.date.none.fl_str_mv	2022-10-01T00:00:00Z
dc.identifier.none.fl_str_mv	10.1016/j.dim.2022.100019
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/PROVOKE_Toxicity_trigger_detection_in_conversations_from_the_top_100_subreddits/25662672
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Library and information studies Online toxicity Conversation threads Reddit Toxicity triggers Neural networks Social media
dc.title.none.fl_str_mv	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p>Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments. Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.</p><h2>Other Information</h2> <p> Published in: Data and Information Management<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.dim.2022.100019" target="_blank">https://dx.doi.org/10.1016/j.dim.2022.100019</a></p>
eu_rights_str_mv	openAccess
id	Manara2_a3596ecaafaf328d64bdd3d0e2895c4d
identifier_str_mv	10.1016/j.dim.2022.100019
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/25662672
publishDate	2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	PROVOKE: Toxicity trigger detection in conversations from the top 100 subredditsHind Almerekhi (7434776)Haewoon Kwak (5747558)Joni Salminen (7434770)Bernard J. Jansen (7434779)Information and computing sciencesLibrary and information studiesOnline toxicityConversation threadsRedditToxicity triggersNeural networksSocial media<p>Promoting healthy discourse on community-based online platforms like Reddit can be challenging, especially when conversations show ominous signs of toxicity. Therefore, in this study, we find the turning points (i.e., toxicity triggers) making conversations toxic. Before finding toxicity triggers, we built and evaluated various machine learning models to detect toxicity from Reddit comments. Subsequently, we used our best-performing model, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model that achieved an area under the receiver operating characteristic curve (AUC) score of 0.983 to detect toxicity. Next, we constructed conversation threads and used the toxicity prediction results to build a training set for detecting toxicity triggers. This procedure entailed using our large-scale dataset to refine toxicity triggers' definition and build a trigger detection dataset using 991,806 conversation threads from the top 100 communities on Reddit. Then, we extracted a set of sentiment shift, topical shift, and context-based features from the trigger detection dataset, using them to build a dual embedding biLSTM neural network that achieved an AUC score of 0.789. Our trigger detection dataset analysis showed that specific triggering keywords are common across all communities, like ‘racist’ and ‘women’. In contrast, other triggering keywords are specific to certain communities, like ‘overwatch’ in r/Games. Implications are that toxicity trigger detection algorithms can leverage generic approaches but must also tailor detections to specific communities.</p><h2>Other Information</h2> <p> Published in: Data and Information Management<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.dim.2022.100019" target="_blank">https://dx.doi.org/10.1016/j.dim.2022.100019</a></p>2022-10-01T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.dim.2022.100019https://figshare.com/articles/journal_contribution/PROVOKE_Toxicity_trigger_detection_in_conversations_from_the_top_100_subreddits/25662672CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/256626722022-10-01T00:00:00Z
spellingShingle	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits Hind Almerekhi (7434776) Information and computing sciences Library and information studies Online toxicity Conversation threads Reddit Toxicity triggers Neural networks Social media
status_str	publishedVersion
title	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
title_full	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
title_fullStr	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
title_full_unstemmed	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
title_short	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
title_sort	PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits
topic	Information and computing sciences Library and information studies Online toxicity Conversation threads Reddit Toxicity triggers Neural networks Social media

PROVOKE: Toxicity trigger detection in conversations from the top 100 subreddits

مواد مشابهة