A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets

<p dir="ltr">Twitter’s widespread popularity has made it a prime target for malicious actors exploiting trending hashtags to disseminate harmful content. This study marks the first systematic exploration of semantic consistency in tweets to detect trending topic attacks. Unlike previ...

Full description

Saved in:

Bibliographic Details
Main Author:	Insaf Kraidia (19198012) (author)
Other Authors:	Afifa Ghenai (19198015) (author), Samir Brahim Belhaouari (9427347) (author)
Published:	2025
Subjects:	Information and computing sciences Artificial intelligence Cybersecurity and privacy Data management and data science Machine learning Trending topic attacks semantic similarity detection twitter hashtag Semantics Social networking (online) Feature extraction Blogs Data augmentation Labeling Training Visualization Unsolicited e-mail Accuracy
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513538064646144
author	Insaf Kraidia (19198012)
author2	Afifa Ghenai (19198015) Samir Brahim Belhaouari (9427347)
author2_role	author author
author_facet	Insaf Kraidia (19198012) Afifa Ghenai (19198015) Samir Brahim Belhaouari (9427347)
author_role	author
dc.creator.none.fl_str_mv	Insaf Kraidia (19198012) Afifa Ghenai (19198015) Samir Brahim Belhaouari (9427347)
dc.date.none.fl_str_mv	2025-02-03T06:00:00Z
dc.identifier.none.fl_str_mv	10.1109/access.2025.3535996
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/A_Multi-Faceted_Approach_to_Trending_Topic_Attack_Detection_Using_Semantic_Similarity_and_Large-Scale_Datasets/30234103
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Information and computing sciences Artificial intelligence Cybersecurity and privacy Data management and data science Machine learning Trending topic attacks semantic similarity detection twitter hashtag Semantics Social networking (online) Feature extraction Blogs Data augmentation Labeling Training Visualization Unsolicited e-mail Accuracy
dc.title.none.fl_str_mv	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<p dir="ltr">Twitter’s widespread popularity has made it a prime target for malicious actors exploiting trending hashtags to disseminate harmful content. This study marks the first systematic exploration of semantic consistency in tweets to detect trending topic attacks. Unlike previous approaches, we emphasize the semantic aspect of tweets, leveraging advanced techniques such as semantic similarity estimation using WordNet and contextual understanding through Sentence-Transformers. To support this methodology, we curated large-scale, high-quality datasets comprising 7,000 Arabic and 28,000 English tweets, applying tailored preprocessing steps to ensure efficiency and accuracy. A novel data augmentation technique further enriched the quality and diversity of these datasets. We evaluated our approach using a comprehensive framework that assessed textual, image, and overall similarity. Five machine learning models—Random Forest, Decision Tree, K-Neighbors, Gradient Boosting, and XGBoost—were tested, with results benchmarked against nine baseline methods across different linguistic datasets and learning scenarios. Our approach demonstrated superior performance, achieving F1-scores of 96% for English and 97% for Arabic, with accuracy improvements ranging from 2% to 14% for English and 5% to 28% for Arabic. These results establish a new benchmark for detecting trending topic attacks across languages, highlighting the robustness and effectiveness of our method in combating malicious activities on social platforms.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3535996" target="_blank">https://dx.doi.org/10.1109/access.2025.3535996</a></p>
eu_rights_str_mv	openAccess
id	Manara2_d76c203d9d390a987477d6588702cc39
identifier_str_mv	10.1109/access.2025.3535996
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/30234103
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale DatasetsInsaf Kraidia (19198012)Afifa Ghenai (19198015)Samir Brahim Belhaouari (9427347)Information and computing sciencesArtificial intelligenceCybersecurity and privacyData management and data scienceMachine learningTrending topic attackssemantic similaritydetectiontwitterhashtagSemanticsSocial networking (online)Feature extractionBlogsData augmentationLabelingTrainingVisualizationUnsolicited e-mailAccuracy<p dir="ltr">Twitter’s widespread popularity has made it a prime target for malicious actors exploiting trending hashtags to disseminate harmful content. This study marks the first systematic exploration of semantic consistency in tweets to detect trending topic attacks. Unlike previous approaches, we emphasize the semantic aspect of tweets, leveraging advanced techniques such as semantic similarity estimation using WordNet and contextual understanding through Sentence-Transformers. To support this methodology, we curated large-scale, high-quality datasets comprising 7,000 Arabic and 28,000 English tweets, applying tailored preprocessing steps to ensure efficiency and accuracy. A novel data augmentation technique further enriched the quality and diversity of these datasets. We evaluated our approach using a comprehensive framework that assessed textual, image, and overall similarity. Five machine learning models—Random Forest, Decision Tree, K-Neighbors, Gradient Boosting, and XGBoost—were tested, with results benchmarked against nine baseline methods across different linguistic datasets and learning scenarios. Our approach demonstrated superior performance, achieving F1-scores of 96% for English and 97% for Arabic, with accuracy improvements ranging from 2% to 14% for English and 5% to 28% for Arabic. These results establish a new benchmark for detecting trending topic attacks across languages, highlighting the robustness and effectiveness of our method in combating malicious activities on social platforms.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/deed.en" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2025.3535996" target="_blank">https://dx.doi.org/10.1109/access.2025.3535996</a></p>2025-02-03T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2025.3535996https://figshare.com/articles/journal_contribution/A_Multi-Faceted_Approach_to_Trending_Topic_Attack_Detection_Using_Semantic_Similarity_and_Large-Scale_Datasets/30234103CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/302341032025-02-03T06:00:00Z
spellingShingle	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets Insaf Kraidia (19198012) Information and computing sciences Artificial intelligence Cybersecurity and privacy Data management and data science Machine learning Trending topic attacks semantic similarity detection twitter hashtag Semantics Social networking (online) Feature extraction Blogs Data augmentation Labeling Training Visualization Unsolicited e-mail Accuracy
status_str	publishedVersion
title	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
title_full	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
title_fullStr	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
title_full_unstemmed	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
title_short	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
title_sort	A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets
topic	Information and computing sciences Artificial intelligence Cybersecurity and privacy Data management and data science Machine learning Trending topic attacks semantic similarity detection twitter hashtag Semantics Social networking (online) Feature extraction Blogs Data augmentation Labeling Training Visualization Unsolicited e-mail Accuracy

A Multi-Faceted Approach to Trending Topic Attack Detection Using Semantic Similarity and Large-Scale Datasets

Similar Items