Text-based framework for spam detection in Twitter. (c2017)
Due to the inevitable popularity of twitter, as well as its ability to transport messages into sparse communities, spammers tend to take twitter for granted in spreading their commercial messages. Moreover, different spammers behave in various manners. Some of them adopted behavioral approaches; oth...
Saved in:
| Main Author: | |
|---|---|
| Format: | masterThesis |
| Published: |
2017
|
| Subjects: | |
| Online Access: | http://hdl.handle.net/10725/6553 https://doi.org/10.26756/th.2017.21 http://libraries.lau.edu.lb/research/laur/terms-of-use/thesis.php |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Due to the inevitable popularity of twitter, as well as its ability to transport messages into sparse communities, spammers tend to take twitter for granted in spreading their commercial messages. Moreover, different spammers behave in various manners. Some of them adopted behavioral approaches; others made use of content entropy while many others explored bait behaviors. Previous related works look at this problem from the perspective of studying a tweet along with its metadata, performing different statistical and profiling activities in order to infer about spam. However, these approaches do not pay attention to the limitations placed over twitter’s streaming API, minimizing user’s abilities to extracting follower and followees’ data. Also, many of the approaches violate user privacy by investigating personal data about him/her without previous consent. This thesis is dedicated to studying the relationship between tweets shared by different users, particularly, content considered as spam vs. legitimate. Moreover, we will overcome the above mentioned limitations by developing a set of Message to Message analysis approaches. First, we will deploy the cosine vector similarity and later the natural language toolkit and co-occurrence model to enhance the correctness in detection. However, due to spammer’s creativity in building organic messages, hardly looking similar to old messages, these models suffer from limitations. That is why, we elaborate the use of ontologies in detecting spam over twitter during events. Our experimental results will demonstrate the efficiency of analyzing spam content/semantic relationships over twitter through ontologies. |
|---|