TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

<div><p>As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. T...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Muhammad Imran (282621) (author)
مؤلفون آخرون: Umair Qazi (8983514) (author), Ferda Ofli (8983517) (author)
منشور في: 2022
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513518594686976
author Muhammad Imran (282621)
author2 Umair Qazi (8983514)
Ferda Ofli (8983517)
author2_role author
author
author_facet Muhammad Imran (282621)
Umair Qazi (8983514)
Ferda Ofli (8983517)
author_role author
dc.creator.none.fl_str_mv Muhammad Imran (282621)
Umair Qazi (8983514)
Ferda Ofli (8983517)
dc.date.none.fl_str_mv 2022-01-10T03:00:00Z
dc.identifier.none.fl_str_mv 10.3390/data7010008
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/TBCOV_Two_Billion_Multilingual_COVID-19_Tweets_with_Sentiment_Entity_Geo_and_Gender_Labels/25671924
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Information systems
social sensing
COVID-19
sentiment analysis
trends analysis
geo-mapping
natural cities
dc.title.none.fl_str_mv TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <div><p>As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/data7010008" target="_blank">https://dx.doi.org/10.3390/data7010008</a></p>
eu_rights_str_mv openAccess
id Manara2_be8a1c040d84490d23e63f8b7a8df2d7
identifier_str_mv 10.3390/data7010008
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25671924
publishDate 2022
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender LabelsMuhammad Imran (282621)Umair Qazi (8983514)Ferda Ofli (8983517)Information and computing sciencesInformation systemssocial sensingCOVID-19sentiment analysistrends analysisgeo-mappingnatural cities<div><p>As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/data7010008" target="_blank">https://dx.doi.org/10.3390/data7010008</a></p>2022-01-10T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3390/data7010008https://figshare.com/articles/journal_contribution/TBCOV_Two_Billion_Multilingual_COVID-19_Tweets_with_Sentiment_Entity_Geo_and_Gender_Labels/25671924CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/256719242022-01-10T03:00:00Z
spellingShingle TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
Muhammad Imran (282621)
Information and computing sciences
Information systems
social sensing
COVID-19
sentiment analysis
trends analysis
geo-mapping
natural cities
status_str publishedVersion
title TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
title_full TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
title_fullStr TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
title_full_unstemmed TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
title_short TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
title_sort TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
topic Information and computing sciences
Information systems
social sensing
COVID-19
sentiment analysis
trends analysis
geo-mapping
natural cities