TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
<div><p>As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. T...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , |
| منشور في: |
2022
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513518594686976 |
|---|---|
| author | Muhammad Imran (282621) |
| author2 | Umair Qazi (8983514) Ferda Ofli (8983517) |
| author2_role | author author |
| author_facet | Muhammad Imran (282621) Umair Qazi (8983514) Ferda Ofli (8983517) |
| author_role | author |
| dc.creator.none.fl_str_mv | Muhammad Imran (282621) Umair Qazi (8983514) Ferda Ofli (8983517) |
| dc.date.none.fl_str_mv | 2022-01-10T03:00:00Z |
| dc.identifier.none.fl_str_mv | 10.3390/data7010008 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/TBCOV_Two_Billion_Multilingual_COVID-19_Tweets_with_Sentiment_Entity_Geo_and_Gender_Labels/25671924 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Information systems social sensing COVID-19 sentiment analysis trends analysis geo-mapping natural cities |
| dc.title.none.fl_str_mv | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <div><p>As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/data7010008" target="_blank">https://dx.doi.org/10.3390/data7010008</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_be8a1c040d84490d23e63f8b7a8df2d7 |
| identifier_str_mv | 10.3390/data7010008 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/25671924 |
| publishDate | 2022 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender LabelsMuhammad Imran (282621)Umair Qazi (8983514)Ferda Ofli (8983517)Information and computing sciencesInformation systemssocial sensingCOVID-19sentiment analysistrends analysisgeo-mappingnatural cities<div><p>As the world struggles with several compounded challenges caused by the COVID-19 pandemic in the health, economic, and social domains, timely access to disaggregated national and sub-national data are important to understand the emergent situation but it is difficult to obtain. The widespread usage of social networking sites, especially during mass convergence events, such as health emergencies, provides instant access to citizen-generated data offering rich information about public opinions, sentiments, and situational updates useful for authorities to gain insights. We offer a large-scale social sensing dataset comprising two billion multilingual tweets posted from 218 countries by 87 million users in 67 languages. We used state-of-the-art machine learning models to enrich the data with sentiment labels and named-entities. Additionally, a gender identification approach is proposed to segregate user gender. Furthermore, a geolocalization approach is devised to geotag tweets at country, state, county, and city granularities, enabling a myriad of data analysis tasks to understand real-world issues at national and sub-national levels. We believe this multilingual data with broader geographical and longer temporal coverage will be a cornerstone for researchers to study impacts of the ongoing global health catastrophe and to manage adverse consequences related to people’s health, livelihood, and social well-being.</p><p> </p></div><h2>Other Information</h2> <p> Published in: Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3390/data7010008" target="_blank">https://dx.doi.org/10.3390/data7010008</a></p>2022-01-10T03:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3390/data7010008https://figshare.com/articles/journal_contribution/TBCOV_Two_Billion_Multilingual_COVID-19_Tweets_with_Sentiment_Entity_Geo_and_Gender_Labels/25671924CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/256719242022-01-10T03:00:00Z |
| spellingShingle | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels Muhammad Imran (282621) Information and computing sciences Information systems social sensing COVID-19 sentiment analysis trends analysis geo-mapping natural cities |
| status_str | publishedVersion |
| title | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| title_full | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| title_fullStr | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| title_full_unstemmed | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| title_short | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| title_sort | TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels |
| topic | Information and computing sciences Information systems social sensing COVID-19 sentiment analysis trends analysis geo-mapping natural cities |