IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets

<p>While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Reem Suwaileh (17863055) (author)
مؤلفون آخرون: Tamer Elsayed (14777071) (author), Muhammad Imran (282621) (author)
منشور في: 2023
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513528979783680
author Reem Suwaileh (17863055)
author2 Tamer Elsayed (14777071)
Muhammad Imran (282621)
author2_role author
author
author_facet Reem Suwaileh (17863055)
Tamer Elsayed (14777071)
Muhammad Imran (282621)
author_role author
dc.creator.none.fl_str_mv Reem Suwaileh (17863055)
Tamer Elsayed (14777071)
Muhammad Imran (282621)
dc.date.none.fl_str_mv 2023-05-01T00:00:00Z
dc.identifier.none.fl_str_mv 10.1016/j.ipm.2023.103340
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/IDRISI-RE_A_generalizable_dataset_with_benchmarks_for_location_mention_recognition_on_disaster_tweets/25101293
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Information and computing sciences
Information systems
Library and information studies
Location mention recognition
Twitter
Geolocation
Disaster management
Dataset
Domain generalizability
Geographical generalizability
dc.title.none.fl_str_mv IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR) in tweets, specifically, is attributed to the lack of a standard dataset that enables research in LMR. To bridge this gap, we present IDRISI-RE, a large-scale human-labeled LMR dataset comprising around 20.5k tweets. The annotated location mentions within the tweets are also assigned location types (e.g., country, city, street, etc.). IDRISI-RE contains tweets from 19 disaster events of diverse types (e.g., flood and earthquake) covering a wide geographical area of 22 English-speaking countries. Additionally, IDRISI-RE contains about 56.6k automatically-labeled tweets that we offer as a silver dataset. To highlight the superiority of IDRISI-RE over past efforts, we present rigorous analyses on reliability, consistency, coverage, diversity, and generalizability. Furthermore, we benchmark IDRISI-RE using a representative set of LMR models to provide the community with baselines for future work. Our extensive empirical analysis shows the promising generalizability of IDRISI-RE compared to existing datasets. We show that models trained on IDRISI-RE better tackle domain shifts and are less susceptible to change in geographical areas.</p><h2>Other Information</h2> <p> Published in: Information Processing & Management<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.ipm.2023.103340" target="_blank">https://dx.doi.org/10.1016/j.ipm.2023.103340</a></p>
eu_rights_str_mv openAccess
id Manara2_f59c5ea47fc0a8d06f5af1b24e7397a5
identifier_str_mv 10.1016/j.ipm.2023.103340
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25101293
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweetsReem Suwaileh (17863055)Tamer Elsayed (14777071)Muhammad Imran (282621)Information and computing sciencesInformation systemsLibrary and information studiesLocation mention recognitionTwitterGeolocationDisaster managementDatasetDomain generalizabilityGeographical generalizability<p>While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR) in tweets, specifically, is attributed to the lack of a standard dataset that enables research in LMR. To bridge this gap, we present IDRISI-RE, a large-scale human-labeled LMR dataset comprising around 20.5k tweets. The annotated location mentions within the tweets are also assigned location types (e.g., country, city, street, etc.). IDRISI-RE contains tweets from 19 disaster events of diverse types (e.g., flood and earthquake) covering a wide geographical area of 22 English-speaking countries. Additionally, IDRISI-RE contains about 56.6k automatically-labeled tweets that we offer as a silver dataset. To highlight the superiority of IDRISI-RE over past efforts, we present rigorous analyses on reliability, consistency, coverage, diversity, and generalizability. Furthermore, we benchmark IDRISI-RE using a representative set of LMR models to provide the community with baselines for future work. Our extensive empirical analysis shows the promising generalizability of IDRISI-RE compared to existing datasets. We show that models trained on IDRISI-RE better tackle domain shifts and are less susceptible to change in geographical areas.</p><h2>Other Information</h2> <p> Published in: Information Processing & Management<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.ipm.2023.103340" target="_blank">https://dx.doi.org/10.1016/j.ipm.2023.103340</a></p>2023-05-01T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.ipm.2023.103340https://figshare.com/articles/journal_contribution/IDRISI-RE_A_generalizable_dataset_with_benchmarks_for_location_mention_recognition_on_disaster_tweets/25101293CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/251012932023-05-01T00:00:00Z
spellingShingle IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
Reem Suwaileh (17863055)
Information and computing sciences
Information systems
Library and information studies
Location mention recognition
Twitter
Geolocation
Disaster management
Dataset
Domain generalizability
Geographical generalizability
status_str publishedVersion
title IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
title_full IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
title_fullStr IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
title_full_unstemmed IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
title_short IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
title_sort IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
topic Information and computing sciences
Information systems
Library and information studies
Location mention recognition
Twitter
Geolocation
Disaster management
Dataset
Domain generalizability
Geographical generalizability