IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets
<p>While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , |
| منشور في: |
2023
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513528979783680 |
|---|---|
| author | Reem Suwaileh (17863055) |
| author2 | Tamer Elsayed (14777071) Muhammad Imran (282621) |
| author2_role | author author |
| author_facet | Reem Suwaileh (17863055) Tamer Elsayed (14777071) Muhammad Imran (282621) |
| author_role | author |
| dc.creator.none.fl_str_mv | Reem Suwaileh (17863055) Tamer Elsayed (14777071) Muhammad Imran (282621) |
| dc.date.none.fl_str_mv | 2023-05-01T00:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1016/j.ipm.2023.103340 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/IDRISI-RE_A_generalizable_dataset_with_benchmarks_for_location_mention_recognition_on_disaster_tweets/25101293 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Information and computing sciences Information systems Library and information studies Location mention recognition Geolocation Disaster management Dataset Domain generalizability Geographical generalizability |
| dc.title.none.fl_str_mv | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR) in tweets, specifically, is attributed to the lack of a standard dataset that enables research in LMR. To bridge this gap, we present IDRISI-RE, a large-scale human-labeled LMR dataset comprising around 20.5k tweets. The annotated location mentions within the tweets are also assigned location types (e.g., country, city, street, etc.). IDRISI-RE contains tweets from 19 disaster events of diverse types (e.g., flood and earthquake) covering a wide geographical area of 22 English-speaking countries. Additionally, IDRISI-RE contains about 56.6k automatically-labeled tweets that we offer as a silver dataset. To highlight the superiority of IDRISI-RE over past efforts, we present rigorous analyses on reliability, consistency, coverage, diversity, and generalizability. Furthermore, we benchmark IDRISI-RE using a representative set of LMR models to provide the community with baselines for future work. Our extensive empirical analysis shows the promising generalizability of IDRISI-RE compared to existing datasets. We show that models trained on IDRISI-RE better tackle domain shifts and are less susceptible to change in geographical areas.</p><h2>Other Information</h2> <p> Published in: Information Processing & Management<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.ipm.2023.103340" target="_blank">https://dx.doi.org/10.1016/j.ipm.2023.103340</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_f59c5ea47fc0a8d06f5af1b24e7397a5 |
| identifier_str_mv | 10.1016/j.ipm.2023.103340 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/25101293 |
| publishDate | 2023 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweetsReem Suwaileh (17863055)Tamer Elsayed (14777071)Muhammad Imran (282621)Information and computing sciencesInformation systemsLibrary and information studiesLocation mention recognitionTwitterGeolocationDisaster managementDatasetDomain generalizabilityGeographical generalizability<p>While utilizing Twitter data for crisis management is of interest to different response authorities, a critical challenge that hinders the utilization of such data is the scarcity of automated tools that extract geolocation information. The limited focus on Location Mention Recognition (LMR) in tweets, specifically, is attributed to the lack of a standard dataset that enables research in LMR. To bridge this gap, we present IDRISI-RE, a large-scale human-labeled LMR dataset comprising around 20.5k tweets. The annotated location mentions within the tweets are also assigned location types (e.g., country, city, street, etc.). IDRISI-RE contains tweets from 19 disaster events of diverse types (e.g., flood and earthquake) covering a wide geographical area of 22 English-speaking countries. Additionally, IDRISI-RE contains about 56.6k automatically-labeled tweets that we offer as a silver dataset. To highlight the superiority of IDRISI-RE over past efforts, we present rigorous analyses on reliability, consistency, coverage, diversity, and generalizability. Furthermore, we benchmark IDRISI-RE using a representative set of LMR models to provide the community with baselines for future work. Our extensive empirical analysis shows the promising generalizability of IDRISI-RE compared to existing datasets. We show that models trained on IDRISI-RE better tackle domain shifts and are less susceptible to change in geographical areas.</p><h2>Other Information</h2> <p> Published in: Information Processing & Management<br> License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.ipm.2023.103340" target="_blank">https://dx.doi.org/10.1016/j.ipm.2023.103340</a></p>2023-05-01T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.ipm.2023.103340https://figshare.com/articles/journal_contribution/IDRISI-RE_A_generalizable_dataset_with_benchmarks_for_location_mention_recognition_on_disaster_tweets/25101293CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/251012932023-05-01T00:00:00Z |
| spellingShingle | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets Reem Suwaileh (17863055) Information and computing sciences Information systems Library and information studies Location mention recognition Geolocation Disaster management Dataset Domain generalizability Geographical generalizability |
| status_str | publishedVersion |
| title | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| title_full | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| title_fullStr | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| title_full_unstemmed | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| title_short | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| title_sort | IDRISI-RE: A generalizable dataset with benchmarks for location mention recognition on disaster tweets |
| topic | Information and computing sciences Information systems Library and information studies Location mention recognition Geolocation Disaster management Dataset Domain generalizability Geographical generalizability |