LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

<p>Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an i...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Mingjie Tang (227920) (author)
مؤلفون آخرون: Yongyang Yu (400611) (author), Ahmed R. Mahmood (18623587) (author), Qutaibah M. Malluhi (14151912) (author), Mourad Ouzzani (3618794) (author), Walid G. Aref (18623590) (author)
منشور في: 2020
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513512985853952
author Mingjie Tang (227920)
author2 Yongyang Yu (400611)
Ahmed R. Mahmood (18623587)
Qutaibah M. Malluhi (14151912)
Mourad Ouzzani (3618794)
Walid G. Aref (18623590)
author2_role author
author
author
author
author
author_facet Mingjie Tang (227920)
Yongyang Yu (400611)
Ahmed R. Mahmood (18623587)
Qutaibah M. Malluhi (14151912)
Mourad Ouzzani (3618794)
Walid G. Aref (18623590)
author_role author
dc.creator.none.fl_str_mv Mingjie Tang (227920)
Yongyang Yu (400611)
Ahmed R. Mahmood (18623587)
Qutaibah M. Malluhi (14151912)
Mourad Ouzzani (3618794)
Walid G. Aref (18623590)
dc.date.none.fl_str_mv 2020-10-16T09:00:00Z
dc.identifier.none.fl_str_mv 10.3389/fdata.2020.00030
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/LocationSpark_In-memory_Distributed_Spatial_Query_Processing_and_Optimization/25912255
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Geomatic engineering
Information and computing sciences
Data management and data science
spatial data
query processing
in-memory computation
parallel computing
query optimization
dc.title.none.fl_str_mv LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happen in practice, and minimize communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.</p><h2>Other Information</h2> <p> Published in: Frontiers in Big Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3389/fdata.2020.00030" target="_blank">https://dx.doi.org/10.3389/fdata.2020.00030</a></p>
eu_rights_str_mv openAccess
id Manara2_ccb2e7d9f377728cf586ef7c0ef17c0d
identifier_str_mv 10.3389/fdata.2020.00030
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25912255
publishDate 2020
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling LocationSpark: In-memory Distributed Spatial Query Processing and OptimizationMingjie Tang (227920)Yongyang Yu (400611)Ahmed R. Mahmood (18623587)Qutaibah M. Malluhi (14151912)Mourad Ouzzani (3618794)Walid G. Aref (18623590)EngineeringGeomatic engineeringInformation and computing sciencesData management and data sciencespatial dataquery processingin-memory computationparallel computingquery optimization<p>Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happen in practice, and minimize communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.</p><h2>Other Information</h2> <p> Published in: Frontiers in Big Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3389/fdata.2020.00030" target="_blank">https://dx.doi.org/10.3389/fdata.2020.00030</a></p>2020-10-16T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3389/fdata.2020.00030https://figshare.com/articles/journal_contribution/LocationSpark_In-memory_Distributed_Spatial_Query_Processing_and_Optimization/25912255CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/259122552020-10-16T09:00:00Z
spellingShingle LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
Mingjie Tang (227920)
Engineering
Geomatic engineering
Information and computing sciences
Data management and data science
spatial data
query processing
in-memory computation
parallel computing
query optimization
status_str publishedVersion
title LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
title_full LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
title_fullStr LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
title_full_unstemmed LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
title_short LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
title_sort LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
topic Engineering
Geomatic engineering
Information and computing sciences
Data management and data science
spatial data
query processing
in-memory computation
parallel computing
query optimization