LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
<p>Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an i...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , , |
| منشور في: |
2020
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513512985853952 |
|---|---|
| author | Mingjie Tang (227920) |
| author2 | Yongyang Yu (400611) Ahmed R. Mahmood (18623587) Qutaibah M. Malluhi (14151912) Mourad Ouzzani (3618794) Walid G. Aref (18623590) |
| author2_role | author author author author author |
| author_facet | Mingjie Tang (227920) Yongyang Yu (400611) Ahmed R. Mahmood (18623587) Qutaibah M. Malluhi (14151912) Mourad Ouzzani (3618794) Walid G. Aref (18623590) |
| author_role | author |
| dc.creator.none.fl_str_mv | Mingjie Tang (227920) Yongyang Yu (400611) Ahmed R. Mahmood (18623587) Qutaibah M. Malluhi (14151912) Mourad Ouzzani (3618794) Walid G. Aref (18623590) |
| dc.date.none.fl_str_mv | 2020-10-16T09:00:00Z |
| dc.identifier.none.fl_str_mv | 10.3389/fdata.2020.00030 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/LocationSpark_In-memory_Distributed_Spatial_Query_Processing_and_Optimization/25912255 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Engineering Geomatic engineering Information and computing sciences Data management and data science spatial data query processing in-memory computation parallel computing query optimization |
| dc.title.none.fl_str_mv | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happen in practice, and minimize communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.</p><h2>Other Information</h2> <p> Published in: Frontiers in Big Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3389/fdata.2020.00030" target="_blank">https://dx.doi.org/10.3389/fdata.2020.00030</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_ccb2e7d9f377728cf586ef7c0ef17c0d |
| identifier_str_mv | 10.3389/fdata.2020.00030 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/25912255 |
| publishDate | 2020 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | LocationSpark: In-memory Distributed Spatial Query Processing and OptimizationMingjie Tang (227920)Yongyang Yu (400611)Ahmed R. Mahmood (18623587)Qutaibah M. Malluhi (14151912)Mourad Ouzzani (3618794)Walid G. Aref (18623590)EngineeringGeomatic engineeringInformation and computing sciencesData management and data sciencespatial dataquery processingin-memory computationparallel computingquery optimization<p>Due to the ubiquity of spatial data applications and the large amounts of spatial data that these applications generate and process, there is a pressing need for scalable spatial query processing. In this paper, we present new techniques for spatial query processing and optimization in an in-memory and distributed setup to address scalability. More specifically, we introduce new techniques for handling query skew that commonly happen in practice, and minimize communication costs accordingly. We propose a distributed query scheduler that uses a new cost model to minimize the cost of spatial query processing. The scheduler generates query execution plans that minimize the effect of query skew. The query scheduler utilizes new spatial indexing techniques based on bitmap filters to forward queries to the appropriate local nodes. Each local computation node is responsible for optimizing and selecting its best local query execution plan based on the indexes and the nature of the spatial queries in that node. All the proposed spatial query processing and optimization techniques are prototyped inside Spark, a distributed memory-based computation system. Our prototype system is termed LocationSpark. The experimental study is based on real datasets and demonstrates that LocationSpark can enhance distributed spatial query processing by up to an order of magnitude over existing in-memory and distributed spatial systems.</p><h2>Other Information</h2> <p> Published in: Frontiers in Big Data<br> License: <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.3389/fdata.2020.00030" target="_blank">https://dx.doi.org/10.3389/fdata.2020.00030</a></p>2020-10-16T09:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.3389/fdata.2020.00030https://figshare.com/articles/journal_contribution/LocationSpark_In-memory_Distributed_Spatial_Query_Processing_and_Optimization/25912255CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/259122552020-10-16T09:00:00Z |
| spellingShingle | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization Mingjie Tang (227920) Engineering Geomatic engineering Information and computing sciences Data management and data science spatial data query processing in-memory computation parallel computing query optimization |
| status_str | publishedVersion |
| title | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| title_full | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| title_fullStr | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| title_full_unstemmed | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| title_short | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| title_sort | LocationSpark: In-memory Distributed Spatial Query Processing and Optimization |
| topic | Engineering Geomatic engineering Information and computing sciences Data management and data science spatial data query processing in-memory computation parallel computing query optimization |