DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
<p dir="ltr">Heterogeneity is the key aspect of complex networks and smart devices for using it as nature of live streams. The heterogeneous stream-disk join is a significant research topic in real-time processing applications because it can directly affect the data analytics. Multip...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | , , , , |
| Published: |
2023
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1864513527505485824 |
|---|---|
| author | Erum Mehmood (17541768) |
| author2 | Tayyaba Anees (15373043) Ahmad Sami Al-Shamayleh (17541495) Abdullah Hussein Al-Ghushami (17541771) Wajeeha Khalil (17541429) Adnan Akhunzada (3134064) |
| author2_role | author author author author author |
| author_facet | Erum Mehmood (17541768) Tayyaba Anees (15373043) Ahmad Sami Al-Shamayleh (17541495) Abdullah Hussein Al-Ghushami (17541771) Wajeeha Khalil (17541429) Adnan Akhunzada (3134064) |
| author_role | author |
| dc.creator.none.fl_str_mv | Erum Mehmood (17541768) Tayyaba Anees (15373043) Ahmad Sami Al-Shamayleh (17541495) Abdullah Hussein Al-Ghushami (17541771) Wajeeha Khalil (17541429) Adnan Akhunzada (3134064) |
| dc.date.none.fl_str_mv | 2023-06-21T06:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1109/access.2023.3288284 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/DHSDJArch_An_Efficient_Design_of_Distributed_Heterogeneous_Stream-Disk_Join_Architecture/25205168 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Engineering Electrical engineering Electronics, sensors and digital hardware Materials engineering Distributed databases Real-time systems Sparks Scalability Big Data Media streaming Optimization Apache Kafka distributed big data ETL heterogeneous stream MongoDB spark structured streaming un-structured data |
| dc.title.none.fl_str_mv | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p dir="ltr">Heterogeneity is the key aspect of complex networks and smart devices for using it as nature of live streams. The heterogeneous stream-disk join is a significant research topic in real-time processing applications because it can directly affect the data analytics. Multiple issues, including stream loss, scalability, disk access cost, and data accuracy, should be considered during heterogeneous stream-disk join transformation. In this work we overcome these issues by introducing a distributed heterogeneous stream-disk join architecture (DHSDJArch) which can prevent stream data loss as well as maintaining balance between heterogeneous distributed data sources and accuracy of stream-disk join. A four phased distributed architecture is proposed for the multi-objective optimization to transform heterogeneous incomplete stream. To prevent stream loss, configuration of log retention is proposed based on the characteristics of distributed event streaming platform (DESP) . Specifically, two transformations are proposed to pre-process heterogeneous streams and to join pre-processed stream with distributed disk data by performing real-time disk access while compensating the differences between data sources and streaming application, respectively. We conduct comprehensive experimental study on real datasets to verify the performance of proposed architecture in terms of accuracy, log retention policy, scaling, stability and cloud data storage.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="http://creativecommons.org/licenses/by/4.0" target="_blank">http://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2023.3288284" target="_blank">https://dx.doi.org/10.1109/access.2023.3288284</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_c3df29e9f4a66bb19d36c0e59ac43150 |
| identifier_str_mv | 10.1109/access.2023.3288284 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/25205168 |
| publishDate | 2023 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join ArchitectureErum Mehmood (17541768)Tayyaba Anees (15373043)Ahmad Sami Al-Shamayleh (17541495)Abdullah Hussein Al-Ghushami (17541771)Wajeeha Khalil (17541429)Adnan Akhunzada (3134064)EngineeringElectrical engineeringElectronics, sensors and digital hardwareMaterials engineeringDistributed databasesReal-time systemsSparksScalabilityBig DataMedia streamingOptimizationApache Kafkadistributed big dataETLheterogeneous streamMongoDBspark structured streamingun-structured data<p dir="ltr">Heterogeneity is the key aspect of complex networks and smart devices for using it as nature of live streams. The heterogeneous stream-disk join is a significant research topic in real-time processing applications because it can directly affect the data analytics. Multiple issues, including stream loss, scalability, disk access cost, and data accuracy, should be considered during heterogeneous stream-disk join transformation. In this work we overcome these issues by introducing a distributed heterogeneous stream-disk join architecture (DHSDJArch) which can prevent stream data loss as well as maintaining balance between heterogeneous distributed data sources and accuracy of stream-disk join. A four phased distributed architecture is proposed for the multi-objective optimization to transform heterogeneous incomplete stream. To prevent stream loss, configuration of log retention is proposed based on the characteristics of distributed event streaming platform (DESP) . Specifically, two transformations are proposed to pre-process heterogeneous streams and to join pre-processed stream with distributed disk data by performing real-time disk access while compensating the differences between data sources and streaming application, respectively. We conduct comprehensive experimental study on real datasets to verify the performance of proposed architecture in terms of accuracy, log retention policy, scaling, stability and cloud data storage.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="http://creativecommons.org/licenses/by/4.0" target="_blank">http://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2023.3288284" target="_blank">https://dx.doi.org/10.1109/access.2023.3288284</a></p>2023-06-21T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2023.3288284https://figshare.com/articles/journal_contribution/DHSDJArch_An_Efficient_Design_of_Distributed_Heterogeneous_Stream-Disk_Join_Architecture/25205168CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/252051682023-06-21T06:00:00Z |
| spellingShingle | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture Erum Mehmood (17541768) Engineering Electrical engineering Electronics, sensors and digital hardware Materials engineering Distributed databases Real-time systems Sparks Scalability Big Data Media streaming Optimization Apache Kafka distributed big data ETL heterogeneous stream MongoDB spark structured streaming un-structured data |
| status_str | publishedVersion |
| title | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| title_full | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| title_fullStr | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| title_full_unstemmed | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| title_short | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| title_sort | DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture |
| topic | Engineering Electrical engineering Electronics, sensors and digital hardware Materials engineering Distributed databases Real-time systems Sparks Scalability Big Data Media streaming Optimization Apache Kafka distributed big data ETL heterogeneous stream MongoDB spark structured streaming un-structured data |