DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture

<p dir="ltr">Heterogeneity is the key aspect of complex networks and smart devices for using it as nature of live streams. The heterogeneous stream-disk join is a significant research topic in real-time processing applications because it can directly affect the data analytics. Multip...

Full description

Saved in:
Bibliographic Details
Main Author: Erum Mehmood (17541768) (author)
Other Authors: Tayyaba Anees (15373043) (author), Ahmad Sami Al-Shamayleh (17541495) (author), Abdullah Hussein Al-Ghushami (17541771) (author), Wajeeha Khalil (17541429) (author), Adnan Akhunzada (3134064) (author)
Published: 2023
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513527505485824
author Erum Mehmood (17541768)
author2 Tayyaba Anees (15373043)
Ahmad Sami Al-Shamayleh (17541495)
Abdullah Hussein Al-Ghushami (17541771)
Wajeeha Khalil (17541429)
Adnan Akhunzada (3134064)
author2_role author
author
author
author
author
author_facet Erum Mehmood (17541768)
Tayyaba Anees (15373043)
Ahmad Sami Al-Shamayleh (17541495)
Abdullah Hussein Al-Ghushami (17541771)
Wajeeha Khalil (17541429)
Adnan Akhunzada (3134064)
author_role author
dc.creator.none.fl_str_mv Erum Mehmood (17541768)
Tayyaba Anees (15373043)
Ahmad Sami Al-Shamayleh (17541495)
Abdullah Hussein Al-Ghushami (17541771)
Wajeeha Khalil (17541429)
Adnan Akhunzada (3134064)
dc.date.none.fl_str_mv 2023-06-21T06:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2023.3288284
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/DHSDJArch_An_Efficient_Design_of_Distributed_Heterogeneous_Stream-Disk_Join_Architecture/25205168
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Electrical engineering
Electronics, sensors and digital hardware
Materials engineering
Distributed databases
Real-time systems
Sparks
Scalability
Big Data
Media streaming
Optimization
Apache Kafka
distributed big data
ETL
heterogeneous stream
MongoDB
spark structured streaming
un-structured data
dc.title.none.fl_str_mv DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p dir="ltr">Heterogeneity is the key aspect of complex networks and smart devices for using it as nature of live streams. The heterogeneous stream-disk join is a significant research topic in real-time processing applications because it can directly affect the data analytics. Multiple issues, including stream loss, scalability, disk access cost, and data accuracy, should be considered during heterogeneous stream-disk join transformation. In this work we overcome these issues by introducing a distributed heterogeneous stream-disk join architecture (DHSDJArch) which can prevent stream data loss as well as maintaining balance between heterogeneous distributed data sources and accuracy of stream-disk join. A four phased distributed architecture is proposed for the multi-objective optimization to transform heterogeneous incomplete stream. To prevent stream loss, configuration of log retention is proposed based on the characteristics of distributed event streaming platform (DESP) . Specifically, two transformations are proposed to pre-process heterogeneous streams and to join pre-processed stream with distributed disk data by performing real-time disk access while compensating the differences between data sources and streaming application, respectively. We conduct comprehensive experimental study on real datasets to verify the performance of proposed architecture in terms of accuracy, log retention policy, scaling, stability and cloud data storage.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="http://creativecommons.org/licenses/by/4.0" target="_blank">http://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2023.3288284" target="_blank">https://dx.doi.org/10.1109/access.2023.3288284</a></p>
eu_rights_str_mv openAccess
id Manara2_c3df29e9f4a66bb19d36c0e59ac43150
identifier_str_mv 10.1109/access.2023.3288284
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/25205168
publishDate 2023
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join ArchitectureErum Mehmood (17541768)Tayyaba Anees (15373043)Ahmad Sami Al-Shamayleh (17541495)Abdullah Hussein Al-Ghushami (17541771)Wajeeha Khalil (17541429)Adnan Akhunzada (3134064)EngineeringElectrical engineeringElectronics, sensors and digital hardwareMaterials engineeringDistributed databasesReal-time systemsSparksScalabilityBig DataMedia streamingOptimizationApache Kafkadistributed big dataETLheterogeneous streamMongoDBspark structured streamingun-structured data<p dir="ltr">Heterogeneity is the key aspect of complex networks and smart devices for using it as nature of live streams. The heterogeneous stream-disk join is a significant research topic in real-time processing applications because it can directly affect the data analytics. Multiple issues, including stream loss, scalability, disk access cost, and data accuracy, should be considered during heterogeneous stream-disk join transformation. In this work we overcome these issues by introducing a distributed heterogeneous stream-disk join architecture (DHSDJArch) which can prevent stream data loss as well as maintaining balance between heterogeneous distributed data sources and accuracy of stream-disk join. A four phased distributed architecture is proposed for the multi-objective optimization to transform heterogeneous incomplete stream. To prevent stream loss, configuration of log retention is proposed based on the characteristics of distributed event streaming platform (DESP) . Specifically, two transformations are proposed to pre-process heterogeneous streams and to join pre-processed stream with distributed disk data by performing real-time disk access while compensating the differences between data sources and streaming application, respectively. We conduct comprehensive experimental study on real datasets to verify the performance of proposed architecture in terms of accuracy, log retention policy, scaling, stability and cloud data storage.</p><h2>Other Information</h2><p dir="ltr">Published in: IEEE Access<br>License: <a href="http://creativecommons.org/licenses/by/4.0" target="_blank">http://creativecommons.org/licenses/by/4.0</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2023.3288284" target="_blank">https://dx.doi.org/10.1109/access.2023.3288284</a></p>2023-06-21T06:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2023.3288284https://figshare.com/articles/journal_contribution/DHSDJArch_An_Efficient_Design_of_Distributed_Heterogeneous_Stream-Disk_Join_Architecture/25205168CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/252051682023-06-21T06:00:00Z
spellingShingle DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
Erum Mehmood (17541768)
Engineering
Electrical engineering
Electronics, sensors and digital hardware
Materials engineering
Distributed databases
Real-time systems
Sparks
Scalability
Big Data
Media streaming
Optimization
Apache Kafka
distributed big data
ETL
heterogeneous stream
MongoDB
spark structured streaming
un-structured data
status_str publishedVersion
title DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
title_full DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
title_fullStr DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
title_full_unstemmed DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
title_short DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
title_sort DHSDJArch: An Efficient Design of Distributed Heterogeneous Stream-Disk Join Architecture
topic Engineering
Electrical engineering
Electronics, sensors and digital hardware
Materials engineering
Distributed databases
Real-time systems
Sparks
Scalability
Big Data
Media streaming
Optimization
Apache Kafka
distributed big data
ETL
heterogeneous stream
MongoDB
spark structured streaming
un-structured data