Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark

<p>Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal a...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلف الرئيسي: Ameema Zainab (16864263) (author)
مؤلفون آخرون: Ali Ghrayeb (16864266) (author), Haitham Abu-Rub (16855500) (author), Shady S. Refaat (16864269) (author), Othmane Bouhali (8252544) (author)
منشور في: 2021
الموضوعات:
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
_version_ 1864513562711425024
author Ameema Zainab (16864263)
author2 Ali Ghrayeb (16864266)
Haitham Abu-Rub (16855500)
Shady S. Refaat (16864269)
Othmane Bouhali (8252544)
author2_role author
author
author
author
author_facet Ameema Zainab (16864263)
Ali Ghrayeb (16864266)
Haitham Abu-Rub (16855500)
Shady S. Refaat (16864269)
Othmane Bouhali (8252544)
author_role author
dc.creator.none.fl_str_mv Ameema Zainab (16864263)
Ali Ghrayeb (16864266)
Haitham Abu-Rub (16855500)
Shady S. Refaat (16864269)
Othmane Bouhali (8252544)
dc.date.none.fl_str_mv 2021-04-12T00:00:00Z
dc.identifier.none.fl_str_mv 10.1109/access.2021.3072609
dc.relation.none.fl_str_mv https://figshare.com/articles/journal_contribution/Distributed_Tree-Based_Machine_Learning_for_Short-Term_Load_Forecasting_With_Apache_Spark/24042492
dc.rights.none.fl_str_mv CC BY 4.0
info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv Engineering
Electrical engineering
Information and computing sciences
Data management and data science
Machine learning
Sparks
Big Data
Load forecasting
Data models
Load modeling
Parallel processing
Computational modeling
Apache spark
Concurrent computing
Resource management
dc.title.none.fl_str_mv Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
dc.type.none.fl_str_mv Text
Journal contribution
info:eu-repo/semantics/publishedVersion
text
contribution to journal
description <p>Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers' real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.</p><h2>Other Information</h2><p>Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3072609" target="_blank">https://dx.doi.org/10.1109/access.2021.3072609</a></p>
eu_rights_str_mv openAccess
id Manara2_402258a1b4d869590fcdd2b86021fe46
identifier_str_mv 10.1109/access.2021.3072609
network_acronym_str Manara2
network_name_str Manara2
oai_identifier_str oai:figshare.com:article/24042492
publishDate 2021
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv CC BY 4.0
spelling Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache SparkAmeema Zainab (16864263)Ali Ghrayeb (16864266)Haitham Abu-Rub (16855500)Shady S. Refaat (16864269)Othmane Bouhali (8252544)EngineeringElectrical engineeringInformation and computing sciencesData management and data scienceMachine learningSparksBig DataLoad forecastingData modelsLoad modelingParallel processingComputational modelingApache sparkConcurrent computingResource management<p>Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers' real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.</p><h2>Other Information</h2><p>Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3072609" target="_blank">https://dx.doi.org/10.1109/access.2021.3072609</a></p>2021-04-12T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2021.3072609https://figshare.com/articles/journal_contribution/Distributed_Tree-Based_Machine_Learning_for_Short-Term_Load_Forecasting_With_Apache_Spark/24042492CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/240424922021-04-12T00:00:00Z
spellingShingle Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
Ameema Zainab (16864263)
Engineering
Electrical engineering
Information and computing sciences
Data management and data science
Machine learning
Sparks
Big Data
Load forecasting
Data models
Load modeling
Parallel processing
Computational modeling
Apache spark
Concurrent computing
Resource management
status_str publishedVersion
title Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_full Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_fullStr Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_full_unstemmed Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_short Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
title_sort Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
topic Engineering
Electrical engineering
Information and computing sciences
Data management and data science
Machine learning
Sparks
Big Data
Load forecasting
Data models
Load modeling
Parallel processing
Computational modeling
Apache spark
Concurrent computing
Resource management