Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark
<p>Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal a...
محفوظ في:
| المؤلف الرئيسي: | |
|---|---|
| مؤلفون آخرون: | , , , |
| منشور في: |
2021
|
| الموضوعات: | |
| الوسوم: |
إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
|
| _version_ | 1864513562711425024 |
|---|---|
| author | Ameema Zainab (16864263) |
| author2 | Ali Ghrayeb (16864266) Haitham Abu-Rub (16855500) Shady S. Refaat (16864269) Othmane Bouhali (8252544) |
| author2_role | author author author author |
| author_facet | Ameema Zainab (16864263) Ali Ghrayeb (16864266) Haitham Abu-Rub (16855500) Shady S. Refaat (16864269) Othmane Bouhali (8252544) |
| author_role | author |
| dc.creator.none.fl_str_mv | Ameema Zainab (16864263) Ali Ghrayeb (16864266) Haitham Abu-Rub (16855500) Shady S. Refaat (16864269) Othmane Bouhali (8252544) |
| dc.date.none.fl_str_mv | 2021-04-12T00:00:00Z |
| dc.identifier.none.fl_str_mv | 10.1109/access.2021.3072609 |
| dc.relation.none.fl_str_mv | https://figshare.com/articles/journal_contribution/Distributed_Tree-Based_Machine_Learning_for_Short-Term_Load_Forecasting_With_Apache_Spark/24042492 |
| dc.rights.none.fl_str_mv | CC BY 4.0 info:eu-repo/semantics/openAccess |
| dc.subject.none.fl_str_mv | Engineering Electrical engineering Information and computing sciences Data management and data science Machine learning Sparks Big Data Load forecasting Data models Load modeling Parallel processing Computational modeling Apache spark Concurrent computing Resource management |
| dc.title.none.fl_str_mv | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| dc.type.none.fl_str_mv | Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal |
| description | <p>Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers' real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.</p><h2>Other Information</h2><p>Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3072609" target="_blank">https://dx.doi.org/10.1109/access.2021.3072609</a></p> |
| eu_rights_str_mv | openAccess |
| id | Manara2_402258a1b4d869590fcdd2b86021fe46 |
| identifier_str_mv | 10.1109/access.2021.3072609 |
| network_acronym_str | Manara2 |
| network_name_str | Manara2 |
| oai_identifier_str | oai:figshare.com:article/24042492 |
| publishDate | 2021 |
| repository.mail.fl_str_mv | |
| repository.name.fl_str_mv | |
| repository_id_str | |
| rights_invalid_str_mv | CC BY 4.0 |
| spelling | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache SparkAmeema Zainab (16864263)Ali Ghrayeb (16864266)Haitham Abu-Rub (16855500)Shady S. Refaat (16864269)Othmane Bouhali (8252544)EngineeringElectrical engineeringInformation and computing sciencesData management and data scienceMachine learningSparksBig DataLoad forecastingData modelsLoad modelingParallel processingComputational modelingApache sparkConcurrent computingResource management<p>Machine learning algorithms have been intensively applied to perform load forecasting to obtain better accuracies as compared to traditional statistical methods. However, with the huge increase in data size, sophisticated models have to be created which require big data platforms. Optimal and effective use of the available computational resources can be attained by maximizing the effective utilization of the cluster nodes. Parallel computing is demanded to allow for optimal resource utilization in dealing with smart grid big data. In this paper, a master-slave parallel computing paradigm is utilized and experimented with for load forecasting in a multi-AMI environment. The paper proposes a concurrent job scheduling algorithm in a multi-energy data source environment using Apache Spark. An efficient resource utilization strategy is proposed for submitting multiple Spark jobs to reduce job completion time. The optimal value of clustering is used in this paper to cluster the data into groups to be able to reduce the computational time additionally. Multiple tree-based machine learning algorithms are tested with parallel computation to evaluate the performance with tunable parameters on a real-world dataset. One thousand distribution transformers' real data from Spain for three years are used to demonstrate the performance of the proposed methodology with a trade-off between accuracy and processing time.</p><h2>Other Information</h2><p>Published in: IEEE Access<br>License: <a href="https://creativecommons.org/licenses/by/4.0/legalcode" target="_blank">https://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1109/access.2021.3072609" target="_blank">https://dx.doi.org/10.1109/access.2021.3072609</a></p>2021-04-12T00:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1109/access.2021.3072609https://figshare.com/articles/journal_contribution/Distributed_Tree-Based_Machine_Learning_for_Short-Term_Load_Forecasting_With_Apache_Spark/24042492CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/240424922021-04-12T00:00:00Z |
| spellingShingle | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark Ameema Zainab (16864263) Engineering Electrical engineering Information and computing sciences Data management and data science Machine learning Sparks Big Data Load forecasting Data models Load modeling Parallel processing Computational modeling Apache spark Concurrent computing Resource management |
| status_str | publishedVersion |
| title | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| title_full | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| title_fullStr | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| title_full_unstemmed | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| title_short | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| title_sort | Distributed Tree-Based Machine Learning for Short-Term Load Forecasting With Apache Spark |
| topic | Engineering Electrical engineering Information and computing sciences Data management and data science Machine learning Sparks Big Data Load forecasting Data models Load modeling Parallel processing Computational modeling Apache spark Concurrent computing Resource management |