Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm

A Master of Science thesis in Computer Engineering by Lamees Elhiny entitled, "Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm," submitted in November 2016. Thesis advisor is Dr. Gerassimos Barlas. Soft and hard copy availab...

Full description

Saved in:

Bibliographic Details
Main Author:	Elhiny, Lamees (author)
Format:	doctoralThesis
Published:	2016
Subjects:	hybrid processing parallel processing load partitioning matrixmatrix multiplication divisible load theory Matrices Data processing Multiplication Computer engineering
Online Access:	http://hdl.handle.net/11073/8695
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513433260523520
author	Elhiny, Lamees
author_facet	Elhiny, Lamees
author_role	author
dc.contributor.none.fl_str_mv	Barlas, Gerassimos
dc.creator.none.fl_str_mv	Elhiny, Lamees
dc.date.none.fl_str_mv	2016-11 2017-01-17T08:10:45Z 2017-01-17T08:10:45Z
dc.format.none.fl_str_mv	application/pdf
dc.identifier.none.fl_str_mv	35.232-2016.43 http://hdl.handle.net/11073/8695
dc.language.none.fl_str_mv	en_US
dc.subject.none.fl_str_mv	hybrid processing parallel processing load partitioning matrixmatrix multiplication divisible load theory Matrices Data processing Multiplication Computer engineering
dc.title.none.fl_str_mv	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
dc.type.none.fl_str_mv	info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/doctoralThesis
description	A Master of Science thesis in Computer Engineering by Lamees Elhiny entitled, "Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm," submitted in November 2016. Thesis advisor is Dr. Gerassimos Barlas. Soft and hard copy available.
format	doctoralThesis
id	aus_e1528a4df8310b7d3c047f8409eebb9e
identifier_str_mv	35.232-2016.43
language_invalid_str_mv	en_US
network_acronym_str	aus
network_name_str	aus
oai_identifier_str	oai:repository.aus.edu:11073/8695
publishDate	2016
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load ParadigmElhiny, Lameeshybrid processingparallel processingload partitioningmatrixmatrix multiplicationdivisible load theoryMatricesData processingMultiplicationComputer engineeringA Master of Science thesis in Computer Engineering by Lamees Elhiny entitled, "Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm," submitted in November 2016. Thesis advisor is Dr. Gerassimos Barlas. Soft and hard copy available.Matrix-matrix multiplication is a component of many numerical algorithms; however, it is a time consuming operation. Sometimes, when the matrix size is huge, the processing of the matrix-matrix multiplication on a single processor in not sufficiently fast. Finding an approach for efficient matrix-matrix multiplication can scale the performance of several applications that depend on it. The aim of this study is to improve the efficiency of matrix-matrix multiplication on a distributed network composed of heterogeneous nodes. Since load balancing between heterogeneous nodes forms the biggest challenge, the performance model is derived using the Divisible Load Theory (DLT). The proposed solution improves performance by: (a) the reduction of communication overhead, as DLT-derived load partitioning does not require synchronization between nodes during processing time, and (b) high utilization of resources, as both Control Processing Unit (CPU) and Graphical Processing Unit (GPU) are used in the computation. The experiments are conducted on a single node as well as a cluster of nodes. The results prove that the use of DLT equations balances the load between CPUs and GPUs. On a single node, the suggested hybrid approach has superior performance when compared to C Basic Linear Algebra Subroutines (cBLAS) and OpenMP Basic Linear Algebra Subroutines (openBLAS) approaches. On the other hand, the performance difference between the hybrid and GPU only (CUDA Basic Linear Algebra Subroutines) approaches is mild as the majority of the load in the hybrid approach is allocated to the GPU. On a cluster of nodes, the computation time is reduced to almost half of the GPU only processing time; however, the overall improvement is impeded by communication overhead. It is expected that faster communication media could reduce the overall time and further improve speedup.College of EngineeringDepartment of Computer Science and EngineeringMaster of Science in Computer Engineering (MSCoE)Barlas, Gerassimos2017-01-17T08:10:45Z2017-01-17T08:10:45Z2016-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdf35.232-2016.43http://hdl.handle.net/11073/8695en_USoai:repository.aus.edu:11073/86952025-06-26T12:24:07Z
spellingShingle	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm Elhiny, Lamees hybrid processing parallel processing load partitioning matrixmatrix multiplication divisible load theory Matrices Data processing Multiplication Computer engineering
status_str	publishedVersion
title	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_full	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_fullStr	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_full_unstemmed	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_short	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_sort	Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
topic	hybrid processing parallel processing load partitioning matrixmatrix multiplication divisible load theory Matrices Data processing Multiplication Computer engineering
url	http://hdl.handle.net/11073/8695

Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm

Similar Items