Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm

A Master of Science thesis in Computer Engineering by Lamees Elhiny entitled, "Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm," submitted in November 2016. Thesis advisor is Dr. Gerassimos Barlas. Soft and hard copy availab...

Full description

Saved in:
Bibliographic Details
Main Author: Elhiny, Lamees (author)
Format: doctoralThesis
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/11073/8695
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1864513433260523520
author Elhiny, Lamees
author_facet Elhiny, Lamees
author_role author
dc.contributor.none.fl_str_mv Barlas, Gerassimos
dc.creator.none.fl_str_mv Elhiny, Lamees
dc.date.none.fl_str_mv 2016-11
2017-01-17T08:10:45Z
2017-01-17T08:10:45Z
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv 35.232-2016.43
http://hdl.handle.net/11073/8695
dc.language.none.fl_str_mv en_US
dc.subject.none.fl_str_mv hybrid processing
parallel processing
load partitioning
matrixmatrix multiplication
divisible load theory
Matrices
Data processing
Multiplication
Computer engineering
dc.title.none.fl_str_mv Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/doctoralThesis
description A Master of Science thesis in Computer Engineering by Lamees Elhiny entitled, "Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm," submitted in November 2016. Thesis advisor is Dr. Gerassimos Barlas. Soft and hard copy available.
format doctoralThesis
id aus_e1528a4df8310b7d3c047f8409eebb9e
identifier_str_mv 35.232-2016.43
language_invalid_str_mv en_US
network_acronym_str aus
network_name_str aus
oai_identifier_str oai:repository.aus.edu:11073/8695
publishDate 2016
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
spelling Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load ParadigmElhiny, Lameeshybrid processingparallel processingload partitioningmatrixmatrix multiplicationdivisible load theoryMatricesData processingMultiplicationComputer engineeringA Master of Science thesis in Computer Engineering by Lamees Elhiny entitled, "Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm," submitted in November 2016. Thesis advisor is Dr. Gerassimos Barlas. Soft and hard copy available.Matrix-matrix multiplication is a component of many numerical algorithms; however, it is a time consuming operation. Sometimes, when the matrix size is huge, the processing of the matrix-matrix multiplication on a single processor in not sufficiently fast. Finding an approach for efficient matrix-matrix multiplication can scale the performance of several applications that depend on it. The aim of this study is to improve the efficiency of matrix-matrix multiplication on a distributed network composed of heterogeneous nodes. Since load balancing between heterogeneous nodes forms the biggest challenge, the performance model is derived using the Divisible Load Theory (DLT). The proposed solution improves performance by: (a) the reduction of communication overhead, as DLT-derived load partitioning does not require synchronization between nodes during processing time, and (b) high utilization of resources, as both Control Processing Unit (CPU) and Graphical Processing Unit (GPU) are used in the computation. The experiments are conducted on a single node as well as a cluster of nodes. The results prove that the use of DLT equations balances the load between CPUs and GPUs. On a single node, the suggested hybrid approach has superior performance when compared to C Basic Linear Algebra Subroutines (cBLAS) and OpenMP Basic Linear Algebra Subroutines (openBLAS) approaches. On the other hand, the performance difference between the hybrid and GPU only (CUDA Basic Linear Algebra Subroutines) approaches is mild as the majority of the load in the hybrid approach is allocated to the GPU. On a cluster of nodes, the computation time is reduced to almost half of the GPU only processing time; however, the overall improvement is impeded by communication overhead. It is expected that faster communication media could reduce the overall time and further improve speedup.College of EngineeringDepartment of Computer Science and EngineeringMaster of Science in Computer Engineering (MSCoE)Barlas, Gerassimos2017-01-17T08:10:45Z2017-01-17T08:10:45Z2016-11info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/doctoralThesisapplication/pdf35.232-2016.43http://hdl.handle.net/11073/8695en_USoai:repository.aus.edu:11073/86952025-06-26T12:24:07Z
spellingShingle Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
Elhiny, Lamees
hybrid processing
parallel processing
load partitioning
matrixmatrix multiplication
divisible load theory
Matrices
Data processing
Multiplication
Computer engineering
status_str publishedVersion
title Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_full Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_fullStr Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_full_unstemmed Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_short Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
title_sort Load Partitioning for Matrix-Matrix Multiplication on a Cluster of CPUGPU Nodes Using the Divisible Load Paradigm
topic hybrid processing
parallel processing
load partitioning
matrixmatrix multiplication
divisible load theory
Matrices
Data processing
Multiplication
Computer engineering
url http://hdl.handle.net/11073/8695