Browse by author
Lookup NU author(s): Professor Raj Ranjan
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
A simplified approach to accelerate matrix factorization of big data is to parallelize it. A commonly used method is to divide the matrix into multiple non-intersecting blocks and concurrently calculate them. This operation causes the Load balance problem, which significantly impacts parallel performance and is a big concern. A general belief is that the load balance across blocks is impossible by balancing rows and columns separately. We challenge the belief by proposing an approach of “Balanced Partitioning (BaPa)”. We demonstrate under what circumstance independently balancing rows and columns can lead to the balanced intersection of rows and columns, why, and how. We formally prove the feasibility of BaPa by observing the variance of rating numbers across blocks, and empirically validate its soundness by applying it to two standard parallel matrix factorization algorithms, DSGD and CCD++. Besides, we establish a mathematical model of “Imbalance Degree” to explain further why BaPa works well. BaPa is applied to synchronous parallel matrix factorization, but as a general load balance solution, it has significant application potential.
Author(s): Guo R, Zhang F, Wang L, Zhang W, Lei X, Ranjan R, Zomaya A
Publication type: Article
Publication status: Published
Journal: IEEE Transactions on Computers
Year: 2021
Volume: 70
Issue: 5
Pages: 789-802
Print publication date: 01/05/2021
Online publication date: 25/05/2020
Acceptance date: 10/05/2020
ISSN (print): 0018-9340
ISSN (electronic): 1557-9956
Publisher: IEEE
URL: https://doi.org/10.1109/TC.2020.2997051
DOI: 10.1109/TC.2020.2997051
Altmetrics provided by Altmetric