OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Citation DataParallel Computing, ISSN: 0167-8191, Vol: 118, Page: 103053

Publication Year2023

3
Citations
0
Usage
5
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
3
- Citation Indexes
  3
Captures
5
- Readers
  5

Article Description

The communication bottleneck has severely restricted the scalability of distributed deep learning. Tensor fusion improves the scalability of data parallelism by overlapping computation and communication tasks. However, existing tensor fusion schemes only result in suboptimal training performance. In this paper, we propose an efficient communication mechanism (OF-WFBP) to find the optimal tensor fusion scheme for synchronous data parallelism. We present the mathematical model of OF-WFBP and prove it is an NP-hard problem. We mathematically solve the mathematical model of OF-WFBP in two cases. We propose an improved sparrow search algorithm (GradSSA) to find the near-optimal tensor fusion scheme efficiently in other cases. Experimental results on two different GPU clusters show that OF-WFBP achieves up to 1.43x speedup compared to the state-of-the-art tensor fusion mechanisms.

Bibliographic Details

DOI10.1016/j.parco.2023.103053

URL IDhttp://www.sciencedirect.com/science/article/pii/S0167819123000595; http://dx.doi.org/10.1016/j.parco.2023.103053; http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85177976455&origin=inward; https://linkinghub.elsevier.com/retrieve/pii/S0167819123000595; https://dx.doi.org/10.1016/j.parco.2023.103053

AUTHOR(S)

Yunqi Gao; Zechao Zhang; Bing Hu; A. Long Jin; Chunming Wu

PUBLISHER(S)

Elsevier BV

TAG(S)

Computer Science; Mathematics

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know