Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Paweł Rościszewski; Jakub Kaliski

doi:10.1109/hpcs.2017.89

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Abstract

In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the overheads of both distributing training jobs and loadingandpreprocessingtrainingdatabyusingmessagepassing and CPU/GPU computation overlapping. The impact of the proposed optimizations is greater for the more frequent neural network model averaging. To justify our efforts, we examine the inﬂuence of averaging frequency on the trained model efﬁciency. We plot learning curves based on the average log-probability per frame of correct paths for utterances in the validation set, as well as word error rates of test set decodings. Based on experiments with training on 2 workstations with 4 GPUs each we point that for the given network architecture, dataset and computing environment there is a certain range of averaging frequencies that are optimal for the model efﬁciency. For the selected averaging frequency of 600k frames per iteration the proposed optimizations reduce the training time by 54.9%.

Citations

1

CrossRef
0

Web of Science
1

Scopus

Authors (2)

Cite as

Full text

full text is not available in portal

full content of the article see on external site open in new tab

Keywords

Details

Category:: Conference activity
Type:: materiały konferencyjne indeksowane w Web of Science
Title of issue:: 2017 International Conference on High Performance Computing & Simulation (HPCS) strony 560 - 565
Language:: English
Publication year:: 2017
Bibliographic description:: Rościszewski P., Kaliski J..: Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging, W: 2017 International Conference on High Performance Computing & Simulation (HPCS), 2017, ,.
DOI:: Digital Object Identifier (open in new tab) 10.1109/hpcs.2017.89
Verified by:: Gdańsk University of Technology