Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Abstract
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the overheads of both distributing training jobs and loadingandpreprocessingtrainingdatabyusingmessagepassing and CPU/GPU computation overlapping. The impact of the proposed optimizations is greater for the more frequent neural network model averaging. To justify our efforts, we examine the influence of averaging frequency on the trained model efficiency. We plot learning curves based on the average log-probability per frame of correct paths for utterances in the validation set, as well as word error rates of test set decodings. Based on experiments with training on 2 workstations with 4 GPUs each we point that for the given network architecture, dataset and computing environment there is a certain range of averaging frequencies that are optimal for the model efficiency. For the selected averaging frequency of 600k frames per iteration the proposed optimizations reduce the training time by 54.9%.
Citations
-
1
CrossRef
-
0
Web of Science
-
1
Scopus
Authors (2)
Cite as
Full text
full text is not available in portal
Keywords
Details
- Category:
- Conference activity
- Type:
- materiały konferencyjne indeksowane w Web of Science
- Title of issue:
- 2017 International Conference on High Performance Computing & Simulation (HPCS) strony 560 - 565
- Language:
- English
- Publication year:
- 2017
- Bibliographic description:
- Rościszewski P., Kaliski J..: Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging, W: 2017 International Conference on High Performance Computing & Simulation (HPCS), 2017, ,.
- DOI:
- Digital Object Identifier (open in new tab) 10.1109/hpcs.2017.89
- Verified by:
- Gdańsk University of Technology
seen 127 times
Recommended for you
Investigating Feature Spaces for Isolated Word Recognition
- P. Treigys,
- G. Korvel,
- G. Tamulevicius
- + 2 authors