Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Paweł Rościszewski; Jakub Kaliski

doi:10.1109/hpcs.2017.89

Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging

Abstrakt

In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modiﬁcation of the training program which minimizes the overheads of both distributing training jobs and loadingandpreprocessingtrainingdatabyusingmessagepassing and CPU/GPU computation overlapping. The impact of the proposed optimizations is greater for the more frequent neural network model averaging. To justify our efforts, we examine the inﬂuence of averaging frequency on the trained model efﬁciency. We plot learning curves based on the average log-probability per frame of correct paths for utterances in the validation set, as well as word error rates of test set decodings. Based on experiments with training on 2 workstations with 4 GPUs each we point that for the given network architecture, dataset and computing environment there is a certain range of averaging frequencies that are optimal for the model efﬁciency. For the selected averaging frequency of 600k frames per iteration the proposed optimizations reduce the training time by 54.9%.

Cytowania

1

CrossRef
0

Web of Science
1

Scopus

Autorzy (2)

Cytuj jako

Pełna treść

pełna treść publikacji nie jest dostępna w portalu

pełna treść artykułu zobacz w serwisie zewnętrznym otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:: Aktywność konferencyjna
Typ:: materiały konferencyjne indeksowane w Web of Science
Tytuł wydania:: 2017 International Conference on High Performance Computing & Simulation (HPCS) strony 560 - 565
Język:: angielski
Rok wydania:: 2017
Opis bibliograficzny:: Rościszewski P., Kaliski J..: Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging, W: 2017 International Conference on High Performance Computing & Simulation (HPCS), 2017, ,.
DOI:: Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1109/hpcs.2017.89
Weryfikacja:: Politechnika Gdańska