Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
Abstrakt
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the overheads of both distributing training jobs and loadingandpreprocessingtrainingdatabyusingmessagepassing and CPU/GPU computation overlapping. The impact of the proposed optimizations is greater for the more frequent neural network model averaging. To justify our efforts, we examine the influence of averaging frequency on the trained model efficiency. We plot learning curves based on the average log-probability per frame of correct paths for utterances in the validation set, as well as word error rates of test set decodings. Based on experiments with training on 2 workstations with 4 GPUs each we point that for the given network architecture, dataset and computing environment there is a certain range of averaging frequencies that are optimal for the model efficiency. For the selected averaging frequency of 600k frames per iteration the proposed optimizations reduce the training time by 54.9%.
Cytowania
-
1
CrossRef
-
0
Web of Science
-
1
Scopus
Autorzy (2)
Cytuj jako
Pełna treść
pełna treść publikacji nie jest dostępna w portalu
Słowa kluczowe
Informacje szczegółowe
- Kategoria:
- Aktywność konferencyjna
- Typ:
- materiały konferencyjne indeksowane w Web of Science
- Tytuł wydania:
- 2017 International Conference on High Performance Computing & Simulation (HPCS) strony 560 - 565
- Język:
- angielski
- Rok wydania:
- 2017
- Opis bibliograficzny:
- Rościszewski P., Kaliski J..: Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging, W: 2017 International Conference on High Performance Computing & Simulation (HPCS), 2017, ,.
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1109/hpcs.2017.89
- Weryfikacja:
- Politechnika Gdańska
wyświetlono 128 razy
Publikacje, które mogą cię zainteresować
Investigating Feature Spaces for Isolated Word Recognition
- P. Treigys,
- G. Korvel,
- G. Tamulevicius
- + 2 autorów