Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition - Publication - Bridge of Knowledge

Search

Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition

Abstract

convolutional neural network (CNN) which is a class of deep, feed-forward artificial neural network. We decided to analyze audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. The choice was made upon the fact that CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in the Lithuanian word recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.

Citations

  • 2 6

    CrossRef

  • 0

    Web of Science

  • 3 6

    Scopus

Cite as

Full text

full text is not available in portal

Keywords

Details

Category:
Articles
Type:
artykuł w czasopiśmie wyróżnionym w JCR
Published in:
JOURNAL OF THE AUDIO ENGINEERING SOCIETY no. 66, pages 1072 - 1081,
ISSN: 1549-4950
Language:
English
Publication year:
2018
Bibliographic description:
Korvel G., Treigys P., Tamulevicus G., Bernataviciene J., Kostek B.: Analysis of 2D Feature Spaces for Deep Learning-based Speech Recognition// JOURNAL OF THE AUDIO ENGINEERING SOCIETY. -Vol. 66, nr. 12 (2018), s.1072-1081
DOI:
Digital Object Identifier (open in new tab) 10.17743/jaes.2018.0066
Sources of funding:
Verified by:
Gdańsk University of Technology

seen 198 times

Recommended for you

Meta Tags