Playback detection using machine learning with spectrogram features approach - Publication - Bridge of Knowledge

Search

Playback detection using machine learning with spectrogram features approach

Abstract

This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on different data partitions in respect of speakers or playback devices: for instance with different speakers in training and test subsets. The playback detection systems were trained and tested on two speech datasets S1 and S2 manufactured independently by two different institutions. The test error for both datasets oscillates about the level of 1% for HOG+SVM and even below it for CNN in bigger S1 base. In cross validation scenario in which one base was used for training and second base for the test the results were very poor what suggests that the information relevant for playback detection appeared in each base in different way.

Citations

  • 1

    CrossRef

  • 0

    Web of Science

  • 1

    Scopus

Cite as

Full text

download paper
downloaded 180 times
Publication version
Accepted or Published Version
License
Copyright (2017 IEEE)

Keywords

Details

Category:
Conference activity
Type:
materiały konferencyjne indeksowane w Web of Science
Title of issue:
2017 10th International Conference on Human System Interactions (HSI) strony 31 - 35
Language:
English
Publication year:
2017
Bibliographic description:
Dembski J., Rumiński J..: Playback detection using machine learning with spectrogram features approach, W: 2017 10th International Conference on Human System Interactions (HSI), 2017, ,.
DOI:
Digital Object Identifier (open in new tab) 10.1109/hsi.2017.8004991
Bibliography: test
  1. Z. Wu, S. Gao, E.S. Cling and H. Li, "A study on replay attack and anti-spoofing for text-dependent speaker verification", Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), 2014. open in new tab
  2. W. Shang and M. Stevenson, "A playback attack detector for speaker verification systems", Communications, Control and Signal Processing ISCCSP 2008, 3rd International Symposium on, March 2008, pp. 11441149, 2008.
  3. J. Gałka, M, Grzywacz and R. Samborski, "Playback attack detec- tion for text-dependent speaker verification over telephone channels", Speech Communication, Volume 67, pp. 143-153, 2015. open in new tab
  4. Z.F. Wang, G. Wei and Q.H. He, "Channel pattern noise based play- back attack detection algorithm for speaker recognition", Proceed- ings International Conference on Machine Learning and Cybernetics (ICMLC 2011), Vol. 4, IEEE, Guilin, China, pp. 17081713, 2011. open in new tab
  5. J. Villalba and E. Lleida, "Preventing replay attacks on speaker veri- fication systems", Proceedings IEEE International Carnahan Confer- ence on Security Technology (ICCST 2011), IEEE, Barcelona, Spain, 2011. open in new tab
  6. S. Shiota, F. Villavicencio, J. Yamagishi, N. Ono, I. Echizen and T. Matsui, "Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification", Proceed- ings Interspeech, ISCA, Dresden, Germany, pp. 239243, 2015. open in new tab
  7. D. Luo, H. Wu and J. Huang, "Audio recapture detection using deep learning", Proceedings IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP 2015), IEEE, Chengdu, China, pp. 478482, 2015. open in new tab
  8. M. Smiatacz, "Playback attack detection: the search for the ultimate set of antispoof features", Advances in Intelligent Systems and Computing, 2017, accepted for printing. open in new tab
  9. M. Jones and P. Viola, "Face recognition using boosted local features", Technical Report MERL-TR-2003-25, Mitsubishi Electric Research Laboratory, 2003.
  10. D. Lowe, "Object recognition from local scale-invariant features", Proceedings of International Conference on Computer Vision, 1999. open in new tab
  11. N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection", IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), pp. 886-893, 2005. open in new tab
  12. C. Cortes,V. Vapnik, "Support-vector networks" Machine Learning 20 (3), 273-297, 1995. open in new tab
  13. C. Chang and C. Lin, "LIBSVM : a library for support vector machines", ACM Transactions on Intelligent Systems and Technology, 2:27:1-27:27, 2011. open in new tab
  14. R.E. Schapire and Y. Freund, "Boosting the Margin: A New Explana- tion for the Effectiveness of Voting Methods", The Annals of Statistics, v. 26(5), 1651-1686, 1998. open in new tab
  15. M. Jones and P. Viola, "Robust Real-Time Face Detection", M. Jones, International Journal of Computer Vision, 57(2), pp. 137-154, 2004.
Verified by:
Gdańsk University of Technology

seen 125 times

Recommended for you

Meta Tags