Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing - Publikacja - MOST Wiedzy

Wyszukiwarka

Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing

Abstrakt

Developing signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings. In this paper we primarily focus on how to obtain data for efficiently training, validating, and testing a deep-learning model by using a data augmentation technique. These data are transformed into 2D feature spaces, i.e., mel-scale spectrograms. The Neural Network used in the experiments consists of a single-block DenseNet architecture and a multi-head softmax classifier for efficient learning with the mixup augmentation. For automatic noisy data labeling, the batch-wise loss masking, which is robust to corrupting outliers in data, was applied. To train the models, various audio sample rates and different audio representations were utilized. The method provides promising recognition scores even with real-world recordings that contain noisy data.

Cytowania

  • 3

    CrossRef

  • 2

    Web of Science

  • 4

    Scopus

Cytuj jako

Pełna treść

pobierz publikację
pobrano 71 razy
Wersja publikacji
Accepted albo Published Version
Licencja
Copyright (2020 Audio Eng. Society)

Słowa kluczowe

Informacje szczegółowe

Kategoria:
Publikacja w czasopiśmie
Typ:
artykuły w czasopismach
Opublikowano w:
JOURNAL OF THE AUDIO ENGINEERING SOCIETY nr 68, strony 57 - 65,
ISSN: 1549-4950
Język:
angielski
Rok wydania:
2020
Opis bibliograficzny:
Koszewski D., Kostek B.: Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing// JOURNAL OF THE AUDIO ENGINEERING SOCIETY -Vol. 68,iss. 1/2 (2020), s.57-65
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.17743/jaes.2019.0050
Bibliografia: test
  1. O. Celma, P. Herrera, X. Serra, Bridging the Music Semantic Gap, Workshop on Mastering the Gap: From Information Extraction to Semantic Representation, Budva, Montenegro, (2006). otwiera się w nowej karcie
  2. B. Kostek, Musical Instrument Classification and Duet Analysis Employing Music Information Retrieval Techniques, May 2004, Proceedings of the IEEE 92(4):712 - 729. otwiera się w nowej karcie
  3. https://doi.org/10.1109/jproc.2004.825903. otwiera się w nowej karcie
  4. B. Kostek, Perception-Based Data Processing in Acoustics. Applications to Music Information Retrieval and Psychophysiology of Hearing, Springer Verlag, Series on Cognitive Technologies, Berlin, Heidelberg, New York, 2005. otwiera się w nowej karcie
  5. G. Korvel, P. Treigys, G. Tamulevicius, J. Bernataviciene, B. Kostek, Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition, J. Audio Eng. Soc., vol. 66, no. 12, pp. 1-10, (2018 November.), DOI: https://doi.org/10.17743/jaes.2018.0066. otwiera się w nowej karcie
  6. G. Tzanetakis, G. Essl, P. Cook, Automatic musical genre classification of audio signals, in Proc. Int.Symp. Music Information Retrieval (ISMIR), Oct. 2001. otwiera się w nowej karcie
  7. DeLiang Wang, G.J. Brown (Editors), Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley- IEEE, A JOHN WILEY & SONS, INC., PUBLICATION, 2006. otwiera się w nowej karcie
  8. J. Zeremdini, M.A. Ben Messaoud, A. Bouzid, A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation, Brain Inform., 2(3), pp. 155- 166, 2015. https://doi.org/10.1007/s40708-015- 0016-0. otwiera się w nowej karcie
  9. J. A. Burgoyne, I. Fujinaga, J.S. Downie, Music Information Retrieval, Chapter 15 in A New Companion to Digital Humanities (Editors: S. Schreibman, R. Siemens, J. Unsworth, John Wiley & Sons, Ltd., 2016, doi: https://doi.org/10.1002/9781118680605.ch15. otwiera się w nowej karcie
  10. L. Vrysis, N. Tsipas, C. Dimoulas, G. Papanikolaou, Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification, December 2016, J. Audio Eng. Soc., vol. 64, 12 pp. 1042-1054. https://doi.org/10.17743/jaes.2016.0051. otwiera się w nowej karcie
  11. M. Casey, A. Westner, Separation of mixed audio sources by independent subspace analysis. Proceedings of International Computer Music Conference, 154-161, Berlin, 2000.
  12. M. Dziubiński, P. Dalka, B. Kostek, Estimation of Musical Sound Separation Algorithm Effectiveness Employing Neural Networks, J. Intelligent Information Systems, vol. 24, 2, 133- 157, 2005. otwiera się w nowej karcie
  13. A. Eronen, Musical instrument recognition using ICA-based transform of features and discriminatively trained HMMs, Proc. of the Seventh International Symposium on Signal Processing and its Applications, ISSPA 2003, Paris, France, 1-4 July 2003, pp. 133-136. otwiera się w nowej karcie
  14. A. Eronen, A. Klapuri, Musical Instrument Recognition Using Cepstral Coefficients and Temporal Features, Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2000, pp. 753- 756. otwiera się w nowej karcie
  15. O. Gillet, G. Richard, Transcription and separation of drum signals from polyphonic music, IEEE Transactions on Audio, Speech and Language Processing, vol. 16, 529-540, 2008. otwiera się w nowej karcie
  16. P. Herrera, X. Amatriain, E. Batlle, X. Serra, Towards instrument segmentation for music content description: a critical review of instrument classification techniques, Proceedings of International Symp. on Music Information Retrieval, Plymouth, Massachusetts, 2000.
  17. T.C. Nagavi1, N.U. Bhajantri, An Extensive Analysis of Query by Singing/Humming System through Query Proportion, The International Journal of Multimedia & Its Applications (IJMA) vol. 4, 6, December 2012, pp. 73-86. https://doi.org/10.5121/ijma.2012.4606. otwiera się w nowej karcie
  18. B. Kostek, A. Czyżewski, Representing Musical Instrument Sounds for Their Automatic Classification, J. Audio Eng. Soc., vol. 49, 9, 768-785, 2001. otwiera się w nowej karcie
  19. B. Kostek, Soft Computing in Acoustics, Applications of Neural Networks, Fuzzy Logic and Rough Sets to Musical Acoustics, Studies in Fuzziness and Soft Computing, Physica Verlag, Heidelberg, New York 1999. otwiera się w nowej karcie
  20. R. Kotsakis, G. Kalliris, C. Dimoulas, Investigation of broadcast-audio semantic analysis scenarios employing radio-programme- adaptive pattern classification, Speech Communication, Vol. 54, No. 6, pp. 743-762, 2012. https://doi.org/10.1016/j.specom.2012.01.004. otwiera się w nowej karcie
  21. R. Kotsakis, G. Kalliris, C. Dimoulas, Investigation of salient audio-features for pattern- based semantic content analysis of radio productions, in Proceedings of the 132nd Audio Engineering Society (AES) Convention, paper no. 8663, Budapest, Hungary, April 2012. Permalink: http://www.aes.org/e- lib/browse.cfm?elib=16301 (AES E-Library) otwiera się w nowej karcie
  22. M P E G 7 s t a n d a r d , https://mpeg.chiariglione.org/standards/mpeg-7.
  23. G. De Poli, P. Prandoni, Sonological Models for Timbre Characterization, Journal of New Music Research, Vol 26 (1997), pp. 170-197, 1997. otwiera się w nowej karcie
  24. G. de Poli, P. Prandoni, Sonological Models for Timbre Characterization, Journal of New Music Research, Vol 26 (1997), pp. 170-197, 1997. otwiera się w nowej karcie
  25. G. de Poli, A. Piccialli, C. Roads (Editors), Representations of Musical Signals, MIT Press, Cambridge, MA, 1991.
  26. C. Papaodysseus, G. Roussopoulos, D. Fragoulis, Th. Panagopoulos, C. Alexiou, A New Approach to the Automatic Recognition of Musical Recordings, J. Audio Eng. Soc., Vol. 49, No. 1/2, 2001.
  27. A. Rosner, B. Kostek, Automatic music genre classification based on musical instrument track separation, J. Intell. Inf. Syst., vol. 50, 2, pp. 363- 384. https://doi.org/10.1007/s10844-017-0464-5. otwiera się w nowej karcie
  28. J. Il-Young, L. Hyungui, Audio tagging system for dcase 2018: focusing on label noise, data augmentation and its efficient learning, Detection and Classification of Acoustic Scenes and Events, 2018.
  29. DCASE 2018 challenge, http://dcase.community/challenge2018/task- general-purpose-audio-tagging-results, 2018. otwiera się w nowej karcie
  30. G. Huang, Z. Liu, L. van der Maaten, Densely Connected Convolutional Networks, 2018. otwiera się w nowej karcie
  31. P. Ruiz, Understanding and visualizing DenseNets, http://www.pabloruizruiz10.com/resources/CNN s/DenseNets.pdf (accessed March 2019).
  32. J.T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. otwiera się w nowej karcie
  33. Y. Zhang, K. Lee, H. Lee, Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In ICML, 2016.
  34. D. Liang, F. Yang, T. Zhang, P. Yang, Understanding Mixup Training Methods. IEEE Access. PP. 1-1. October 2018. otwiera się w nowej karcie
  35. https://doi.org/10.1109/ACCESS.2018.2872698. [34] Cambridge Music Technology website, http://www.cambridge-mt.com/ms-mtk.htm, 2019. otwiera się w nowej karcie
  36. P. Simard, Y. LeCun, J. Denker, B. Victorri, Transformation invariance in pattern recognitiontangent distance and tangent propagation. Neural networks: tricks of the trade, 1998. otwiera się w nowej karcie
  37. O. Chapelle, J. Weston, L. Bottou, V. Vapnik, Vicinal risk minimization. NIPS, 2000.
  38. H. Zhang, M. Cisse, Y. Dauphin, D. Lopez-Paz, mixup: Beyond Empirical Risk Minimization, arXiv: 1710:09412v2, 2018. otwiera się w nowej karcie
  39. A.M. Badshah, J. Ahmad, N. Rahim, S.W. Baik, Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network, Intern. Conf. on Platform Technology and Service (PlatCon), pp. 1-5 (2017). otwiera się w nowej karcie
  40. https://doi.org/10.1109/PlatCon.2017.7883728. otwiera się w nowej karcie
  41. T. Ozseven, Investigation of the Effect of Spectrogram Images and Different Texture Analysis Methods on Speech Emotion Recognition, Applied Acoustics, vol. 142, 70-77 (2018). otwiera się w nowej karcie
  42. https://doi.org/10.1016/j.apacoust.2018.08.003. otwiera się w nowej karcie
  43. P. Mermelstein, Distance Measures for Speech Recognition, Psychological and Instrumental, Pattern Recognition and Artificial Intelligence, vol. 116, pp. 374-388 (1976).
  44. K. Choi, D. Joo, J. Kim, Kapre: On-gpu audio preprocessing layers for a quick implementation of deep neural network models with keras, arXiv preprint arXiv: 1706.05781, 2017. otwiera się w nowej karcie
  45. J. Hu, L. Shen, G. Sun, Squeeze-andexcitation networks, arXiv preprint arXiv: 1709.01507, vol. 7, 2017. otwiera się w nowej karcie
  46. B. Cheng, R. Xiao, Y. Guo, Y. Hu, J. Wang, L. Zhang, Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition, Computer Science-Computer Vision and Pattern Recognition, Sep 2018, arXiv.org > cs > arXiv:1809.06131.
  47. F. Chollet, Keras, https://keras.io, 2015. otwiera się w nowej karcie
  48. D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014. otwiera się w nowej karcie
  49. SOX command-line audio processing tool script, http://sox.sourceforge.net/sox.pdf, 2019. otwiera się w nowej karcie
  50. ffmpeg script for silence detection in audio stream, https://ffmpeg.org/ffmpeg.html, 2019. otwiera się w nowej karcie
  51. Windows PowerShell script, https://docs.microsoft.com/en-us/powershell/, 2019. otwiera się w nowej karcie
Źródła finansowania:
  • Działalność statusowa
Weryfikacja:
Politechnika Gdańska

wyświetlono 24 razy

Publikacje, które mogą cię zainteresować

Meta Tagi