SYNAT_PCA_48

Opis

There is a series of datasets containing feature vectors derived from music tracks. The dataset contains 51582 music tracks (22 music genres) and feature vector after Principal Component Analysis (PCA) performing, so there are 48-element vectors derived from music excerpts. Originally, a feature vector containing 173 elements was conceived in earlier research studies carried out by the team of authors [1-6]. A collection of more than 50000 music excerpts described with a set of descriptors obtained through the analysis of 30-second mp3 recordings was gathered in a database called SYNAT. The SYNAT database was realized by the Gdansk University of Technology (GUT) [1],[2]. For the recordings included in the database, the analysis band is limited to 8 kHz due to the music excerpts format, this means that the frequency band used for the parameterization is in the range from 63 to 8000 Hz. The prepared feature vector is used to describe parametrically each signal frame. The original database stores 173‑feature vectors, which in majority are the MPEG-7 standard parameters [7], however we used also the so-called 'dedicated' features, described in several publications.

The 173-feature dataset is also available in Most Wiedzy.

173-element vector generates a very large amount of information describing a given track. As a consequence, this leads to an extensive amount of data undergoing classification. Therefore, Principal Component Analysis (PCA) was applied to reduce the data redundancy as it transforms a number of possibly correlated variables into a smaller number of variables called principal components. The new components are linear combination of parameters that carry most information about the test set, thus they are no longer refer to descriptors contained in the original feature vector. The PCA method can shorten the feature vector of 173 elements to 19 components, which significantly reduces the computation time. Furthermore, the use of the described analysis can increase classification efficiency, as shown in an earlier paper [9].

The original 173-element vector has additionally been supplemented with 20 Mel-Frequency Cepstral Coefficients (MFCC), 20 MFCC variances and 24 time-related ‘dedicated’ parameters. The vector includes parameters associated with the MPEG-7 standard, mel-cepstral (MFCC) parameters and is enlarged by the so-called dedicated parameters which refer to temporal characteristic of the analyzed music excerpt, their names are included in Table 1. The list of parameters and their definitions were shown in the earlier study [10], however, it is worth noting that the proposed FV was used in the ISMIS 2011 contest in which there were over 120 participants [4]. The best contest result returned almost 88% of accuracy [4], and later in the authors’ own study gained even better effectiveness [8].

Table 1 The list of parameters within the SYNAT music database [10].

No.	Parameter
1	Temporal Centroid
2	Spectral Centroid
3	Spectral Centroid variance
4-32	Audio Spectrum Envelope for particular bands
33	ASE average for all bands
34-62	ASE variance values for particular bands
63	averaged ASE variance
64	average Audio Spectrum Centroid
65	variance of Audio Spectrum Centroid
66	average Audio Spectrum Spread
67	variance Audio Spectrum Spread
68-87	Spectral Flatness Measure for particular bands
88	SFM average value
89-108	Spectral Flatness Measure variance for particular bands
109	averaged SFM variance
110-129	Mel-Frequency Cepstral Coefficients for particular bands
130-149	MFCC variance for particular bands
150	number of samples exceeding RMS
151	number of samples exceeding 2×RMS
152	number of samples exceeding 3×RMS
153	mean value of samples exceeding RMS, averaged for 10 frames
154	variance value of samples exceeding RMS, averaged for 10 frames
155	mean value of samples exceeding 2×RMS, averaged for 10 frames
156	variance value of samples exceeding 2×RMS, averaged for 10 frames
157	mean value of samples exceeding 3×RMS, averaged for 10 frames
158	variance value of samples exceeding 3×RMS, averaged for 10 frames
159	peak to RMS ratio
160	mean value of the peak to RMS ratio calculated in 10 subframes
161	variance of the peak to RMS ratio calculated in 10 subframes
162	Zero Crossing Rate
163	RMS Threshold Crossing Rate
164	2×RMS Threshold Crossing Rate
165	3×RMS Threshold Crossing Rate
166	Zero Crossing Rate averaged for 10 frames
167	Zero Crossing Rate variance for 10 frames
168	RMS Threshold Crossing Rate averaged for 10 frames
169	RMS Threshold Crossing Rate variance for 10 frames
170	2×RMS Threshold Crossing Rate averaged for 10 frames
171	2×RMS Threshold Crossing Rate variance for 10 frames
172	3×RMS Threshold Crossing Rate averaged for 10 frames
173	3×RMS Threshold Crossing Rate variance for 10 frames

[1] Kostek B., Music Information Retrieval in Music Repositories, Rough Sets and Intelligent Systems (A. Skowron, Z. Suraj, eds.), 464-489, Springer Verlag, Berlin, Heildelberg 2013. https://doi.org/10.1007/978-3-642-30344-9_17

[2] Kostek B., Hoffmann P., Kaczmarek A., Spaleniak P., Creating a Reliable Music Discovery and Recommendation System, Springer Verlag, 107-130, XIII, 2013. DOI: 10.1007/978-3-319-04714-0_7

[3] Hoffmann P., Kostek B., Music Data Processing and Mining in Large Databases for Active Media, The 2014 16 International Conference on Active Media Technology, Warsaw, 85-85, Springer 2014. https://doi.org/10.1007/978-3-319-09912-5_8

[4] Kostek B., Kupryjanow A., Zwan P, Jiang W., Ras Z., Wojnarski M., Swietlicka J., Report of the ISMIS 2011 Contest: Music Information Retrieval, Foundations of Intelligent Systems, ISMIS 2011, Springer Verlag, 715–724, Berlin, Heidelberg 2011. https://doi.org/10.1007/978-3-642-21916-0_75

[5] Rosner A., Schuller B., Kostek B., Classification of Music Genres Based on Music Separation into Harmonic and Drum Components. Archives of Acoustics, 629-638, 2014, DOI: 10.2478/aoa-2014-0068.

[6] Kostek B., Kaczmarek A., Music Recommendation Based on Multidimensional Description and Similarity Measures, Fundamenta Informaticae, 127(1-4), 325-340, 2013. DOI: 10.3233/FI-2013-912.

[7] MPEG 7 standard, http://mpeg.chiariglione.org/standards/mpeg-7

[8] Hoffmann P., Kostek B., Kaczmarek A., Spaleniak P., Music Recommendation System, Journal of Telecommunication and Information Technology, 59-69, Warsaw 2013.

[9] Hoffmann P., Kostek B., Bass Enhancement Settings in Portable Devices Based on Music Genre Recognition, Journal of the Audio Engineering Society, Vol. 63, No. 12, 980-989, December 2015, DOI: http://dx.doi.org/10.17743/jaes.2015.0087

[10] Rosner A., Kostek B. Automatic music genre classification based on musical instrument track separation. J Intell Inf Syst 50, 363–384 (2018). https://doi.org/10.1007/s10844-017-0464-5

[11] Plewa M., Kostek B., Music Mood Visualization Using Self-Organizing Maps; Archives of Acoustics, No. 4, vol. 40, pp. 513 - 525, 2015, DOI: 10.1515/aoa-2015-0051.

Plik z danymi badawczymi

SYNAT_PCA_48.zip

15.2 MB, S3 ETag 05d30c1ef80874f7f0608b2b7437e2f8-1, pobrań: 84

Hash pliku liczony jest ze wzoru
hexmd5(md5(part1)+md5(part2)+...)-{parts_count} gdzie pojedyncza część pliku jest wielkości 512 MB

Przykładowy skrypt do wyliczenia:
https://github.com/antespi/s3md5

pobierz

Informacje szczegółowe o pliku

Licencja:: otwiera się w nowej karcie

CC BY-NC

Użycie niekomercyjne

Informacje szczegółowe

Rok publikacji:

2021

Data zatwierdzenia:

2021-06-22

Język danych badawczych:

angielski

Dyscypliny:

informatyka techniczna i telekomunikacja (Dziedzina nauk inżynieryjno-technicznych)

DOI:

10.34808/9fhc-eq62

Weryfikacja:

Politechnika Gdańska

Słowa kluczowe

Powiązane zasoby

Cytuj jako

Autorzy

Bożena Kostek prof. dr hab. inż.
Laboratorium Akustyki Fonicznej
numer orcid 0000-0001-6288-2908otwiera się w nowej karcie
Twórca
Piotr Hoffmann dr inż.

Członek zespołu
Piotr Odya dr inż.
Katedra Systemów Multimedialnych
numer orcid 0000-0003-0288-6178otwiera się w nowej karcie
Powiązana osoba

wyświetlono 192 razy

Wyszukiwarka