Search results for: AUDIO PROCESSING

Search results for: AUDIO PROCESSING

results on page:
embed this view on your website

Filters

total: 225

clear all filters disabled

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING

Journals

ISSN: 1063-6676
Wow detection and compensation employing spectral processing of audio.
Publication
- Year 2004
Praca zawiera opis opracowanych algorytmów detekcji i kompensacji pasożytniczych modulacji częstotliwości wynikających z nierównomiernego przesuwu nośnika dźwięku. Proponowane metody opracowano ze szczególnym uwzględnieniem przypadkowych zniekształceń drżenia obecnych w archiwalnych filmowych ścieżkach dźwiękowych. Dodatkowo algorytmy badają wpływ zniekształceń na strukturę formantową sygnałów. Analiza zmian położenia formantów...
IEEE Transactions on Audio Speech and Language Processing

Journals

ISSN: 1558-7916
Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing
Publication
- M. Niedźwiecki
- M. Ciołek
- IEEE Transactions on Audio Speech and Language Processing - Year 2013
In this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...

Full text available to download
RENOVATION OF ARCHIVE AUDIO RECORDINGS USING SPARSE AUTOREGRESSIVE MODELING AND BIDIRECTIONAL PROCESSING
Publication
- M. Niedźwiecki
- M. Ciołek
- Year 2013
The paper presents a new approach to elimination of broadband noise and impulsive disturbances from archive audio recordings. The proposed adaptive Kalman-like algorithm, based on a sparse autoregressive model of the audio signal, simultaneously detects noise pulses, interpolates the irrevocably distorted samples and performs signal smoothing. It is shown that bidirectional (forward-backward) processing of the archive signal improves...

Full text to download in external service
Intelligent Audio Signal Processing − Do We Still Need Annotated Datasets?
Publication
- B. Kostek
- Year 2022
In this paper, intelligent audio signal processing examples are shortly described. The focus is, however, on the machine learning approach and datasets needed, especially for deep learning models. Years of intense research produced many important results in this area; however, the goal of fully intelligent signal processing, characterized by its autonomous acting, is not yet achieved. Therefore, a review of state-of-the-art concerning...

Full text available to download
IEEE-ACM Transactions on Audio Speech and Language Processing

Journals

ISSN: 2329-9290
Adaptive system for recognition of sounds indicating threats to security of people and property employing parallel processing of audio data streams
Publication
- K. Łopatka
- Year 2015
A system for recognition of threatening acoustic events employing parallel processing on a supercomputing cluster is featured. The methods for detection, parameterization and classication of acoustic events are introduced. The recognition engine is based onthreshold-based detection with adaptive threshold and Support Vector Machine classifcation. Spectral, temporal and mel-frequency descriptors are used as signal features. The...
EURASIP Journal on Audio Speech and Music Processing

Journals

ISSN: 1687-4714 , eISSN: 1687-4722
International Symposium on Audio, Video, Image Processing and Intelligent Applications

Conferences
Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering
Publication
- IEEE Transactions on Audio Speech and Language Processing - Year 2015
This paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...

Full text available to download
Automatic music signal mixing system based on one-dimensional Wave-U-Net autoencoders
Publication
- D. Koszewski
- T. Görne
- G. Korvel
- B. Kostek
- EURASIP Journal on Audio Speech and Music Processing - Year 2023
The purpose of this paper is to show a music mixing system that is capable of automatically mixing separate raw recordings with good quality regardless of the music genre. This work recalls selected methods for automatic audio mixing first. Then, a novel deep model based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. The model is trained on a custom-prepared database. Mixes created using the...

Full text available to download
Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling
Publication
- S. Raczyński
- E. Vincent
- S. Sagayama
- IEEE Transactions on Audio Speech and Language Processing - Year 2013
Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...

Full text to download in external service
Estimation of the short-term predictor parameters of speech under noisy conditions
Publication
- M. Kuropatwinski
- W. Kleijn
- M. Kuropatwiński
- IEEE Transactions on Audio Speech and Language Processing - Year 2006
Full text to download in external service
Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation
Publication
- S. Raczyński
- E. Vincent
- IEEE Transactions on Audio Speech and Language Processing - Year 2014
In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...

Full text to download in external service
New approach for determining the QoS of MP3-coded voice signals in IP networks
Publication
- T. Uhl
- S. Paulsen
- K. Nowicki
- EURASIP Journal on Audio Speech and Music Processing - Year 2017
Present-day IP transport platforms being what they are, it will never be possible to rule out conflicts between the available services. The logical consequence of this assertion is the inevitable conclusion that the quality of service (QoS) must always be quantifiable no matter what. This paper focuses on one method to determine QoS. It defines an innovative, simple model that can evaluate the QoS of MP3-coded voice data transported...

Full text available to download
Piotr Szczuko dr hab. inż.

People

Department of Multimedia Systems

Piotr Szczuko received his M.Sc. degree in 2002. His thesis was dedicated to examination of correlation phenomena between perception of sound and vision for surround sound and digital image. He finished Ph.D. studies in 2007 and one year later completed a dissertation "Application of Fuzzy Rules in Computer Character Animation" that received award of Prime Minister of Poland. His interests include: processing of audio and video, computer...
Marek Blok dr hab. inż.

People

Marek Blok in 1994 graduated from the Faculty of Electronics at Gdansk University of Technology receiving his MSc in telecommunications. In 2003 received Ph.D. and in 2017 D.Sc. in telecommunications from the Faculty of Electronics, Telecommunications and Informatics of Gdańsk University of Technology. His research interests are focused on application of digital signal processing in telecommunications. He provides lectures, laboratory...
Michał Lech dr inż.

People

Michał Lech was born in Gdynia in 1983. In 2007 he graduated from the faculty of Electronics, Telecommunications and Informatics of Gdansk University of Technology. In June 2013, he received his Ph.D. degree. The subject of the dissertation was: “A Method and Algorithms for Controlling the Sound Mixing Processes by Hand Gestures Recognized Using Computer Vision”. The main focus of the thesis was the bias of audio perception caused...
Personal adaptive tuning of mobile computer audio
Publication
- Year 2015
An integrated methodology for enhancing audio quality in mobile computers is presented. The key features are adaptation of the characteristics of the acoustic track to the changing conditions and to the user's individual preferences. Original signal processing algorithms are introduced, which concern: linearization of frequency response, dialogue intelligibility enhancement and dynamics processing tuned up to the user's preferences....
Measurement of Latency in the Android Audio Path
Publication
- Year 2018
This paper provides a description of experimental investigations concerning comparison between the audio path characteristics of various Android versions. First, information about the changes in each system version in the context of latency caused by them is presented. Then, a measurement procedure employing available applications to measure latency is described comparing to results contained in the Internet. Finally, a comparison...

Full text to download in external service
Musical Instrument Tagging Using Data Augmentation and Effective Noisy Data Processing
Publication
- D. Koszewski
- B. Kostek
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2020
Developing signal processing methods to extract information automatically has potential in several applications, for example searching for multimedia based on its audio content, making context-aware mobile applications (e.g., tuning apps), or pre-processing for an automatic mixing system. However, the last-mentioned application needs a significant amount of research to reliably recognize real musical instruments in recordings....

Full text available to download
Data obtained via parametrization of differently mixed audio signals
Open Research Data
open access
- J. Stefański
- K. Marciniuk
Dataset consists of audio samples and the results of their parametrization. The extraction of music parameters was performed using MIRToolbox. Information extracted from the samples was used as a database for master's thesis titled 'The influence of audio signal processing chain in mixing on the emotional state of a music piece'.
Music Data Processing and Mining in Large Databases for Active Media
Publication
- B. Kostek
- P. Hoffmann
- Year 2014
The aim of this paper was to investigate the problem of music data processing and mining in large databases. Tests were performed on a large data-base that included approximately 30000 audio files divided into 11 classes cor-responding to music genres with different cardinalities. Every audio file was de-scribed by a 173-element feature vector. To reduce the dimensionality of data the Principal Component Analysis (PCA) with variable...

Full text to download in external service
A Study on Audio Signal Processed by "Instant Mastering"
Publication
- M. Piotrowska
- S. Piotrowski
- B. Kostek
- Year 2018
An increasing amount of music produced in home- and project-studios results in development and growth of "automatic mastering services". The presented investigation explores changes introduced to audio signal by various online mastering platforms. A music set consisting of 10 songs produced in small facilities was processed by eight on-line automatic mastering services. Additionally, some laboratory-constructed signals were tested....
Fitting the mobile device characteristics to the user's hearing preferences
Publication
- Year 2014
A method for fitting the mobile computer audio characteristics to the user's hearing preferences is proposed. The process consists of two stages: calibration and dynamics processing. During the calibration phase the user performs a loudness scaling test giving their response regarding the perceived loudness. The dynamics processing made on above basis sets the loudness to the most comfortable level. The processing accounts both...

Full text to download in external service
Adaptive Personal Tuning of Sound in Mobile Computers
Publication
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2016
An integrated methodology for enhancing audio quality in mobile computers is presented. The key features are adaptation of the characteristics of their acoustic track to changing acoustic conditions of the environment and to users’ individual preferences. Signal processing algorithms are introduced that concern: linearization of frequency response, dialogue intelligibility enhancement, and dynamics processing tuned up to the users’...

Full text available to download
An audio-visual corpus for multimodal automatic speech recognition
Publication
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017
review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download
Grzegorz Szwoch dr hab. inż.

People

Department of Multimedia Systems

Grzegorz Szwoch was born in 1972 in Gdansk. In 1991-1996 he studied at the Technical University of Gdansk. In 1996 he graduated as a student from the Sound Engineering Department. His thesis was related to physical modeling of musical instruments. Since that time he has been a member of the research staff at the Multimedia Systems Department as a PhD student (1996-2001), Assistant (2001-2004), Assistant professor (2004-2020) and...
Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders
Publication
- D. Koszewski
- Year 2023
The purpose of this dissertation is to develop an automatic song mixing system that is capable of automatically mixing a song with good quality in any music genre. This work recalls first the audio signal processing methods used in audio mixing, and it describes selected methods for automatic audio mixing. Then, a novel architecture built based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. Models...

Full text available to download
Data, Information, Knowledge, Wisdom Pyramid Concept Revisited in the Context of Deep Learning
Publication
- B. Kostek
- Year 2023
In this paper, the data, information, knowledge, and wisdom (DIKW) pyramid is revisited in the context of deep learning applied to machine learningbased audio signal processing. A discussion on the DIKW schema is carried out, resulting in a proposal that may supplement the original concept. Parallels between DIWK pertaining to audio processing are presented based on examples of the case studies performed by the author and her collaborators....

Full text to download in external service
Audio content analysis in the urban area telemonitoring system
Publication
- Year 2010
Artykuł przedstawia możliwości rozwinięcie monitoringu miejskiego o automatyczną analizę dźwięku. Przedstawiono metody parametryzacji dźwięku, które możliwe są do zastosowania w takim systemie oraz omówiono aspekty techniczne implementacji. W kolejnej części przedstawiono system decyzyjny oparty na drzewach zastosowany w systemie. System ten rozpoznaje dźwięki niebezpieczne (strzał, rozbita szyba, krzyk) wśród dźwięków zarejestrowanych...

Full text to download in external service
Elimination of impulsive disturbances from archive audio files – comparison of three noise pulse detection schemes
Publication
- M. Niedźwiecki
- M. Ciołek
- Year 2014
The problem of elimination of impulsive disturbances (such as clicks, pops, ticks, crackles, and record scratches) from archive audio recordings is considered and solved using autoregressive modeling. Three classical noise pulse detection schemes are examined and compared: the approach based on open-loop multi-step-ahead signal prediction, the approach based on decision-feedback signal prediction, and the double threshold approach,...

Full text to download in external service
Quality Aspects in Digital Broadcasting and Webcasting Systems: Bitrate versus Loudness
Publication
- Journal of Telecommunications and Information Technology - Year 2017
In this paper the quality aspects of bitrate and loudness in digital broadcasting and webcasting systems are examined. The authors discuss a survey concerning user preferences related with processing and managing audio content. The coding efficiency of a popular audio format is analyzed in the context of storing media. An objective study on a representative group of signal samples, as well as a subjective study of the perceived...

Full text available to download
Localization of impulsive disturbances in audio signals using template matching
Publication
- M. Niedźwiecki
- M. Ciołek
- DIGITAL SIGNAL PROCESSING - Year 2015
In this paper, a new solution to the problem of elimination of impulsive disturbances from audio signals, based on the matched filtering technique, is proposed. The new approach stems from the observation that a large proportion of noise pulses corrupting audio recordings have highly repetitive shapes that match several typical “patterns”. In many cases a representative set of exemplary pulse waveforms can be extracted from the...

Full text available to download
Recognition of hazardous acoustic events employing parallel processing on a supercomputing cluster . Rozpoznawanie niebezpiecznych zdarzeń dźwiękowych z wykorzystaniem równoległego przetwarzania na klastrze superkomputerowym
Publication
- K. Łopatka
- A. Czyżewski
- Year 2015
A method for automatic recognition of hazardous acoustic events operating on a super computing cluster is introduced. The methods employed for detecting and classifying the acoustic events are outlined. The evaluation of the recognition engine is provided: both on the training set and using real-life signals. The algorithms yield sufficient performance in practical conditions to be employed in security surveillance systems. The...
Piotr Odya dr inż.

People

Department of Multimedia Systems

Piotr Odya was born in Gdansk in 1974. He received his M.Sc. in 1999 from the Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Poland. His thesis was related to the problem of sound quality improvement in the contemporary broadcasting studio. He is interested in video editing and multichannel sound systems. The goal of Mr. Odya Ph.D. thesis concerned methods and algorithms for correcting...
Reduction of parasitic pitch variations in archival musical recordings
Publication
- SIGNAL PROCESSING - Year 2010
A new method for reducing parasitic pitch variations in archival audio recordings is presented. The method is intended for analyzing movie soundtracks recorded in optical films. It utilizes image processing for calculating and reducing effects of tape shrinkage being one of the main reasons for parasitic pitch variations in audio accompanying moving images. As long as the film tape characteristics are known the new method can be...

Full text available to download
Acceleration of decision making in sound event recognition employing supercomputing cluster
Publication
- K. Łopatka
- A. Czyżewski
- INFORMATION SCIENCES - Year 2014
Parallel processing of audio data streams is introduced to shorten the decision making time in hazardous sound event recognition. A supercomputing cluster environment with a framework dedicated to processing multimedia data streams in real time is used. The sound event recognition algorithms employed are based on detecting foreground events, calculating their features in short time frames, and classifying the events with Support...

Full text to download in external service
Audio Content and Crowdsourcing: A Subjective Quality Evaluation of Radio Programs Streamed Online
Publication
- P. Falkowski-Gilski
- Year 2023
Radio broadcasting has been present in our lives for over 100 years. The transmission of speech and music signals accompanies us from an early age. Broadcasts provide the latest information from home and abroad. They also shape musical tastes and allow many artists to share their creativity. Modern distribution involves transmission over a number of terrestrial systems. The most popular are analog FM (Frequency Modulation) and...

Full text to download in external service
Online sound restoration system for digital library applications
Publication
- Year 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...

Full text to download in external service
Online Sound Restoration for Digital Library Applications
Publication
- Year 2012
A system for sound restoration was conceived and engineered having the following features: no special sound restoration software is needed to perform audio restoration by the user, the process of restoration employs automatic reduction of noise, wow and impulse distortions performed in the online mode, no skills in digital signal processing from the user are needed. The principles of the created system and its features as well...

Full text to download in external service
Creating a Realible Music Discovery and Recomendation System
Publication
- Year 2014
The aim of this paper is to show problems related to creating a reliable music dis-covery system. The SYNAT database that contains audio files is used for the purpose of experiments. The files are divided into 22 classes corresponding to music genres with different cardinality. Of utmost importance for a reliable music recommendation system are the assignment of audio files to their appropriate gen-res and optimum parameterization...

Full text to download in external service
Processing of musical data employing rough sets and artificial neural networks
Publication
- Year 2005
Artykuł opisuje założenia systemu automatycznej identyfikacji muzyki i dźwięków muzycznych. Dokonano przeglądu standardu MPEG-7, ze szczególnym naciskiem na parametry opisowe dźwięku. Przedyskutowano problemy analizy danych audio, związane z zastosowaniami wykorzystującymi MPEG-7. W oparciu o eksperymenty przedstawiono efektywność deskryptorów niskiego poziomu w automatycznym rozpoznawaniu dźwięków instrumentów muzycznych. Przedyskutowano...
Processing of musical data employing rough sets and artificial neural networks
Publication
- Year 2004
Artykuł opisuje założenia systemu automatycznej identyfikacji muzyki i dźwięków muzycznych. Dokonano przeglądu standardu MPEG-7, ze szczególnym naciskiem na parametry opisowe dźwięku. Przedyskutowano problemy analizy danych audio, związane z zastosowaniami wykorzystującymi MPEG-7. W oparciu o eksperymenty przedstawiono efektywność deskryptorów niskiego poziomu w automatycznym rozpoznawaniu dźwięków instrumentów muzycznych. Przedyskutowano...
Online sound restoration system for digital library applications.
Publication
- Journal of the Acoustical Society of America - Year 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
Variable Ratio Sample Rate Conversion Based on Fractional Delay Filter
Publication
- M. Blok
- P. Drózda
- Archives of Acoustics - Year 2014
In this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely...

Full text available to download
Further Developments of the Online Sound Restoration System for Digital Library Applications
Publication
- Year 2014
New signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...

Full text to download in external service
New semi-causal and noncausal techniques for detection of impulsive disturbances in multivariate signals with audio applications
Publication
- M. Niedźwiecki
- M. Ciołek
- IEEE TRANSACTIONS ON SIGNAL PROCESSING - Year 2017
This paper deals with the problem of localization of impulsive disturbances in nonstationary multivariate signals. Both unidirectional and bidirectional (noncausal) detection schemes are proposed. It is shown that the strengthened pulse detection rule, which combines analysis of one-step-ahead signal prediction errors with critical evaluation of leave-one-out signal interpolation errors, allows one to noticeably improve detection results...

Full text available to download
Multimodal English corpus for automatic speech recognition
Publication
- Year 2013
A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

Search

Filters

Catalog

Search results for: AUDIO PROCESSING

Piotr Szczuko dr hab. inż.

Marek Blok dr hab. inż.

Michał Lech dr inż.

Grzegorz Szwoch dr hab. inż.

Piotr Odya dr inż.