Filters
total: 1146
filtered: 140
Search results for: MODALITY CORPUS · ENGLISH LANGUAGE CORPUS · SPEECH RECOGNITION · AVSR
-
In search of the new: American volunteers’ opinions about their participation in the Teaching English in Poland (TEIP) Program
PublicationThe Teaching English in Poland (TEIP) program relies on summer camps during which native English speakers, American volunteers, teach Polish children and adolescents using the language immersion method – during everyday activities, sports and art classes, and similar occasions. A vital aspect of the evaluation of the program is researching its impact on the young people; however, the opinions of the volunteers regarding their...
-
The Innovative Faculty for Innovative Technologies
PublicationA leaflet describing Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology. Multimedia Systems Department described laboratories and prototypes of: Auditory-visual attention stimulator, Automatic video event detection, Object re-identification application for multi-camera surveillance systems, Object Tracking and Automatic Master-Slave PTZ Camera Positioning System, Passive Acoustic Radar,...
-
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
PublicationIn this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...
-
Speech Analytics Based on Machine Learning
PublicationIn this chapter, the process of speech data preparation for machine learning is discussed in detail. Examples of speech analytics methods applied to phonemes and allophones are shown. Further, an approach to automatic phoneme recognition involving optimized parametrization and a classifier belonging to machine learning algorithms is discussed. Feature vectors are built on the basis of descriptors coming from the music information...
-
Examining Feature Vector for Phoneme Recognition
PublicationThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Determining Pronunciation Differences in English Allophones Utilizing Audio Signal Parameterization
PublicationAn allophonic description of English plosive consonants, based on audio-visual recordings of 600 specially selected words, was developed. First, several speakers were recorded while reading words from a teleprompter. Then, every word was played back from the previously recorded sample read by a phonology expert and each examined speaker repeated a particular word trying to imitate correct pronunciation. The next step consisted...
-
Estimation of the short-term predictor parameters of speech under noisy conditions
Publication -
Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation
PublicationIn this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor pr ocess priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bi- gram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for integration of -grams with a topic model,...
-
Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering
PublicationThis paper presents a new approach to elimination of impulsive disturbances from stereo audio recordings. The proposed solution is based on vector autoregressive modeling of audio signals. Online tracking of signal model parameters is performed using the exponential ly weighted least squares algo- rithm. Detection of noise pulses an d model-based interpolation of the irrevocably distorted sampl es is realized using an adaptive, variable-order...
-
Elimination of Impulsive Disturbances From Archive Audio Signals Using Bidirectional Processing
PublicationIn this application-oriented paper we consider the problem of elimination of impulsive disturbances, such as clicks, pops and record scratches, from archive audio recordings. The proposed approach is based on bidirectional processing—noise pulses are localized by combining the results of forward-time and backward-time signal analysis. Based on the results of specially designed empirical tests (rather than on the results of theoretical analysis),...
-
Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling
PublicationSymbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of an- alyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the “horizontal” and the “vertical” pitch struc- ture. These models are formulated as linear or log-linear interpo- lations of up to fi ve sub-models, each of which is...
-
Reaktywny system oddziaływania ze środowiskiem oparty na inteligentnym systemie decyzyjnym
PublicationProcesy poznawcze zachodzące w umyśle człowieka, po matematycznym zamodelowaniu i algorytmizacji, mogą by wykorzystane do konstruowania inteligentnych systemów decyzyjnych. Systemy takie mają wielorakie zastosowania. Znaleźć można je między innymi w rozmaitych autonomicznych systemach informatyki, automatyki i robotyki: począwszy od 'inteligentnego' strażnika, kamerdynera, itp., a skończywszy na opiekunie - wirtualnym towarzyszu...
-
Objectivization of phonological evaluation of speech elements by means of audio parametrization
PublicationThis study addresses two issues related to both machine- and subjective-based speech evaluation by investigating five phonological phenomena related to allophone production. Its aim is to use objective parametrization and phonological classification of the recorded allophones. These allophones were selected as specifically difficult for Polish speakers of English: aspiration, final obstruent devoicing, dark lateral /l/, velar nasal...
-
Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students
PublicationThe user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...
-
Quality Analysis of Audio-Video Transmission in an OFDM-Based Communication System
PublicationApplication of a reliable audio-video communication system, brings many advantages. With the spoken word we can exchange ideas, provide descriptive information, as well as aid to another person. With the availability of visual information one can monitor the surrounding, working environment, etc. As the amount of available bandwidth continues to shrink, researchers focus on novel types of transmission. Currently, orthogonal frequency...
-
MACHINE LEARNING–BASED ANALYSIS OF ENGLISH LATERAL ALLOPHONES
PublicationAutomatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and selforganizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’...
-
Examining Feature Vector for Phoneme Recognition / Analiza parametrów w kontekście automatycznej klasyfikacji fonemów
PublicationThe aim of this paper is to analyze usability of descriptors coming from music information retrieval to the phoneme analysis. The case study presented consists in several steps. First, a short overview of parameters utilized in speech analysis is given. Then, a set of time and frequency domain-based parameters is selected and discussed in the context of stop consonant acoustical characteristics. A toolbox created for this purpose...
-
Modeling and Simulation for Exploring Power/Time Trade-off of Parallel Deep Neural Network Training
PublicationIn the paper we tackle bi-objective execution time and power consumption optimization problem concerning execution of parallel applications. We propose using a discrete-event simulation environment for exploring this power/time trade-off in the form of a Pareto front. The solution is verified by a case study based on a real deep neural network training application for automatic speech recognition. A simulation lasting over 2 hours...
-
Usability study of various biometric techniques in bank branches
PublicationThe purpose of the presented research was to evaluate the performance of the prepared biometric algorithms and obtain information on the opinions and preferences of their users in bank branches. The study aimed to determine users' attitudes towards particular modalities and preferences on how to use biometrics after the bank customers had practical experience with the operation of the prototype solutions. The research results...
-
Automatic Emotion Recognition in Children with Autism: A Systematic Literature Review
PublicationThe automatic emotion recognition domain brings new methods and technologies that might be used to enhance therapy of children with autism. The paper aims at the exploration of methods and tools used to recognize emotions in children. It presents a literature review study that was performed using a systematic approach and PRISMA methodology for reporting quantitative and qualitative results. Diverse observation channels and modalities...
-
Analysis of allophones based on audio signal recordings and parameterization
PublicationThe aim of this study is to develop an allophonic description of English plosive consonants based on recordings of 600 specially selected words. Allophonic variations addressed in the study may have two sources: positional and contextual. The former one depends on the syllabic or prosodic position in which a particular phoneme occurs. Contextual allophony is conditioned by the local phonetic environment. Co-articulation overlapping...
-
Minimizing Distribution and Data Loading Overheads in Parallel Training of DNN Acoustic Models with Frequent Parameter Averaging
PublicationIn the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training program which minimizes the...
-
Automatic Watercraft Recognition and Identification on Water Areas Covered by Video Monitoring as Extension for Sea and River Traffic Supervision Systems
PublicationThe article presents the watercraft recognition and identification system as an extension for the presently used visual water area monitoring systems, such as VTS (Vessel Traffic Service) or RIS (River Information Service). The watercraft identification systems (AIS - Automatic Identification Systems) which are presently used in both sea and inland navigation require purchase and installation of relatively expensive transceivers...
-
Noise profiling for speech enhancement employing machine learning models
PublicationThis paper aims to propose a noise profiling method that can be performed in near real-time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features...
-
Integrating heterogeneous systems with high-dependability requirements by means of web services
PublicationWeb services are commonly used on boundaries of heterogeneous components in Service Oriented Architecture (SOA) as they provide a universal communication channel not bound to any particular programming language or run-time platform. This paper describes how web services can be used to integrate heterogeneous systems which serve purposes requiring high dependability, reliability and availability. Examples of such systems include...
-
The role of EMG module in hybrid interface of prosthetic arm
PublicationNearly 10% of all upper limb amputations concern the whole arm. It affects the mobility and reduces the productivity of such a person. These two factors can be restored by using prosthetics. However, the complexity of human arm makes restoring its basic functions quite difficult. When the osseointegration and/or targeted muscle reinnervation (TMR) are not possible, different modalities can be used to control the prosthesis. In...
-
Modeling the Customer’s Contextual Expectations Based on Latent Semantic Analysis Algorithms
PublicationNowadays, in the age of Internet, access to open data detects the huge possibilities for information retrieval. More and more often we hear about the concept of open data which is unrestricted access, in addition to reuse and analysis by external institutions, organizations and people. It’s such information that can be freely processed, add another data (so-called remix) and then published. More and more data are available in text...
-
BPL-PLC Voice Communication System for the Oil and Mining Industry
PublicationApplication of a high-efficiency voice communication systems based on broadband over power line-power line communication (BPL-PLC) technology in medium voltage networks, including hazardous areas (like the oil and mining industry), as a redundant mean of wired communication (apart from traditional fiber optics and electrical wires) can be beneficial. Due to the possibility of utilizing existing electrical infrastructure, it can...
-
Glossary [Intellectual Output 1] Glossary as a method for reflection on complex research questions
PublicationGlobalization and digitization are strongly influencing the process of shaping the built environment. The latter is causing the new design tools to emerge faster than ever before in history, while the former is speeding up not only the development, but also the broad roll-out of more agile and interdisciplinary methodologies and work approaches. The design process is also becoming more and more inter- and trans-disciplinary. This...
-
Contactless hearing aid designed for infants
PublicationIt is a well known fact that language development through home intervention for a hearing-impaired infant should start in the early months of a newborn baby's life. The aim of this paper is to present a concept of a contactless digital hearing aid designed especially for infants. In contrast to all typical wearable hearing aid solutions (ITC, ITE, BTE), the proposed device is mounted in the infant's bed with any parts of its set-up...
-
An Analysis of Neural Word Representations for Wikipedia Articles Classification
PublicationOne of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...
-
Linear revitalization - problems and challenges. Discursive article
PublicationThe aim of the article, defined by the author as discursive, is to give the answer as to whether within ‘revitalization’ we should distinguish the notion of ‘linear revitalization’ – not yet defined in Polish and English-language literature. The author presents the thesis that we should do so by presenting the idea, its specific character and its role. This kind of action seems to have, in the author’s opinion, a positive influence...
-
Separability Assessment of Selected Types of Vehicle-Associated Noise
PublicationMusic Information Retrieval (MIR) area as well as development of speech and environmental information recognition techniques brought various tools in-tended for recognizing low-level features of acoustic signals based on a set of calculated parameters. In this study, the MIRtoolbox MATLAB tool, designed for music parameter extraction, is used to obtain a vector of parameters to check whether they are suitable for separation of...
-
MODERNIST, 1920S AND 1930S INDUSTRIAL ARCHITECTURE OF THE PORT OF GDYNIA - IN SEARCH OF AN AESTHETIC LANGUAGE FOR UTILITARIAN BUILDINGS OF THE POLISH GATEWAY TO THE WORLD
PublicationThe purpose of the article is to present the results of the research on the aspects of the Port of Gdynia modernist architecture aesthetics. Its construction was one of the two major projects carried out in the interwar period in Poland. In the course of analyses it has been attempted to answer the question whether an individual aesthetic language has been created in the 1920s and 1930s for the industrial architecture of the Polish...
-
Circumlocutions with the noun peopo ‘people’ in Hawai’i Creole English
Publication -
Learning design of a blended course in technical writing
PublicationBlending face-to-face classes with e-learning components can lead to a very successful outcome if the blend of approaches, methods, content, space, time, media and activities is carefully structured and approached from both the student’s and the tutor’s perspective. In order to blend synchronous and asynchronous e-learning activities with traditional ones, educators should make them inter-dependent and develop them according to...
-
Towards More Realistic Probabilistic Models for Data Structures: The External Path Length in Tries under the Markov Model
PublicationTries are among the most versatile and widely used data structures on words. They are pertinent to the (internal) structure of (stored) words and several splitting procedures used in diverse contexts ranging from document taxonomy to IP addresses lookup, from data compression (i.e., Lempel- Ziv'77 scheme) to dynamic hashing, from partial-match queries to speech recognition, from leader election algorithms to distributed hashing...
-
Information retrieval with semantic memory model
PublicationPsycholinguistic theories of semantic memory form the basis of understanding of natural language concepts. These theories are used here as an inspiration for implementing a computational model of semantic memory in the form of semantic network. Combining this network with a vector-based object-relation-feature value representation of concepts that includes also weights for confidence and support, allows for recognition of concepts...
-
The Bridge to Knowledge – Open Access to Scientific Research Results on Multidisciplinary Open System Transferring Knowledge Platform
PublicationThe European policy of Open Access to scientific research is now one of the key issues discussed in public debates on the future development of scientific communication. The implementation of Open Access tools has significant impact on scientific and economic growth. On the one hand, Open Access accelerates disseminating new research findings and facilitates recognition of authors on a more global scale. On the other hand, Open...
-
The Russian Federation in European Union Programmes
PublicationSince the early 1990s, the European Union has been supporting socio-economic transformations in the former Soviet Union states, including the Russian Federation. Initially, this assistance was provided in the framework of the TACIS Programme, offering long-term, non-repayable aid. In 1991–2006 Russia received EUR 2.7bn for the restructuring of the state enterprise sector, establishment of private companies, state administration...