Wyniki wyszukiwania dla: Query by Sketch
-
Bimodal classification of English allophones employing acoustic speech signal and facial motion capture
PublikacjaA method for automatic transcription of English speech into International Phonetic Alphabet (IPA) system is developed and studied. The principal objective of the study is to evaluate to what extent the visual data related to lip reading can enhance recognition accuracy of the transcription of English consonantal and vocalic allophones. To this end, motion capture markers were placed on the faces of seven speakers to obtain lip...
-
Stochastic Integration and Long Term Predictor Estimation under Noisy Conditions for Speech Enhancement
Publikacja -
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System
PublikacjaThe broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...
-
Design of Intelligent Low-Voltage Load Switch for Remote Control System in Smart Grid
PublikacjaCurrent low-voltage load switches do not support remote disconnect/connect and real-time monitoring of a disconnect/connect state. Addressing to these issues, this paper presents a low-voltage load switch for a smart remote control system, which uses a one-chip microcontroller board and a DC step motor drive mechanism and provides the feedback on the switch status also. Arrears disconnect and full-pay connect control is implemented...
-
Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students
PublikacjaThe user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...
-
Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech
PublikacjaWe propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...
-
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
PublikacjaThe article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...
-
Automated detection of pronunciation errors in non-native English speech employing deep learning
PublikacjaDespite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep learning methods for detecting pronunciation errors in non-native (L2) English speech, outperforming the state-of-the-art method in AUC metric (Area under the Curve) by 41%, i.e., from...
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recognition is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
CLICK 'n' Sleep: Light-Switch Behavior of Triazole-Containing Tris(bipyridyl)ruthenium Complexes
PublikacjaA set of RuII complexes incorporating triazole subunits are presented. They show a solvent-dependent light-switch effect. Theoretical calculations revealed the excited states involved in the emission process. The findings are highly important for future design of light-switch sensors and suggest a severe restriction for functional photomolecular devices synthesized by CLICK chemistry.
-
EXAMINING INFLUENCE OF VIDEO FRAMERATE AND AUDIO/VIDEO SYNCHRONIZATION ON AUDIO-VISUAL SPEECH RECOGNITION ACCURACY
PublikacjaThe problem of video framerate and audio/video synchronization in audio-visual speech recogni-tion is considered. The visual features are added to the acoustic parameters in order to improve the accuracy of speech recognition in noisy conditions. The Mel-Frequency Cepstral Coefficients are used on the acoustic side whereas Active Appearance Model features are extracted from the image. The feature fusion approach is employed. The...
-
Akustyczny obraz słowa na tle mowy etnicznej [The acoustic image of ethnic speech words]
Publikacja -
Mowa nienawiści (hate speech) a odpowiedzialność dostawców usług internetowych w orzecznictwie sądów europejskich
PublikacjaThe article analyses the phenomenon of hate speech in the Internet contrasted with the problem of responsability of Internet Service Providers for cases of such abuses of freedom of expression. The text provides an analysis of jurisprudence of two European Courts. On the one hand it presents the position of the European Court of Human Rights on the problem of hate speech: its definition and the liability for it as an exception...
-
A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times
PublikacjaObjective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublikacjaBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
DBpedia and YAGO Based System for Answering Questions in Natural Language
PublikacjaIn this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The...
-
Canadian Journal of Speech-Language Pathology and Audiology
Czasopisma -
Estimation of time-frequency complex phase-based speech attributes using narrow band filter banks
PublikacjaIn this paper, we present nonlinear estimators of nonstationary and multicomponent signal attributes (parameters, properties) which are instantaneous frequency, spectral (or group) delay, and chirp-rate (also known as instantaneous frequency slope). We estimate all of these distributions in the time-frequency domain using both finite and infinite impulse response (FIR and IIR) narrow band filers for speech analysis. Then, we present...
-
The development of speech in early childhood in children from twin pregnancies with twin-twin transfusion syndrome (TTTS)
Publikacja -
Minimum mean square error estimation of speech short-term predictor parameters under noisy conditions
Publikacja -
Immune escape of B-cell lymphoblastic leukemic cells through a lineage switch to acute myeloid leukemia
PublikacjaAcute leukemia (AL) with a lineage switch (LS) is associated with poor prognosis. The predisposing factors of LS are unknown, apart from KMT2A rearrangements that have been reported to be associated with LS. Herein, we present two cases and review all 104 published cases to identify risk factors for LS. Most of the patients (75.5%) experienced a switch from the lymphoid phenotype to the myeloid phenotype. Eighteen patients (17.0%)...
-
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
PublikacjaIn this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...
-
Novel Family of Single-Phase Modified Impedance-Source Buck-Boost Multilevel Inverters With Reduced Switch Count
Publikacjahis paper describes novel single-phase solutions with increased inverter voltage levels derived by means of a nonstandard inverter configuration and impedance source networks. Operation principles based on special modulation techniques are presented. Detailed component design guidelines along with simulation and experimental verification are also provided. Possible application fields are discussed, as well as advantages and disadvantages....
-
Cyfrowa analiza mowy etnicznej – ekstrakcja kodu informacji [A digital analysis of ethnic speech – deciphering the information code]
Publikacja -
Pulse-Width Modulation Template for Five-Level Switch-Clamped H-Bridge-Based Cascaded Multilevel Inverter
PublikacjaThis article presents a carrier-based pulse-width modulation (PWM) template for a 5-level, H bridge-based cascaded multilevel inverter (MLI). The developed control concept generates adequate modulation template for this inverter topology wherein a sinusoidal modulating waveform is modified to fit in a single triangular carrier signal range. With this modulation approach, classical multiplicity and synchronization of the triangular...
-
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
PublikacjaIn order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...
-
SPEECH COMMUNICATION
Czasopisma -
Michał Lech dr inż.
OsobyMichał Lech was born in Gdynia in 1983. In 2007 he graduated from the faculty of Electronics, Telecommunications and Informatics of Gdansk University of Technology. In June 2013, he received his Ph.D. degree. The subject of the dissertation was: “A Method and Algorithms for Controlling the Sound Mixing Processes with Hand Gestures Recognized Using Computer Vision”. The main focus of the thesis was the bias of audio perception caused...
-
Advances in Speech-Language Pathology (correct title: IJSLP)
Czasopisma -
IEEE-ACM Transactions on Audio Speech and Language Processing
Czasopisma -
International Journal of Speech, Language and the Law: Forensic Linguistics
Czasopisma -
Andrzej Czyżewski prof. dr hab. inż.
OsobyProf. zw. dr hab. inż. Andrzej Czyżewski jest absolwentem Wydziału Elektroniki PG (studia magisterskie ukończył w 1982 r.). Pracę doktorską na temat związany z dźwiękiem cyfrowym obronił z wyróżnieniem na Wydziale Elektroniki PG w roku 1987. W 1992 r. przedstawił rozprawę habilitacyjną pt.: „Cyfrowe operacje na sygnałach fonicznych”. Jego kolokwium habilitacyjne zostało przyjęte jednomyślnie w czerwcu 1992 r. w Akademii Górniczo-Hutniczej...
-
Цифровой анализ сигналов речи как инструмент сравнительного языкознания [A digital analysis of speech signals as an instrument in comparative linguistics]
Publikacja -
Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection
Publikacja -
Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection
Publikacja -
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych = System of speech signal processing and visualisation of the results
PublikacjaW artykule przedstawiono sposób przetwarzania i wizualizacji sygnału mowy w formie prostego w obsłudze i relatywnie niedrogiego urządzenia do nagrywania sygnału akustycznego oraz przetwarzania cyfrowego wyselekcjonowanych fragmentów i wizualizacji uzyskanych rezultatów przekształceń. Zastosowano do tego celu komputer z kartą dźwiękową. Przetwarzanie cyfrowe oraz wizualizacja dokonywana była w oparciu o program MATLAB bezpośrednio...
-
SMAQ - A Semantic Model for Analitical Queries
PublikacjaWhile the Self-Service Business Intelligence (BI) becomes an important part of organizational BI solutions there is a great need for new tools allowing to construct ad-hoc queries by users with various responsibilities and skills. The paper presents a Semantic Model for Analytical Queries – SMAQ allowing to construct queries by users familiar with business events and terms, but being unaware of database or data warehouse concepts...
-
System przetwarzania i wizualizacji sygnału mowy dla potrzeb lingwistycznych [A system of speech signal processing and visualisation for linguistic purposes]
Publikacja -
Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency
Publikacja -
Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning
PublikacjaThe Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...
-
Krzysztof Goczyła prof. dr hab. inż.
OsobyKrzysztof Goczyła, profesor zwyczajny Politechniki Gdańskiej, informatyk, specjalista z inżynierii oprogramowania, inżynierii wiedzy i baz danych. Ukończył studia wyższe na Wydziale Elektroniki Politechniki Gdańskiej w 1976 r. jako magister inżynier elektronik w specjalności automatyka. Na Politechnice Gdańskiej pracuje od 1976. Na Wydziale Elektroniki PG w 1982 r. uzyskał doktorat z informatyki, a w 1999 r. habilitację. W 2012...
-
Artur Gańcza dr inż.
OsobyI received the M.Sc. degree from the Gdańsk University of Technology (GUT), Gdańsk, Poland, in 2019. I am currently a Ph.D. student at GUT, with the Department of Automatic Control, Faculty of Electronics, Telecommunications and Informatics. My professional interests include speech recognition, system identification, adaptive signal processing and linear algebra.
-
A literature survey of the influence of preform reheating and stretch blow moulding with hot mould process parameters on the properties of PET containers – part 2.
PublikacjaThe hot fill process is an inexpensive conventional filling technology for high-acidity products (pH < 4.5). It allows certain drinks (sensitive beverages such as fruit and vegetable juices, nectars, soft drinks, vitaminised water) to be stored at ambient temperature without the need for chemical preservatives. The primary feature of the bottles used in the hot fill process is their temperature stability, i.e. the ability to retain...
-
A literature survey of the influence of preform reheating and stretch blow molding with hot mold process parameters on the properties of PET containers. Part I.
PublikacjaThe hot fill process is an inexpensive conventional filling technology for high-acidity products (pH < 4.5). It allows certain drinks (sensitive beverages such as fruit and vegetable juices, nectars, soft drinks, vitaminized water) to be stored at ambient temperature without the need for chemical preservatives. The primary feature of the bottles used in the hot fill process is their temperature stability, i.e. the ability to retain...
-
Music Information Retrieval – Soft Computing versus Statistics . Wyszukiwanie informacji muzycznej - algorytmy uczące versus metody statystyczne
PublikacjaMusic Information Retrieval (MIR) is an interdisciplinary research area that covers automated extraction of information from audio signals, music databases and services enabling the indexed information searching. In the early stages the primary focus of MIR was on music information through Query-by-Humming (QBH) applications, i.e. on identifying a piece of music by singing (singing/whistling), while more advanced implementations...
-
LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags
PublikacjaThe paper presents the approach to using tags from Stack Overflow questions as a data source in the process of building domain-specific unsupervised term embeddings. Using a huge dataset of Stack Overflow posts, our solution employs the LSA algorithm to learn latent representations of information technology terms. The paper also presents the Teamy.ai system, currently developed by Scalac company, which serves as a platform that...
-
Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej
PublikacjaThe bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...
-
Suspended-sediment transport related to ice-cover conditions during cold and warm winters, Toudaoguai stretch of the Yellow River, Inner Mongolia, China
PublikacjaThe presence of winter ice in cold regions changes the water level, flow rate, velocity distribution, and other parameters of the river, which in turn affects the sediment concentration and channel evolution. Based on data obtained from Toudaoguai Hydrological Station from 1959 to 2021, this study examines the characteristics of the ice regime during cold and warm winters and the water and sediment transport processes along the...
-
A Framework for Searching in Graphs in the Presence of Errors
PublikacjaWe consider a problem of searching for an unknown target vertex t in a (possibly edge-weighted) graph. Each vertex-query points to a vertex v and the response either admits that v is the target or provides any neighbor s of v that lies on a shortest path from v to t. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et al. [STOC 2016]. In the latter, the authors provide...
-
Edge and Pair Queries-Random Graphs and Complexity
PublikacjaWe investigate two types of query games played on a graph, pair queries and edge queries. We concentrate on investigating the two associated graph parameters for binomial random graphs, and showing that determining any of the two parameters is NP-hard for bounded degree graphs.