Search results for: audiovisual speech recognition

Search results for: audiovisual speech recognition

results on page:
embed this view on your website

Filters

total: 1052

clear all filters disabled

displaying 1000 best results Help

Improvement of speech intelligibility in the presence of noise interference using the Lombard effect and an automatic noise interference profiling based on deep learning
Publication
- K. Kąkol
- Year 2023
The Lombard effect is a phenomenon that results in speech intelligibility improvement when applied to noise. There are many distinctive features of Lombard speech that were recalled in this dissertation. This work proposes the creation of a system capable of improving speech quality and intelligibility in real-time measured by objective metrics and subjective tests. This system consists of three main components: speech type detection,...

Full text available to download
Topology recognition and leader election in colored networks
Publication
- D. Dereniowski
- A. Pelc
- THEORETICAL COMPUTER SCIENCE - Year 2016
Topology recognition and leader election are fundamental tasks in distributed computing in networks. The first of them requires each node to find a labeled isomorphic copy of the network, while the result of the second one consists in a single node adopting the label 1 (leader), with all other nodes adopting the label 0 and learning a path to the leader. We consider both these problems in networks whose nodes are equipped with...

Full text available to download
Gesture recognition framework for multimedia content viewer controlling
Publication
- Year 2009
In the paper a system for controlling a multimedia content viewer by hand gestures is presented. First, selected methods used for gesture recognition are described. Two different application cases of the system, i.e. for multimedia presentation purposes and for multimedia content viewing are outlined. Moreover, a proposal of improvement of the system combining these approaches is also given. The system work cycle is reviewed. The...
Subjective Quality Evaluation of Speech Signals Transmitted via BPL-PLC Wired System
Publication
- P. Falkowski-Gilski
- G. Debita
- M. Habrych
- B. Miedziński
- P. Jedlikowski
- B. Polnik
- J. Wandzio
- X. Wang
- Year 2020
The broadband over power line – power line communication (BPL-PLC) cable is resistant to electricity stoppage and partial damage of phase conductors. It maintains continuity of transmission in case of an emergency. These features make it an ideal solution for delivering data, e.g. in an underground mine environment, especially clear and easily understandable voice messages. This paper describes a subjective quality evaluation of...

Full text to download in external service
Comparison of edge detection algorithms for electric wire recognition
Publication
- Year 2018
Edge detection is the preliminary step in image processing for object detection and recognition procedure. It allows to remove useless information and reduce amount of data before further analysis. The paper contains the comparison of edge detection algorithms optimized for detection of horizontal edges. For comparison purposes the algorithms were implemented in the developed application dedicated to detection of electric line...

Full text to download in external service
Optical recognition elements: macrocyclic imidazole chromoionophores entrapped in silica xerogel
Publication
- M. Jamrógiewicz
- K. Kledzik
- M. Gwiazda
- E. Wagner-Wysiecka
- J. Jezierska
- J. Biernat
- A. Kłonkowski
- MATERIALS SCIENCE-POLAND - Year 2007
Materials containing new chromoionophores consisting of crown residue and azole moiety as partsof macrocycles were encapsulated by the sol-gel procedure in silica xerogel matrices and proposed aschemical recognition elements especially for such metal ions as Li+, Cs+ and Cu2+. Action of these recognition elements is in principle based on changes of reflectance. The recognition elements containing 21-membered chromogenic...
Acceleration of decision making in sound event recognition employing supercomputing cluster
Publication
- K. Łopatka
- A. Czyżewski
- INFORMATION SCIENCES - Year 2014
Parallel processing of audio data streams is introduced to shorten the decision making time in hazardous sound event recognition. A supercomputing cluster environment with a framework dedicated to processing multimedia data streams in real time is used. The sound event recognition algorithms employed are based on detecting foreground events, calculating their features in short time frames, and classifying the events with Support...

Full text to download in external service
Gesture Recognition With the Linear Optical Sensor and Recurrent Neural Networks
Publication
- IEEE SENSORS JOURNAL - Year 2018
In this paper, the optical linear sensor, a representative of low-resolution sensors, was investigated in the multiclass recognition of near-field hand gestures. The recurrent neural network (RNN) with a gated recurrent unit (GRU) memory cell was utilized as a gestures classifier. A set of 27 gestures was collected from a group of volunteers. The 27 000 sequences obtained were divided into training, validation, and test subsets....

Full text available to download
Digits Recognition with Quadrant Photodiode and Convolutional Neural Network
Publication
- J. Kamil
- K. Czuszyński
- J. Rumiński
- Year 2018
In this paper we have investigated the capabilities of a quadrant photodiode based gesture sensor in the recognition of digits drawn in the air. The sensor consisting of 4 active elements, 4 LEDs and a pinhole was considered as input interface for both discrete and continuous gestures. Index finger and a round pointer were used as navigating mediums for the sensor. Experiments performed with 5 volunteers...

Full text to download in external service
Emotion Recognition
Open Research Data
open access
- M. Przyborski
- K. Bobkowska
- series: Person A
The films presented here were recorded using so-called high-speed camera Phantom Miro. To play the movie You need the special software which can be downloaded from the web site https://www.phantomhighspeed.com/resourcesandsupport/phantomresources/pccsoftware the details of the movie are available after starting the movie in the viewer in the description...
Emotion Recognition
Open Research Data
open access
- M. Przyborski
- K. Bobkowska
- series: Person A
The films presented here were recorded using so-called high-speed camera Phantom Miro. To play the movie You need the special software which can be downloaded from the web site https://www.phantomhighspeed.com/resourcesandsupport/phantomresources/pccsoftware the details of the movie are available after starting the movie in the viewer in the description...
Camera angle invariant shape recognition in surveillance systems
Publication
- D. Ellwart
- A. Czyżewski
- Year 2010
A method for human action recognition in surveillance systems is described. Problems within this task are discussed and a solution based on 3D object models is proposed. The idea is shown and some of its limitations are talked over. Shape description methods are introduced along with their main features. Utilized parameterization algorithm is presented. Classification problem, restricted to bi-nary cases is discussed. Support vector...
Intelligent processing of stuttered speech.
Publication
- A. Czyżewski
- A. Kaczmarek
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2003
W artykule zaprezentowano kilka metod analizy i automatycznego zliczania potknięć artykulacyjnych, związanych z jąkaniem się, opartych na wykorzystaniu algorytmów uczących się sztucznych sieci neuronowych i zbiorów przybliżonych.
Scoreboard Architectural Pattern and Integration of Emotion Recognition Results
Publication
- A. Landowska
- G. Brodny
- IEEE Access - Year 2019
This paper proposes a new design pattern, named Scoreboard , dedicated for applications solving complex, multi-stage, non-deterministic problems. The pattern provides a computational framework for the design and implementation of systems that integrate a large number of diverse specialized modules that may vary in accuracy, solution level, and modality. The Scoreboard is an extension of Blackboard design pattern and comes under...

Full text available to download
Quality Evaluation of Speech Transmission via Two-way BPL-PLC Voice Communication System in an Underground Mine
Publication
- P. Falkowski-Gilski
- G. Debita
- Archives of Acoustics - Year 2023
In order to design a stable and reliable voice communication system, it is essential to know how many resources are necessary for conveying quality content. These parameters may include objective quality of service (QoS) metrics, such as: available bandwidth, bit error rate (BER), delay, latency as well as subjective quality of experience (QoE) related to user expectations. QoE is expressed as clarity of speech and the ability...

Full text available to download
System for automatic singing voice recognition
Publication
- P. Żwan
- B. Kostek
- JOURNAL OF THE AUDIO ENGINEERING SOCIETY - Year 2008
W artykule przedstawiono system automatycznego rozpoznawania jakości i typu głosu śpiewaczego. Przedstawiono bazę danych oraz zaimplementowane parametry. Algorytmem decyzyjnym jest algorytm sztucznych sieci neuronowych. Wytrenowany system decyzyjny osiąga skuteczność ok. 90% w obydwu kategoriach rozpoznawania. Dodatkowo wykazano przy pomocy metod statystycznych, że wyniki działania systemu automatycznej oceny jakości technicznej...
Pose classification in the gesture recognition using the linear optical sensor
Publication
- Year 2017
Gesture sensors for mobile devices, which have a capability of distinguishing hand poses, require efficient and accurate classifiers in order to recognize gestures based on the sequences of primitives. Two methods of poses recognition for the optical linear sensor were proposed and validated. The Gaussian distribution fitting and Artificial Neural Network based methods represent two kinds of classification approaches. Three types...

Full text to download in external service
Influence of accelerometer signal pre-processing and classification method on human activity recognition
Publication
- Elektronika : konstrukcje, technologie, zastosowania - Year 2010
A study of data pre-processing influence on accelerometer-based human activity recognition algorithms is presented. The frequency band used to filter-out the accelerometer signals and the number of accelerometers involved were considered in terms of their influence on the recognition accuracy. In the test four methods of classification were used: support vector machine, decision trees, neural network, k-nearest neighbor.

Full text to download in external service
On practical application of Shannon theory to character recognition and more
Publication
- M. Jurkiewicz
- Year 2014
Let us consider an optical character recognition system, which in particular can be used for identifying objects that were assigned strings of some length. The system is not perfect, for example, it sometimes recognizes wrongly the characters "Y" and "V". What is the largest set of strings of given length for the system under consideration, which can be mutually correctly recognized, and the corresponding objects correctly identified?...
Acquisition and indexing of RGB-D recordings for facial expressions and emotion recognition
Publication
- M. Szwoch
- Studia Informatica Pomerania - Year 2015
In this paper KinectRecorder comprehensive tool is described which provides for convenient and fast acquisition, indexing and storing of RGB-D video streams from Microsoft Kinect sensor. The application is especially useful as a supporting tool for creation of fully indexed databases of facial expressions and emotions that can be further used for learning and testing of emotion recognition algorithms for affect-aware applications....

Full text to download in external service
Molecular Recognition in Complexes of TRF Proteins with Telomeric DNA
Publication
- M. Wieczór
- A. Tobiszewski
- P. Wityk
- B. Tomiczek
- J. Czub
- PLOS ONE - Year 2014
Telomeres are specialized nucleoprotein assemblies that protect the ends of linear chromosomes. In humans and many other species, telomeres consist of tandem TTAGGG repeats bound by a protein complex known as shelterin that remodels telomeric DNA into a protective loop structure and regulates telomere homeostasis. Shelterin recognizes telomeric repeats through its two major components known as Telomere Repeat-Binding Factors, TRF1...

Full text available to download
Parameters optimization in medicine supporting image recognition algorithms
Publication
- A. Brzeski
- Year 2011
In this paper, a procedure of automatic set up of image recognition algorithms' parameters is proposed, for the purpose of reducing the time needed for algorithms' development. The procedure is presented on two medicine supporting algorithms, performing bleeding detection in endoscopic images. Since the algorithms contain multiple parameters which must be specified, empirical testing is usually required to optimise the algorithm's...
Accelerometer-based Human Activity Recognition and the Impact of the Sample Size
Publication
- Year 2014
The presented study focused on the recognition of eight user activities (e.g. walking, lying, climbing stairs) basing on the measurements from an accelerometer embedded in a mobile device. It is assumed that the device is carried in a specific location of the user’s clothing. Three types of classifiers were tested on different sizes of the samples. The influence of the time window (the duration of a single trial) on selected activities...

Full text to download in external service
Comparison of selected off-the-shelf solutions for emotion recognition based on facial expressions
Publication
- Year 2016
The paper concerns accuracy of emotion recognition from facial expressions. As there are a couple of ready off-the-shelf solutions available in the market today, this study aims at practical evaluation of selected solutions in order to provide some insight into what potential buyers might expect. Two solutions were compared: FaceReader by Noldus and Xpress Engine by QuantumLab. The performed evaluation revealed that the recognition...

Full text to download in external service
Automatic Singing Voice Recognition EmployingNeural Networks and Rough Sets
Publication
- Year 2008
Celem badań jest automatyczne rozpoznawanie głosów śpiewaczych w kategorii rodzaju i jakości technicznej śpiewu. W artykule opisano stworzoną bazę danych głosów, która zawiera próbki głosu śpiewaków profesjonalnych i amatorskich. W dalszej części opisano parametry zdefiniowane w oparciu o zjawiska biomechaniczne w narządzie głosu podczas śpiewania. W oparciu o stworzone macierze parametrów wytrenowano i porównano automatyczne klasyfikatory...
Human-Computer Interface Based on Visual Lip Movement and Gesture Recognition
Publication
- P. Dalka
- A. Czyżewski
- International Journal of Computing Science and Mathematics - Year 2010
The multimodal human-computer interface (HCI) called LipMouse is presented, allowing a user to work on a computer using movements and gestures made with his/her mouth only. Algorithms for lip movement tracking and lip gesture recognition are presented in details. User face images are captured with a standard webcam. Face detection is based on a cascade of boosted classifiers using Haar-like features. A mouth region is located in...

Full text to download in external service
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
Publication
- D. Piotrowski
- R. Korzeniowski
- A. Falai
- S. Cygert
- K. Pokora
- G. Tinchev
- Z. Zhang
- K. Yanagisawa
- Year 2023
In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...

Full text to download in external service
Systematic Literature Review for Emotion Recognition from EEG Signals
Publication
- P. A. Leszczełowska
- N. Dawidowska
- Year 2022
Researchers have recently become increasingly interested in recognizing emotions from electroencephalogram (EEG) signals and many studies utilizing different approaches have been conducted in this field. For the purposes of this work, we performed a systematic literature review including over 40 articles in order to identify the best set of methods for the emotion recognition problem. Our work collects information about the most...

Full text to download in external service
Systematic Literature Review for Emotion Recognition from EEG Signals
Publication
- P. A. Leszczełowska
- N. Dawidowska
- Year 2022
Researchers have recently become increasingly interested in recognizing emotions from electroencephalogram (EEG) signals and many studies utilizing different approaches have been conducted in this field. For the purposes of this work, we performed a systematic literature review including over 40 articles in order to identify the best set of methods for the emotion recognition problem. Our work collects information about the most...

Full text available to download
Automatic recognition of therapy progress among children with autism
Publication
- A. Kołakowska
- A. Landowska
- A. Anzulewicz
- K. Sobota
- Scientific Reports - Year 2017
The article presents a research study on recognizing therapy progress among children with autism spectrum disorder. The progress is recognized on the basis of behavioural data gathered via five specially designed tablet games. Over 180 distinct parameters are calculated on the basis of raw data delivered via the game flow and tablet sensors - i.e. touch screen, accelerometer and gyroscope. The results obtained confirm the possibility...

Full text available to download
Local Texture Pattern Selection for Efficient Face Recognition and Tracking
Publication
- M. Smiatacz
- J. Rumiński
- Advances in Intelligent Systems and Computing - Year 2015
This paper describes the research aimed at finding the optimal configuration of the face recognition algorithm based on local texture descriptors (binary and ternary patterns). Since the identification module was supposed to be a part of the face tracking system developed for interactive wearable computer, proper feature selection, allowing for real-time operation, became particularly important. Our experiments showed that it is...

Full text to download in external service
Feasibility Study for Food Intake Tasks Recognition Based on Smart Glasses
Publication
- M. Biallas
- A. Andrushevich
- R. Kistler
- A. Klapproth
- K. Czuszyński
- A. Bujnowski
- Journal of Medical Imaging and Health Informatics - Year 2015
In this exploratory study 13 adult test subjects have performed different food intake tasks while wearing a three axis accelerometer mounted at a temple of glasses. Two different algorithms for task recognition have been applied and compared. The retrospective data processing leads to better task recognition results when the frequency range of 50 Hz to 100 Hz is analysed within accelerometer signal recordings. A straightforward...

Full text to download in external service
Fuzzy rule-based dynamic gesture recognition employing camera & multimedia projector
Publication
- M. Lech
- B. Kostek
- Year 2010
In the paper the system based on camera and multimedia projector enabling a user to control computer applications by dynamic hand gestures is presented. The main objective is to present the gesture recognition methodology which bases on representing hand movement trajectory by motion vectors analyzed using fuzzy rule-based inference. The approach was engineered in the system developed with J2SE and C++ / OpenCV technology. OpenCV...

Full text to download in external service
Sign Language Recognition Using Convolution Neural Networks
Publication
- Year 2024
The objective of this work was to provide an app that can automatically recognize hand gestures from the American Sign Language (ASL) on mobile devices. The app employs a model based on Convolutional Neural Network (CNN) for gesture classification. Various CNN architectures and optimization strategies suitable for devices with limited resources were examined. InceptionV3 and VGG-19 models exhibited negligibly higher accuracy than...

Full text available to download
Integration of speech enhancement and coding techniques
Publication
- M. Kuropatwinski
- D. Leckschat
- K. Kroschel
- A. Czyzewski
- M. Kuropatwiński
- Year 1999
Full text to download in external service
Novel approaches to wideband speech coding
Publication
- M. Kulesza
- A. Czyżewski
- Year 2008
Dwie metoda kodowania szerokopasmowego mowy zostały zaprezentowane. W pierwszej metodzie wykorzystano algorytm kompresji i ekspansji czasowej sygnału mowy, pozwalający na kodowanie szerokopasmowe sygnału mowy z wykorzystaniem ustandaryzowanych kodeków. Metoda ta jest przewidziana do zastosowania w adaptacyjnych algorytmach kodowania mowy. Drugie z proponowanych rozwiazan dotyczy nowej metody estymacji obwiedni widma sygnalu mowy...

Full text to download in external service
Broadband interference in speech reinforcement systems
Publication
- H. Lasota
- R. Mazurek
- Year 2008
Artykuł podejmuje niedoceniany problem wpływu liczby i rozkładu głośników w systemach nagłośnienia, na jakość przekazu głosowego, czyli na zrozumiałość mowy w audytoriach. Superpozycji przesuniętych w czasie szerokopasmowych sygnałów o tym samym kształcie i lekko różnych wielkościach, które docierają do słuchacza z licznych spójnych źródeł, towarzyszy zjawisko interferencji prowadzące do głębokiej modyfikacji odbieranych sygnałów...
Multitask Noisy Speech Enhancement System
Publication
- A. Czyżewski
- J. Kotus
- G. Szwoch
- M. Dziubiński
- A. Rypulak
- A. Pawlik
- Year 2005
W referacie opisano Wielozadaniowy System Poprawy Jakości Sygnału Mowy. Jest to wyspecjalizowany pakiet oprogramowania przeznaczony do rejestrowania sygnału mowy i do poprawy jego jakości oraz zrozumiałości mowy, przy użyciu zaawansowanych procedur cyfrowego przetwarzania sygnału. Pakiet oprogramowania składa się z programów: Rejestrator, Przeglądarka oraz Rekonstruktor. Oprogramowanie to może być użyte w przypadkach, gdy zrozumiałość...
A system for multitask noisy speech enhancement.
Publication
- A. Czyżewski
- A. Kaczmarek
- J. Kotus
- A. Pawlik
- A. Rypulak
- P. Żwan
- Year 2004
W artykule przedstawiono ogolną charakterystyke opracowanego systemu rejestracji i rekonstrukcji mowy. Artykuł zawiera opis składników systemu, ktory jest oprogramowaniem zawierającym zaawansowane narzędzia służące poprawie zrozumiałości mowy. Zaimplementowane narzędzia systemu umożliwiają wyszukiwanie nagrań dźwiękowych i ich obróbkę przy pomocy zaimplementowanych pluginów. W artykule przedstawione wykorzystane w systemie algorytmy...
Contextual Knowledge to Enhance Workplace Hazard Recognition and Interpretation in a Cognitive Vision Platform
Publication
- C. De
- C. Sanin
- E. Szczerbicki
- Year 2018
The combination of vision and sensor data together with the resulting necessity for formal representations builds a central component of an autonomous Cyber Physical System for detection and tracking of laborers in workplaces environments. This system must be adaptable and perceive the environment as automatically as possible, performing in a variety of plants and scenes without the necessity of recoding the application for each...

Full text available to download
Difference in Perceived Speech Signal Quality Assessment Among Monolingual and Bilingual Teenage Students
Publication
- P. Falkowski-Gilski
- Year 2021
The user perceived quality is a mixture of factors, including the background of an individual. The process of auditory perception is discussed in a wide variety of fields, ranging from engineering to medicine. Many studies examine the difference between musicians and non-musicians. Since musical training develops musical hearing and other various auditory capabilities, similar enhancements should be observable in case of bilingual...

Full text to download in external service
JOURNAL OF MOLECULAR RECOGNITION

Journals

ISSN: 0952-3499 , eISSN: 1099-1352
COMPUTER SPEECH AND LANGUAGE

Journals

ISSN: 0885-2308 , eISSN: 1095-8363
SEMINARS IN SPEECH AND LANGUAGE

Journals

ISSN: 0734-0478 , eISSN: 1098-9056
Speech and Language Technology

Journals

ISSN: 1895-0434
Speech Language and Hearing

Journals

ISSN: 1361-3286 , eISSN: 2050-5728
Quarterly Journal of Speech

Journals

ISSN: 0033-5630 , eISSN: 1479-5779
SpringerBriefs in Speech Technology

Journals

ISSN: 2191-737X , eISSN: 2191-7388
Audiology and Speech Research

Journals

ISSN: 2635-5019 , eISSN: 2635-5027
Voice and Speech Review

Journals

ISSN: 2326-8263 , eISSN: 2326-8271

Search

Filters

Catalog

Search results for: audiovisual speech recognition