displaying 1000 best results Help
Search results for: AUTOMATIC SPEECH RECOGNITION, WHISPER, MEDICAL LANGUAGE RECOGNITION, SPEECH PROCESSING
-
MODALITY corpus - SPEAKER 17 - COMMANDS C4
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - SEQUENCE S5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 32 - COMMANDS C2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 35 - COMMANDS C2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 33 - COMMANDS C5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - COMMANDS C2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 33 - SEQUENCE S6
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 35 - COMMANDS C6
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 32 - COMMANDS C5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 35 - COMMANDS C5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 33 - COMMANDS C4
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - SEQUENCE S3
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - COMMANDS C3
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 33 - SEQUENCE S5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 27 - SEQUENCE S2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
The design of an intelligent medical space supporting automated patient interviewing
PublicationThe article presents the architecture and results of implementing an application for the intelligent medical space UbiDoDo (Ubiquitous Domestic Doctor's Office). The main purpose of the application is real-time monitoring of the biomedical parameters of a patient in his domestic environment. It allows an immediate reaction to appearing symptoms and provides means to automatically interview the patient and deliver his results to...
-
The data exchange between smart glasses and healthcare information systems using the HL7 FHIR standard
PublicationIn this study we evaluated system architecture for the use of smart glasses as a viewer of information, as a source of medical data (vital sign measurements: temperature, pulse rate, and respiration rate), and as a filter of healthcare information. All activities were based on patient/device identification procedures using graphical markers or features based on visual appearance. The architecture and particular use cases were implemented...
-
Music Mood Visualization Using Self-Organizing Maps
PublicationDue to an increasing amount of music being made available in digital form in the Internet, an automatic organization of music is sought. The paper presents an approach to graphical representation of mood of songs based on Self-Organizing Maps. Parameters describing mood of music are proposed and calculated and then analyzed employing correlation with mood dimensions based on the Multidimensional Scaling. A map is created in which...
-
International Joint Conference on Natural Language Processing
Conferences -
Towards facts extraction from text in Polish language
PublicationNatural Language Processing (NLP) finds many usages in different fields of endeavor. Many tools exists allowing analysis of English language. For Polish language the situation is different as the language itself is more complicated. In this paper we show differences between NLP of Polish and English language. Existing solutions are presented and TEAMS software for facts extraction is described. The paper shows also evaluation of...
-
Towards New Mappings between Emotion Representation Models
PublicationThere are several models for representing emotions in affect-aware applications, and available emotion recognition solutions provide results using diverse emotion models. As multimodal fusion is beneficial in terms of both accuracy and reliability of emotion recognition, one of the challenges is mapping between the models of affect representation. This paper addresses this issue by: proposing a procedure to elaborate new mappings,...
-
SEMANTIC ANALYSIS ALGORITHMS FOR KNOWLEDGE WORKERS SUPPORT
PublicationThe paper examines various aspects of text analysis application for knowledge worker’s activity realization. Conclusions are drawn about the relevance and importance of processing the non-structured textual information in order to increase knowledge worker’s efficiency, as well as their awareness in different branches of science. The paper considers the existing algorithms of texts semantic analysis as the sphere of documents topical...
-
Joint Conference on New Methods in Language Processing and Computational Natural Language Learning
Conferences -
Using Eye-tracking to get information on the skills acquisition by the radiology residents
PublicationThis paper describes the possibility of monitoring the progress of knowledge and skills acquisition by the students of radiology. It is achieved by an analysis of a visual attention distribution patterns during image-based tasks solving. The concept is to use the eye-tracking data to recognize the way how the radiographic images are read by recognized experts, radiography residents involved in the training program, and untrained...
-
An electronic nose for quantitative determination of gas concentrations
PublicationThe practical application of human nose for fragrance recognition is severely limited by the fact that our sense of smell is subjective and gets tired easily. Consequen tly, there is considerable need for an instrument that can be a substitution of the human sense of smell. Electronic nose devices from the mid 1980s are used in growing number of applications. They comprise an array of several electrochemical gas sensors...
-
Semi complex navigation with an active optical gesture sensor
PublicationThis paper presents the methods of diversified touchless interactions between a user and a mobile platform utilizing the optical gesture sensor. The sensor uses 8 photodiodes to measure the reflected light in the active mode (using embedded LEDs) or it measures shadows caused by fingers in the passive mode. Several algorithms were implemented: automatic mode switching, adaptive illumination level compensation, resolution improvements...
-
Development and tuning of irregular divide-and-conquer applications in DAMPVM/DAC
PublicationThis work presents implementations and tuning experiences with parallel irregular applications developed using the object oriented framework DAM-PVM/DAC. It is implemented on top of DAMPVM and provides automatic partitioning of irregular divide-and-conquer (DAC) applications at runtime and dynamic mapping to processors taking into account their speeds and even loads by other user processes. New implementations of parallel applications...
-
Audio content analysis in the urban area telemonitoring system
PublicationArtykuł przedstawia możliwości rozwinięcie monitoringu miejskiego o automatyczną analizę dźwięku. Przedstawiono metody parametryzacji dźwięku, które możliwe są do zastosowania w takim systemie oraz omówiono aspekty techniczne implementacji. W kolejnej części przedstawiono system decyzyjny oparty na drzewach zastosowany w systemie. System ten rozpoznaje dźwięki niebezpieczne (strzał, rozbita szyba, krzyk) wśród dźwięków zarejestrowanych...
-
Focus on Misinformation: Improving Medical Experts’ Efficiency of Misinformation Detection
PublicationFighting medical disinformation in the era of the global pandemic is an increasingly important problem. As of today, automatic systems for assessing the credibility of medical information do not offer sufficient precision to be used without human supervision, and the involvement of medical expert annotators is required. Thus, our work aims to optimize the utilization of medical experts’ time. We use the dataset of sentences taken...
-
Video Semantic Analysis Framework based on Run-time Production Rules - Towards Cognitive Vision
PublicationThis paper proposes a service-oriented architecture for video analysis which separates object detection from event recognition. Our aim is to introduce new tools to be considered in the pathway towards Cognitive Vision as a support for classical Computer Vision techniques that have been broadly used by the scientific community. In the article, we particularly focus in solving some of the reported scalability issues found in current...
-
Intelligent video and audio applications for learning enhancement
PublicationThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Intelligent multimedia solutions supporting special education needs.
PublicationThe role of computers in school education is briefly discussed. Multimodal interfaces development history is shortly reviewed. Examples of applications of multimodal interfaces for learners with special educational needs are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with facial expression and speech stretching audio interface representing audio modality....
-
Evaluation Criteria for Affect-Annotated Databases
PublicationIn this paper a set of comprehensive evaluation criteria for affect-annotated databases is proposed. These criteria can be used for evaluation of the quality of a database on the stage of its creation as well as for evaluation and comparison of existing databases. The usefulness of these criteria is demonstrated on several databases selected from affect computing domain. The databases contain different kind of data: video or still...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S1
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S4
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S2
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S5
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S3
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
MODALITY corpus - SPEAKER 17 - SEQUENCE S6
Open Research DataThe MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
-
The project IDENT: Multimodal biometric system for bank client identity verification
PublicationBiometric identity verification methods are implemented inside a real banking environment comprising: dynamic handwritten signature verification, face recognition, bank cli-ent voice recognition and hand vein distribution verification. A secure communication system based on an intra-bank client-server architecture was designed for this purpose. Hitherto achieved progress within the project is reported in this paper with a focus...
-
DevEmo—Software Developers’ Facial Expression Dataset
PublicationThe COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able...
-
International Conference on Recent Advances in Natural Language Processing
Conferences -
Contactless hearing aid designed for infants
PublicationIt is a well known fact that language development through home intervention for a hearing-impaired infant should start in the early months of a newborn baby's life. The aim of this paper is to present a concept of a contactless digital hearing aid designed especially for infants. In contrast to all typical wearable hearing aid solutions (ITC, ITE, BTE), the proposed device is mounted in the infant's bed with any parts of its set-up...
-
Affect-awareness framework for intelligent tutoring systems
PublicationThe paper proposes a framework for construction of Intelligent Tutoring Systems (ITS), that take into consideration student emotional states and make affective interventions. The paper provides definitions of `affect-aware systems' and `affective interventions' and describes the concept of the affect-awareness framework. The proposed framework separates emotion recognition from its definition, processing and making decisions on...
-
Affective computing and affective learning – methods, tools and prospects
PublicationEvery teacher knows that interest, active participation and motivation are important factors in the learning process. At the same time e-learning environments almost always address only the cognitive aspects of education. This paper provides a brief review of methods used for affect recognition, representation and processing as well as investigates how these methods may be used to address affective aspect of e-education. The paper...
-
In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation
PublicationWe present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...
-
Sensors integration in the smart home environment - a proposal to solve the problem with user identification
PublicationIn this preliminary study we, investigate the possibility of user recognition techniques suitable on smart home devices like chairs, beds, aiming for low–power, high accuracy and quick response time. We propose the two well know technique: voice speaker recognition and accelerometer signal from device mounted on the chair, and the third one optical system basing on IR LED transmitter/receiver circuit. The preliminary results proved...
-
Engineering Challenges in the Design of Cochlear Implants
PublicationHearing aids such as cochlear implants have been used by both adults and children for a long time. In addition, cochlear implants are used by patients who have severe hearing loss either by birth or after an accident. This paper aims to investigate the engineering challenges bounding the design of cochlear implants and present its possible solution...
-
Szymon Andrzejewski dr
PeopleMaster’s degree at the University of Gdańsk in 2008 Major in political system and self-government. Overgraduate studies at the Gdańsk University of Technology „Management and evaluation of projects financed from EU funds” and at AGH University of Science and Technology Noise protection against noise and vibration. Student of sociology PhD studies at the University of Gdańsk from 2016. The research scope is democracy and institutions...
-
Analysis-by-synthesis paradigm evolved into a new concept
PublicationThis work aims at showing how the well-known analysis-by-synthesis paradigm has recently been evolved into a new concept. However, in contrast to the original idea stating that the created sound should not fail to pass the foolproof synthesis test, the recent development is a consequence of the need to create new data. Deep learning models are greedy algorithms requiring a vast amount of data that, in addition, should be correctly...