Wyniki wyszukiwania dla: ARCHIWIZACJA AUDIO-WIDEO

Wyniki wyszukiwania dla: ARCHIWIZACJA AUDIO-WIDEO

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 499

wyczyść wszystkie filtry niedostępne

Bimodal deep learning model for subjectively enhanced emotion classification in films
Publikacja
- D. Weber
- B. Kostek
- INFORMATION SCIENCES - Rok 2024
This research delves into the concept of color grading in film, focusing on how color influences the emotional response of the audience. The study commenced by recalling state-of-the-art works that process audio-video signals and associated emotions by machine learning. Then, assumptions of subjective tests for refining and validating an emotion model for assigning specific emotional labels to selected film excerpts were presented....

Pełny tekst do pobrania w serwisie zewnętrznym
Online sound restoration system for digital library applications
Publikacja
- Rok 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...

Pełny tekst do pobrania w serwisie zewnętrznym
Wow defect reduction based on interpolation techniques
Publikacja
- P. Maziewski
- Rok 2005
W referacie przedstawiono wyniki badania różnych technik interpolacji wykorzystanych w redukcji kołysania dźwięku. W badaniach użyto: interpolację liniową, dwie techniki interpolacji wielomianowej (Hermite i spline), i technikę sumowania okienkowanych funkcji sink. Jakość rekonstrukcji wykonano wykorzystując sztucznie spreparowany sygnał audio, rekonstruowany wymienionymi metodami interpolacji. Jakość rekonstrukcji oceniono wykorzystując...
Creating a Realible Music Discovery and Recomendation System
Publikacja
- Rok 2014
The aim of this paper is to show problems related to creating a reliable music dis-covery system. The SYNAT database that contains audio files is used for the purpose of experiments. The files are divided into 22 classes corresponding to music genres with different cardinality. Of utmost importance for a reliable music recommendation system are the assignment of audio files to their appropriate gen-res and optimum parameterization...

Pełny tekst do pobrania w serwisie zewnętrznym
Transmitting Alarm Information in DAB+ Broadcasting System
Publikacja
- P. Falkowski-Gilski
- Rok 2018
The main goal of digital broadcasting is to deliver high-quality content with the lowest possible bitrate. This paper is focused on transmitting alarm information, such as emergency warning and alerting, in the DAB+ (Digital Audio Broadcasting plus) broadcasting system. These additional services should be available at the lowest possible bitrate, in order to provide a clear and understandable voice message to people. Furthermore, additional...
In uence of Low-Level Features Extracted from Rhythmic and Harmonic Sections on Music Genre Classi cation
Publikacja
- A. Rosner
- F. Weninger
- B. Schuller
- M. Michalak
- B. Kostek
- Rok 2013
We present a comprehensive evaluation of the infuence of 'harmonic' and rhythmic sections contained in an audio file on automatic music genre classi cation. The study is performed using the ISMIS database composed of music files, which are represented by vectors of acoustic parameters describing low-level music features. Non-negative Matrix Factorization serves for blind separation of instrument components. Rhythmic components...
Marek Olesz dr hab. inż.

Osoby

Katedra Elektrotechniki i Inżynierii Wysokich Napięć, Wydział Elektrotechniki i Automatyki

Wydział Elektrotechniki i Automatyki, Prodziekan ds. rozwoju dr hab. inż. Marek Olesz, prof. PG data urodzenia 1966 wykształcenie Politechnika Gdańska, Wydział Elektryczny (1990) stopień / tytuł naukowy doktor habilitowany – Politechnika Gdańska, Wydział Elektrotechniki i Automatyki (2017), doktor – Politechnika Gdańska, Wydział Elektrotechniki i Automatyki (1998) zatrudnienie Politechnika Gdańska: asystent stażysta (1989 –...
Network and Operating System Support for Digital Audio and Video (Network and OS Support for Digital A/V)

Konferencje
Online sound restoration system for digital library applications.
Publikacja
- Journal of the Acoustical Society of America - Rok 2013
Audio signal processing algorithms were introduced to the new online non-commercial service for audio restoration intended to enhance the content of digitized audio repositories. Missing or distorted audio samples are predicted using neural networks and a specific implementation of the Jannsen interpolation method based on the autoregressive model (AR) combined with the iterative restoring of missing signal samples. Since the distortion...
Face recognition by humans with gaze-tracking system Cyber-Eye
Publikacja
- Rok 2010
W celu dokładniejszego zrozumienia sposobu rozpoznawania i zapamiętywania twarzy przez człowieka przeprowadzono doświadczenie na grupie 20 osób z wykorzystaniem wcześniej opracowanego systemu śledzenia fiksacji wzroku Cyber-Oko [3]. Wykorzystując diody i kamerę podczerwieni wraz z dedykowanym oprogramowaniem Cyber-Oko, które pozwala na śledzenie punktu skupienia wzroku na ekranie. Każdej osobie biorącej udział w doświadczeniu pokazano...
Badanie rozpoznawania twarzy przez człowieka z wykorzystaniem systemu śledzenia fiksacji wzroku Cyber-Oko
Publikacja
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2011
W celu dokładniejszego zrozumienia sposobu rozpoznawania i zapamiętywania twarzy przez człowieka przeprowadzono doświadczenie na grupie 20 osób z wykorzystaniem wcześniej opracowanego systemu śledzenia fiksacji wzroku Cyber-Oko. Wykorzystując diody i kamerę podczerwieni wraz z dedykowanym oprogramowaniem Cyber-Oko, które pozwala na śledzenie punktu skupienia wzroku na ekranie. Każdej osobie biorącej udział w doświadczeniu pokazano...

Pełny tekst do pobrania w serwisie zewnętrznym
Superkomputerowy system identyfikacji pojazdów na podstawie numerów rejestracyjnych
Publikacja
- A. Sobecki
- M. Downar
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2013
Opisano sposób identyfikacji pojazdów na podstawie numerów rejestracyjnych. Scharakteryzowano etapy identyfikacji i wymieniono algorytmy stosowane w ramach implementacji rozwiązania ESIP (Elektroniczny System Identyfikacji Pojazdów). Skuteczność zastosowanych algorytmów porównano z innymi rozwiązaniami dostępnymi na rynku. Opisano wdrożenie rozwiązania na superkomputerze GALERA. System ESIP umożliwia skuteczną identyfikację pojazdów...
Reduction of parasitic pitch variations in archival musical recordings
Publikacja
- SIGNAL PROCESSING - Rok 2010
A new method for reducing parasitic pitch variations in archival audio recordings is presented. The method is intended for analyzing movie soundtracks recorded in optical films. It utilizes image processing for calculating and reducing effects of tape shrinkage being one of the main reasons for parasitic pitch variations in audio accompanying moving images. As long as the film tape characteristics are known the new method can be...

Pełny tekst do pobrania w portalu
Fitting the mobile device characteristics to the user's hearing preferences
Publikacja
- Rok 2014
A method for fitting the mobile computer audio characteristics to the user's hearing preferences is proposed. The process consists of two stages: calibration and dynamics processing. During the calibration phase the user performs a loudness scaling test giving their response regarding the perceived loudness. The dynamics processing made on above basis sets the loudness to the most comfortable level. The processing accounts both...

Pełny tekst do pobrania w serwisie zewnętrznym
Building Knowledge for the Purpose of Lip Speech Identification
Publikacja
- Advances in Intelligent Systems and Computing - Rok 2017
Consecutive stages of building knowledge for automatic lip speech identification are shown in this study. The main objective is to prepare audio-visual material for phonetic analysis and transcription. First, approximately 260 sentences of natural English were prepared taking into account the frequencies of occurrence of all English phonemes. Five native speakers from different countries read the selected sentences in front of...

Pełny tekst do pobrania w serwisie zewnętrznym
Data, Information, Knowledge, Wisdom Pyramid Concept Revisited in the Context of Deep Learning
Publikacja
- B. Kostek
- Rok 2023
In this paper, the data, information, knowledge, and wisdom (DIKW) pyramid is revisited in the context of deep learning applied to machine learningbased audio signal processing. A discussion on the DIKW schema is carried out, resulting in a proposal that may supplement the original concept. Parallels between DIWK pertaining to audio processing are presented based on examples of the case studies performed by the author and her collaborators....

Pełny tekst do pobrania w serwisie zewnętrznym
1D convolutional context-aware architectures for acoustic sensing and recognition of passing vehicle type
Publikacja
- Rok 2020
A network architecture that may be employed to sensing and recognition of a type of vehicle on the basis of audio recordings made in the proximity of a road is proposed in the paper. The analyzed road traffic consists of both passenger cars and heavier vehicles. Excerpts from recordings that do not contain vehicles passing sounds are also taken into account and marked as ones containing silence....
ERAAVG Rozpoznawanie emocji do sterowania w grach wideo

Projekty

Kierownik projektu: dr inż. Wioleta Szwoch Program finansujący: Norweski Mechanizm Finansowy

Projekt realizowany w Katedra Inteligentnych Systemów Interaktywnych zgodnie z porozumieniem Pol-Norw/210629/51/2013 z dnia 2013-09-25
Architektura i mechanizmy Równoległego Internetu Ipv6 QoS
Publikacja
- H. Tarasiuk
- W. Góralski
- J. Granat
- M. B. Jordi
- W. Szymak
- S. Hanczewski
- R. Szuman
- M. Giertych
- K. Gierłowski
- M. Natkaniec
- J. Gozdecki
- Rok 2012
Artykuł przedstawia propozycje architektury i mechanizmy Równoległego Internetu IPv6 QoS, który jest rozważany jako jeden z trzech równoległych Internatów w systemie IIp tworzonym w ramach projektu Inzynieria Internetu Przyszłości (IIP). Artykuł zawiera funkcje i mechanizmy, które umożliwiają działanie sieci wirtualnych dla wybranych typów aplikacji, takich jak e-zdrowie, monitorowanie, bezpieczeństwo publiczne, zdalne nauczanie,...
Zapytania muzyczne do bibliotek cyfrowych
Publikacja
- M. Szwoch
- Rok 2007
Biblioteki cyfrowe dokumentów muzycznych umożliwiają przechowywanie różnorodnej, multimedialnej informacji muzycznej. Oprócz opisu bibliograficznego obejmować ona może również dane w postaci nagrań dźwiękowych i wideo, obrazów partytur oraz partytur w postaci cyfrowej.W celu efektywnego wyszukiwania danych muzycznych należy stosować zapytania muzyczne. W rozdziale przedstawiono specyfikę cyfrowych bibliotek muzycznych oraz metody...
Architektura i mechanizmy Równoległego Internetu IPv6 QoS
Publikacja
- K. Gierłowski
- H. Tarasiuk
- W. Góralski
- J. Granat
- J. M. Batalla
- W. Szymak
- P. Świątek
- S. Hanczewski
- R. Szuman
- M. Giertych... i 2 innych
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Rok 2011
Referat przedstawia propozycję architektury i mechanizmy Równoległego Internetu IPv6 QoS, który jest rozważany jako jeden z trzech Równoległych Internetów w Systemie IIP tworzonym w ramach projektu Inżynieria Internetu Przyszłości (IIP). Referat zawiera funkcje i mechanizmy, które umożliwią działanie sieci wirtualnych dla wybranych typów aplikacji, takich jak e-zdrowie, monitorowanie i bezpieczeństwo publiczne, zdalne nauczanie,...
Image Classification Based on Video Segments
Publikacja
- A. Blokus
- Rok 2018
In the dissertation a new method for improving the quality of classifications of images in video streams has been proposed and analyzed. In multiple fields concerning such a classification, the proposed algorithms focus on the analysis of single frames. This class of algorithms has been named OFA (One Frame Analyzed).In the dissertation, small segments of the video are considered and each image is analyzed in the context of its...

Pełny tekst do pobrania w portalu
Evaluation of a Novel Approach to Virtual Bass Synthesis Strategy
Publikacja
- P. Hoffmann
- B. Kostek
- Rok 2015
The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) strategy applied to portable computers. The developed algorithms involve intelligent, rule-based settings of bass synthesis parameters with regard to music genre of an audio excerpt and the type of a portable device in use. The Smart VBS algorithm performs the synthesis based on a nonlinear device (NLD) with artificial controlling synthesis...

Pełny tekst do pobrania w serwisie zewnętrznym
Classification of Music Genres by Means of Listening Tests and Decision Algorithms
Publikacja
- Rok 2018
The paper compares the results of audio excerpt assignment to a music genre obtained in listening tests and classification by means of decision algorithms. A short review on music description employing music styles and genres is given. Then, assumptions of listening tests to be carried out along with an online survey for assigning audio samples to selected music genres are presented. A framework for music parametrization is created...

Pełny tekst do pobrania w serwisie zewnętrznym
Poradnik Webinarowy PG
Kursy Online
Poradnik Webinarowy PG (PWPG) – materiał pomocny w Waszych działaniach związanych z organizacją wideo-spotkań, opracowany przez specjalistów z Katedry Inżynierii Mikrofalowej i Antenowej Wydziału ETI PG.
Audiovisual speech recognition for training hearing impaired patients
Publikacja
- Rok 2006
Praca przedstawia system rozpoznawania izolowanych głosek mowy wykorzystujący dane wizualne i akustyczne. Modele Active Shape Models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na współczynnikach melcepstralnych. Sieć neuronowa została użyta do rozpoznawania wymawianych głosek na podstawie wektora cech zawierającego oba typy...
Zaawansowane Przetwarzanie Sygnału
Kursy Online
- A. Szewczyk
- J. Smulko
Przedmiot prezentuje wybrane metody przetwarzania sygnałów w bardzo szerokim obszarze zastosowań. Ilustruje najnowsze osiągnięcia w tym zakresie, wsparte wybranymi publikacjami. Zajęcia są podzielone na wykład (15 h) i seminarium (15 h). Podstawowe pojęcia dotyczące cyfrowego przetwarzania sygnałów, zalecana literatura Analiza widmowa gęstość widmowa mocy, widmo falkowe, polispektra i gęstość widmowa mocy skrośnej Efekty...
Music genre classification applied to bass enhancement for mobile technology
Publikacja
- P. Hoffmann
- B. Kostek
- Elektronika : konstrukcje, technologie, zastosowania - Rok 2015
The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The proposed algorithm is related to intelligent, rule-based setting of synthesis parameters according to music genre of an audio excerpt. The classification of music genres is automatically executed employing MPEG 7 parameters and the Principal Component Analysis method applied to reduce information...

Pełny tekst do pobrania w serwisie zewnętrznym
Machine learning applied to acoustic-based road traffic monitoring
Publikacja
- K. Marciniuk
- B. Kostek
- Procedia Computer Science - Rok 2022
The motivation behind this study lies in adapting acoustic noise monitoring systems for road traffic monitoring for driver’s safety. Such a system should recognize a vehicle type and weather-related pavement conditions based on the audio level measurement. The study presents the effectiveness of the selected machine learning algorithms in acoustic-based road traffic monitoring. Bases of the operation of the acoustic road traffic...

Pełny tekst do pobrania w portalu
Machine learning applied to acoustic-based road traffic monitoring
Publikacja
- K. Marciniuk
- B. Kostek
- Rok 2022
The motivation behind this study lies in adapting acoustic noise monitoring systems for road traffic monitoring for driver’s safety. Such a system should recognize a vehicle type and weather-related pavement conditions based on the audio level measurement. The study presents the effectiveness of the selected machine learning algorithms in acoustic-based road traffic monitoring. Bases of the operation of the acoustic road traffic...

Pełny tekst do pobrania w portalu
FPGA-Based Real-Time Implementation of Detection Algorithm for Automatic Traffic Surveillance Sensor Network
Publikacja
- Journal of Signal Processing Systems for Signal Image and Video Technology - Rok 2012
Artykuł opisuje sprzętową implementację w układzie FPGA algorytmu wykrywającego pojazdy, przeznaczonego do zastosowania w autonomicznej sieci sensorowej. Zadaniem algorytmu jest detekcja poruszających się pojazdów w obrazie z kamery pracującej w czasie rzeczywistym. Algorytm ma na celu oszacowanie parametrów ruchu ulicznego, takich jak liczba pojazdów, ich kierunek ruchu i przybliżona prędkość, przy wykorzystaniu sprzętu sieci...

Pełny tekst do pobrania w portalu
Music Data Processing and Mining in Large Databases for Active Media
Publikacja
- B. Kostek
- P. Hoffmann
- Rok 2014
The aim of this paper was to investigate the problem of music data processing and mining in large databases. Tests were performed on a large data-base that included approximately 30000 audio files divided into 11 classes cor-responding to music genres with different cardinalities. Every audio file was de-scribed by a 173-element feature vector. To reduce the dimensionality of data the Principal Component Analysis (PCA) with variable...

Pełny tekst do pobrania w serwisie zewnętrznym
Nauka w świecie cyfrowym okiem młodego inżyniera - początki techniki wirtualnej rzeczywistości
Publikacja
- K. Fidurski
- P. Falkowski-Gilski
- Pismo PG - Rok 2022
Istnieje wiele definicji wirtualnej rzeczywistości (VR – Virtual Reality), które mniej lub bardziej pokrywają się ze sobą w różnych obszarach naukowych. Obecnie, gdy używamy określenia „VR”, odnosi się ono konkretnie do obrazów generowanych komputerowo, które zostały specjalnie zaprojektowane tak, aby dostarczyć jak najbardziej immersyjnych wrażeń. Sporo opracowań mówi również, że VR musi być interaktywna. To odróżniałoby ją od...

Pełny tekst do pobrania w serwisie zewnętrznym
Further Developments of the Online Sound Restoration System for Digital Library Applications
Publikacja
- Rok 2014
New signal processing algorithms were introduced to the online service for audio restoration available at the web address: www.youarchive.net. Missing or distorted audio samples are estimated using a specific implementation of the Jannsen interpolation method. The algorithm is based on the autoregressive model (AR) combined with the iterative complementation of signal samples. Since the interpolation algorithm is computationally...

Pełny tekst do pobrania w serwisie zewnętrznym
Oprogramowanie mobilnego komunikatora multimedialnego
Publikacja
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2010
Artykuł przedstawia efekty prac nad stworzeniem oprogramowania dla mobilnego komunikatora multimedialnego. Projektowane urządzenie ma umożliwić użytkownikom swobodną komunikację (tekstową, głosową, wideo) oraz możliwość lokalizowania innych użytkowników dzięki działającej w tle wymianie informacji o pozycji. W referacie zaprezentowano architekturę systemu oraz oprogramowania stworzonego w środowisku Qt realizującego założoną funkcjonalność....
Sparse autoregressive modeling
Publikacja
- M. Ciołek
- Rok 2012
In the paper the comparison of the popular pitch determination (PD) algorithms for thepurpose of elimination of clicks from archive audio signals using sparse autoregressive (SAR)modeling is presented. The SAR signal representation has been widely used in code-excitedlinear prediction (CELP) systems. The appropriate construction of the SAR model is requiredto guarantee model stability. For this reason the signal representation...
An Approach to Bass Enhancement in Portable Computers Employing Smart Virtual Bass Synthesis Algorithms
Publikacja
- Rok 2014
The aim of this paper is to present a novel approach to the Virtual Bass Synthesis (VBS) algorithms applied to portable computers. The developed algorithms are related to intelligent, rule-based setting of synthesis parameters according to music genre of an audio excerpt and to the type of a portable device in use. To find optimum synthesis parameters of the VBS algorithms, subjective listening tests based on a parametric procedure...

Pełny tekst do pobrania w serwisie zewnętrznym
Innovative method of localization airplanes in VCS (VCS-MLAT) distributed system
Publikacja
- S. Wiszniewski
- Rok 2019
The article presents the concept and the structure of the localization module. The prototype module is the part of the VCS (VCS-MLAT) localization distributed system. The device receives the audio signal transmitted in airplanes band (118 MHz – 136 MHz). Received data with the timestamps are send to the main server. The data from multiple devices estimates the localization of the airplane. The main aim of the project is the analysis...
Speech recognition system for hearing impaired people.
Publikacja
- P. Dalka
- A. Czyżewski
- Rok 2005
Praca przedstawia wyniki badań z zakresu rozpoznawania mowy. Tworzony system wykorzystujący dane wizualne i akustyczne będzie ułatwiał trening poprawnego mówienia dla osób po operacji transplantacji ślimaka i innych osób wykazujących poważne uszkodzenia słuchu. Active Shape models zostały wykorzystane do wyznaczania parametrów wizualnych na podstawie analizy kształtu i ruchu ust w nagraniach wideo. Parametry akustyczne bazują na...
Pomiar obrotów i przemieszczenia cząstek T-S w zlokalizowanej strefie deformacji.
Publikacja
- Rok 2003
Przedstawiono nowe stanowisko do badań eksperymentalnych w warunkach dwuosiowego ściskania z materiałem Taylor-Schneebeli (T-S). Unikatowe stanowisko badawcze, w skali badań mechaniki ośrodków rozdrobnionych w ogóle, wykorzystuje oryginalną technikę pomiarów cyfrowych dla określenia obrotów i przemieszczenia cząstek T-S w skali mikro. Cyfrowa technika pomiarów wykorzystuje konwencjonalny sprzęt wideo kamer lub aparatów...
Cross-domain applications of multimodal human-computer interfaces
Publikacja
- A. Czyżewski
- Rok 2015
Developed multimodal interfaces for education applications and for disabled people are presented, including interactive electronic whiteboard based on video image analysis, application for controlling computers with mouth gestures and audio interface for speech stretching for hearing impaired and stuttering people and intelligent pen allowing for diagnosing and ameliorating developmental dyslexia. The eye-gaze tracking system named...
Subjective and Objective Comparative Study of DAB+ Broadcast System
Publikacja
- P. Falkowski-Gilski
- J. Stefański
- Archives of Acoustics - Rok 2017
Broadcasting services seek to optimize their use of bandwidth in order to maximize user’s quality of experience. They aim to transmit high-quality digital speech and music signals at the lowest bitrate. They intend to offer the best quality under available conditions. Due to bandwidth limitations, audio quality is in conflict with the number of transmitted radio programs. This paper analyzes whether the quality of real-time digital...

Pełny tekst do pobrania w portalu
Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention
Publikacja
- D. Korzekwa
- R. Barra-Chicote
- S. Zaporowski
- G. Beringer
- J. Lorenzo-trueba
- A. Serafinowicz
- J. Droppo
- T. Drugman
- B. Kostek
- Rok 2021
This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as the syllable nucleus. We propose an attention-based deep learning model that automatically de...

Pełny tekst do pobrania w portalu
AAM toolkit: a system for visual object appearance modeling
Publikacja
- M. Smiatacz
- D. Sikora
- Rok 2010
Aktywne modele wyglądu (AAM) mogą być traktowane jako zaawansowana metoda analizy informacji multimedialnych, pozwalająca na lokalizowanie i rozpoznawanie obiektów w obrazach statycznych i sekwencjach wideo. Pomimo tego że ukazało się wiele publikacji dotyczących AAM, przejście od koncepcji teoretycznych do działającej implementacji stanowi nadal duże wyzwanie. W pracy przedstawiono przygotowany przez autorów pakiet oprogramowania...
Methodology and technology for the polymodal allophonic speech transcription
Publikacja
- Journal of the Acoustical Society of America - Rok 2016
A method for automatic audiovisual transcription of speech employing: acoustic and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e. the changes in the articulatory setting of speech organs for...

Pełny tekst do pobrania w serwisie zewnętrznym
Methodology and technology for the polymodal allophonic speech transcription
Publikacja
- Journal of the Acoustical Society of America - Rok 2016
A method for automatic audiovisual transcription of speech employing: acoustic, electromagnetical articulography and visual speech representations is developed. It adopts a combining of audio and visual modalities, which provide a synergy effect in terms of speech recognition accuracy. To establish a robust solution, basic research concerning the relation between the allophonic variation of speech, i.e., the changes in the articulatory...

Pełny tekst do pobrania w serwisie zewnętrznym
Sound engineering as our commitment to its creators in Poland
Publikacja
- B. Kostek
- A. Czyżewski
- Archives of Acoustics - Rok 2019
Sound engineering is an interdisciplinary and rapidly expanding domain. It covers many aspects, such as sound perception, studio and sound mastering technology, music information retrieval including content-based search systems and automatic music transcription frameworks, sound synthesis, sound restoration, electroacoustics, and other ones constituting multimedia technology. Moreover, machine learning methods applied to the topics...

Pełny tekst do pobrania w serwisie zewnętrznym
MODALITY corpus - SPEAKER 17 - SEQUENCE S1
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S4
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...
MODALITY corpus - SPEAKER 17 - SEQUENCE S2
Dane Badawcze
- seria: MODALITY corpus
The MODALITY corpus is one of the multimodal database of word recordings in English. It consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system,...

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: ARCHIWIZACJA AUDIO-WIDEO

Marek Olesz dr hab. inż.