Search results for: CORPUS - Bridge of Knowledge

Multimodal English corpus for automatic speech recognition

Publication

- Year 2013

A multimodal corpus developed for research of speech recognition based on audio-visual data is presented. Besides usual video and sound excerpts, the prepared database contains also thermovision images and depth maps. All streams were recorded simultaneously, therefore the corpus enables to examine the importance of the information provided by different modalities. Based on the recordings, it is also possible to develop a speech...

COLOUR TERMS IN INORGANIC CHEMISTRY: A CORPUS STUDY

Publication

D. Stanulewicz
K. Radomyski

- Scientific Journal of National Pedagogical Dragomanov University. Series 9. Current Trends in Language Development - Year 2022

Full text to download in external service

An audio-visual corpus for multimodal automatic speech recognition

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017

review of available audio-visual speech corpora and a description of a new multimodal corpus of English speech recordings is provided. The new corpus containing 31 hours of recordings was created specifically to assist audio-visual speech recognition systems (AVSR) development. The database related to the corpus includes high-resolution, high-framerate stereoscopic video streams from RGB cameras, depth imaging stream utilizing Time-of-Flight...

Full text available to download

THE ADJECTIVES LIGHT AND DARK IN ASTROPHYSICAL TEXTS: A CORPUS STUDY

Publication

D. Stanulewicz
K. Radomyski

- Scientific Journal of National Pedagogical Dragomanov University. Series 9. Current Trends in Language Development - Year 2023

Full text to download in external service

A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages

Publication

N. Khairova
O. Mamyrbayev
N. Rizun
M. Razno
G. Ybytayeva

- IEEE Access - Year 2023

These days, a lot of crime-related events take place all over the world. Most of them are reported in news portals and social media. Crime-related event extraction from the published texts can allow monitoring, analysis, and comparison of police or criminal activities in different countries or regions. Existing approaches to event extraction mainly suggest processing texts in English, French, Chinese, and some other resource-rich...

Full text available to download

The Presidential Campaign of Małgorzata Kidawa-Błońska in Media Discourse: Analysis Based on Statistical Corpus Analysis and Topic Modelling

Publication

W. Świerczyńska-Głownia
J. Wieczorek
T. Walkowiak

- Year 2024

Full text to download in external service

KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY

Publication

- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2016

W referacie zaprezentowano audiowizualny korpus mowy zawierający 31 godzin nagrań mowy w języku angielskim. Korpus dedykowany jest do celów automatycznego audiowizualnego rozpoznawania mowy. Korpus zawiera nagrania wideo pochodzące z szybkoklatkowej kamery stereowizyjnej oraz dźwięk zarejestrowany przez matrycę mikrofonową i mikrofon komputera przenośnego. Dzięki uwzględnieniu nagrań zarejestrowanych w warunkach szumowych korpus...

Once in a season – the pragmatic function of fuck in “BoJack Horseman” TV Show

Publication

B. Grobelna

- Galactica Media-Journal of Media Studies - Galaktika Media-Zhurnal Media Issledovanij - Year 2023

This article investigates the use and pragmatic functions of the swear word fuck in the “BoJack Horseman” produced by Netflix and bridges the gap in the linguistic research on this particular TVshow. Incorporating corpus linguistics tools, the BoJack Horseman Corpus was compiled and thelemma fuck has been investigated and analysed from the multimodal perspective....

Methodology of Constructing and Analyzing the Hierarchical Contextually-Oriented Corpora

Publication

- Year 2018

Methodology of Constructing and Analyzing the Hierarchical structure of the Contextually-Oriented Corpora was developed. The methodology contains the following steps: Contextual Component of the Corpora’s Structure Building; Text Analysis of the Contextually-Oriented Hierarchical Corpus. Main contribution of this study is the following: hierarchical structure of the Corpus provides advanced possibilities for identification of the...

Full text available to download

Phraseological Units in Audiovisual Translation. A Case Study of Polish Dubbing of Disney’s 'The Little Mermaid'

Publication

P. Golda
J. Mężyk

- Kwartalnik Neofilologiczny - Year 2021

The paper aims to discuss phraseological units as the object of audiovisual translation in the Polish dubbing of Disney’s 'The Little Mermaid', to discuss the role of phraseological translation techniques, and to present possible translation inconsistencies. A theoretical introduction presents definitions for crucial terms. It is followed by the analysis of the corpus of phraseological units in Disney’s The Little Mermaid and...

Full text available to download

Agile Commerce in the light of Text Mining

Publication

A. Baj-Rogowska

- Przedsiębiorczość i Zarządzanie - Year 2017

The survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...

Full text available to download

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Publication

D. Korzekwa
J. Lorenzo-trueba
T. Drugman
S. Calamaro
B. Kostek

- Year 2021

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonetic transcriptions for L2 speech means that the model has to learn only from a weak signal of word-level mispronunciations. Because of that and due to the limited amount of mispronounced...

Full text available to download

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Publication

A. Czyżewski
B. Kostek
T. Ciszewski
D. Majewicz

- Year 2013

The bi-modal speech recognition system requires a 2-sample language input for training and for testing algorithms which precisely depicts natural English speech. For the purposes of the audio-visual recordings, a training data base of 264 sentences (1730 words without repetitions; 5685 sounds) has been created. The language sample reflects vowel and consonant frequencies in natural speech. The recording material reflects both the...

Unités phraséologiques au pays de la traduction: transfert des collocations nomino-adjectivales avec le lexème «femme» dans la traduction de la littérature houellebecquienne du français vers l’italien et le polonais

Publication

P. Golda

- Linguistica silesiana - Year 2022

The present paper examines the transfer of nomino-adjectival collocations based on the word ‘femme’ (‘woman’) in the literary translation from French into Italian and Polish. The lexical connection analysed in the article can be defined as the habitual juxtaposition of a word with another word (or words) that has a significant frequency in a given language. The research corpus comprises seven Michel Houellebecq’s novels written...

Full text available to download

Contextual ontology for tonality assessment

Publication

- Procedia Computer Science - Year 2020

classification tasks. The discussion focuses on two important research hypotheses: (1) whether it is possible to construct such an ontology from a corpus of textual document, and (2) whether it is possible and beneficial to use inferencing from this ontology to support the process of sentiment classification. To support the first hypothesis we present a method of extraction of hierarchy of contexts from a set of textual documents...

Full text available to download

S’attaquer à la suprématie du masculin sur le féminin : le français inclusif dans les publications des universités françaises dans les réseaux sociaux

Publication

P. Golda
N. Żywicka
V. Ferreira Vieira

- Neophilologica. Etudes semantico-syntaxiques des langues romanes. Prace Naukowe Uniwersytetu Slaskiego w Katowicach - Year 2021

This paper aims to examine the use of inclusive French in the Internet publications of Paris universities on their social media. Three higher education institutions were selected: Paris Dauphine-PSL University, Gustave Eiffel University, and Sorbonne Paris North University. The publications were obtained from Facebook, Instagram, and LinkedIn. Firstly, the groups of people to whom the use of inclusive French referred...

Full text available to download

Electrochemical Evaluation of Sustainable Corrosion Inhibitors via Dynamic Electrochemical Impedance Spectroscopy

Publication

P. Ślepski
H. Gerengi
G. Gece
E. Kaya
M. Rizvi
M. Szociński

- Year 2021

Finding suitable measurement methods for the effective management of electrochemical problems is of paramount importance, particularly for improving efficiency in corrosion protection. The need for accurate measurement techniques specific to nonstationary conditions has long been recognized, and promising approaches have emerged. This chapter introduces dynamic electrochemical impedance spectroscopy as a novel advancement in electrochemistry...

Full text to download in external service

Constructing a Dataset of Speech Recordingswith Lombard Effect

Publication

D. Weber
S. Zaporowski
D. Korzekwa

- Year 2020

Thepurpose of therecordings was to create a speech corpus based on the ISLEdataset, extended with video and Lombard speech. Selected from a set of 165sentences, 10, evaluatedas having thehighest possibility to occur in the context ofthe Lombard effect,were repeated in the presence of the so-called babble speech to obtain Lombard speech features. Altogether,15speakers were recorded, and speech parameterswere...

Enriching the Context: Methods of Improving the Non-contextual Assessment of Sentence Credibility

Publication

A. Nabożny
B. Balcerzak
D. Korzinek

- Year 2019

This paper presents several methods of automatic context enrichment of sentences that need to be evaluated, tagged or fact-checked by human judges. We have created a corpus of medical Web articles. Sentences from this corpus have been fact-checked by medical experts in two modes: contextually (reading the entire article and evaluating sentence by sentence) and without context (evaluating sentences from all articles in random order)....

Full text to download in external service

English, French, and Polish Aliases of Criminals: Diversity of Inspirations in their Creation and Typical Nicknaming Schemes

Publication

P. Golda
J. Mężyk

- Academic Journal of Modern Philology - Year 2021

The present paper examines the topic of aliases of criminals, which seems to be understudied in linguistic research. Therefore, this article’s primary goal is to describe how criminals’ aliases are created and what are the differences and similarities in that process in English, French, and Polish. Firstly, the theoretical background concerning the topic of pseudonyms is presented. Then, the corpus gathered for this paper (available...

Full text available to download

A comparative study of English viseme recognition methods and algorithm

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018

An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector...

Full text available to download

A comparative study of English viseme recognition methods and algorithms

Publication

- MULTIMEDIA TOOLS AND APPLICATIONS - Year 2018

An elementary visual unit – the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction...

Full text available to download

Sésame, ouvre-toi: internationalisme phraséologique à contenu universel

Publication

P. Golda
O. Karabag
J. Ryszka

- Studia Linguistica - Year 2023

Phraseological units, characterised by their opaque meaning, are the subject of multiple theoretical works. The following article adds to this discussion by providing another interesting example. It analyses the case of the Arabic phraseological unit ‘open sesame’ from the “Ali Baba and the Forty Thievesˮ folk tale, permeating into French, Italian, Polish, Turkish and Japanese – languages distant both linguistically and culturally....

Full text available to download

Investigating Feature Spaces for Isolated Word Recognition

Publication

P. Treigys
G. Korvel
G. Tamulevicius
J. Bernataviciene
B. Kostek

- Year 2020

The study addresses the issues related to the appropriateness of a two-dimensional representation of speech signal for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and time-frequency signal representation converted to the investigated feature spaces. In particular, waveforms and fractal dimension features of the signal were chosen for the time domain, and...

Full text to download in external service

Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

Publication

- Year 2018

This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

Full text available to download

Reaktywny system oddziaływania ze środowiskiem oparty na inteligentnym systemie decyzyjnym

Publication

Z. Kowalczuk

- Year 2009

Procesy poznawcze zachodzące w umyśle człowieka, po matematycznym zamodelowaniu i algorytmizacji, mogą by wykorzystane do konstruowania inteligentnych systemów decyzyjnych. Systemy takie mają wielorakie zastosowania. Znaleźć można je między innymi w rozmaitych autonomicznych systemach informatyki, automatyki i robotyki: począwszy od 'inteligentnego' strażnika, kamerdynera, itp., a skończywszy na opiekunie - wirtualnym towarzyszu...

Investigating Feature Spaces for Isolated Word Recognition

Publication

G. Korvel
G. Tamulevicus
P. Treigys
J. Bernataviciene
B. Kostek

- Year 2018

Much attention is given by researchers to the speech processing task in automatic speech recognition (ASR) over the past decades. The study addresses the issue related to the investigation of the appropriateness of a two-dimensional representation of speech feature spaces for speech recognition tasks based on deep learning techniques. The approach combines Convolutional Neural Networks (CNNs) and timefrequency signal representation...

Audio Feature Analysis for Precise Vocalic Segments Classification in English

Publication

- Year 2020

An approach to identifying the most meaningful Mel-Frequency Cepstral Coefficients representing selected allophones and vocalic segments for their classification is presented in the paper. For this purpose, experiments were carried out using algorithms such as Principal Component Analysis, Feature Importance, and Recursive Parameter Elimination. The data used were recordings made within the ALOFON corpus containing audio signal...

Full text to download in external service

Glossary [Intellectual Output 1] Glossary as a method for reflection on complex research questions

Publication

- Year 2022

Globalization and digitization are strongly influencing the process of shaping the built environment. The latter is causing the new design tools to emerge faster than ever before in history, while the former is speeding up not only the development, but also the broad roll-out of more agile and interdisciplinary methodologies and work approaches. The design process is also becoming more and more inter- and trans-disciplinary. This...

Full text to download in external service

The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models

Publication

N. Rizun
W. Waloszek
Y. Taranenko

- Year 2017

This paper presents the algorithm of modelling and analysis of Latent Semantic Relations inside the argumentative type of documents collection. The novelty of the algorithm consists in using a systematic approach: in the combination of the probabilistic Latent Dirichlet Allocation (LDA) and Linear Algebra based Latent Semantic Analysis (LSA) methods; in considering each document as a complex of topics, defined on the basis of separate...

Full text available to download

Filters

Catalog

Category

Year

Options

Multimodal English corpus for automatic speech recognition

COLOUR TERMS IN INORGANIC CHEMISTRY: A CORPUS STUDY

An audio-visual corpus for multimodal automatic speech recognition

THE ADJECTIVES LIGHT AND DARK IN ASTROPHYSICAL TEXTS: A CORPUS STUDY

A Parallel Corpus-Based Approach to the Crime Event Extraction for Low-Resource Languages

The Presidential Campaign of Małgorzata Kidawa-Błońska in Media Discourse: Analysis Based on Statistical Corpus Analysis and Topic Modelling

KORPUS MOWY ANGIELSKIEJ DO CELÓW MULTIMODALNEGO AUTOMATYCZNEGO ROZPOZNAWANIA MOWY

Once in a season – the pragmatic function of fuck in “BoJack Horseman” TV Show

Methodology of Constructing and Analyzing the Hierarchical Contextually-Oriented Corpora

Phraseological Units in Audiovisual Translation. A Case Study of Polish Dubbing of Disney’s 'The Little Mermaid'

Agile Commerce in the light of Text Mining

Weakly-Supervised Word-Level Pronunciation Error Detection in Non-Native English Speech

Language material for English audiovisual speech recognition system developmen . Materiał językowy do wykorzystania w systemie audiowizualnego rozpoznawania mowy angielskiej

Unités phraséologiques au pays de la traduction: transfert des collocations nomino-adjectivales avec le lexème «femme» dans la traduction de la littérature houellebecquienne du français vers l’italien et le polonais

Contextual ontology for tonality assessment

S’attaquer à la suprématie du masculin sur le féminin : le français inclusif dans les publications des universités françaises dans les réseaux sociaux

Electrochemical Evaluation of Sustainable Corrosion Inhibitors via Dynamic Electrochemical Impedance Spectroscopy

Constructing a Dataset of Speech Recordingswith Lombard Effect

Enriching the Context: Methods of Improving the Non-contextual Assessment of Sentence Credibility

English, French, and Polish Aliases of Criminals: Diversity of Inspirations in their Creation and Typical Nicknaming Schemes

A comparative study of English viseme recognition methods and algorithm

A comparative study of English viseme recognition methods and algorithms

Sésame, ouvre-toi: internationalisme phraséologique à contenu universel

Investigating Feature Spaces for Isolated Word Recognition

Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

Reaktywny system oddziaływania ze środowiskiem oparty na inteligentnym systemie decyzyjnym

Investigating Feature Spaces for Isolated Word Recognition

Audio Feature Analysis for Precise Vocalic Segments Classification in English

Glossary [Intellectual Output 1] Glossary as a method for reflection on complex research questions

The Algorithm of Modelling and Analysis of Latent Semantic Relations: Linear Algebra vs. Probabilistic Topic Models