Publications
Filters
total: 911
Catalog Publications
Year 2024
-
3-D Printable Metal-Dielectric Metasurface for Risley Prism-Based Beam-Steering Antennas
PublicationA 3-D printable, planar, metal-dielectric metasurface-based, 2-D beam-steering system for aperture-type antennas is presented in this paper. This beam steering system, also known as the near-field meta-steering system, comprises two fully passive phase-gradient metasurfaces placed in the antenna’s nearfield region to steer the radiation beam. To address the non-uniform electric field phase of the aperture antenna, phase correction...
-
A Comparison of Directional Beamforming Capabilities: High-Order Ambisonic Microphone vs. Shotgun Microphones
PublicationThis article presents the practical implications of the directional beamforming capability of a higher-order ambisonic microphone compared with popular shotgun microphones. Five different microphones were used in the study: Sennheiser MKH 416, Rode NTG2, Panasonic AG-MC200, Zoom SGH-6, and Zylia ZM-1 (ambisonic microphone). The results highlight the versatility of higher-order ambisonics for non-immersive use, which allows for...
-
A Mammography Data Management Application for Federated Learning
PublicationThis study aimed to develop and assess an application designed to enhance the management of a local client database consisting of mammographic images with a focus on ensuring that images are suitably and uniformly prepared for federated learning applications. The application supports a comprehensive approach, starting with a versatile image-loading function that supports DICOM files from various medical imaging devices and settings....
-
Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning
PublicationIn this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KDbased methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher...
-
Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
Publication"Generalized Continual Category Discovery (GCCD) tackles learning from sequentially arriving, partially labeled datasets while uncovering new categories. Traditional methods depend on feature distillation to prevent forgetting the old knowledge. However, this strategy restricts the model’s ability to adapt and effectively distinguish new categories. To address this, we introduce a novel technique integrating a learnable projector...
-
Decoding imagined speech for EEG-based BCI
PublicationBrain–computer interfaces (BCIs) are systems that transform the brain's electrical activity into commands to control a device. To create a BCI, it is necessary to establish the relationship between a certain stimulus, internal or external, and the brain activity it provokes. A common approach in BCIs is motor imagery, which involves imagining limb movement. Unfortunately, this approach allows few commands. As an alternative, this...
-
Deep learning techniques for biometric security: A systematic review of presentation attack detection systems
PublicationBiometric technology, including finger vein, fingerprint, iris, and face recognition, is widely used to enhance security in various devices. In the past decade, significant progress has been made in improving biometric sys- tems, thanks to advancements in deep convolutional neural networks (DCNN) and computer vision (CV), along with large-scale training datasets. However, these systems have become targets of various attacks, with...
-
Developing a Low SNR Resistant, Text Independent Speaker Recognition System for Intercom Solutions - A Case Study
PublicationThis article presents a case study on the development of a biometric voice verification system for an intercom solution, utilizing the DeepSpeaker neural network architecture. Despite the variety of solutions available in the literature, there is a noted lack of evaluations for "text-independent" systems under real conditions and with varying distances between the speaker and the microphone. This article aims to bridge this gap....
-
Divide and not forget: Ensemble of selectively trained experts in Continual Learning
PublicationClass-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are usually trained all at once using whole task data, which makes them all prone to forgetting and increasing computational burden. To address this limitation,...
-
English Language Learning Employing Developments in Multimedia IS
PublicationIn the realm of the development of information systems related to education, integrating multimedia technologies offers novel ways to enhance foreign language learning. This study investigates audio-video processing methods that leverage real-time speech rate adjustment and dynamic captioning to support English language acquisition. Through a mixed-methods analysis involving participants from a language school, we explore the impact...
-
Exploring music listening patterns: an online survey
PublicationAn online survey was carried out to explore how respondents listen to music recordings. It was anticipated that the listener’s preferences would be influenced by various factors, such as age, music genre, the contexts in which they listen, and their favored methods of music consumption. Consequently, the data were collected to analyze these relationships. The survey, structured as a web application, encompassed 23 questions,...
-
Finger Vein Presentation Attack Detection Method Using a Hybridized Gray-Level Co-Occurrence Matrix Feature with Light-Gradient Boosting Machine Model
PublicationPresentation Attack Detection (PAD) is crucial in biometric finger vein recognition. The susceptibility of these systems to forged finger vein images is a significant challenge. Existing approaches to mitigate presentation attacks have computational complexity limitations and limited data availability. This study proposed a novel method for identifying presentation attacks in finger vein biometric systems. We have used optimal...
-
High frequency oscillations in human memory and cognition: a neurophysiological substrate of engrams?
PublicationDespite advances in understanding the cellular and molecular processes underlying memory and cognition, and recent successful modulation of cognitive performance in brain disorders, the neurophysiological mechanisms remain underexplored. High frequency oscillations beyond the classic electroencephalogram spectrum have emerged as a potential neural correlate of fundamental cognitive processes. High frequency oscillations are detected...
-
Identyfikacja instrumentu muzycznego z nagrania fonicznego za pomocą sztucznych sieci neuronowych
PublicationCelem rozprawy jest zbadanie algorytmów do identyfikacji instrumentów występujących w sygnale polifonicznym z wykorzystaniem sztucznych sieci neuronowych. W części teoretycznej przywołano podstawy przetwarzania sygnałów fonicznych w kontekście ekstrakcji parametrów sygnałów wykorzystywanych w treningu sieci neuronowych. Dodatkowo dokonano analizy rozwoju metod uczenia maszynowego z uwzględnieniem podziału na sieci neuronowe pierwszej,...
-
Improving platelet‐RNA‐based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification
PublicationLiquid biopsy demonstrates excellent potential in patient management by providing a minimally invasive and cost-effective approach to detecting and monitoring cancer, even at its early stages. Due to the complexity of liquid biopsy data, machine-learning techniques are increasingly gaining attention in sample analysis, especially for multidimensional data such as RNA expression profiles. Yet, there is no agreement in the community...
-
Learning sperm cells part segmentation with class-specific data augmentation
PublicationInfertility affects around 15% of couples worldwide. Male fertility problems include poor sperm quality and low sperm count. The advanced fertility treatment methods like ICSI are nowadays supported by vision systems to assist embryologists in selecting good quality sperm. Computer-Assisted Semen Analysis (CASA) provides quantitative and qualitative sperm analysis concerning concentration, motility, morphology, vitality, and fragmentation....
-
Leveraging Activation Maps for Improved Acoustic Events Detection and Classification
PublicationThis paper presents a novel approach to enhance the accuracy of deep learning models for acoustic event detection and classification in real-world environments. We introduce a method that leverages activation maps to identify and address model overfitting, combined with an expert-knowledge-based event detection algorithm for data pre-processing. Our approach significantly improved classification performance, increasing the F1 score...
-
Looking through the past: better knowledge retention for generative replay in continual learning
PublicationIn this work, we improve the generative replay in a continual learning setting to perform well on challenging scenarios. Because of the growing complexity of continual learning tasks, it is becoming more popular, to apply the generative replay technique in the feature space instead of image space. Nevertheless, such an approach does not come without limitations. In particular, we notice the degradation of the continually trained...
-
MagMax: Leveraging Model Merging for Seamless Continual Learning
PublicationThis paper introduces a continual learning approach named MagMax, which utilizes model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Distinct from traditional continual learning methods that aim to reduce forgetting during task training, MagMax combines sequential fine-tuning with a maximum magnitude weight selection for effective knowledge integration...
-
Missing Puzzle Pieces in Dementia Research: HCN Channels and Theta Oscillations
PublicationIncreasing evidence indicates a role of hyperpolarization activated cation (HCN) channels in controlling the resting membrane potential, pacemaker activity, memory formation, sleep, and arousal. Their disfunction may be associated with the development of epilepsy and age-related memory decline. Neuronal hyperexcitability involved in epileptogenesis and EEG desynchronization occur in the course of dementia in human Alzheimer’s Disease...
-
Mobilenet-V2 Enhanced Parkinson's Disease Prediction with Hybrid Data Integration
PublicationThis study investigates the role of deep learning models, particularly MobileNet-v2, in Parkinson's Disease (PD) detection through handwriting spiral analysis. Handwriting difficulties often signal early signs of PD, necessitating early detection tools due to potential impacts on patients' work capacities. The study utilizes a three-fold approach, including data augmentation, algorithm development for simulated PD image datasets,...
-
Opracowanie metodologii rozpoznawania i klasyfikowania emocji w filmach przy użyciu sztucznych sieci neuronowych
PublicationCelem rozprawy doktorskiej jest opracowanie metodologii pozwalającej na rozpoznawanie i klasyfikację emocji w filmie za pomocą sztucznych sieci neuronowych. W pracy przedstawiono tematykę związaną z kolorowaniem sceny filmowej w kontekście oddziaływania koloru na emocje widza. W celu analizy wpływu filmow na emocje widza dokonano wyboru tytułow filmowych, następnie przeprowadzono szereg wstępnych testow subiektywnych pozwalających...
-
Reverberation divergence in VR applications
PublicationThis project aimed to investigate the correlation between virtual reality (VR) imagery and ambisonic sound. With the increasing popularity of VR applications, understanding how sound is perceived in virtual environments is crucial for enhancing the immersiveness of the experience. In the experiment, participants were immersed in a virtual environment that replicated a concert hall. Their task was to assess the correspondence between...
-
Revisiting Supervision for Continual Representation Learning
Publication"In the field of continual learning, models are designed to learn tasks one after the other. While most research has centered on supervised continual learning, there is a growing interest in unsupervised continual learning, which makes use of the vast amounts of unlabeled data. Recent studies have highlighted the strengths of unsupervised methods, particularly self-supervised learning, in providing robust representations. The improved...
-
Sounding Mechanism of a Flue Organ Pipe—A Multi-Sensor Measurement Approach
PublicationThis work presents an approach that integrates the results of measuring, analyzing, and modeling air flow phenomena driven by pressurized air in a flue organ pipe. The investigation concerns a Bourdon organ pipe. Measurements are performed in an anechoic chamber using the Cartesian robot equipped with a 3D acoustic vector sensor (AVS) that acquires both acoustic pressure and air particle velocity. Also, a high-speed camera is employed...
-
Task-recency bias strikes back: Adapting covariances in Exemplar-Free Class Incremental Learning
PublicationExemplar-Free Class Incremental Learning (EFCIL) tackles the problem of training a model on a sequence of tasks without access to past data. Existing state-of-the-art methods represent classes as Gaussian distributions in the feature extractor's latent space, enabling Bayes classification or training the classifier by replaying pseudo features. However, we identify two critical issues that compromise their efficacy when the feature...
-
The Impact of Foreign Accents on the Performance of Whisper Family Models Using Medical Speech in Polish
PublicationThe article presents preliminary experiments investigating the impact of accent on the performance of the Whisper automatic speech recognition (ASR) system, specifically for the Polish language and medical data. The literature review revealed a scarcity of studies on the influence of accents on speech recognition systems in Polish, especially concerning medical terminology. The experiments involved voice cloning of selected individuals...
Year 2023
-
A commonly-accessible toolchain for live streaming music events with higher-order ambisonic audio and 4k 360 vision
PublicationAn immersive live stream is especially interesting in the ongoing development of telepresence tools, especially in the virtual reality (VR) or mixed reality (MR) domain. This paper explores the remote and immersive way of enabling telepresence for the audience to high-fidelity music performance using freely-available and easily-accessible tools. A functional VR live-streaming toolchain, comprising 360 vision and higher-order ambisonic...
-
A survey of automatic speech recognition deep models performance for Polish medical terms
PublicationAmong the numerous applications of speech-to-text technology is the support of documentation created by medical personnel. There are many available speech recognition systems for doctors. Their effectiveness in languages such as Polish should be verified. In connection with our project in this field, we decided to check how well the popular speech recognition systems work, employing models trained for the general Polish language....
-
An automated, low-latency environment for studying the neural basis of behavior in freely moving rats
PublicationBackground Behavior consists of the interaction between an organism and its environment, and is controlled by the brain. Brain activity varies at sub-second time scales, but behavioral measures are usually coarse (often consisting of only binary trial outcomes). Results To overcome this mismatch, we developed the Rat Interactive Foraging Facility (RIFF): a programmable interactive arena for freely moving rats with multiple feeding...
-
Applying the Lombard Effect to Speech-in-Noise Communication
PublicationThis study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...
-
Autoencoder application for anomaly detection in power consumption of lighting systems
PublicationDetecting energy consumption anomalies is a popular topic of industrial research, but there is a noticeable lack of research reported in the literature on energy consumption anomalies for road lighting systems. However, there is a need for such research because the lighting system, a key element of the Smart City concept, creates new monitoring opportunities and challenges. This paper examines algorithms based on the deep learning...
-
Automatic audio signal mixing system based on one-dimensional Wave-U-Net autoencoders
PublicationThe purpose of this dissertation is to develop an automatic song mixing system that is capable of automatically mixing a song with good quality in any music genre. This work recalls first the audio signal processing methods used in audio mixing, and it describes selected methods for automatic audio mixing. Then, a novel architecture built based on one-dimensional Wave-U-Net autoencoders is proposed for automatic music mixing. Models...
-
Bimodal Emotion Recognition Based on Vocal and Facial Features
PublicationEmotion recognition is a crucial aspect of human communication, with applications in fields such as psychology, education, and healthcare. Identifying emotions accurately is challenging, as people use a variety of signals to express and perceive emotions. In this study, we address the problem of multimodal emotion recognition using both audio and video signals, to develop a robust and reliable system that can recognize emotions...
-
Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice
PublicationThe vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron,...
-
Computer-Aided Diagnosis of COVID-19 from Chest X-ray Images Using Hybrid-Features and Random Forest Classifier
PublicationIn recent years, a lot of attention has been paid to using radiology imaging to automatically find COVID-19. (1) Background: There are now a number of computer-aided diagnostic schemes that help radiologists and doctors perform diagnostic COVID-19 tests quickly, accurately, and consistently. (2) Methods: Using chest X-ray images, this study proposed a cutting-edge scheme for the automatic recognition of COVID-19 and pneumonia....
-
Cross-Lingual Knowledge Distillation via Flow-Based Voice Conversion for Robust Polyglot Text-to-Speech
PublicationIn this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations...
-
Detection of Water on Road Surface with Acoustic Vector Sensor
PublicationThis paper presents a new approach to detecting the presence of water on a road surface, employing an acoustic vector sensor. The proposed method is based on sound intensity analysis in the frequency domain. Acoustic events, representing road vehicles, are detected in the sound intensity signals. The direction of the incoming sound is calculated for the individual spectral components of the intensity signal, and the components...
-
Digital Transformation and Its Influence on Sustainable Manufacturing and Business Practices
PublicationThe paper focuses on the relationship between businesses and digital transformation, and how digital transformation has changed manufacturing in several ways. Aspects like Cloud Computing, vertical and horizontal integration, data communication, and the internet have contributed to sustainable manufacturing by decentralizing supply chains. In addition, digital transformation inventions such as predictive analysis and big data analytics...
-
Direct electrical brain stimulation of human memory: lessons learnt and future perspectives
PublicationModulation of cognitive functions supporting human declarative memory is one of the grand challenges of neuroscience, and of vast importance for a variety of neuropsychiatric, neurodegenerative and neurodevelopmental diseases. Despite a recent surge of successful attempts at improving performance in a range of memory tasks, the optimal approaches and parameters for memory enhancement have yet to be determined. On a more fundamental...
-
Distinct hippocampal-prefrontal neural assemblies coordinate memory encoding, maintenance, and recall
PublicationShort-term memory enables incorporation of recent experience into subsequent decision-making. This processing recruits both the prefrontal cortex and hippocampus, where neurons encode task cues, rules, and outcomes. However, precisely which information is carried when, and by which neurons, remains unclear. Using population decoding of activity in rat medial prefrontal cortex (mPFC) and dorsal hippocampal CA1, we confirm that mPFC...
-
Driver’s Condition Detection System Using Multimodal Imaging and Machine Learning Algorithms
PublicationTo this day, driver fatigue remains one of the most significant causes of road accidents. In this paper, a novel way of detecting and monitoring a driver’s physical state has been proposed. The goal of the system was to make use of multimodal imaging from RGB and thermal cameras working simultaneously to monitor the driver’s current condition. A custom dataset was created consisting of thermal and RGB video samples. Acquired data...
-
Energy consumption optimization in wastewater treatment plants: Machine learning for monitoring incineration of sewage sludge
PublicationBiomass management in terms of energy consumption optimization has become a recent challenge for developed countries. Nevertheless, the multiplicity of materials and operating parameters controlling energy consumption in wastewater treatment plants necessitates the need for sophisticated well-organized disciplines in order to minimize energy consumption and dissipation. Sewage sludge (SS) disposal management is the key stage of...
-
Ensembling noisy segmentation masks of blurred sperm images
PublicationBackground: Sperm tail morphology and motility have been demonstrated to be important factors in determining sperm quality for in vitro fertilization. However, many existing computer-aided sperm analysis systems leave the sperm tail out of the analysis, as detecting a few tail pixels is challenging. Moreover, some publicly available datasets for classifying morphological defects contain images limited only to the sperm head. This...
-
Facilitating free travel in the Schengen area—A position paper by the European Association for Biometrics
PublicationDue to migration, terror-threats and the viral pandemic, various EU member states have re-established internal border control or even closed their borders. European Association for Biometrics (EAB), a non-profit organisation, solicited the views of its members on ways which biometric technologies and services may be used to help with re-establishing open borders within the Schengen area while at the same time mitigating any adverse...
-
How Can We Identify Electrophysiological iEEG Activities Associated with Cognitive Functions?
PublicationElectrophysiological activities of the brain are engaged in its various functions and give rise to a wide spectrum of low and high frequency oscillations in the intracranial EEG (iEEG) signals, commonly known as the brain waves. The iEEG spectral activities are distributed across networks of cortical and subcortical areas arranged into hierarchical processing streams. It remains a major challenge to identify these activities in...
-
Intra-subject class-incremental deep learning approach for EEG-based imagined speech recognition
PublicationBrain–computer interfaces (BCIs) aim to decode brain signals and transform them into commands for device operation. The present study aimed to decode the brain activity during imagined speech. The BCI must identify imagined words within a given vocabulary and thus perform the requested action. A possible scenario when using this approach is the gradual addition of new words to the vocabulary using incremental learning methods....
-
Multimedia industrial and medical applications supported by machine learning
PublicationThis article outlines a keynote paper presented at the Intelligent DecisionTechnologies conference providing a part of the KES Multi-theme Conference “Smart Digital Futures” organized in Rome on June 14–16, 2023. It briefly discusses projects related to traffic control using developed intelligent traffic signs and diagnosing the health of wind turbine mechanisms and multimodal biometric authentication for banking branches to provide...
-
Neural Graph Collaborative Filtering: Analysis of Possibilities on Diverse Datasets
PublicationThis paper continues the work by Wang et al. [17]. Its goal is to verify the robustness of the NGCF (Neural Graph Collaborative Filtering) technique by assessing its ability to generalize across different datasets. To achieve this, we first replicated the experiments conducted by Wang et al. [17] to ensure that their replication package is functional. We received sligthly better results for ndcg@20 and somewhat poorer results for...
-
Optimizing Medical Personnel Speech Recognition Models Using Speech Synthesis and Reinforcement Learning
PublicationText-to-Speech synthesis (TTS) can be used to generate training data for building Automatic Speech Recognition models (ASR). Access to medical speech data is because it is sensitive data that is difficult to obtain for privacy reasons; TTS can help expand the data set. Speech can be synthesized by mimicking different accents, dialects, and speaking styles that may occur in a medical language. Reinforcement Learning (RL), in the...