Filters
total: 474
filtered: 287
Chosen catalog filters
Search results for: DATASET QUALITY
-
Segmentation Quality Refinement in Large-Scale Medical Image Dataset with Crowd-Sourced Annotations
PublicationDeployment of different techniques of deep learning including Convolutional Neural Networks (CNN) in image classification systems has accomplished outstanding results. However, the advantages and potential impact of such a system can be completely negated if it does not reach a target accuracy. To achieve high classification accuracy with low variance in medical image classification system, there is needed the large size of the...
-
Dataset Characteristics and Their Impact on Offline Policy Learning of Contextual Multi-Armed Bandits
PublicationThe Contextual Multi-Armed Bandits (CMAB) framework is pivotal for learning to make decisions. However, due to challenges in deploying online algorithms, there is a shift towards offline policy learning, which relies on pre-existing datasets. This study examines the relationship between the quality of these datasets and the performance of offline policy learning algorithms, specifically, Neural Greedy and NeuraLCB. Our results...
-
RDF dataset profiling - a survey of features, methods, vocabularies and applications
PublicationThe Web of Data, and in particular Linked Data, has seen tremendous growth over the past years. However, reuse and take-up of these rich data sources is often limited and focused on a few well-known and established RDF datasets. This can be partially attributed to the lack of reliable and up-to-date information about the characteristics of available datasets. While RDF datasets vary heavily with respect to the features related...
-
Effective Air Quality Prediction Using Reinforced Swarm Optimization and Bi-Directional Gated Recurrent Unit
PublicationIn the present scenario, air quality prediction (AQP) is a complex task due to high variability, volatility, and dynamic nature in space and time of particulates and pollutants. Recently, several nations have had poor air quality due to the high emission of particulate matter (PM2.5) that affects human health conditions, especially in urban areas. In this research, a new optimization-based regression model was implemented for effective...
-
Balanced Spider Monkey Optimization with Bi-LSTM for Sustainable Air Quality Prediction
PublicationA reliable air quality prediction model is required for pollution control, human health monitoring, and sustainability. The existing air quality prediction models lack efficiency due to overfitting in prediction model and local optima trap in feature selection. This study proposes the Balanced Spider Monkey Optimization (BSMO) technique for effective feature selection to overcome the local optima trap and overfitting problems....
-
Using Synchronously Registered Biosignals Dataset for Teaching Basics of Medical Data Analysis – Case Study
PublicationMedical data analysis and processing strongly relies on the data quality itself. The correct data registration allows many unnecessary steps in data processing to be avoided. Moreover, it takes a certain amount of experience to acquire data that can produce replicable results. Because consistency is crucial in the teaching process, students have access to pre-recorded real data without the necessity of using additional equipment...
-
Methods for quality improvement of multibeam and LiDAR point cloud data in the context of 3D surface reconstruction
PublicationPoint cloud dataset is the transitional data model used in several marine and land remote-sensing applications. During further steps of processing, the transformation of point cloud spatial data to more complex models containing higher order geometric structures like edges and facets may be possible, if an appropriate quality level of input data is provided. Point cloud datasets usually contain a considerable amount of undesirable...
-
Influence of Soft Soil Samples Quality on the Compressibility and Undrained Shear Strength – Seven Lessons Learned From the Vistula Marshlands
PublicationThis technical article presents the influence of sample quality on the compressibility parameters and undrained shear strength ( c u ) of soft soils from the Vistula Marshlands. The analysis covers: (1) quality of soft soil according to three criteria: void ratio (Δ e / e 0 index), volumetric strain (Δ ɛ v ) and C r / C c ratio; (2) influence of storage time on quality; (3) influence of sample quality on undrained shear strength...
-
Detection of the Oocyte Orientation for the ICSI Method Automation
PublicationAutomation or even computer assistance of the popular infertility treatment method: ICSI (Intracytoplasmic Sperm Injection) would speed up the whole process and improve the control of the results. This paper introduces a preliminary research for automatic spermatozoon injection into the oocyte cytoplasm. Here, the method for detection a correct orientation of the polar body of the oocyte is presented. Proposed method uses deep...
-
Using Isolation Forest and Alternative Data Products to Overcome Ground Truth Data Scarcity for Improved Deep Learning-based Agricultural Land Use Classification Models
PublicationHigh-quality labelled datasets represent a cornerstone in the development of deep learning models for land use classification. The high cost of data collection, the inherent errors introduced during data mapping efforts, the lack of local knowledge, and the spatial variability of the data hinder the development of accurate and spatially-transferable deep learning models in the context of agriculture. In this paper, we investigate...
-
Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia
PublicationThe paper presents an approach to build references (also called mappings) between WordNet and Wikipedia. We propose four algorithms used for automatic construction of the references. Then, based on an aggregation algorithm, we produce an initial set of mappings that has been evaluated in a cooperative way. For that purpose, we implement a system for the distribution of evaluation tasks, that have been solved by the user community....
-
Simplified AutoDock force field for hydrated binding sites
Publicationhas been extracted from the Protein Data Bank and used to test and recalibrate AutoDock force field. Since for some binding sites water molecules are crucial for bridging the receptor-ligand interactions, they have to be included in the analysis. To simplify the process of incorporating water molecules into the binding sites and make it less ambiguous, new simple water model was created. After recalibration of the force field on...
-
Cyanobacterial and Algal Strains in the Culture Collection of Baltic Algae (CCBA)
PublicationThe dataset titled Microalgal strains from “Culture Collection of Baltic Algae (CCBA)” is a representation of cyanobacterial and microalgal cultures isolated from the Baltic Sea. It is a unique catalogue of strains of the dominant and rare species found in the Baltic phytoplankton and microphytobenthos assemblages. The main purpose of the collection is to extend the knowledge on the Baltic microbial communities by providing...
-
The Verification of the Usefulness of Electronic Nose Based on Ultra-Fast Gas Chromatography and Four Different Chemometric Methods for Rapid Analysis of Spirit Beverages
PublicationSpirit beverages are a diverse group of foodstuffs. They are very often counterfeited which cause the appearance of low quality products or wrongly labelled products on the market. It is important to find a proper quality control and botanical origin method enabling the same time preliminary check of the composition of investigated samples, which was the main goal of this work. For this purpose, the usefulness of electronic nose...
-
Investigating Noise Interference on Speech Towards Applying the Lombard Effect Automatically
PublicationThe aim of this study is two-fold. First, we perform a series of experiments to examine the interference of different noises on speech processing. For that purpose, we concentrate on the Lombard effect, an involuntary tendency to raise speech level in the presence of background noise. Then, we apply this knowledge to detecting speech with the Lombard effect. This is for preparing a dataset for training a machine learning-based...
-
Application of Multivariate Adaptive Regression Splines (MARSplines) Methodology for Screening of Dicarboxylic Acids Cocrystal Using 1D and 2D Molecular Descriptors
PublicationDicarboxylic acids (DiAs) are probably one of the most popular cocrystals formers. Due to the high hydrophilicity and non-toxicity, they are promising solubilizes of active pharmaceutical ingredients (APIs). Although DiAs appear to be highly capable of forming multicomponent crystals with various compounds, some systems reported in the literature are physical mixtures the solid state without forming stable intermolecular complex....
-
Two Stage SVM and kNN Text Documents Classifier
PublicationThe paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
-
Learning sperm cells part segmentation with class-specific data augmentation
PublicationInfertility affects around 15% of couples worldwide. Male fertility problems include poor sperm quality and low sperm count. The advanced fertility treatment methods like ICSI are nowadays supported by vision systems to assist embryologists in selecting good quality sperm. Computer-Assisted Semen Analysis (CASA) provides quantitative and qualitative sperm analysis concerning concentration, motility, morphology, vitality, and fragmentation....
-
Local variability in snow concentrations of chlorinated persistent organic pollutants as a source of large uncertainty in interpreting spatial patterns at all scales
PublicationSingle point sampling, a widespread practice in snow studies in remote areas, due to logistical constraints, can present an unquantified error to the final study results. The low concentrations of studied chemicals, such as chlorinated persistent organic pollutants, contribute to the uncertainty. We conducted a field experiment in the Arctic to estimate the error stemming from differences in the composition of snow at short distances...
-
Exploring the Usability and User Experience of Social Media Apps through a Text Mining Approach
PublicationThis study aims to evaluate the applicability of a text mining approach for extracting UUX-related issues from a dataset of user comments and not to evaluate the Instagram (IG) app. This study analyses textual data mined from reviews in English written by IG mobile application users. The article’s authors used text mining (based on the LDA algorithm) to identify the main UUX-related topics. Next, they mapped the identified topics...
-
Photoplethysmographic Time-Domain Heart Rate Measurement Algorithm for Resource-Constrained Wearable Devices and its Implementation
PublicationThis paper presents an algorithm for the measurement of the human heart rate, using photoplethysmography (PPG), i.e., the detection of the light at the skin surface. The signal from the PPG sensor is processed in time-domain; the peaks in the preprocessed and conditioned PPG waveform are detected by using a peak detection algorithm to find the heart rate in real time. Apart from the PPG sensor, the accelerometer is also used to...
-
Exploring Relationships Between Data in Enterprise Information Systems by Analysis of Log Contents
PublicationEnterprise systems are inherently complex and maintaining their full, up-to-date overview poses a serious challenge to the enterprise architects’ teams. This problem encourages the search for automated means of discovering knowledge about such systems. An important aspect of this knowledge is understanding the data that are processed by applications and their relationships. In our previous work, we used application logs of an enterprise...
-
Process of Medical Dataset Construction for Machine Learning-Multifield Study and Guidelines
PublicationThe acquisition of high-quality data and annotations is essential for the training of efficient machine learning algorithms, while being an expensive and time-consuming process. Although the process of data processing and training and testing of machine learning models is well studied and considered in the literature, the actual procedures of obtaining data and their annotations in collaboration with physicians are in most cases...
-
Musical Instrument Identification Using Deep Learning Approach
PublicationThe work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata...
-
Evaluating the risk of endometriosis based on patients’ self-assessment questionnaires
PublicationBackground Endometriosis is a condition that significantly affects the quality of life of about 10 % of reproductive-aged women. It is characterized by the presence of tissue similar to the uterine lining (endometrium) outside the uterus, which can lead lead scarring, adhesions, pain, and fertility issues. While numerous factors associated with endometriosis are documented, a wide range of symptoms may still be undiscovered. Methods In...
-
Ensembling noisy segmentation masks of blurred sperm images
PublicationBackground: Sperm tail morphology and motility have been demonstrated to be important factors in determining sperm quality for in vitro fertilization. However, many existing computer-aided sperm analysis systems leave the sperm tail out of the analysis, as detecting a few tail pixels is challenging. Moreover, some publicly available datasets for classifying morphological defects contain images limited only to the sperm head. This...
-
Applying the Lombard Effect to Speech-in-Noise Communication
PublicationThis study explored how the Lombard effect, a natural or artificial increase in speech loudness in noisy environments, can improve speech-in-noise communication. This study consisted of several experiments that measured the impact of different types of noise on synthesizing the Lombard effect. The main steps were as follows: first, a dataset of speech samples with and without the Lombard effect was collected in a controlled setting;...
-
Selection of an artificial pre-training neural network for the classification of inland vessels based on their images
PublicationArtificial neural networks (ANN) are the most commonly used algorithms for image classification problems. An image classifier takes an image or video as input and classifies it into one of the possible categories that it was trained to identify. They are applied in various areas such as security, defense, healthcare, biology, forensics, communication, etc. There is no need to create one’s own ANN because there are several pre-trained...
-
Cost-Efficient Multi-Objective Design of Miniaturized Microwave Circuits Using Machine Learning and Artificial Neural Network
PublicationDesigning microwave components involves managing multiple objectives such as center frequencies, impedance matching, and size reduction for miniaturized structures. Traditional multi-objective optimization (MO) approaches heavily rely on computationally expensive population-based methods, especially when exe-cuted with full-wave electromagnetic (EM) analysis to guarantee reliability. This paper introduces a novel and cost-effective...
-
Fusion-based Representation Learning Model for Multimode User-generated Social Network Content
PublicationAs mobile networks and APPs are developed, user-generated content (UGC), which includes multi-source heterogeneous data like user reviews, tags, scores, images, and videos, has become an essential basis for improving the quality of personalized services. Due to the multi-source heterogeneous nature of the data, big data fusion offers both promise and drawbacks. With the rise of mobile networks and applications, UGC, which includes...
-
Rediscovering Automatic Detection of Stuttering and Its Subclasses through Machine Learning—The Impact of Changing Deep Model Architecture and Amount of Data in the Training Set
PublicationThis work deals with automatically detecting stuttering and its subclasses. An effective classification of stuttering along with its subclasses could find wide application in determining the severity of stuttering by speech therapists, preliminary patient diagnosis, and enabling communication with the previously mentioned voice assistants. The first part of this work provides an overview of examples of classical and deep learning...
-
Detecting type of hearing loss with different AI classification methods: a performance review
PublicationHearing is one of the most crucial senses for all humans. It allows people to hear and connect with the environment, the people they can meet and the knowledge they need to live their lives to the fullest. Hearing loss can have a detrimental impact on a person's quality of life in a variety of ways, ranging from fewer educational and job opportunities due to impaired communication to social withdrawal in severe situations. Early...
-
Reliable computationally-efficient behavioral modeling of microwave passives using deep learning surrogates in confined domains
PublicationThe importance of surrogate modeling techniques has been steadily growing over the recent years in high-frequency electronics, including microwave engineering. Fast metamodels are employed to speedup design processes, especially those conducted at the level of full-wave electromagnetic (EM) simulations. The surrogates enable massive system evaluations at nearly EM accuracy and negligible costs, which is invaluable in parameter...
-
Tool Wear Monitoring Using Improved Dragonfly Optimization Algorithm and Deep Belief Network
PublicationIn recent decades, tool wear monitoring has played a crucial role in the improvement of industrial production quality and efficiency. In the machining process, it is important to predict both tool cost and life, and to reduce the equipment downtime. The conventional methods need enormous quantities of human resources and expert skills to achieve precise tool wear information. To automatically identify the tool wear types, deep...
-
Identification of High-Value Dataset determinants: is there a silver bullet for efficient sustainability-oriented data-driven development?
PublicationOpen Government Data (OGD) are seen as one of the trends that has the potential to benefit the economy, improve the quality, efficiency, and transparency of public administration, and change the lives of citizens, and the society as a whole facilitating efficient sustainability-oriented data-driven services. However, the quick achievement of these benefits is closely related to the “value” of the OGD, i.e., how useful, and reusable...
-
Long-Term Measurement of Physiological Parameters – Child Dataset
PublicationThe dataset titled “Long-term measurement of physiological parameters – child is one dataset” of the bigger series named Long-term measurement of physiological parameters. The dataset contains physiological parameter measurements such as skin temperature and resistance, blood pulse, as well as the stress detection marker, which can have a value of 0 when there is no stress detected or 1 when stress appeared. Additionally, the dataset...
-
Model-Based Adaptive Machine Learning Approach in Concrete Mix Design
PublicationConcrete mix design is one of the most critical issues in concrete technology. This process aims to create a concrete mix which helps deliver concrete with desired features and quality. Contemporary requirements for concrete concern not only its structural properties, but also increasingly its production process and environmental friendliness, forcing concrete producers to use both chemically and technologically complex concrete...
-
Reducing Monitoring Costs in Industrially Contaminated Rivers: Cluster and Regression Analysis Approach
PublicationMonitoring contamination in river water is an expensive procedure, particularly for developing countries where pollution is a significant problem. This study was conducted to provide a pollution monitoring strategy that reduces the cost of laboratory analysis. The new monitoring strategy was designed as a result of cluster and regression analysis on field data collected from an industrially influenced river. Pollution sources in...
-
AC Motor Voltage and Audible Noise Dataset
PublicationThe dataset titled AC motor voltage and audible noise waveforms in ship’s electrical drive systems with frequency converters contains the voltage and sound measurement results recorded in a marine frequency controlled AC drive system. The dataset is part of research focussing on the impact of the ship’s electrical drive systems with frequency converters on vibrations and the level of audible noise on ships. The dataset allows the...
-
G2DC-PL+: a gridded 2 km daily climate dataset for the union of the Polish territory and the Vistula and Odra basins
PublicationG2DC-PL+, a gridded 2 km daily climate dataset for the union of the Polish territory and the Vistula and Odra basins, is an update and extension of the CHASE-PL Forcing Data – Gridded Daily Precipitation and Temperature Dataset – 5 km (CPLFD-GDPT5). The latter was the first publicly available, high-resolution climate forcing dataset in Poland, used for a range of purposes including hydrological modelling and bias correction of...
-
Evaluating the Use of Edge Device Towards Fall Detection in Smart City Environment
PublicationThis paper presents the development and preliminary testing of a fall detection algorithm that leverages OpenPose for real-time human pose estimation from video feeds. The system is designed to function optimally within a range of up to 7 meters from ground-level cameras, focusing exclusively on detected human silhouettes to enhance processing efficiency. The performance of the proposed approach was evaluated using accuracy values...
-
AITP - AI Thermal Pedestrians Dataset
PublicationEfficient pedestrian detection is a very important task in ensuring safety within road conditions, especially after sunset. One way to achieve this goal is to use thermal imaging in conjunction with deep learning methods and an annotated dataset for models training. In this work, such a dataset has been created by capturing thermal images of pedestrians in different weather and traffic conditions. All images were manually annotated...
-
The Optimum Dataset method – examples of the application
PublicationData reduction is a procedure to decrease the dataset in order to make their analysis more effective and easier. Reduction of the dataset is an issue that requires proper planning, so after reduction it meets all the user’s expectations. Evidently, it is better if the result is an optimal solution in terms of adopted criteria. Within reduction methods, which provide the optimal solution there is the Optimum Dataset method (OptD)...
-
Petrophysical analyses of rock construction materials from the Roman rural settlement in Podšilo bay on Rab island (NE Adriatic, Croatia)
PublicationThis article presents the results of petrophysical analyses of limestones and sandstones used for the construction of the wall structures of a Roman rural settlement located in Podšilo Bay on Rab Island (Croatia). An on-site analysis of the walls indicated the use of different lithotypes, which is an uncommon case in the area. So far, no petrophysical properties of the applied materials have been tested, and their provenance...
-
Impedance Spectra of RC Model as a Result of Testing Pulse Excitation Measurement Method Dataset
PublicationThe dataset titled Impedance spectra of RC model as a result of testing pulse excitation measurement method contains the impedance spectrum of an exemplary test RC model obtained using pulse excitation. The dataset allows presentation of the accuracy of the impedance spectroscopy measuring instrument, which uses the pulse excitation method to shorten the time of the whole spectrum acquisition.
-
Video of LEGO Bricks on Conveyor Belt Dataset Series
PublicationThe dataset series titled Video of LEGO bricks on conveyor belt is composed of 14 datasets containing video recordings of a moving white conveyor belt. The recordings were created using a smartphone camera in Full HD resolution. The dataset allows for the preparation of data for neural network training, and building of a LEGO sorting machine that can help builders to organise their collections.
-
DevEmo—Software Developers’ Facial Expression Dataset
PublicationThe COVID-19 pandemic has increased the relevance of remote activities and digital tools for education, work, and other aspects of daily life. This reality has highlighted the need for emotion recognition technology to better understand the emotions of computer users and provide support in remote environments. Emotion recognition can play a critical role in improving the remote experience and ensuring that individuals are able...
-
Application Of Generative Adversarial Network for Data Augmentation and Multiplication to Automated Cell Segmentation of the Corneal Endothelium
PublicationConsidering the automatic segmentation of the endothelial layer, the available data of the corneal endothelium is still limited to a few datasets, typically containing an average of only about 30 images. To fill this gap, this paper introduces the use of Generative Adversarial Networks (GANs) to augment and multiply data. By using the ``Alizarine'' dataset, we train a model to generate a new synthetic dataset with over 513k images....
-
High-Resolution Wind Wave Parameters in the Area of the Gulf of Gdańsk During 21 Extreme Storms
PublicationThis dataset contains the results of wind-wave parameter modelling in the area of the Gulf of Gdańsk (Southern Baltic). For the simulations, a high resolution SWAN model was used. The dataset consists of the significant wave height, the direction of the wave approaching the shore and the wave period during 21 historical, extreme storms. The storms were selected by an automatic search over the 44-year-long significant wave height...
-
Measurement of the Temporal and Spatial Temperature Distribution on the Surface of PVCP Tissue Phantom Illuminated by Laser Dataset
PublicationThe dataset entitled Measurement of the temporal and spatial temperature distribution on the surface of PVCP tissue phantom illuminated by laser was obtained with a laboratory set-up for characterisation of the thermal properties of optical tissue phantoms during laser irradiation. The dataset contains a single image file representing the spatial temperature distribution on the surface of a PVCP tissue phantom. This thermal image...