Julian Szymański - Publications - Bridge of Knowledge

An IoT-Based Computational Framework for Healthcare Monitoring in Mobile Environments

Publication

H. Mora
D. Gil
R. Munoz Terol
J. Azorin-Lopez
J. Szymański

- SENSORS - Year 2017

The new Internet of Things paradigm allows for small devices with sensing, processing and communication capabilities to be designed, which enable the development of sensors, embedded devices and other ‘things’ ready to understand the environment. In this paper, a distributed framework based on the internet of things paradigm is proposed for monitoring human biomedical signals in activities involving physical exertion. The main...

Full text available to download

Blockchain technologies to address smart city and society challenges

Publication

J. Szymański
H. Mora
J. Mendoza-Tello
E. Varela-guzmán

- COMPUTERS IN HUMAN BEHAVIOR - Year 2021

New Information and Communications Technologies (ICT) are changing the way in which the world works. These technologies provide new tools to face the issues of contemporary society (poverty, migrations, sustainable development challenges, governance, etc.). Among them, blockchain emerge as a disruptive technology able to make things in a completely different and innovative way. They can provide solutions where before there were...

Full text to download in external service

Detection of the Bee Queen Presence Using Sound Analysis

Publication

T. Cejrowski
J. Szymański
H. Mora
D. Gil

- Year 2018

This work describes the system and methods of data analysis we use for beehive monitoring. We present overview of the hardware infrastructures used in hive monitoring systems and we describe algorithms used for analysis of this kind of data. Based on acquisited signals we construct the application that is capable to detect an absence of honey bee queen. We describe our method of signal analysis and present results that allow us...

Full text available to download

Review of the Complexity of Managing Big Data of the Internet of Things

Publication

D. Gil
M. Johnsson
H. Mora
J. Szymański

- COMPLEXITY - Year 2019

Tere is a growing awareness that the complexity of managing Big Data is one of the main challenges in the developing feld of the Internet of Tings (IoT). Complexity arises from several aspects of the Big Data life cycle, such as gathering data, storing them onto cloud servers, cleaning and integrating the data, a process involving the last advances in ontologies, such as Extensible Markup Language (XML) and Resource Description...

Full text available to download

Comparative Analysis of Text Representation Methods Using Classification

Publication

J. Szymański

- CYBERNETICS AND SYSTEMS - Year 2014

In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Full text to download in external service

MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems

Publication

- SIMULATION MODELLING PRACTICE AND THEORY - Year 2017

In this paper we present a new environment called MERPSYS that allows simulation of parallel application execution time on cluster-based systems. The environment offers a modeling application using the Java language extended with methods representing message passing type communication routines. It also offers a graphical interface for building a system model that incorporates various hardware components such as CPUs, GPUs, interconnects...

Full text available to download

Modelling the malware propagation in mobile computer devices

Publication

M. Signes-Pont
A. Cortés-Castillo
H. Mora-Mora
J. Szymański

- COMPUTERS & SECURITY - Year 2018

Nowadays malware is a major threat to the security of cyber activities. The rapid develop- ment of the Internet and the progressive implementation of the Internet of Things (IoT) increase the security needs of networks. This research presents a theoretical model of malware propagation for mobile computer devices. It is based on the susceptible-exposed- infected-recovered-susceptible (SEIRS) epidemic model. The scheme is based on...

Full text to download in external service

Distributed Architectures for Intensive Urban Computing: A Case Study on Smart Lighting for Sustainable Cities

Publication

H. Mora
J. Peral
A. Ferrandez
D. Gil
J. Szymański

- IEEE Access - Year 2019

New information and communication technologies have contributed to the development of the smart city concept. On a physical level, this paradigm is characterised by deploying a substantial number of different devices that can sense their surroundings and generate a large amount of data. The most typical case is image and video acquisition sensors. Recently, these types of sensors are found in abundance in urban spaces and are responsible...

Full text available to download

RDF dataset profiling - a survey of features, methods, vocabularies and applications

Publication

M. B. Ellefi
B. Zohra
J. G. Breslin
E. Demidova
S. Dietze
K. Todorov
J. Szymański

- Semantic Web - Year 2018

The Web of Data, and in particular Linked Data, has seen tremendous growth over the past years. However, reuse and take-up of these rich data sources is often limited and focused on a few well-known and established RDF datasets. This can be partially attributed to the lack of reliable and up-to-date information about the characteristics of available datasets. While RDF datasets vary heavily with respect to the features related...

Deep learning in the fog

Publication

A. Sobecki
J. Szymański
D. Gil
H. Mora

- International Journal of Distributed Sensor Networks - Year 2019

In the era of a ubiquitous Internet of Things and fast artificial intelligence advance, especially thanks to deep learning networks and hardware acceleration, we face rapid growth of highly decentralized and intelligent solutions that offer functionality of data processing closer to the end user. Internet of Things usually produces a huge amount of data that to be effectively analyzed, especially with neural networks, demands high...

Full text available to download

Buzz-based recognition of the honeybee colony circadian rhythm

Publication

- COMPUTERS AND ELECTRONICS IN AGRICULTURE - Year 2020

Honeybees are one of the highly valued pollinators. Their work as individuals is appreciated for crops pollination and honey production. It is believed that work of an entire bee colony is intense and almost continuous. The goal of the work presented in this paper is identification of bees circadian rhythm with a use of sound-based analysis. In our research as a source of information on bee colony we use their buzz that have been...

Full text available to download

External Validation Measures for Nested Clustering of Text Documents

Publication

- Year 2011

Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...

Mining relations between wikipedia categories

Publication

J. Szymański

- Communications in Computer and Information Science - Year 2010

Opisano metody indukcji powiązań pomiędzy kategoriami organizującymi zbiór dokumentów. Przedstawiono wyniki zastosowania proponowanego podejścia dla poprawy systemu kategorii Wikipedii.

Words context analysis for improvement of information retrieval

Publication

J. Szymański

- Year 2012

In the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...

Thresholding Strategies for Large Scale Multi-Label Text Classifier

Publication

- Year 2013

This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classiﬁcation tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classiﬁer on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

Full text to download in external service

Self–Organizing Map representation for clustering Wikipedia search results

Publication

J. Szymański

- Year 2011

The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...

Full text to download in external service

Practical I-Voting on Stellar Blockchain

Publication

- Applied Sciences-Basel - Year 2020

In this paper, we propose a privacy-preserving i-voting system based on the public Stellar Blockchain network. We argue that the proposed system satisfies all requirements stated for a robust i-voting system including transparency, verifiability, and voter anonymity. The practical architecture of the system abstracts a voter from blockchain technology used underneath. To keep user privacy, we propose a privacy-first protocol that...

Full text available to download

Simulation of parallel similarity measure computations for large data sets

Publication

- Year 2015

The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...

Full text to download in external service

Semantic memory knowledge acquisition through active dialogues

Publication

J. Szymański
W. Duch

- Year 2007

Szereg zagadnień językowych nie może zostać rozwiązany bez pamięci semantycznej zawierającej opisy cech obiektów, Automatyczne tworzenie tego rodzaju pamięci jest wielkim wyzwaniem nawet dla prostych obszarów dziedzinowych. Implementacja pamięci semantycznej opartej na reprezentacji wiedzy poprzez powiązania obiektu z jego cechami pokazuje ciekawe zastosowania które nie zostały dotychczas zademonstrowane poprzez bardziej wyszukane...

Full text to download in external service

Big Data Paradigm Developed in Volunteer Grid System with Genetic Programming Scheduler

Publication

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2014

Artificial intelligence techniques are capable to handle a large amount of information collected over the web. In this paper, big data paradigm has been studied in volunteer and grid system called Comcute that is optimized by a genetic programming scheduler. This scheduler can optimize load balancing and resource cost. Genetic programming optimizer has been applied for finding the Pareto solu-tions. Finally, some results from numerical...

Full text to download in external service

DEPTH IMAGES FILTERING IN DISTRIBUTED STREAMING

Publication

- Polish Maritime Research - Year 2016

In this paper we discuss the comparison of point cloud filters focusing on their applicability for streaming optimization. For the filtering stage within a stream pipeline processing we evaluate three filters: Voxel Grid, Pass Through and Statistical Outlier Removal. For the filters we perform series of the tests aiming at evaluation of changes of point cloud size and transmitting frequency (various fps ratio). We propose a distributed...

Full text available to download

Depth Images Filtering In Distributed Streaming

Publication

- Polish Maritime Research - Year 2016

In this paper, we propose a distributed system for point cloud processing and transferring them via computer network regarding to effectiveness-related requirements. We discuss the comparison of point cloud filters focusing on their usage for streaming optimization. For the filtering step of the stream pipeline processing we evaluate four filters: Voxel Grid, Radial Outliner Remover, Statistical Outlier Removal and Pass Through....

Full text available to download

Buzz-based honeybee colony fingerprint

Publication

- COMPUTERS AND ELECTRONICS IN AGRICULTURE - Year 2021

Non-intrusive remote monitoring has its applications in a variety of areas. For industrial surveillance case, devices are capable of detecting anomalies that may threaten machine operation. Similarly, agricultural monitoring devices are used to supervise livestock or provide higher yields. Modern IoT devices are often coupled with Machine Learning models, which provide valuable insights into device operation. However, the data...

Full text available to download

Review on Wikification methods

Publication

J. Szymański
M. Naruszewicz

- AI COMMUNICATIONS - Year 2019

The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

Full text to download in external service

Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives

Publication

T. Souza
E. Demidova
T. Risse
H. Holzmann
G. Gossen
J. Szymański

- Year 2015

Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provided through their URLs, which are...

Full text to download in external service

Towards bees detection on images: study of different color models for neural networks

Publication

- Year 2019

This paper presents an approach to bee detection in videostreams using a neural network classifier. We describe the motivationfor our research and the methodology of data acquisition. The maincontribution to this work is a comparison of different color models usedas an input format for a feedforward convolutional architecture appliedto bee detection. The detection process has is based on a neural...

Bees Detection on Images: Study of Different Color Models for Neural Networks

Publication

- Year 2019

This paper presents an approach to bee detection in video streams using a neural network classifier. We describe the motivation for our research and the methodology of data acquisition. The main contribution to this work is a comparison of different color models used as an input format for a feedforward convolutional architecture applied to bee detection. The detection process has is based on a neural binary classifier that classifies...

Full text available to download

Framework for Integration Decentralized and Untrusted Multi-vendor IoMT Environments

Publication

A. Sobecki
J. Szymański
D. Gil
H. Mora

- IEEE Access - Year 2020

Lack of standardization is highly visible while we use historical data sets or compare our model with others that use IoMT devices from different vendors. The problem also concerns the trust in highly decentralized and anonymous environments where sensitive data are transferred through the Internet and then are analyzed by third-party companies. In our research we propose a standard that has been implemented in the form of framework...

Full text available to download

Two Stage SVM and kNN Text Documents Classifier

Publication

- Year 2015

The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

Identification of category associations using a multilabel classifier

Publication

- EXPERT SYSTEMS WITH APPLICATIONS - Year 2016

Description of the data using categories allows one to describe it on a higher abstraction level. In this way, we can operate on aggregated groups of the information, allowing one to see relationships that do not appear explicit when we analyze the individual objects separately. In this paper we present automatic identification of the associations between categories used for organization of the textual data. As experimental data...

Full text to download in external service

Categorization of Wikipedia articles with spectral clustering

Publication

J. Szymański

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011

Abstract. The article reports application of clustering algorithms for creating hierarchical groups withinWikipedia articles.We evaluate three spectral clustering algorithms based on datasets constructed with usage ofWikipedia categories. Selected algorithm has been implemented in the system that categorize Wikipedia search results in the fly.

An Analysis of Neural Word Representations for Wikipedia Articles Classification

Publication

J. Szymański
N. Kawalec

- CYBERNETICS AND SYSTEMS - Year 2019

One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

Full text to download in external service

Improving Effectiveness of SVM Classifier for Large Scale Data

Publication

- Year 2015

The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments...

Full text to download in external service

Concept description vectors and the 20 question game

Publication

- Year 2005

Knowledge of properties that are applicable to a given object is a necessary prerequisite to formulate intelligent question. Concept description vectors provide simplest representation of this knowledge, storing for each object information about the values of its properties. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured...

Full text to download in external service

Annotating Words Using WordNet Semantic Glosses

Publication

J. Szymański
W. Duch

- Year 2012

An approach to the word sense disambiguation (WSD) relaying onthe WordNet synsets is proposed. The method uses semantically tagged glosses to perform a process similar to the spreading activation in semantic network, creating ranking of the most probable meanings for word annotation. Preliminary evaluation shows quite promising results. Comparison with the state-of-theart WSD methods indicates that the use of WordNet relations...

Categorization of Cloud Workload Types with Clustering

Publication

- Year 2017

The paper presents a new classification schema of IaaS cloud workloads types, based on the functional characteristics. We show the results of an experiment of automatic categorization performed with different benchmarks that represent particular workload types. Monitoring of resource utilization allowed us to construct workload models that can be processed with machine learning algorithms. The direct connection between the functional...

Full text to download in external service

Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia

Publication

- INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING - Year 2019

The paper presents an approach to build references (also called mappings) between WordNet and Wikipedia. We propose four algorithms used for automatic construction of the references. Then, based on an aggregation algorithm, we produce an initial set of mappings that has been evaluated in a cooperative way. For that purpose, we implement a system for the distribution of evaluation tasks, that have been solved by the user community....

Full text available to download

Improvement of Imperfect String Matching Based on Asymetric n-Grams

Publication

- Year 2013

Typical approaches to string comparing treats them as either different or identical without taking into account the possibility of misspelling of the word. In this article we present an approach we used for improvement of imperfect string matching that allows one to reconstruct potential string distortions. The proposed method increases the quality of imperfect string matching, allowing the lookup of misspelled words without significant...

Full text available to download

Interactive Information Search in Text Data Collections

Publication

- Year 2013

This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

Full text to download in external service

Improving css-KNN Classification Performance by Shifts in Training Data

Publication

- Year 2015

This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...

Advances in Architectures, Big Data, and Machine Learning Techniques for Complex Internet of Things Systems

Publication

D. Gil
M. Johnsson
H. Mora
J. Szymański

- COMPLEXITY - Year 2019

Te feld of Big Data is rapidly developing with a lot of ongoing research, which will likely continue to expand in the future. A crucial part of this is Knowledge Discovery from Data (KDD), also known as the Knowledge Discovery Process (KDP). Tis process is a very complex procedure, and for that reason it is essential to divide it into several steps (Figure 1). Some authors use fve steps to describe this procedure, whereas others...

Full text available to download

Selection of Relevant Features for Text Classification with K-NN

Publication

- Year 2013

In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...

Full text to download in external service

Representation of hypertext documents based on terms, Links and text compressibility

Publication

J. Szymański
W. Duch

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2010

Opisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.

Analysis of Denoising Autoencoder Properties Through Misspelling Correction Task

Publication

- Year 2017

The paper analyzes some properties of denoising autoencoders using the problem of misspellings correction as an exemplary task. We evaluate the capacity of the network in its classical feed-forward form. We also propose a modiﬁcation to the output layer of the net, which we called multi-softmax. Experiments show that the model trained with this output layer outperforms traditional network both in learning time and accuracy. We...

Full text available to download

Privacy-Preserving, Scalable Blockchain-Based Solution for Monitoring Industrial Infrastructure in the Near Real-Time

Publication

- Applied Sciences-Basel - Year 2022

This paper proposes an improved monitoring and measuring system dedicated to industrial infrastructure. Our model achieves security of data by incorporating cryptographical methods and near real-time access by the use of virtual tree structure over records. The currently available blockchain networks are not very well adapted to tasks related to the continuous monitoring of the parameters of industrial installations. In the database...

Full text available to download

Selecting Features with SVM

Publication

- Year 2013

A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount...

Full text to download in external service

Spectral Clustering Wikipedia Keyword-Based search Results

Publication

- FRONTIERS IN ROBOTICS AND AI - Year 2017

The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

Full text available to download

Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

Publication

- Applied Sciences-Basel - Year 2021

To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

Full text available to download

Bringing Common Sense to WordNet with a Word Game

Publication

- Year 2013

We present a tool for common sense knowledge acquisition in form of a twenty questions game. The described approach uses WordNet dictionary, which rich taxonomy allows to keep cognitive economy and accelerate knowledge propagation, although sometimes inferences made on hierarchical relations result in noise. We extend the dictionary with common sense assertions acquired during the games played with humans. The facts added to the...

Full text to download in external service

0-step K-means for clustering Wikipedia search results

Publication

J. Szymański

- Year 2011

This article describes an improvement for K-means algorithm and its application in the form of a system that clusters search results retrieved from Wikipedia. The proposed algorithm eliminates K-means isadvantages and allows one to create a cluster hierarchy. The main contributions of this paper include the ollowing: (1) The concept of an improved K-means algorithm and its application for hierarchical clustering....

DBpedia and YAGO Based System for Answering Questions in Natural Language

Publication

- Year 2018

In this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The...

Full text available to download

Weighted Clustering for Bees Detection on Video Images

Publication

- Year 2020

This work describes a bee detection system to monitor bee colony conditions. The detection process on video images has been divided into 3 stages: determining the regions of interest (ROI) for a given frame, scanning the frame in ROI areas using the DNN-CNN classifier, in order to obtain a confidence of bee occurrence in each window in any position and any scale, and form one detection window from a cloud of windows provided by...

Full text available to download

Network-assisted processing of advanced IoT applications: challenges and proof-of-concept application

Publication

H. Mora
F. A. Pujol
T. Ramírez
A. Jimeno-Morenilla
J. Szymański

- Cluster Computing-The Journal of Networks Software Tools and Applications - Year 2024

Recent advances in the area of the Internet of Things shows that devices are usually resource-constrained. To enable advanced applications on these devices, it is necessary to enhance their performance by leveraging external computing resources available in the network. This work presents a study of computational platforms to increase the performance of these devices based on the Mobile Cloud Computing (MCC) paradigm. The main...

Full text available to download

How to Sort Them? A Network for LEGO Bricks Classification

Publication

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2022

LEGO bricks are highly popular due to the ability to build almost any type of creation. This is possible thanks to availability of multiple shapes and colors of the bricks. For the smooth build process the bricks need to properly sorted and arranged. In our work we aim at creating an automated LEGO bricks sorter. With over 3700 different LEGO parts bricks classification has to be done with deep neural networks. The question arises...

Full text available to download

Semantic Memory for Avatars in Cyberspace

Publication

- Year 2005

Avatars that show intelligent behavior should have an access to general knowledge about the world, knowledge that humans store in their semantic memories. The simplest knowledge representation for semantic memory is based on the Concept Description Vectors (CDVs) that store, for each concept, an information whether a given property can be applied to this concept or not. Unfortunately large-scale semantic memories are not available....

Evaluation of Path Based Methods for Conceptual Representation of the Text

Publication

- Year 2014

Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Full text to download in external service

Path-based methods on categorical structures for conceptual representation of wikipedia articles

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017

Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Full text available to download

Collaborative Data Acquisition and Learning Support

Publication

- International Journal of Computer Information Systems and Industrial Management Applications - Year 2020

With the constant development of neural networks, traditional algorithms relying on data structures lose their significance as more and more solutions are using AI rather than traditional algorithms. This in turn requires a lot of correctly annotated and informative data samples. In this paper, we propose a crowdsourcing based approach for data acquisition and tagging with support for Active Learning where the system acts as an...

Full text available to download

Detection of anomalies in bee colony using transitioning state and contrastive autoencoders

Publication

- COMPUTERS AND ELECTRONICS IN AGRICULTURE - Year 2022

Honeybees plays vital role for the environmental sustainability and overall agricultural economy. Assisting bee colonies within their proper functioning brings the attention of researchers around the world. Electronics systems and machine learning algorithms are being developed for classifying specific undesirable bee behaviors in order to alert about upcoming substantial losses. However, classifiers could be impaired when used...

Full text available to download

Active Learning Based on Crowdsourced Data

Publication

- Applied Sciences-Basel - Year 2022

The paper proposes a crowdsourcing-based approach for annotated data acquisition and means to support Active Learning training approach. In the proposed solution, aimed at data engineers, the knowledge of the crowd serves as an oracle that is able to judge whether the given sample is informative or not. The proposed solution reduces the amount of work needed to annotate large sets of data. Furthermore, it allows a perpetual increase...

Full text available to download

Exact-match Based Wikipedia-WordNet Integration

Publication

- Year 2019

Ability to link between WordNet synsets and Wikipedia articles allows usage of those resources by computers during natural language processing. A lot of work was done in this field, however most of the approaches focus on similarity between Wikipedia articles and WordNet synsets rather than creation of perfect matches. In this paper we proposed a set of methods for automatic perfect matching generation. The proposed methods were...

Full text available to download

Web search results clusterization with background knowledge

Publication

J. Szymański

- Year 2009

Clusterization of web pages is an attractive wayfor presenting web resources. Arranging pages into groups ofsimilar topics simplifies and shorten the search process. Thispaper concerns the problem of clustering web pages and presentsour approach to this issue. Our solution is focused on findingsimilarities between documents delivered by different web searchengines. This process was accomplished by applying WordNetdictionary.

Induction of the common-sense hierarchies in lexical data

Publication

J. Szymański
W. Duch

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011

Unsupervised organization of a set of lexical concepts that captures common-sense knowledge inducting meaningful partitioning of data is described. Projection of data on principal components allow for dentification of clusters with wide margins, and the procedure is recursively repeated within each cluster. Application of this idea to a simple dataset describing animals created hierarchical partitioning with each clusters related...

Towards Increasing Density of Relations in Category Graphs

Publication

- Year 2014

In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...

Full text to download in external service

NLP Questions Answering Using DBpedia and YAGO

Publication

- Vietnam Journal of Computer Science - Year 2020

In this paper, we present results of employing DBpedia and YAGO as lexical databases for answering questions formulated in the natural language. The proposed solution has been evaluated for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference). Our method uses dependency trees generated from the user query. The trees are browsed for paths leading from the root of the tree to the question...

Full text available to download

Application of a stochastic compartmental model to approach the spread of environmental events with climatic bias

Publication

J. Boters Pitarch
M. Signes-Pont
J. Szymański
H. Mora-Mora

- Ecological Informatics - Year 2023

Wildfires have significant impacts on both environment and economy, so understanding their behaviour is crucial for the planning and allocation of firefighting resources. Since forest fire management is of great concern, there has been an increasing demand for computationally efficient and accurate prediction models. In order to address this challenge, this work proposes applying a parameterised stochastic model to study the propagation...

Full text available to download

Fast Approximate String Search for Wikification

Publication

- Year 2021

The paper presents a novel method for fast approximate string search based on neural distance metrics embeddings. Our research is focused primarily on applying the proposed method for entity retrieval in the Wikification process, which is similar to edit distance-based similarity search on the typical dictionary. The proposed method has been compared with symmetric delete spelling correction algorithm and proven to be more efficient...

Full text available to download

Towards Extending Wikipedia with Bidirectional Links

Publication

- Year 2020

In this paper, we present the results of our WikiLinks project which aims at extending current Wikipedia linkage mechanisms. Wikipedia has become recently one of the most important information sources on the Internet, which still is based on relatively simple linkage facilities. A WikiLinks system extends the Wikipedia with bidirectional links between fragments of articles. However, there were several attempts to introduce bidirectional...

Full text available to download

Cooperative editing approach for building Wordnet database

Publication

J. Szymański
K. Dusza
Ł. Byczkowski

- Year 2007

Artykuł przedstawia podejście do kooperacyjnej pracy nad baza danych systemu Wordnet. Opisana została architektura systemu oraz wizualizacja sieci powiązań konceptualnych z użyciem komponentu touchgraph.

Semantic memory architecture for knowledge acquisition and management

Publication

J. Szymański
W. Duch

- Year 2007

Rozumienie informacji zawartej w tekście przez komputer wymaga wiedzy stojacej za systemem informatycznym. Wiedza ta nie jest implicite zapisanej w analizowanym tekscie. Zapisana może być ona w postaci ontologii badanej dziedziny. Zasadniczym zagadnieniem jest konstrukcja takiej ontologii. Artykuł przedstawia podeście oparte na grze 20 pytań do budowy przestrzeni semantycznej dla wybranej dziedziny.

Full text to download in external service

Wontougo - kooperacyjny edytor Wordnetu

Publication

J. Szymański
B. Kamiński
O. Tomczak

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2007

Artkuł zawiera opis systemu pozwalającego na kooperacyjną edycją słownika opartego na wordnecie[1]. w ramach projektu dokonano przeniesienia słownika z wersji zorganizowanej na plikach do relacyjnej bazy danych. wykonano również interfejs użytkownika w postaci aplikacji opartej na bibliotece touchgraph[2]. w niniejszym artykule przedstawiono sposób odzwierciedlenia struktury plików wordnetu na bazę danych oraz możliwości, jakie...

WordNet -bazodanowy system jako słownik języka angielskiego

Publication

J. Szymański

- Year 2006

WordNet[1] to alternatywne podejście do organizacji danychsłownikowych, w stosunku do klasycznej listy słów wraz z ich defnicjami. Koncepcja słownika opiera się na utworzeniu sieci koncepcji (sensów) powiązanych ze sobą relacjami określonego typu. Opisane zostały podstawowe założenia dotyczące budowy systemu WordNet oraz sposób organizacji danych językowych w postaci sieci semantycznej.

Wordventure - cooperative wordnet editor. Architecture for lexical semantic aquisition

Publication

J. Szymański

- Year 2009

This article presents architecture for acquiring lexical semanticsin a collaborative approach paradigm. The system enablesfunctionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation.It has been used for semantic network presentation,and brings simultaneously modification functionality.

Wikipedia and WordNet integration based on words co-occurrences

Publication

J. Kilanowski
J. Szymański

- Year 2009

The article presents a method for automatic integration of two lexical resources: semantic dictionary WordNet and electronic encyclopaedia Wikipedia. Our goal is to add automatically an semantic tags - a WordNet synset identifier to the title of the Wikipedia article. We've analyze several different ap-proaches to these problem and implement our own solution, based on word occurrences in synsets descriptions and the article body....

Rozumienie pojęć języka naturalnego w procesie kognitywnym

Publication

J. Szymański

- Year 2009

Text categorization with semantic commonsense knowledge: First results

Publication

P. Majewski
J. Szymański

- Year 2008

Do przetwarzania tekstów typowo wykorzystuje się reprezentacjeBOW. Podejście takie nie daje jednak dobrych rezultatów w sytuacjigdy podobne dokumenty nie współdzielą ze sobą słów.W artykule zaprezentowano podejście do konstrukcji funkcjijądra dla klasyfikatorów SVM opartego na zewnętrznej bazie wiedzyo pojęciach językowych.

Ujednoznacznienie słów przy uzyciu słownika WORDNET

Publication

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2008

Artykuł prezentuje problem odnajdywania sensu wyrazów (dezambiguacja) w zdaniu na podstawie ich kontekstu. Zaproponowany algorytm ujednoznaczniania wyrazów przeanalizowany został pod kątem złożoności, oraz zastosowania. Zaprezentowana w artykule platforma udostępnia użytkownikowi możliwość graficznego przeglądania procesu dezambiguacji zachodzącego między zadanymi w zdaniu słowami, a znaczeniami ze słownika WordNet. W końcowym...

Knowledge representation and acquisition for large-scale semantic memory

Publication

J. Szymański
W. Duch

- Year 2008

Pozyskiwanie i reprezentacja pojęć jest koniecznym warunkiem doimplementacji rozumienia w systemach kognitywnych.Gry słowne są dają interesujące możliwości pozyskiwaniawiedzy do komputerowego modelu pamięci semantycznej. W artykuleprzedstawiono podstawy architektury pamięci semantycznej orazwyniki działającego na niej algorytmu wyszukiwania kontekstowego,który użyty został do realizacji gry w 20 pytań.

Full text to download in external service

PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES

Publication

W. Duch
J. Szymański

- Year 2008

Odnajdywanie informacji w internecie lub w dużych bazach tekstowychwymaga wiedzy o słowach indeksujących dokumentu.Jednnym z podejść poprawiających jakość i szybkość wyszukiwaniajest zastosowanie klasteryzacji i wizualizacji danych. W artykuleprzedstawione zostało podejście do wyszukiwania informacji winternecie oparte o baze wiedzy o języku. Implementacja takiegokontenera wiedzy zrealizowana została w oparciu o kognitywne teorieorganizacji...

Portal ontologii: Portal do kooperacyjnej pracy nad ontologiami dziedzinowymi

Publication

J. Szymański

- Year 2008

Przedstawiono metodę reprezentacji wiedzy użytą do składowania ontologii w relacyjnej bazie danych. Opracowany na jej podstawie system umozliwia kooperacyjną pracę nad ontologiami dziedzinowymi w środowisku rozproszonym. Uzyte struktury danych pozwalają na zamianę reprezentacji wiedzy w zalżności od potrzeb przetwarzania danych oraz śledzenie dynamiki procesu uzgadniania wspólnej warstwy konceptualnej między specjalistami. Zawarto...

Wyszukiwanie artykułów medycznych w MEDLINE z wykorzystaniem UMLS

Publication

J. Szymański

- Year 2009

Parallel Computations of Text Similarities for Categorization Task

Publication

J. Szymański

- Year 2013

In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....

DBpedia As a Formal Knowledge Base – An Evaluation

Publication

- WSEAS Transactions on Information Science and Applications - Year 2015

DBpedia is widely used by researchers as a mean of accessing Wikipedia in a standardized way. In this paper it is characterized from the point of view of questions answering system. Simple implementation of such system is also presented. The paper also characterizes alternatives to DBpedia in form of OpenCyc and YAGO knowledge bases. A comparison between DBpedia and those knowledge bases is presented.

Full text available to download

Wikipedia Articles Representation with Matrix'u

Publication

J. Szymański

- Year 2013

In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

Full text to download in external service

Retrieval with Semantic Sieve

Publication

- Year 2013

The article presents an algorithm we called Semantic Sieve applied for refining search results in text documents repository. The algorithm calculates socalled conceptual directions that enables interaction with the user and allows to narrow the set of results to the most relevant ones. We present the system where the algorithm has been implemented. The system also offers in the presentation layer clustering of the results into...

Full text to download in external service

IDENTYFIKACJA POWIĄZAŃ POMIĘDZY KATEGORIAMI WIKIPEDII Z UŻYCIEM MIAR PODOBIEŃSTWA ARTYKUŁÓW

Publication

- Studia Informatica Pomerania - Year 2013

W artykule opisano podejście do identyfikacji powiązań między kategoriami w repozytorium danych tekstowych, bazując na Wikipedii. Przeprowadzając analizę podobieństwa między artykułami określono miary pozwalające zidentyfikować powiązania między kategoriami, które nie były wcześniej uwzględnione i nadawać im wagi określające stopień istotności. Przeprowadzono automatyczną ocenę uzyskanych rezultatów w odniesieniu do już istniejącej...

Full text to download in external service

Wordventure - Developing WordNet in Wikipedia-like Style

Publication

J. Szymański

- Year 2010

The article describes an approach for building WordNet semantic dictionary in a collaborative way. The idea of gathering lexical data has been proposed, as well as the system for linguistic data acquisition and management.

Zastosowanie systemu Comcute do łamania algorytmu DES

Publication

J. Szymański
A. Polak

- Year 2012

Zaprezentowano zastosowanie systemu Comcute do łamania szyfru DES. Przedstawiono podstawową architekturę wykorzystaną do dystrybucji obliczeń oraz zaprezentowano wyniki skalowalności rozwiązania w funkcji użytych jednostek obliczeniowych.

Full text to download in external service

Rozpraszanie obliczeń za pomocą serwerów dystrybucyjnych

Publication

- Year 2012

Omówiono zasady funkcjonowania serwerów dystrybucyjnych w systemie obliczeniowym klasy grid pracującym w trybie volunteer computing. Omówiono sposoby zwiększania wydajności tej warstwy systemu za pomocą zarządzania strumieniem paczek danych. Odniesiono się także do koncepcji Map-Reduce w implementacji przetwarzania równoległego.

Full text to download in external service

Text classifiers for automatic articles categorization

Publication

- Year 2012

The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

Matching Exception Class Hierarchies between .NET, Java Environments

Publication

- Year 2012

The paper presents a methodology of exception classification and matching exception messages between .NET andJava environments. The methodology operates on existing exception class hierarchies and proposes two complementingapproaches: automated and manual matching. The automated matching uses the similarity measure to find associationsbetween exception messages from the two sets of classes for the considered programming languages....

Security ontology construction and integration

Publication

- Year 2011

There are many different levels on which we can examine security. Each one is different from others, all of them are dependent on the context. Hence the need to bear additional knowledge enabling efficient utilization of the knowledge by the computers. Such information can be provided by ontologies. The paper presentsgathered requirements needed to be taken into account when creating an ontology. The method of ontology creation...

Information retrieval with semantic memory model

Publication

J. Szymański

- Cognitive Systems Research - Year 2011

Psycholinguistic theories of semantic memory form the basis of understanding of natural language concepts. These theories are used here as an inspiration for implementing a computational model of semantic memory in the form of semantic network. Combining this network with a vector-based object-relation-feature value representation of concepts that includes also weights for confidence and support, allows for recognition of concepts...

Full text to download in external service

Self-Organizing Map representation for clustering Wikipedia search results

Publication

J. Szymański

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011

The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...

Management of Textual Data at Conceptual Level

Publication

J. Szymański

- Year 2011

The article presents the approach to the management of a large repository of documents at conceptual level. We describe our approach to representing Wikipedia articles using their categories. The representation has been used to construct groups of similar articles. Proposed approach has been implemented in prototype system that allows to organize articles that are search results for a given query. Constructed clusters allow to...

Gra słowna do pozyskiwania wiedzy językowej

Publication

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2011

W artykule opisano implementację gry słownej w pytania, będącej modelem wyszukiwarki kontekstowej oraz narzędziem do pozyskiwania wiedzy o pojęciach języka naturalnego. Zdefiniowano określenie wyszukiwania kontekstowego oraz przedstawiono opis algorytmu znajdującego obiekty na podstawie ich cech. Scharakteryzowano przyjętą reprezentację wiedzy oraz sposób uczenia się w kontekście innych znanych projektów poruszających problem akwizycji...

Wizualizacja struktury Wikipedii do wspomagania wyszukiwania informacji

Publication

J. Szymański
W. Duch

- Year 2011

Graficzna prezentacja jest efektywnym sposobem poprawiania interakcji użytkownika z repozytorium wiedzy. Pozwala ona na przejrzyste przedstawienie złożonych struktur i uchwycenie zależności, które nie są widoczne bezpośrednio. Zastosowanie takiego podejścia w wyszukiwaniu informacji pozwala na prezentację danych na wysokim poziomie abstrakcji przy jednoczesnym określeniu ich kontekstu, co ma bezpośrednie przełożenie na jakość dostępu...

Towards automatic classification of Wikipedia content

Publication

J. Szymański

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2010

Artykuł opisuje podejście do automatycznej klasyfikacji artykułów w Wikipedii. Przeanalizowane zostały reprezentacje tekstu bazujące na treści dokumentu i wzajemnych powiązaniach. Przedstawiono rezultaty zastosowania klasyfikatora SVM.

Automatyczna klasyfikacja artykułów Wikipedii

Publication

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010

Wikipedia- internetowa encyklopedia do organizacji artykułów wykorzystuje system kategorii. W chwili obecnej proces przypisywania artykułu do odpowiednich kategorii tematycznych realizowany jest ręcznie przez jej edytorów. Zadanie to jest czasochłonne i wymaga wiedzy o strukturze Wikiedii. Ręczna kategoryzacja jest również podatna na błędy wynikające z faktu, że przyporządkowanie artykułu don kategorii odbywa się w oparciu o arbitralną...

Zespołowa budowa ontologii z wykorzystaniem systemu OCS oraz edytora Protégé

Publication

T. M. Boiński
A. Jaworska
R. Kleczkowski
P. Kunowski
J. Szymański

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010

Konstruowanie ontologii wymaga współpracy wielu osób. W idealnej sytuacji nad pojedynczą ontologią pracować będzie duża, rozproszona społeczność tworząca tym samym wspólną reprezentację wiedzy z danej dziedziny. W publikacji zaprezentowano propozycję modelu pracy grupowej nad ontologią. Zdefiniowano model zarządzania wersjami ontologii. Przedstawiono system Ontology Creation System (OCS) oraz architekturę i implementację rozszerzenia...

Full text available to download

Search

dr hab. inż. Julian Szymański

Employment

Keywords Help

Publications

Filters

Category

Year

Options

Catalog Publications