Julian Szymański - Publications - Bridge of Knowledge

Bringing Common Sense to WordNet with a Word Game

Publication

- Year 2013

We present a tool for common sense knowledge acquisition in form of a twenty questions game. The described approach uses WordNet dictionary, which rich taxonomy allows to keep cognitive economy and accelerate knowledge propagation, although sometimes inferences made on hierarchical relations result in noise. We extend the dictionary with common sense assertions acquired during the games played with humans. The facts added to the...

Full text to download in external service

DBpedia and YAGO Based System for Answering Questions in Natural Language

Publication

- Year 2018

In this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The...

Full text available to download

Weighted Clustering for Bees Detection on Video Images

Publication

- Year 2020

This work describes a bee detection system to monitor bee colony conditions. The detection process on video images has been divided into 3 stages: determining the regions of interest (ROI) for a given frame, scanning the frame in ROI areas using the DNN-CNN classifier, in order to obtain a confidence of bee occurrence in each window in any position and any scale, and form one detection window from a cloud of windows provided by...

Full text available to download

0-step K-means for clustering Wikipedia search results

Publication

J. Szymański

- Year 2011

This article describes an improvement for K-means algorithm and its application in the form of a system that clusters search results retrieved from Wikipedia. The proposed algorithm eliminates K-means isadvantages and allows one to create a cluster hierarchy. The main contributions of this paper include the ollowing: (1) The concept of an improved K-means algorithm and its application for hierarchical clustering....

Detection of anomalies in bee colony using transitioning state and contrastive autoencoders

Publication

- COMPUTERS AND ELECTRONICS IN AGRICULTURE - Year 2022

Honeybees plays vital role for the environmental sustainability and overall agricultural economy. Assisting bee colonies within their proper functioning brings the attention of researchers around the world. Electronics systems and machine learning algorithms are being developed for classifying specific undesirable bee behaviors in order to alert about upcoming substantial losses. However, classifiers could be impaired when used...

Full text available to download

Active Learning Based on Crowdsourced Data

Publication

- Applied Sciences-Basel - Year 2022

The paper proposes a crowdsourcing-based approach for annotated data acquisition and means to support Active Learning training approach. In the proposed solution, aimed at data engineers, the knowledge of the crowd serves as an oracle that is able to judge whether the given sample is informative or not. The proposed solution reduces the amount of work needed to annotate large sets of data. Furthermore, it allows a perpetual increase...

Full text available to download

Exact-match Based Wikipedia-WordNet Integration

Publication

- Year 2019

Ability to link between WordNet synsets and Wikipedia articles allows usage of those resources by computers during natural language processing. A lot of work was done in this field, however most of the approaches focus on similarity between Wikipedia articles and WordNet synsets rather than creation of perfect matches. In this paper we proposed a set of methods for automatic perfect matching generation. The proposed methods were...

Full text available to download

Evaluation of Path Based Methods for Conceptual Representation of the Text

Publication

- Year 2014

Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Full text to download in external service

Path-based methods on categorical structures for conceptual representation of wikipedia articles

Publication

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017

Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Full text available to download

Collaborative Data Acquisition and Learning Support

Publication

- International Journal of Computer Information Systems and Industrial Management Applications - Year 2020

With the constant development of neural networks, traditional algorithms relying on data structures lose their significance as more and more solutions are using AI rather than traditional algorithms. This in turn requires a lot of correctly annotated and informative data samples. In this paper, we propose a crowdsourcing based approach for data acquisition and tagging with support for Active Learning where the system acts as an...

Full text available to download

Semantic Memory for Avatars in Cyberspace

Publication

- Year 2005

Avatars that show intelligent behavior should have an access to general knowledge about the world, knowledge that humans store in their semantic memories. The simplest knowledge representation for semantic memory is based on the Concept Description Vectors (CDVs) that store, for each concept, an information whether a given property can be applied to this concept or not. Unfortunately large-scale semantic memories are not available....

Fast Approximate String Search for Wikification

Publication

- Year 2021

The paper presents a novel method for fast approximate string search based on neural distance metrics embeddings. Our research is focused primarily on applying the proposed method for entity retrieval in the Wikification process, which is similar to edit distance-based similarity search on the typical dictionary. The proposed method has been compared with symmetric delete spelling correction algorithm and proven to be more efficient...

Full text available to download

Towards Extending Wikipedia with Bidirectional Links

Publication

- Year 2020

In this paper, we present the results of our WikiLinks project which aims at extending current Wikipedia linkage mechanisms. Wikipedia has become recently one of the most important information sources on the Internet, which still is based on relatively simple linkage facilities. A WikiLinks system extends the Wikipedia with bidirectional links between fragments of articles. However, there were several attempts to introduce bidirectional...

Full text available to download

Web search results clusterization with background knowledge

Publication

J. Szymański

- Year 2009

Clusterization of web pages is an attractive wayfor presenting web resources. Arranging pages into groups ofsimilar topics simplifies and shorten the search process. Thispaper concerns the problem of clustering web pages and presentsour approach to this issue. Our solution is focused on findingsimilarities between documents delivered by different web searchengines. This process was accomplished by applying WordNetdictionary.

Application of a stochastic compartmental model to approach the spread of environmental events with climatic bias

Publication

J. Boters Pitarch
M. Signes-Pont
J. Szymański
H. Mora-Mora

- Ecological Informatics - Year 2023

Wildfires have significant impacts on both environment and economy, so understanding their behaviour is crucial for the planning and allocation of firefighting resources. Since forest fire management is of great concern, there has been an increasing demand for computationally efficient and accurate prediction models. In order to address this challenge, this work proposes applying a parameterised stochastic model to study the propagation...

Full text available to download

Towards Increasing Density of Relations in Category Graphs

Publication

- Year 2014

In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...

Full text to download in external service

NLP Questions Answering Using DBpedia and YAGO

Publication

- Vietnam Journal of Computer Science - Year 2020

In this paper, we present results of employing DBpedia and YAGO as lexical databases for answering questions formulated in the natural language. The proposed solution has been evaluated for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference). Our method uses dependency trees generated from the user query. The trees are browsed for paths leading from the root of the tree to the question...

Full text available to download

Induction of the common-sense hierarchies in lexical data

Publication

J. Szymański
W. Duch

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011

Unsupervised organization of a set of lexical concepts that captures common-sense knowledge inducting meaningful partitioning of data is described. Projection of data on principal components allow for dentification of clusters with wide margins, and the procedure is recursively repeated within each cluster. Application of this idea to a simple dataset describing animals created hierarchical partitioning with each clusters related...

Generowanie tekstu z użyciem sieci typu Transformer

Publication

- Year 2021

Opisano działanie wybranych modeli uczenia maszynowego znajdujących zastosowanie w przetwarzaniu języka naturalnego w szczególności wy- korzystywanych do generowania tekstu. Przedstawiono również model BERT i jego różne wersje, a także praktyczne wykorzystanie modeli typu Transformer. Przedstawiono ich działanie w aplikacji zmieniającej nastrój tekstu w sposób sekwencyjny.

Full text to download in external service

Embedded Representations of Wikipedia Categories

Publication

- Year 2021

In this paper, we present an approach to building neural representations of the Wikipedia category graph. We test four different methods and examine the neural embeddings in terms of preservation of graphs edges, neighborhood coverage in representation space, and their influence on the results of a task predicting parent of two categories. The main contribution of this paper is application of neural representations for improving the...

Full text to download in external service

Bidirectional Fragment to Fragment Links in Wikipedia

Publication

- Year 2020

The paper presents a WikiLinks system that extends the Wikipedia linkage model with bidirectional links between fragments of the articles and overlapping links’ anchors. The proposed model adopts some ideas from the research conducted in a field of nonlinear, computer-aided writing, often called a hypertext. WikiLinks may be considered as a web augmentation tool but it presents a new approach to the problem that addresses the specific...

Full text available to download

Cooperative editing approach for building Wordnet database

Publication

J. Szymański
K. Dusza
Ł. Byczkowski

- Year 2007

Artykuł przedstawia podejście do kooperacyjnej pracy nad baza danych systemu Wordnet. Opisana została architektura systemu oraz wizualizacja sieci powiązań konceptualnych z użyciem komponentu touchgraph.

Semantic memory architecture for knowledge acquisition and management

Publication

J. Szymański
W. Duch

- Year 2007

Rozumienie informacji zawartej w tekście przez komputer wymaga wiedzy stojacej za systemem informatycznym. Wiedza ta nie jest implicite zapisanej w analizowanym tekscie. Zapisana może być ona w postaci ontologii badanej dziedziny. Zasadniczym zagadnieniem jest konstrukcja takiej ontologii. Artykuł przedstawia podeście oparte na grze 20 pytań do budowy przestrzeni semantycznej dla wybranej dziedziny.

Full text to download in external service

Wontougo - kooperacyjny edytor Wordnetu

Publication

J. Szymański
B. Kamiński
O. Tomczak

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2007

Artkuł zawiera opis systemu pozwalającego na kooperacyjną edycją słownika opartego na wordnecie[1]. w ramach projektu dokonano przeniesienia słownika z wersji zorganizowanej na plikach do relacyjnej bazy danych. wykonano również interfejs użytkownika w postaci aplikacji opartej na bibliotece touchgraph[2]. w niniejszym artykule przedstawiono sposób odzwierciedlenia struktury plików wordnetu na bazę danych oraz możliwości, jakie...

Text categorization with semantic commonsense knowledge: First results

Publication

P. Majewski
J. Szymański

- Year 2008

Do przetwarzania tekstów typowo wykorzystuje się reprezentacjeBOW. Podejście takie nie daje jednak dobrych rezultatów w sytuacjigdy podobne dokumenty nie współdzielą ze sobą słów.W artykule zaprezentowano podejście do konstrukcji funkcjijądra dla klasyfikatorów SVM opartego na zewnętrznej bazie wiedzyo pojęciach językowych.

Ujednoznacznienie słów przy uzyciu słownika WORDNET

Publication

- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2008

Artykuł prezentuje problem odnajdywania sensu wyrazów (dezambiguacja) w zdaniu na podstawie ich kontekstu. Zaproponowany algorytm ujednoznaczniania wyrazów przeanalizowany został pod kątem złożoności, oraz zastosowania. Zaprezentowana w artykule platforma udostępnia użytkownikowi możliwość graficznego przeglądania procesu dezambiguacji zachodzącego między zadanymi w zdaniu słowami, a znaczeniami ze słownika WordNet. W końcowym...

Knowledge representation and acquisition for large-scale semantic memory

Publication

J. Szymański
W. Duch

- Year 2008

Pozyskiwanie i reprezentacja pojęć jest koniecznym warunkiem doimplementacji rozumienia w systemach kognitywnych.Gry słowne są dają interesujące możliwości pozyskiwaniawiedzy do komputerowego modelu pamięci semantycznej. W artykuleprzedstawiono podstawy architektury pamięci semantycznej orazwyniki działającego na niej algorytmu wyszukiwania kontekstowego,który użyty został do realizacji gry w 20 pytań.

Full text to download in external service

PROCEEDING OF THE SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION AND MANAGEMENT SCIENCES

Publication

W. Duch
J. Szymański

- Year 2008

Odnajdywanie informacji w internecie lub w dużych bazach tekstowychwymaga wiedzy o słowach indeksujących dokumentu.Jednnym z podejść poprawiających jakość i szybkość wyszukiwaniajest zastosowanie klasteryzacji i wizualizacji danych. W artykuleprzedstawione zostało podejście do wyszukiwania informacji winternecie oparte o baze wiedzy o języku. Implementacja takiegokontenera wiedzy zrealizowana została w oparciu o kognitywne teorieorganizacji...

WordNet -bazodanowy system jako słownik języka angielskiego

Publication

J. Szymański

- Year 2006

WordNet[1] to alternatywne podejście do organizacji danychsłownikowych, w stosunku do klasycznej listy słów wraz z ich defnicjami. Koncepcja słownika opiera się na utworzeniu sieci koncepcji (sensów) powiązanych ze sobą relacjami określonego typu. Opisane zostały podstawowe założenia dotyczące budowy systemu WordNet oraz sposób organizacji danych językowych w postaci sieci semantycznej.

Portal ontologii: Portal do kooperacyjnej pracy nad ontologiami dziedzinowymi

Publication

J. Szymański

- Year 2008

Przedstawiono metodę reprezentacji wiedzy użytą do składowania ontologii w relacyjnej bazie danych. Opracowany na jej podstawie system umozliwia kooperacyjną pracę nad ontologiami dziedzinowymi w środowisku rozproszonym. Uzyte struktury danych pozwalają na zamianę reprezentacji wiedzy w zalżności od potrzeb przetwarzania danych oraz śledzenie dynamiki procesu uzgadniania wspólnej warstwy konceptualnej między specjalistami. Zawarto...

Wordventure - cooperative wordnet editor. Architecture for lexical semantic aquisition

Publication

J. Szymański

- Year 2009

This article presents architecture for acquiring lexical semanticsin a collaborative approach paradigm. The system enablesfunctionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation.It has been used for semantic network presentation,and brings simultaneously modification functionality.

Wikipedia and WordNet integration based on words co-occurrences

Publication

J. Kilanowski
J. Szymański

- Year 2009

The article presents a method for automatic integration of two lexical resources: semantic dictionary WordNet and electronic encyclopaedia Wikipedia. Our goal is to add automatically an semantic tags - a WordNet synset identifier to the title of the Wikipedia article. We've analyze several different ap-proaches to these problem and implement our own solution, based on word occurrences in synsets descriptions and the article body....

Rozumienie pojęć języka naturalnego w procesie kognitywnym

Publication

J. Szymański

- Year 2009

Wyszukiwanie artykułów medycznych w MEDLINE z wykorzystaniem UMLS

Publication

J. Szymański

- Year 2009

A Formal Approach to Model the Expansion of Natural Events: The Case of Infectious Diseases

Publication

M. Teresa Signes-Pont
J. Boters Pitarch
J. Szymański
H. Mora-Mora

- Parallel Processing Letters - Year 2023

A formal approach to modeling the expansion of natural events is presented in this paper. Since the mathematical, statistical or computational methods used are not relevant for development, a modular framework is carried out that guides from the external observation down to the innermost level of the variables that have to appear in the future mathematical-computational formalization. As an example we analyze the expansion of Covid-19....

Full text to download in external service

An intelligent cellular automaton scheme for modelling forest fires

Publication

J. Boters Pitarch
M. Signes-Pont
J. Szymański
H. Mora-Mora

- Ecological Informatics - Year 2024

Forest fires have devastating consequences for the environment, the economy and human lives. Understanding their dynamics is therefore crucial for planning the resources allocated to combat them effectively. In a world where the incidence of such phenomena is increasing every year, the demand for efficient and accurate computational models is becoming increasingly necessary. In this study, we perform a revision of an initial proposal...

Full text available to download

LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags

Publication

S. Olewniczak
J. Szymański
P. Malak
R. Komar
A. Letowska

- Year 2024

The paper presents the approach to using tags from Stack Overflow questions as a data source in the process of building domain-specific unsupervised term embeddings. Using a huge dataset of Stack Overflow posts, our solution employs the LSA algorithm to learn latent representations of information technology terms. The paper also presents the Teamy.ai system, currently developed by Scalac company, which serves as a platform that...

Full text available to download

From Scores to Predictions in Multi-Label Classification: Neural Thresholding Strategies

Publication

- Applied Sciences-Basel - Year 2023

In this paper, we propose a novel approach for obtaining predictions from per-class scores to improve the accuracy of multi-label classification systems. In a multi-label classification task, the expected output is a set of predicted labels per each testing sample. Typically, these predictions are calculated by implicit or explicit thresholding of per-class real-valued scores: classes with scores exceeding a given threshold value...

Full text available to download

Previous Opinions is All You Need - Legal Information Retrieval System

Publication

M. Osowski
K. Lorenc
P. Drozda
R. Scherer
K. Szałapak
K. Komar-Komarowski
J. Szymański
A. Sobecki

- Year 2023

We present a system for retrieving the most relevant legal opinions to a given legal case or question. To this end, we checked several state-of-the-art neural language models. As a training and testing data, we use tens of thousands of legal cases as question-opinion pairs. Text data has been subjected to advanced pre-processing adapted to the specifics of the legal domain. We empirically chose the BERT-based HerBERT model to perform...

Full text to download in external service

Optimization of Bread Production Using Neuro-Fuzzy Modelling

Publication

- Year 2023

Automation of food production is an actively researched domain. One of the areas, where automation is still not progressing significantly is bread making. The process still relies on expert knowledge regarding how to react to procedure changes depending on environmental conditions, quality of the ingredients, etc. In this paper, we propose an ANFIS-based model for changing the mixer speed during the kneading process. Although the...

Full text to download in external service

Towards semantic-rich word embeddings

Publication

G. Beringer
M. Jabłoński
P. Januszewski
A. Sobecki
J. Szymański

- Annals of Computer Science and Information Systems - Year 2019

In recent years, word embeddings have been shown to improve the performance in NLP tasks such as syntactic parsing or sentiment analysis. While useful, they are problematic in representing ambiguous words with multiple meanings, since they keep a single representation for each word in the vocabulary. Constructing separate embeddings for meanings of ambiguous words could be useful for solving the Word Sense Disambiguation (WSD)...

Full text available to download

Wikipedia Articles Representation with Matrix'u

Publication

J. Szymański

- Year 2013

In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

Full text to download in external service

Retrieval with Semantic Sieve

Publication

- Year 2013

The article presents an algorithm we called Semantic Sieve applied for refining search results in text documents repository. The algorithm calculates socalled conceptual directions that enables interaction with the user and allows to narrow the set of results to the most relevant ones. We present the system where the algorithm has been implemented. The system also offers in the presentation layer clustering of the results into...

Full text to download in external service

IDENTYFIKACJA POWIĄZAŃ POMIĘDZY KATEGORIAMI WIKIPEDII Z UŻYCIEM MIAR PODOBIEŃSTWA ARTYKUŁÓW

Publication

- Studia Informatica Pomerania - Year 2013

W artykule opisano podejście do identyfikacji powiązań między kategoriami w repozytorium danych tekstowych, bazując na Wikipedii. Przeprowadzając analizę podobieństwa między artykułami określono miary pozwalające zidentyfikować powiązania między kategoriami, które nie były wcześniej uwzględnione i nadawać im wagi określające stopień istotności. Przeprowadzono automatyczną ocenę uzyskanych rezultatów w odniesieniu do już istniejącej...

Full text to download in external service

Knowledge Base Suitable for Answering Questions in Natural Language

Publication

- Year 2014

This paper presents three knowledge bases widely used by researchers coping with natural language processing: OpenCyc, DBpedia and YAGO. They are characterized from the point of view of questions answering system. In this paper a short description of the aforementioned system implementation is also presented.

Full text to download in external service

How Specific Can We Be with k-NN Classifier?

Publication

- Year 2014

This paper discusses the possibility of designing a two stage classifier for large-scale hierarchical and multilabel text classification task, that will be a compromise between two common approaches to this task. First of it is called big-bang, where there is only one classifier that aims to do all the job at once. Top-down approach is the second popular option, in which at each node of categories’ hierarchy, there is a flat classifier...

Full text to download in external service

Automatic Classification of Polish Sign Language Words

Publication

- Przegląd Elektrotechniczny - Year 2014

In the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.

Full text available to download

Text Categorization Improvement via User Interaction

Publication

J. Atroszko
J. Szymański
D. Gil
H. Mora

- Year 2018

In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Full text to download in external service

KEYSTONE WG2: Activities and Results Overview on Keyword Search

Publication

J. Szymański
E. Demidova

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2018

In this chapter we summarize activities and results achieved by the Keyword Search Working Group (WG2) of the KEYSTONE Cost Action IC1302. We present the goals of the WG2, its main activities in course of the action and provide a summary of the selected publications related to the WG2 goals and co-authored by WG2 members. We concludewith a summary of open research directions in the area of keyword search for structured data.

Full text to download in external service

Towards automatic classification of Wikipedia content

Publication

J. Szymański

- LECTURE NOTES IN COMPUTER SCIENCE - Year 2010

Artykuł opisuje podejście do automatycznej klasyfikacji artykułów w Wikipedii. Przeanalizowane zostały reprezentacje tekstu bazujące na treści dokumentu i wzajemnych powiązaniach. Przedstawiono rezultaty zastosowania klasyfikatora SVM.

Search

dr hab. inż. Julian Szymański

Employment

Keywords Help

Publications

Filters

Category

Year

Options

Catalog Publications