Julian Szymański - Publikacje - MOST Wiedzy

Wyszukiwarka

Filtry

wszystkich: 132

  • Kategoria
  • Rok
  • Opcje

wyczyść Filtry wybranego katalogu niedostępne

Katalog Publikacji

Rok 2024
Rok 2023
  • Previous Opinions is All You Need - Legal Information Retrieval System
    Publikacja

    - Rok 2023

    We present a system for retrieving the most relevant legal opinions to a given legal case or question. To this end, we checked several state-of-the-art neural language models. As a training and testing data, we use tens of thousands of legal cases as question-opinion pairs. Text data has been subjected to advanced pre-processing adapted to the specifics of the legal domain. We empirically chose the BERT-based HerBERT model to perform...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Optimization of Bread Production Using Neuro-Fuzzy Modelling

    Automation of food production is an actively researched domain. One of the areas, where automation is still not progressing significantly is bread making. The process still relies on expert knowledge regarding how to react to procedure changes depending on environmental conditions, quality of the ingredients, etc. In this paper, we propose an ANFIS-based model for changing the mixer speed during the kneading process. Although the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • From Scores to Predictions in Multi-Label Classification: Neural Thresholding Strategies

    In this paper, we propose a novel approach for obtaining predictions from per-class scores to improve the accuracy of multi-label classification systems. In a multi-label classification task, the expected output is a set of predicted labels per each testing sample. Typically, these predictions are calculated by implicit or explicit thresholding of per-class real-valued scores: classes with scores exceeding a given threshold value...

    Pełny tekst do pobrania w portalu

  • Application of a stochastic compartmental model to approach the spread of environmental events with climatic bias
    Publikacja

    - Ecological Informatics - Rok 2023

    Wildfires have significant impacts on both environment and economy, so understanding their behaviour is crucial for the planning and allocation of firefighting resources. Since forest fire management is of great concern, there has been an increasing demand for computationally efficient and accurate prediction models. In order to address this challenge, this work proposes applying a parameterised stochastic model to study the propagation...

    Pełny tekst do pobrania w portalu

  • A Formal Approach to Model the Expansion of Natural Events: The Case of Infectious Diseases
    Publikacja
    • M. Teresa Signes-Pont
    • J. Boters Pitarch
    • J. Szymański
    • H. Mora-Mora

    - Parallel Processing Letters - Rok 2023

    A formal approach to modeling the expansion of natural events is presented in this paper. Since the mathematical, statistical or computational methods used are not relevant for development, a modular framework is carried out that guides from the external observation down to the innermost level of the variables that have to appear in the future mathematical-computational formalization. As an example we analyze the expansion of Covid-19....

    Pełny tekst do pobrania w serwisie zewnętrznym

Rok 2022
Rok 2021
  • Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

    To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

    Pełny tekst do pobrania w portalu

  • Generowanie tekstu z użyciem sieci typu Transformer
    Publikacja

    Opisano działanie wybranych modeli uczenia maszynowego znajdujących zastosowanie w przetwarzaniu języka naturalnego w szczególności wy- korzystywanych do generowania tekstu. Przedstawiono również model BERT i jego różne wersje, a także praktyczne wykorzystanie modeli typu Transformer. Przedstawiono ich działanie w aplikacji zmieniającej nastrój tekstu w sposób sekwencyjny.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Fast Approximate String Search for Wikification
    Publikacja

    The paper presents a novel method for fast approximate string search based on neural distance metrics embeddings. Our research is focused primarily on applying the proposed method for entity retrieval in the Wikification process, which is similar to edit distance-based similarity search on the typical dictionary. The proposed method has been compared with symmetric delete spelling correction algorithm and proven to be more efficient...

    Pełny tekst do pobrania w portalu

  • Embedded Representations of Wikipedia Categories
    Publikacja

    - Rok 2021

    In this paper, we present an approach to building neural representations of the Wikipedia category graph. We test four different methods and examine the neural embeddings in terms of preservation of graphs edges, neighborhood coverage in representation space, and their influence on the results of a task predicting parent of two categories. The main contribution of this paper is application of neural representations for improving the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Buzz-based honeybee colony fingerprint

    Non-intrusive remote monitoring has its applications in a variety of areas. For industrial surveillance case, devices are capable of detecting anomalies that may threaten machine operation. Similarly, agricultural monitoring devices are used to supervise livestock or provide higher yields. Modern IoT devices are often coupled with Machine Learning models, which provide valuable insights into device operation. However, the data...

    Pełny tekst do pobrania w portalu

  • Blockchain technologies to address smart city and society challenges
    Publikacja

    - COMPUTERS IN HUMAN BEHAVIOR - Rok 2021

    New Information and Communications Technologies (ICT) are changing the way in which the world works. These technologies provide new tools to face the issues of contemporary society (poverty, migrations, sustainable development challenges, governance, etc.). Among them, blockchain emerge as a disruptive technology able to make things in a completely different and innovative way. They can provide solutions where before there were...

    Pełny tekst do pobrania w serwisie zewnętrznym

Rok 2020
  • Weighted Clustering for Bees Detection on Video Images
    Publikacja

    This work describes a bee detection system to monitor bee colony conditions. The detection process on video images has been divided into 3 stages: determining the regions of interest (ROI) for a given frame, scanning the frame in ROI areas using the DNN-CNN classifier, in order to obtain a confidence of bee occurrence in each window in any position and any scale, and form one detection window from a cloud of windows provided by...

    Pełny tekst do pobrania w portalu

  • Towards Extending Wikipedia with Bidirectional Links

    In this paper, we present the results of our WikiLinks project which aims at extending current Wikipedia linkage mechanisms. Wikipedia has become recently one of the most important information sources on the Internet, which still is based on relatively simple linkage facilities. A WikiLinks system extends the Wikipedia with bidirectional links between fragments of articles. However, there were several attempts to introduce bidirectional...

    Pełny tekst do pobrania w portalu

  • Smart Services for Improving eCommerce
    Publikacja

    - Rok 2020

    The level of customer support provided by the existing eCom-merce solutions assumes that the person using the functionality of theshop has sufficient knowledge to decide on the purchase transaction. Alow conversion rate indicates that customers are more likely to seekknowledge about the particular product than finalize the transaction.This is facilitated by the continuous development of customers’ digi-tal...

  • Practical I-Voting on Stellar Blockchain
    Publikacja

    In this paper, we propose a privacy-preserving i-voting system based on the public Stellar Blockchain network. We argue that the proposed system satisfies all requirements stated for a robust i-voting system including transparency, verifiability, and voter anonymity. The practical architecture of the system abstracts a voter from blockchain technology used underneath. To keep user privacy, we propose a privacy-first protocol that...

    Pełny tekst do pobrania w portalu

  • NLP Questions Answering Using DBpedia and YAGO

    In this paper, we present results of employing DBpedia and YAGO as lexical databases for answering questions formulated in the natural language. The proposed solution has been evaluated for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference). Our method uses dependency trees generated from the user query. The trees are browsed for paths leading from the root of the tree to the question...

    Pełny tekst do pobrania w portalu

  • Framework for Integration Decentralized and Untrusted Multi-vendor IoMT Environments
    Publikacja

    - IEEE Access - Rok 2020

    Lack of standardization is highly visible while we use historical data sets or compare our model with others that use IoMT devices from different vendors. The problem also concerns the trust in highly decentralized and anonymous environments where sensitive data are transferred through the Internet and then are analyzed by third-party companies. In our research we propose a standard that has been implemented in the form of framework...

    Pełny tekst do pobrania w portalu

  • Collaborative Data Acquisition and Learning Support

    With the constant development of neural networks, traditional algorithms relying on data structures lose their significance as more and more solutions are using AI rather than traditional algorithms. This in turn requires a lot of correctly annotated and informative data samples. In this paper, we propose a crowdsourcing based approach for data acquisition and tagging with support for Active Learning where the system acts as an...

    Pełny tekst do pobrania w portalu

  • Buzz-based recognition of the honeybee colony circadian rhythm

    Honeybees are one of the highly valued pollinators. Their work as individuals is appreciated for crops pollination and honey production. It is believed that work of an entire bee colony is intense and almost continuous. The goal of the work presented in this paper is identification of bees circadian rhythm with a use of sound-based analysis. In our research as a source of information on bee colony we use their buzz that have been...

    Pełny tekst do pobrania w portalu

  • Bidirectional Fragment to Fragment Links in Wikipedia

    The paper presents a WikiLinks system that extends the Wikipedia linkage model with bidirectional links between fragments of the articles and overlapping links’ anchors. The proposed model adopts some ideas from the research conducted in a field of nonlinear, computer-aided writing, often called a hypertext. WikiLinks may be considered as a web augmentation tool but it presents a new approach to the problem that addresses the specific...

    Pełny tekst do pobrania w portalu

Rok 2019
  • Towards semantic-rich word embeddings
    Publikacja

    - Annals of Computer Science and Information Systems - Rok 2019

    In recent years, word embeddings have been shown to improve the performance in NLP tasks such as syntactic parsing or sentiment analysis. While useful, they are problematic in representing ambiguous words with multiple meanings, since they keep a single representation for each word in the vocabulary. Constructing separate embeddings for meanings of ambiguous words could be useful for solving the Word Sense Disambiguation (WSD)...

    Pełny tekst do pobrania w portalu

  • Towards bees detection on images: study of different color models for neural networks
    Publikacja

    This paper presents an approach to bee detection in videostreams using a neural network classifier. We describe the motivationfor our research and the methodology of data acquisition. The maincontribution to this work is a comparison of different color models usedas an input format for a feedforward convolutional architecture appliedto bee detection. The detection process has is based on a neural...

  • Review on Wikification methods
    Publikacja

    - AI COMMUNICATIONS - Rok 2019

    The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Review of the Complexity of Managing Big Data of the Internet of Things
    Publikacja

    - COMPLEXITY - Rok 2019

    Tere is a growing awareness that the complexity of managing Big Data is one of the main challenges in the developing feld of the Internet of Tings (IoT). Complexity arises from several aspects of the Big Data life cycle, such as gathering data, storing them onto cloud servers, cleaning and integrating the data, a process involving the last advances in ontologies, such as Extensible Markup Language (XML) and Resource Description...

    Pełny tekst do pobrania w portalu

  • Exact-match Based Wikipedia-WordNet Integration

    Ability to link between WordNet synsets and Wikipedia articles allows usage of those resources by computers during natural language processing. A lot of work was done in this field, however most of the approaches focus on similarity between Wikipedia articles and WordNet synsets rather than creation of perfect matches. In this paper we proposed a set of methods for automatic perfect matching generation. The proposed methods were...

    Pełny tekst do pobrania w portalu

  • Distributed Architectures for Intensive Urban Computing: A Case Study on Smart Lighting for Sustainable Cities
    Publikacja

    - IEEE Access - Rok 2019

    New information and communication technologies have contributed to the development of the smart city concept. On a physical level, this paradigm is characterised by deploying a substantial number of different devices that can sense their surroundings and generate a large amount of data. The most typical case is image and video acquisition sensors. Recently, these types of sensors are found in abundance in urban spaces and are responsible...

    Pełny tekst do pobrania w portalu

  • Deep learning in the fog

    In the era of a ubiquitous Internet of Things and fast artificial intelligence advance, especially thanks to deep learning networks and hardware acceleration, we face rapid growth of highly decentralized and intelligent solutions that offer functionality of data processing closer to the end user. Internet of Things usually produces a huge amount of data that to be effectively analyzed, especially with neural networks, demands high...

    Pełny tekst do pobrania w portalu

  • Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia

    The paper presents an approach to build references (also called mappings) between WordNet and Wikipedia. We propose four algorithms used for automatic construction of the references. Then, based on an aggregation algorithm, we produce an initial set of mappings that has been evaluated in a cooperative way. For that purpose, we implement a system for the distribution of evaluation tasks, that have been solved by the user community....

    Pełny tekst do pobrania w portalu

  • Bees Detection on Images: Study of Different Color Models for Neural Networks
    Publikacja

    This paper presents an approach to bee detection in video streams using a neural network classifier. We describe the motivation for our research and the methodology of data acquisition. The main contribution to this work is a comparison of different color models used as an input format for a feedforward convolutional architecture applied to bee detection. The detection process has is based on a neural binary classifier that classifies...

    Pełny tekst do pobrania w portalu

  • An Analysis of Neural Word Representations for Wikipedia Articles Classification
    Publikacja

    - CYBERNETICS AND SYSTEMS - Rok 2019

    One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Advances in Architectures, Big Data, and Machine Learning Techniques for Complex Internet of Things Systems
    Publikacja

    - COMPLEXITY - Rok 2019

    Te feld of Big Data is rapidly developing with a lot of ongoing research, which will likely continue to expand in the future. A crucial part of this is Knowledge Discovery from Data (KDD), also known as the Knowledge Discovery Process (KDP). Tis process is a very complex procedure, and for that reason it is essential to divide it into several steps (Figure 1). Some authors use fve steps to describe this procedure, whereas others...

    Pełny tekst do pobrania w portalu

Rok 2018
  • Text Categorization Improvement via User Interaction
    Publikacja

    - Rok 2018

    In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • RDF dataset profiling - a survey of features, methods, vocabularies and applications
    Publikacja
    • M. B. Ellefi
    • B. Zohra
    • J. G. Breslin
    • E. Demidova
    • S. Dietze
    • K. Todorov
    • J. Szymański

    - Semantic Web - Rok 2018

    The Web of Data, and in particular Linked Data, has seen tremendous growth over the past years. However, reuse and take-up of these rich data sources is often limited and focused on a few well-known and established RDF datasets. This can be partially attributed to the lack of reliable and up-to-date information about the characteristics of available datasets. While RDF datasets vary heavily with respect to the features related...

  • Modelling the malware propagation in mobile computer devices
    Publikacja

    - COMPUTERS & SECURITY - Rok 2018

    Nowadays malware is a major threat to the security of cyber activities. The rapid develop- ment of the Internet and the progressive implementation of the Internet of Things (IoT) increase the security needs of networks. This research presents a theoretical model of malware propagation for mobile computer devices. It is based on the susceptible-exposed- infected-recovered-susceptible (SEIRS) epidemic model. The scheme is based on...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • KEYSTONE WG2: Activities and Results Overview on Keyword Search
    Publikacja

    In this chapter we summarize activities and results achieved by the Keyword Search Working Group (WG2) of the KEYSTONE Cost Action IC1302. We present the goals of the WG2, its main activities in course of the action and provide a summary of the selected publications related to the WG2 goals and co-authored by WG2 members. We concludewith a summary of open research directions in the area of keyword search for structured data.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Detection of the Bee Queen Presence Using Sound Analysis
    Publikacja

    - Rok 2018

    This work describes the system and methods of data analysis we use for beehive monitoring. We present overview of the hardware infrastructures used in hive monitoring systems and we describe algorithms used for analysis of this kind of data. Based on acquisited signals we construct the application that is capable to detect an absence of honey bee queen. We describe our method of signal analysis and present results that allow us...

    Pełny tekst do pobrania w portalu

  • DBpedia and YAGO Based System for Answering Questions in Natural Language

    In this paper we propose a method for answering class 1 and class 2 questions (out of 5 classes defined by Moldovan for TREC conference) based on DBpedia and YAGO. Our method is based on generating dependency trees for the query. In the dependency tree we look for paths leading from the root to the named entity of interest. These paths (referenced further as fibers) are candidates for representation of actual user intention. The...

    Pełny tekst do pobrania w portalu

Rok 2017
Rok 2016
Rok 2015
  • Two Stage SVM and kNN Text Documents Classifier
    Publikacja

    - Rok 2015

    The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

  • Simulation of parallel similarity measure computations for large data sets

    The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives
    Publikacja
    • T. Souza
    • E. Demidova
    • T. Risse
    • H. Holzmann
    • G. Gossen
    • J. Szymański

    - Rok 2015

    Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provided through their URLs, which are...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Retrieval of Heterogeneus Sevices in C2NIWA Repository
    Publikacja

    The paper reviews the methods used for retrieval of information and services. The selected approaches presented in the review inspired us to build retrieval mechanisms in a system for searching the resources stored in the C2NIWA repository. We describe the architecture of the system, its functions and the surrounding subsystems to which it is related. For retrieval of C2NIWA sevices we propos three approaches based on: keyword...

    Pełny tekst do pobrania w portalu

  • Information Retrieval in Wikipedia with Conceptual Directions
    Publikacja

    - Rok 2015

    The paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improving Effectiveness of SVM Classifier for Large Scale Data

    The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improving css-KNN Classification Performance by Shifts in Training Data
    Publikacja

    - Rok 2015

    This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...

  • DBpedia As a Formal Knowledge Base – An Evaluation

    DBpedia is widely used by researchers as a mean of accessing Wikipedia in a standardized way. In this paper it is characterized from the point of view of questions answering system. Simple implementation of such system is also presented. The paper also characterizes alternatives to DBpedia in form of OpenCyc and YAGO knowledge bases. A comparison between DBpedia and those knowledge bases is presented.

    Pełny tekst do pobrania w portalu

Rok 2014
  • Towards Increasing Density of Relations in Category Graphs
    Publikacja

    In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Knowledge Base Suitable for Answering Questions in Natural Language

    This paper presents three knowledge bases widely used by researchers coping with natural language processing: OpenCyc, DBpedia and YAGO. They are characterized from the point of view of questions answering system. In this paper a short description of the aforementioned system implementation is also presented.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • How Specific Can We Be with k-NN Classifier?
    Publikacja

    This paper discusses the possibility of designing a two stage classifier for large-scale hierarchical and multilabel text classification task, that will be a compromise between two common approaches to this task. First of it is called big-bang, where there is only one classifier that aims to do all the job at once. Top-down approach is the second popular option, in which at each node of categories’ hierarchy, there is a flat classifier...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Evaluation of Path Based Methods for Conceptual Representation of the Text
    Publikacja

    Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Comparative Analysis of Text Representation Methods Using Classification
    Publikacja

    In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Big Data Paradigm Developed in Volunteer Grid System with Genetic Programming Scheduler

    Artificial intelligence techniques are capable to handle a large amount of information collected over the web. In this paper, big data paradigm has been studied in volunteer and grid system called Comcute that is optimized by a genetic programming scheduler. This scheduler can optimize load balancing and resource cost. Genetic programming optimizer has been applied for finding the Pareto solu-tions. Finally, some results from numerical...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Automatic Classification of Polish Sign Language Words

    In the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.

    Pełny tekst do pobrania w portalu

Rok 2013
  • Wikipedia Articles Representation with Matrix'u
    Publikacja

    - Rok 2013

    In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Thresholding Strategies for Large Scale Multi-Label Text Classifier
    Publikacja

    This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classification tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classifier on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Selection of Relevant Features for Text Classification with K-NN

    In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Selecting Features with SVM
    Publikacja

    A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Retrieval with Semantic Sieve
    Publikacja

    The article presents an algorithm we called Semantic Sieve applied for refining search results in text documents repository. The algorithm calculates socalled conceptual directions that enables interaction with the user and allows to narrow the set of results to the most relevant ones. We present the system where the algorithm has been implemented. The system also offers in the presentation layer clustering of the results into...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Parallel Computations of Text Similarities for Categorization Task
    Publikacja

    - Rok 2013

    In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....

  • Interactive Information Search in Text Data Collections
    Publikacja

    This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Improvement of Imperfect String Matching Based on Asymetric n-Grams
    Publikacja

    Typical approaches to string comparing treats them as either different or identical without taking into account the possibility of misspelling of the word. In this article we present an approach we used for improvement of imperfect string matching that allows one to reconstruct potential string distortions. The proposed method increases the quality of imperfect string matching, allowing the lookup of misspelled words without significant...

    Pełny tekst do pobrania w portalu

  • IDENTYFIKACJA POWIĄZAŃ POMIĘDZY KATEGORIAMI WIKIPEDII Z UŻYCIEM MIAR PODOBIEŃSTWA ARTYKUŁÓW

    W artykule opisano podejście do identyfikacji powiązań między kategoriami w repozytorium danych tekstowych, bazując na Wikipedii. Przeprowadzając analizę podobieństwa między artykułami określono miary pozwalające zidentyfikować powiązania między kategoriami, które nie były wcześniej uwzględnione i nadawać im wagi określające stopień istotności. Przeprowadzono automatyczną ocenę uzyskanych rezultatów w odniesieniu do już istniejącej...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Bringing Common Sense to WordNet with a Word Game
    Publikacja

    We present a tool for common sense knowledge acquisition in form of a twenty questions game. The described approach uses WordNet dictionary, which rich taxonomy allows to keep cognitive economy and accelerate knowledge propagation, although sometimes inferences made on hierarchical relations result in noise. We extend the dictionary with common sense assertions acquired during the games played with humans. The facts added to the...

    Pełny tekst do pobrania w serwisie zewnętrznym

Rok 2012
  • Zastosowanie systemu Comcute do łamania algorytmu DES
    Publikacja

    - Rok 2012

    Zaprezentowano zastosowanie systemu Comcute do łamania szyfru DES. Przedstawiono podstawową architekturę wykorzystaną do dystrybucji obliczeń oraz zaprezentowano wyniki skalowalności rozwiązania w funkcji użytych jednostek obliczeniowych.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Words context analysis for improvement of information retrieval
    Publikacja

    - Rok 2012

    In the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...

  • Towards Effective Processing of Large Text Collections
    Publikacja

    In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...

  • Text classifiers for automatic articles categorization
    Publikacja

    The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

  • Self Organizing Maps for Visualization of Categories
    Publikacja

    - Rok 2012

    Visualization of Wikipedia categories using Self Organizing Mapsshows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures.

  • Rozpraszanie obliczeń za pomocą serwerów dystrybucyjnych

    Omówiono zasady funkcjonowania serwerów dystrybucyjnych w systemie obliczeniowym klasy grid pracującym w trybie volunteer computing. Omówiono sposoby zwiększania wydajności tej warstwy systemu za pomocą zarządzania strumieniem paczek danych. Odniesiono się także do koncepcji Map-Reduce w implementacji przetwarzania równoległego.

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Matching Exception Class Hierarchies between .NET, Java Environments
    Publikacja

    The paper presents a methodology of exception classification and matching exception messages between .NET andJava environments. The methodology operates on existing exception class hierarchies and proposes two complementingapproaches: automated and manual matching. The automated matching uses the similarity measure to find associationsbetween exception messages from the two sets of classes for the considered programming languages....

  • Interactive Information Retrieval Algorithm for Wikipedia Articels
    Publikacja

    - Rok 2012

    The article presents an algorithm for retrieving textual information in documents collection. The algorithm employs a category system that organizers the repository and using interaction with user improves search precision. The algorithm was implemented for simple English Wikipedia and the first evaluation results indicates the proposed method can help to retrieve information from large document repositories.

  • Context Search Algorithm for Lexical Knowledge Acquisition
    Publikacja

    - CONTROL AND CYBERNETICS - Rok 2012

    A Context Search algorithm used for lexical knowledge acquisition is presented. Knowledge representation based on psycholinguistic theories of cognitive processes allows for implementation of a computational model of semantic memory in the form of semantic network. A knowledge acquisition using supervised dialog templates have been performed in a word game designed to guess the concept a human user is thinking about. The game,...

  • Collaborative approach to WordNet and Wikipedia integration
    Publikacja

    In this article we present a collaborative approach tocreating mappings between WordNet and Wikipedia. Wikipediaarticles have been first matched with WordNet synsets in anautomatic way. Then such associations have been evaluated andcomplemented in a collaborative way using a web application.We describe algorithms used for creating automatic mappingsas well as a system for their collaborative development. Theoutcome enables further...

  • Annotating Words Using WordNet Semantic Glosses
    Publikacja

    - Rok 2012

    An approach to the word sense disambiguation (WSD) relaying onthe WordNet synsets is proposed. The method uses semantically tagged glosses to perform a process similar to the spreading activation in semantic network, creating ranking of the most probable meanings for word annotation. Preliminary evaluation shows quite promising results. Comparison with the state-of-theart WSD methods indicates that the use of WordNet relations...

  • Adaptive Algorithm for Interactive Question-based Search
    Publikacja

    - Rok 2012

    Popular web search engines tend to improve the relevanceof their result pages, but the search is still keyword-oriented and far from "understanding" the queries' meaning. In the article we propose an interactive question-based search algorithm that might come up helpful for identifying users' intents. We describe the algorithm implemented in a form of a questions game. The stress is put mainly on the most critical aspect of this...

Rok 2011
  • Wizualizacja struktury Wikipedii do wspomagania wyszukiwania informacji
    Publikacja

    - Rok 2011

    Graficzna prezentacja jest efektywnym sposobem poprawiania interakcji użytkownika z repozytorium wiedzy. Pozwala ona na przejrzyste przedstawienie złożonych struktur i uchwycenie zależności, które nie są widoczne bezpośrednio. Zastosowanie takiego podejścia w wyszukiwaniu informacji pozwala na prezentację danych na wysokim poziomie abstrakcji przy jednoczesnym określeniu ich kontekstu, co ma bezpośrednie przełożenie na jakość dostępu...

  • Self-Organizing Map representation for clustering Wikipedia search results

    The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...

  • Self–Organizing Map representation for clustering Wikipedia search results
    Publikacja

    - Rok 2011

    The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...

    Pełny tekst do pobrania w serwisie zewnętrznym

  • Security ontology construction and integration
    Publikacja

    - Rok 2011

    There are many different levels on which we can examine security. Each one is different from others, all of them are dependent on the context. Hence the need to bear additional knowledge enabling efficient utilization of the knowledge by the computers. Such information can be provided by ontologies. The paper presentsgathered requirements needed to be taken into account when creating an ontology. The method of ontology creation...

  • Management of Textual Data at Conceptual Level
    Publikacja

    - Rok 2011

    The article presents the approach to the management of a large repository of documents at conceptual level. We describe our approach to representing Wikipedia articles using their categories. The representation has been used to construct groups of similar articles. Proposed approach has been implemented in prototype system that allows to organize articles that are search results for a given query. Constructed clusters allow to...

  • Interaktywne wyszukiwanie informacji w repozytoriach danych tekstowych

    W artykule przedstawione zostały architektura oraz projekt systemu, którego celem jest umożliwienie zbudowania platformy pozwalającej na indeksowanie dużych kolekcji tekstowych oraz wyszukiwania w nich, za pomocą autorskich algorytmów, opartych o zysk informacjny oraz interaktywną komunikację z użytkownikiem. Przeprowadzono ocenę skuteczności zastosowanych algorytmów pod względem zarówno klasteryzacji jak i zbieżności algorytmu...

wyświetlono 3057 razy