Search results for: WIKIPEDIA

Search results for: WIKIPEDIA

results on page:
embed this view on your website

Filters

total: 67

clear all filters disabled

Embedded Representations of Wikipedia Categories
Publication
- J. Majkutewicz
- J. Szymański
- A. Sobecki
- H. Mora
- D. Gil
- Year 2021
In this paper, we present an approach to building neural representations of the Wikipedia category graph. We test four different methods and examine the neural embeddings in terms of preservation of graphs edges, neighborhood coverage in representation space, and their influence on the results of a task predicting parent of two categories. The main contribution of this paper is application of neural representations for improving the...

Full text to download in external service
Wikipedia Articles Representation with Matrix'u
Publication
- J. Szymański
- Year 2013
In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

Full text to download in external service
Towards Extending Wikipedia with Bidirectional Links
Publication
- Year 2020
In this paper, we present the results of our WikiLinks project which aims at extending current Wikipedia linkage mechanisms. Wikipedia has become recently one of the most important information sources on the Internet, which still is based on relatively simple linkage facilities. A WikiLinks system extends the Wikipedia with bidirectional links between fragments of articles. However, there were several attempts to introduce bidirectional...

Full text available to download
Bidirectional Fragment to Fragment Links in Wikipedia
Publication
- Year 2020
The paper presents a WikiLinks system that extends the Wikipedia linkage model with bidirectional links between fragments of the articles and overlapping links’ anchors. The proposed model adopts some ideas from the research conducted in a field of nonlinear, computer-aided writing, often called a hypertext. WikiLinks may be considered as a web augmentation tool but it presents a new approach to the problem that addresses the specific...

Full text available to download
Information Retrieval in Wikipedia with Conceptual Directions
Publication
- J. Szymański
- Year 2015
The paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the...

Full text to download in external service
Collaborative approach to WordNet and Wikipedia integration
Publication
- Year 2012
In this article we present a collaborative approach tocreating mappings between WordNet and Wikipedia. Wikipediaarticles have been first matched with WordNet synsets in anautomatic way. Then such associations have been evaluated andcomplemented in a collaborative way using a web application.We describe algorithms used for creating automatic mappingsas well as a system for their collaborative development. Theoutcome enables further...
Categorization of Wikipedia articles with spectral clustering
Publication
- J. Szymański
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011
Abstract. The article reports application of clustering algorithms for creating hierarchical groups withinWikipedia articles.We evaluate three spectral clustering algorithms based on datasets constructed with usage ofWikipedia categories. Selected algorithm has been implemented in the system that categorize Wikipedia search results in the fly.
Towards automatic classification of Wikipedia content
Publication
- J. Szymański
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2010
Artykuł opisuje podejście do automatycznej klasyfikacji artykułów w Wikipedii. Przeanalizowane zostały reprezentacje tekstu bazujące na treści dokumentu i wzajemnych powiązaniach. Przedstawiono rezultaty zastosowania klasyfikatora SVM.
Mining relations between wikipedia categories
Publication
- J. Szymański
- Communications in Computer and Information Science - Year 2010
Opisano metody indukcji powiązań pomiędzy kategoriami organizującymi zbiór dokumentów. Przedstawiono wyniki zastosowania proponowanego podejścia dla poprawy systemu kategorii Wikipedii.
Exact-match Based Wikipedia-WordNet Integration
Publication
- Year 2019
Ability to link between WordNet synsets and Wikipedia articles allows usage of those resources by computers during natural language processing. A lot of work was done in this field, however most of the approaches focus on similarity between Wikipedia articles and WordNet synsets rather than creation of perfect matches. In this paper we proposed a set of methods for automatic perfect matching generation. The proposed methods were...

Full text available to download
Interactive Information Retrieval Algorithm for Wikipedia Articels
Publication
- J. Szymański
- Year 2012
The article presents an algorithm for retrieving textual information in documents collection. The algorithm employs a category system that organizers the repository and using interaction with user improves search precision. The algorithm was implemented for simple English Wikipedia and the first evaluation results indicates the proposed method can help to retrieve information from large document repositories.
Exact-match Based Wikipedia-WordNet Integration
Publication
- T. Boinski
- J. Szymanski
- T. Cejrowski
- Year 2019
Full text to download in external service
Wordventure - Developing WordNet in Wikipedia-like Style
Publication
- J. Szymański
- Year 2010
The article describes an approach for building WordNet semantic dictionary in a collaborative way. The idea of gathering lexical data has been proposed, as well as the system for linguistic data acquisition and management.
Spectral Clustering Wikipedia Keyword-Based search Results
Publication
- J. Szymański
- T. Dziubich
- FRONTIERS IN ROBOTICS AND AI - Year 2017
The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

Full text available to download
Wikipedia and WordNet integration based on words co-occurrences
Publication
- J. Kilanowski
- J. Szymański
- Year 2009
The article presents a method for automatic integration of two lexical resources: semantic dictionary WordNet and electronic encyclopaedia Wikipedia. Our goal is to add automatically an semantic tags - a WordNet synset identifier to the title of the Wikipedia article. We've analyze several different ap-proaches to these problem and implement our own solution, based on word occurrences in synsets descriptions and the article body....
Game with a Purpose for Verification of Mappings Between Wikipedia and WordNet
Publication
- T. M. Boiński
- Year 2017
The paper presents a Game with a Purpose for verification of automatically generated mappings focusing on mappings between WordNet synsets and Wikipedia articles. General description of idea standing behind the games with the purpose is given. Description of TGame system, a 2D platform mobile game with verification process included in the game-play, is provided. Additional mechanisms for anti-cheating, increasing player’s motivation...

Full text to download in external service
An Analysis of Neural Word Representations for Wikipedia Articles Classification
Publication
- J. Szymański
- N. Kawalec
- CYBERNETICS AND SYSTEMS - Year 2019
One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

Full text to download in external service
Self–Organizing Map representation for clustering Wikipedia search results
Publication
- J. Szymański
- Year 2011
The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...

Full text to download in external service
Crowdsourcing-Based Evaluation of Automatic References Between WordNet and Wikipedia
Publication
- J. Szymański
- T. M. Boiński
- INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING - Year 2019
The paper presents an approach to build references (also called mappings) between WordNet and Wikipedia. We propose four algorithms used for automatic construction of the references. Then, based on an aggregation algorithm, we produce an initial set of mappings that has been evaluated in a cooperative way. For that purpose, we implement a system for the distribution of evaluation tasks, that have been solved by the user community....

Full text available to download
0-step K-means for clustering Wikipedia search results
Publication
- J. Szymański
- Year 2011
This article describes an improvement for K-means algorithm and its application in the form of a system that clusters search results retrieved from Wikipedia. The proposed algorithm eliminates K-means isadvantages and allows one to create a cluster hierarchy. The main contributions of this paper include the ollowing: (1) The concept of an improved K-means algorithm and its application for hierarchical clustering....
Relation-based Wikipedia Search System for Factoid Questions Answering
Publication
- A. Brzeski
- T. M. Boiński
- International Journal of Innovative Research in Computer and Communication Engineering - Year 2014
In this paper we propose an alternative keyword search mechanism for Wikipedia, designed as a prototype solution towards factoid questions answering. The method considers relations between articles for finding the best matching article. Unlike the standard Wikipedia search engine and also Google engine, which search the articles content independently, requiring the entire query to be satisfied by a single article, the proposed...
Self-Organizing Map representation for clustering Wikipedia search results
Publication
- J. Szymański
- LECTURE NOTES IN COMPUTER SCIENCE - Year 2011
The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...
Path-based methods on categorical structures for conceptual representation of wikipedia articles
Publication
- Ł. Kucharczyk
- J. Szymański
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017
Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Full text available to download
Automatically created and partially veriffied Wikipedia - WordNet mappings
Open Research Data
open access
- T. Boiński
- J. Szymański
Mapping between Wikipedia articles and WordNet synsets. The mappings between Wikipedia articles and WordNet synsets were obtained automatically using 4 algorithms of data processing. The automatically generated mappings were than a subject of verification by a group of volunteers using crowdsourcing approach through so called Games with a Purpose. The...
TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia
Open Research Data
open access
The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...
Przegląd badań na temat Wikipedii oraz z wykorzystaniem Wikipedii jako instrument badawczego
Publication
- B. Atroszko
- J. Atroszko
- Year 2020
W badaniach dotychczas prowadzonych w Polsce Wikipedia była zarówno przedmiotem badań, jak i instrumentem badawczym. Badania na jej temat oraz na temat skutków społecznych jej używania prowadzili przedstawiciele nauk humanistycznych, społecznych, ekonomicznych i prawnych. Dla wielu badaczy (zwłaszcza z dziedziny informatyki) Wikipedia była instrumentem pomocnym w prowadzeniu różnorodnych analiz i dociekań naukowych. Niniejszy artykuł...

Full text to download in external service
Comparative Analysis of Text Representation Methods Using Classification
Publication
- J. Szymański
- CYBERNETICS AND SYSTEMS - Year 2014
In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Full text to download in external service
Wydobywanie wiedzy z Wikipedii
Publication
- J. Kuchta
- Year 2022
Wikipedia jest olbrzymim źródłem wiedzy encyklopedycznej gromadzonej przez ludzi i przeznaczonej dla ludzi. W systemach informatycznych odpowiednikiem takiego źródła wiedzy są ontologie. Ten rozdział pokazuje, w jaki sposób Wikipedia jest transformowana w ontologię i jak wydobywać z niej pojęcia, ich właściwości i relacje między nimi.
Elgold: gold standard, multi-genre dataset for named entity recognition and linking
Open Research Data
open access
- S. Olewniczak
- J. Szymański
The dataset contains 276 multi-genre texts with marked named entities, which are linked to corresponding Wikipedia articles if available. Each entity was manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: News
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 37 English texts scrapped from news websites. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking...
Elgold partial: Job offers
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 34 English texts scrapped from the web portals offering job offers. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity...
Text classifiers for automatic articles categorization
Publication
- Year 2012
The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.
Wordventure - cooperative wordnet editor. Architecture for lexical semantic aquisition
Publication
- J. Szymański
- Year 2009
This article presents architecture for acquiring lexical semanticsin a collaborative approach paradigm. The system enablesfunctionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation.It has been used for semantic network presentation,and brings simultaneously modification functionality.
WordVenture - COOPERATIVE WordNet EDITOR Architecture for Lexical Semantic Acquisition
Publication
- J. Szymański
- Year 2017
This article presents architecture for acquiring lexical semantics in a collaborative approach paradigm. The system enables functionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation. It has been used for semantic network presentation, and brings simultaneously modification functionality.

Full text to download in external service
Self Organizing Maps for Visualization of Categories
Publication
- J. Szymański
- W. Duch
- Year 2012
Visualization of Wikipedia categories using Self Organizing Mapsshows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures.
Towards Increasing Density of Relations in Category Graphs
Publication
- Year 2014
In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...

Full text to download in external service
Metody ekstrakcji ustrukturalizowanej treści z Wikipedii
Publication
- J. Kuchta
- Year 2022
Wikipedia jest od dawna przedmiotem zainteresowania badaczy. Jednym z obszarów zainteresowania jest pozyskiwanie wiedzy z treści Wikipedii a to wymaga parsowania tekstu artykułów. W tym rozdziale przedstawiono analizę porównawczą różnych możliwości parsowania treści Wikipedii, wskazując problemy, z jakimi muszą się mierzyć autorzy parserów. Dzięki temu można zrozumieć, dlaczego proces wydobywania wiedzy z Wikipedii jest trudny
Management of Textual Data at Conceptual Level
Publication
- J. Szymański
- Year 2011
The article presents the approach to the management of a large repository of documents at conceptual level. We describe our approach to representing Wikipedia articles using their categories. The representation has been used to construct groups of similar articles. Proposed approach has been implemented in prototype system that allows to organize articles that are search results for a given query. Constructed clusters allow to...
Review on Wikification methods
Publication
- J. Szymański
- M. Naruszewicz
- AI COMMUNICATIONS - Year 2019
The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

Full text to download in external service
Evaluation of Path Based Methods for Conceptual Representation of the Text
Publication
- Ł. Kucharczyk
- J. Szymański
- Year 2014
Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Full text to download in external service
Towards Facts Extraction From Texts in Polish Language
Publication
- T. M. Boiński
- A. Brzeski
- International Journal of Innovative Research in Computer and Communication Engineering - Year 2014
The Polish language differs from English in many ways. It has more complicated conjugation and declination. Because of that automatic facts extraction from texts is difficult. In this paper we present basic differences between those languages. The paper presents an algorithm for extraction of facts from articles from Polish Wikipedia. The algorithm is based on 7 proposed facts schemes that are searched for in the analyzed text....

Full text to download in external service
Dynamic Semantic Visual Information Management
Publication
- J. Szymański
- W. Duch
- Year 2010
Dominant Internet search engines use keywords and therefore are not suited for exploration of new domains of knowledge, when the user does not know specific vocabulary. Browsing through articles in a large encyclopedia, each presenting a small fragment of knowledge, it is hard to map the whole domain, see relevant concepts and their relations. In Wikipedia for example some highly relevant articles are not linked with each other....

Full text to download in external service
Selecting Features with SVM
Publication
- J. Rzeniewicz
- J. Szymański
- Year 2013
A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount...

Full text to download in external service
DBpedia As a Formal Knowledge Base – An Evaluation
Publication
- WSEAS Transactions on Information Science and Applications - Year 2015
DBpedia is widely used by researchers as a mean of accessing Wikipedia in a standardized way. In this paper it is characterized from the point of view of questions answering system. Simple implementation of such system is also presented. The paper also characterizes alternatives to DBpedia in form of OpenCyc and YAGO knowledge bases. A comparison between DBpedia and those knowledge bases is presented.

Full text to download in external service
Improving css-KNN Classification Performance by Shifts in Training Data
Publication
- K. Draszawka
- J. Szymański
- F. Guerra
- Year 2015
This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...
Automatyczna budowa taksonomii usług w oparciu o ich głosy w języku naturalnym oraz przy uzyciu zewnętrznych źródeł wiedzy
Publication
- M. Michalski
- Year 2009
Przedstawiono propozycję metody automatycznej budowy taksonomiiusług na podstawie ich opisów w języku naturalnym, w oparciu ometodę analizy formalnych koncepcji (FCA). Dodatkowo przedstawione rozwiązanie przewiduje możliwość skorzystania z zewnętrznych źródeł wiedzy takich jak Wikipedia, Word Net, ConceptNet lub globalnej sieci WWW w celu eliminacji problemu niepełnych danych wejściowych (ang. data sparseness).
Cooperative Word Net Editor for Lexical Semantic Acquisition
Publication
- J. Szymański
- Year 2011
The article describes an approach for building Word Net semantic dictionary in a collaborative approach paradigm. The presented system system enables functionality for gathering lexical data in a Wikipedia-like style. The core of the system is a user-friendly interface based on component for interactive graph navigation. The component has been used for Word Net semantic network presentation on web page, and it brings functionalities...

Full text to download in external service
Words context analysis for improvement of information retrieval
Publication
- J. Szymański
- Year 2012
In the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...
Thresholding Strategies for Large Scale Multi-Label Text Classifier
Publication
- K. Draszawka
- J. Szymański
- Year 2013
This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classiﬁcation tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classiﬁer on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

Full text to download in external service
Towards Effective Processing of Large Text Collections
Publication
- J. Szymański
- H. Krawczyk
- Year 2012
In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...
Game with a Purpose for Mappings Verification
Publication
- T. M. Boiński
- Annals of Computer Science and Information Systems - Year 2016
Mappings verification is a laborious task. The paper presents a Game with a Purpose based system for verification of automatically generated mappings. General description of idea standing behind the games with the purpose is given. Description of TGame system, a 2D platform mobile game with verification process included in the gameplay, is provided. Additional mechanisms for anti-cheating, increasing player’s motivation and gathering...

Full text available to download
External Validation Measures for Nested Clustering of Text Documents
Publication
- K. Draszawka
- J. Szymański
- Year 2011
Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
Parallel Computations of Text Similarities for Categorization Task
Publication
- J. Szymański
- Year 2013
In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
Passing from requirements specification to class model using application domain ontology
Publication
- J. Kuchta
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010
The quality of a classic software engineering process depends on the completeness of project documents and on the inter-phase consistency. In this paper, a method for passing from the requirement specification to the class model is proposed. First, a developer browses the text of the requirements, extracts the word sequences, and places them as terms into the glossary. Next, the internal ontology logic for the glossary needs to...
How Specific Can We Be with k-NN Classifier?
Publication
- K. Draszawka
- J. Szymański
- Year 2014
This paper discusses the possibility of designing a two stage classifier for large-scale hierarchical and multilabel text classification task, that will be a compromise between two common approaches to this task. First of it is called big-bang, where there is only one classifier that aims to do all the job at once. Top-down approach is the second popular option, in which at each node of categories’ hierarchy, there is a flat classifier...

Full text to download in external service
Selection of Relevant Features for Text Classification with K-NN
Publication
- Year 2013
In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...

Full text to download in external service
Text Categorization Improvement via User Interaction
Publication
- J. Atroszko
- J. Szymański
- D. Gil
- H. Mora
- Year 2018
In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Full text to download in external service
Automatyczna klasyfikacja artykułów Wikipedii
Publication
- J. Szymański
- M. Roman
- G. Borczuch
- R. Szulgo
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010
Wikipedia- internetowa encyklopedia do organizacji artykułów wykorzystuje system kategorii. W chwili obecnej proces przypisywania artykułu do odpowiednich kategorii tematycznych realizowany jest ręcznie przez jej edytorów. Zadanie to jest czasochłonne i wymaga wiedzy o strukturze Wikiedii. Ręczna kategoryzacja jest również podatna na błędy wynikające z faktu, że przyporządkowanie artykułu don kategorii odbywa się w oparciu o arbitralną...
Follow the Light. Where to search for useful research information
Publication
- K. Zielińska-Dąbkowska
- ARC Lighting In Architecture - Year 2019
Architectural Lighting Design (ALD) has never been a standalone professional discipline. Rather, it has existed as the combination of art and the science of light. Today, third generation lighting professionals are already creatively intertwining these fields, and the acceleration in scientific, technological and societal studies has only increased the need for reliable multidisciplinary information. Therefore, a thorough re-examination...

Full text available to download
Identification of category associations using a multilabel classifier
Publication
- J. Szymański
- J. Rzeniewicz
- EXPERT SYSTEMS WITH APPLICATIONS - Year 2016
Description of the data using categories allows one to describe it on a higher abstraction level. In this way, we can operate on aggregated groups of the information, allowing one to see relationships that do not appear explicit when we analyze the individual objects separately. In this paper we present automatic identification of the associations between categories used for organization of the textual data. As experimental data...

Full text to download in external service
Two Stage SVM and kNN Text Documents Classifier
Publication
- M. Kępa
- J. Szymański
- Year 2015
The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
Improving Effectiveness of SVM Classifier for Large Scale Data
Publication
- Year 2015
The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments...

Full text to download in external service
Interactive Information Search in Text Data Collections
Publication
- Year 2013
This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

Full text to download in external service
IDENTYFIKACJA POWIĄZAŃ POMIĘDZY KATEGORIAMI WIKIPEDII Z UŻYCIEM MIAR PODOBIEŃSTWA ARTYKUŁÓW
Publication
- Studia Informatica Pomerania - Year 2013
W artykule opisano podejście do identyfikacji powiązań między kategoriami w repozytorium danych tekstowych, bazując na Wikipedii. Przeprowadzając analizę podobieństwa między artykułami określono miary pozwalające zidentyfikować powiązania między kategoriami, które nie były wcześniej uwzględnione i nadawać im wagi określające stopień istotności. Przeprowadzono automatyczną ocenę uzyskanych rezultatów w odniesieniu do już istniejącej...

Full text to download in external service
Commonly Accessible Web Service Platform - Wiki-WS
Publication
- H. Krawczyk
- M. Downar
- Year 2012
Web Service technology on the basis had to supply complete and reliable system components. Nowadays this technology is commonly used by companies providing results of their work to end users and hiding implementation details. This paper presents a SOA-enabled platform - Wiki-WS - that empowers users to deploy, modify, discover and invoke web services. Moreover it discusses concepts and functionalities of this open source management...

Full text to download in external service
Annotating Words Using WordNet Semantic Glosses
Publication
- J. Szymański
- W. Duch
- Year 2012
An approach to the word sense disambiguation (WSD) relaying onthe WordNet synsets is proposed. The method uses semantically tagged glosses to perform a process similar to the spreading activation in semantic network, creating ranking of the most probable meanings for word annotation. Preliminary evaluation shows quite promising results. Comparison with the state-of-theart WSD methods indicates that the use of WordNet relations...
Wizualizacja struktury Wikipedii do wspomagania wyszukiwania informacji
Publication
- J. Szymański
- W. Duch
- Year 2011
Graficzna prezentacja jest efektywnym sposobem poprawiania interakcji użytkownika z repozytorium wiedzy. Pozwala ona na przejrzyste przedstawienie złożonych struktur i uchwycenie zależności, które nie są widoczne bezpośrednio. Zastosowanie takiego podejścia w wyszukiwaniu informacji pozwala na prezentację danych na wysokim poziomie abstrakcji przy jednoczesnym określeniu ich kontekstu, co ma bezpośrednie przełożenie na jakość dostępu...

Search

Filters

Catalog