Search results for: KATEGORYZACJA WIKIPEDII

Search results for: KATEGORYZACJA WIKIPEDII

results on page:
embed this view on your website

Displayed results came from alternative search method.

Filters

total: 92

clear all filters disabled

Towards Increasing Density of Relations in Category Graphs
Publication
- Year 2014
In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based...

Full text to download in external service
Elgold partial: News
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 37 English texts scrapped from news websites. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking...
Elgold partial: Automotive blogs
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 34 English texts scrapped from automotive blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and...
Elgold partial: Movie reviews
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 37 English texts with movie reviews. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: Job offers
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 34 English texts scrapped from the web portals offering job offers. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity...
Elgold partial: Scientific papers' abstracts
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 87 Scientific papers' abstracts in English randomly chosen from the folowing scientific disciplines: Biomedicine, Life Sciences, Mathematics, Medicine, Science, Humanities, Social Science.
Elgold partial: Amazon product reviews
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 34 Amazon product reviews in English. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Elgold partial: History blogs
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold - partial
The dataset contains 13 texts from English history blogs. In each text, the named entities are marked. Each name entity is linked to the corresponding Wikipedia if possible. All entities were manually verified by at least three people, which makes the dataset a high-quality gold standard for the evaluation of named entity recognition and linking algorithms.
Usługa oraz model dekompozycji - teoretyczne podstawy usługowego zarządzania organizacją wsparcia IT
Publication
- J. Pastuszak
- M. Stolarek
- C. Orłowski
- Year 2009
Rozdział omawia fundamentalne dla usługowego modelu zarządzania organizacją IT pojęcie usługi. Opisuje podstawowe typy jej atrybutów oraz wprowadza funkcję wyróżniającą wykorzystaną do kategoryzacji usług. W dalszej części ukazany jest ogólny model dekompozycji usługi i jego wersję bazującą na ograniczeniach implementacji modelu w CMDB. Publikacja podsumowuje otrzymane wyniki i wskazuje dalsze kierunki badań dotyczące w szczególności...
Elgold intermediate: verified by the authors
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold intermediate
The dataset contains the texts from Elgold intermediate: verified by verification team additionaly verified by the dataset authors but before the final validation step with the elgold toolset.
Elgold intermediate: verified by verification team
Open Research Data
open access
- S. Olewniczak
- J. Szymański
- series: Elgold intermediate
The dataset contains the texts from Elgold intermediate: annotated raw additionaly verified by the five-person verification team. arly 25% of the mentions were corrected in some aspect.
Review on Wikification methods
Publication
- J. Szymański
- M. Naruszewicz
- AI COMMUNICATIONS - Year 2019
The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

Full text to download in external service
Evaluation of Path Based Methods for Conceptual Representation of the Text
Publication
- Ł. Kucharczyk
- J. Szymański
- Year 2014
Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Full text to download in external service
Towards Facts Extraction From Texts in Polish Language
Publication
- T. M. Boiński
- A. Brzeski
- International Journal of Innovative Research in Computer and Communication Engineering - Year 2014
The Polish language differs from English in many ways. It has more complicated conjugation and declination. Because of that automatic facts extraction from texts is difficult. In this paper we present basic differences between those languages. The paper presents an algorithm for extraction of facts from articles from Polish Wikipedia. The algorithm is based on 7 proposed facts schemes that are searched for in the analyzed text....

Full text available to download
Dynamic Semantic Visual Information Management
Publication
- J. Szymański
- W. Duch
- Year 2010
Dominant Internet search engines use keywords and therefore are not suited for exploration of new domains of knowledge, when the user does not know specific vocabulary. Browsing through articles in a large encyclopedia, each presenting a small fragment of knowledge, it is hard to map the whole domain, see relevant concepts and their relations. In Wikipedia for example some highly relevant articles are not linked with each other....

Full text to download in external service
Selecting Features with SVM
Publication
- J. Rzeniewicz
- J. Szymański
- Year 2013
A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount...

Full text to download in external service
Wordventure - cooperative wordnet editor. Architecture for lexical semantic aquisition
Publication
- J. Szymański
- Year 2009
This article presents architecture for acquiring lexical semanticsin a collaborative approach paradigm. The system enablesfunctionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation.It has been used for semantic network presentation,and brings simultaneously modification functionality.
WordVenture - COOPERATIVE WordNet EDITOR Architecture for Lexical Semantic Acquisition
Publication
- J. Szymański
- Year 2017
This article presents architecture for acquiring lexical semantics in a collaborative approach paradigm. The system enables functionality for editing semantic networks in a wikipedia-like style. The core of the system is a user-friendly interface based on interactive graph navigation. It has been used for semantic network presentation, and brings simultaneously modification functionality.

Full text to download in external service
DBpedia As a Formal Knowledge Base – An Evaluation
Publication
- WSEAS Transactions on Information Science and Applications - Year 2015
DBpedia is widely used by researchers as a mean of accessing Wikipedia in a standardized way. In this paper it is characterized from the point of view of questions answering system. Simple implementation of such system is also presented. The paper also characterizes alternatives to DBpedia in form of OpenCyc and YAGO knowledge bases. A comparison between DBpedia and those knowledge bases is presented.

Full text available to download
Self Organizing Maps for Visualization of Categories
Publication
- J. Szymański
- W. Duch
- Year 2012
Visualization of Wikipedia categories using Self Organizing Mapsshows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures.
Automatyczna budowa taksonomii usług w oparciu o ich głosy w języku naturalnym oraz przy uzyciu zewnętrznych źródeł wiedzy
Publication
- M. Michalski
- Year 2009
Przedstawiono propozycję metody automatycznej budowy taksonomiiusług na podstawie ich opisów w języku naturalnym, w oparciu ometodę analizy formalnych koncepcji (FCA). Dodatkowo przedstawione rozwiązanie przewiduje możliwość skorzystania z zewnętrznych źródeł wiedzy takich jak Wikipedia, Word Net, ConceptNet lub globalnej sieci WWW w celu eliminacji problemu niepełnych danych wejściowych (ang. data sparseness).
Cooperative Word Net Editor for Lexical Semantic Acquisition
Publication
- J. Szymański
- Year 2011
The article describes an approach for building Word Net semantic dictionary in a collaborative approach paradigm. The presented system system enables functionality for gathering lexical data in a Wikipedia-like style. The core of the system is a user-friendly interface based on component for interactive graph navigation. The component has been used for Word Net semantic network presentation on web page, and it brings functionalities...

Full text to download in external service
Management of Textual Data at Conceptual Level
Publication
- J. Szymański
- Year 2011
The article presents the approach to the management of a large repository of documents at conceptual level. We describe our approach to representing Wikipedia articles using their categories. The representation has been used to construct groups of similar articles. Proposed approach has been implemented in prototype system that allows to organize articles that are search results for a given query. Constructed clusters allow to...
Words context analysis for improvement of information retrieval
Publication
- J. Szymański
- Year 2012
In the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...
Thresholding Strategies for Large Scale Multi-Label Text Classifier
Publication
- K. Draszawka
- J. Szymański
- Year 2013
This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classiﬁcation tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classiﬁer on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

Full text to download in external service
Towards Effective Processing of Large Text Collections
Publication
- J. Szymański
- H. Krawczyk
- Year 2012
In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...
Game with a Purpose for Mappings Verification
Publication
- T. M. Boiński
- Annals of Computer Science and Information Systems - Year 2016
Mappings verification is a laborious task. The paper presents a Game with a Purpose based system for verification of automatically generated mappings. General description of idea standing behind the games with the purpose is given. Description of TGame system, a 2D platform mobile game with verification process included in the gameplay, is provided. Additional mechanisms for anti-cheating, increasing player’s motivation and gathering...

Full text available to download
Overview of Scalability and Reliability Problem in SDN Networks
Publication
- S. Kaczmarek
- J. A. Litka
- Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne - Year 2016
In the paper an overview of scalability and reliability in the SDN (Software Defined Networks) networks has been presented. Problems and limitations for guaranteeing scalability and reliability in SDN networks have been indicated. Known methods for assuring scalability and reliability in SDN networks have been described. Projects from research communities for resolving issues with scalability and reliability in SDN networks have...
External Validation Measures for Nested Clustering of Text Documents
Publication
- K. Draszawka
- J. Szymański
- Year 2011
Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
Passing from requirements specification to class model using application domain ontology
Publication
- J. Kuchta
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Year 2010
The quality of a classic software engineering process depends on the completeness of project documents and on the inter-phase consistency. In this paper, a method for passing from the requirement specification to the class model is proposed. First, a developer browses the text of the requirements, extracts the word sequences, and places them as terms into the glossary. Next, the internal ontology logic for the glossary needs to...
Parallel Computations of Text Similarities for Categorization Task
Publication
- J. Szymański
- Year 2013
In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
Jakość i efekty kształcenia, a kolejne etapy procesu Bolońskiego
Publication
- A. M. Dąbrowicz-Tlałka
- Pismo PG - Year 2010
Kraje uczestniczące w procesie bolońskim od lat poszukują fundamentalnych wartości i dobrych praktyk związanych z jakością kształcenia. Zapewnienie jakości w szkolnictwie wyższym nie jest problemem wyłącznie europejskim. Na całym świecie obserwuje się coraz większe zainteresowanie tą tematyką, co jest odzwierciedleniem zarówno gwałtownego rozwoju szkolnictwa wyższego, jak i związanych z nim kosztów jakimi obciążone jest z tego...

Full text to download in external service
Modelowanie rozwoju regionalnej sieci połączeń kolejowych z wykorzystaniem metody analitycznego procesu sieciowego
Publication
- D. Kaszubowski
- Logistyka - Year 2014
W artykule przedstawiono wielokryterialny model decyzyjny dla kategoryzacji linii kolejowych w województwie pomorskim z uwagi wymogi użyteczności publicznej. Punktem odniesienia dla analizy był Plan zrównoważonego rozwoju publicznego transportu zbiorowego w województwie pomorskim, w którym z uwagi na niezbędną elastyczność decyzyjną nie dokonano parametryzacji kryteriów decydujących o przypisaniu linii do segmentu użyteczności...

Full text available to download
Selection of Relevant Features for Text Classification with K-NN
Publication
- Year 2013
In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...

Full text to download in external service
How Specific Can We Be with k-NN Classifier?
Publication
- K. Draszawka
- J. Szymański
- Year 2014
This paper discusses the possibility of designing a two stage classifier for large-scale hierarchical and multilabel text classification task, that will be a compromise between two common approaches to this task. First of it is called big-bang, where there is only one classifier that aims to do all the job at once. Top-down approach is the second popular option, in which at each node of categories’ hierarchy, there is a flat classifier...

Full text to download in external service
Text Categorization Improvement via User Interaction
Publication
- J. Atroszko
- J. Szymański
- D. Gil
- H. Mora
- Year 2018
In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Full text to download in external service
Transponowanie tradycji w architekturze synagog XIX i XX wieku na Podlasiu. - Tom 1, 2
Publication
- P. Trojniel
- Year 2013
Podlasie należy do obszaru kulturowego pogranicza. Spotykało się tu od wieków wiele różnych kultur, narodów, religii. Jego wielokulturowy krajobraz współtworzyła od XVI w. społeczność żydowska. Praca niniejsza dotyczy badań nad architekturą synagog, zbudowanych w XIX i XX w. na Podlasiu. Jej zamierzeniem głównym jest rejestracja, analiza, kategoryzacja typologiczna i ocena wartości dóbr tej architektury, w odniesieniu do obowiązujących kanonów...
Wielkoskalowa hierarchiczna klasyfikacja dokumentów tekstowych
Publication
- K. Draszawka
- Year 2012
Niniejszy rozdział przedstawia problematykę wielkoskalowej, hie-rarchicznej i wieloetykietowej klasykacji dokumentów tekstowych naprzykładzie problemu automatycznego przyporządkowywania artykułuencyklopedycznego do jednej lub kilku (wieloetykietowość) kategorii,spośród setek tysięcy (wielkoskalowość) kategorii tematycznych Wi-kipedii zorganizowanych hierarchicznie. Praca opisuje różne wariantyrozwiązania zagadnienia, analizując...
Improving css-KNN Classification Performance by Shifts in Training Data
Publication
- K. Draszawka
- J. Szymański
- F. Guerra
- Year 2015
This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...
Follow the Light. Where to search for useful research information
Publication
- K. Zielińska-Dąbkowska
- ARC Lighting In Architecture - Year 2019
Architectural Lighting Design (ALD) has never been a standalone professional discipline. Rather, it has existed as the combination of art and the science of light. Today, third generation lighting professionals are already creatively intertwining these fields, and the acceleration in scientific, technological and societal studies has only increased the need for reliable multidisciplinary information. Therefore, a thorough re-examination...

Full text available to download
Strategie testowania i diagnostyki analogowych układów elektronicznych
Publication
- W. Toczek
- Year 2009
Praca dotyczy testowania i diagnostyki analogowych układów elektronicznych metodami zorientowanymi na uszkodzenia. Omówiono źródła i klasyfikację uszkodzeń, strategie testowania wykorzystujące nadmiarowość analityczną i sprzętową, wewnątrzobwodowe testowanie pakietów elektronicznych oraz zastosowanie algorytmów klasyfikacji obrazów do lokalizacji uszkodzeń. Wynikiem prac w zakresie metod analitycznych jest opracowanie przyspieszonej...
Identification of category associations using a multilabel classifier
Publication
- J. Szymański
- J. Rzeniewicz
- EXPERT SYSTEMS WITH APPLICATIONS - Year 2016
Description of the data using categories allows one to describe it on a higher abstraction level. In this way, we can operate on aggregated groups of the information, allowing one to see relationships that do not appear explicit when we analyze the individual objects separately. In this paper we present automatic identification of the associations between categories used for organization of the textual data. As experimental data...

Full text to download in external service

Search

Filters

Catalog

Search results for: KATEGORYZACJA WIKIPEDII