Wyniki wyszukiwania dla: documents categorization

Text Categorization Improvement via User Interaction

Publikacja

J. Atroszko
J. Szymański
D. Gil
H. Mora

- Rok 2018

In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Pełny tekst do pobrania w serwisie zewnętrznym

Evaluation of Path Based Methods for Conceptual Representation of the Text

Publikacja

- Rok 2014

Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Pełny tekst do pobrania w serwisie zewnętrznym

Path-based methods on categorical structures for conceptual representation of wikipedia articles

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2017

Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Pełny tekst do pobrania w portalu

Improving css-KNN Classification Performance by Shifts in Training Data

Publikacja

- Rok 2015

This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...

Text classifiers for automatic articles categorization

Publikacja

- Rok 2012

The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

Two Stage SVM and kNN Text Documents Classifier

Publikacja

- Rok 2015

The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

Retrieval of Heterogeneus Sevices in C2NIWA Repository

Publikacja

J. Szymański

- TASK Quarterly - Rok 2015

The paper reviews the methods used for retrieval of information and services. The selected approaches presented in the review inspired us to build retrieval mechanisms in a system for searching the resources stored in the C2NIWA repository. We describe the architecture of the system, its functions and the surrounding subsystems to which it is related. For retrieval of C2NIWA sevices we propos three approaches based on: keyword...

Pełny tekst do pobrania w portalu

Towards Effective Processing of Large Text Collections

Publikacja

- Rok 2012

In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...

Self Organizing Maps for Visualization of Categories

Publikacja

J. Szymański
W. Duch

- Rok 2012

Visualization of Wikipedia categories using Self Organizing Mapsshows an overview of categories and their relations, helping to narrow down search domains. Selecting particular neurons this approach enables retrieval of conceptually similar categories. Evaluation of neural activations indicates that they form coherent patterns that may be useful for building user interfaces for navigation over category structures.

Selecting Features with SVM

Publikacja

- Rok 2013

A common problem with feature selection is to establish how many features should be retained at least so that important information is not lost. We describe a method for choosing this number that makes use of Support Vector Machines. The method is based on controlling an angle by which the decision hyperplane is tilt due to feature selection. Experiments were performed on three text datasets generated from a Wikipedia dump. Amount...

Pełny tekst do pobrania w serwisie zewnętrznym

Comparative Analysis of Text Representation Methods Using Classification

Publikacja

J. Szymański

- CYBERNETICS AND SYSTEMS - Rok 2014

In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Pełny tekst do pobrania w serwisie zewnętrznym

Improving Effectiveness of SVM Classifier for Large Scale Data

Publikacja

- Rok 2015

The paper presents our approach to SVM implementation in parallel environment. We describe how classification learning and prediction phases were pararellised. We also propose a method for limiting the number of necessary computations during classifier construction. Our method, named one-vs-near, is an extension of typical one-vs-all approach that is used for binary classifiers to work with multiclass problems. We perform experiments...

Pełny tekst do pobrania w serwisie zewnętrznym

Spectral Clustering Wikipedia Keyword-Based search Results

Publikacja

- FRONTIERS IN ROBOTICS AND AI - Rok 2017

The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

Pełny tekst do pobrania w portalu

Filtry

Katalog

Kategoria

Rok

Opcje

Text Categorization Improvement via User Interaction

Evaluation of Path Based Methods for Conceptual Representation of the Text

Path-based methods on categorical structures for conceptual representation of wikipedia articles

Improving css-KNN Classification Performance by Shifts in Training Data

Text classifiers for automatic articles categorization

Two Stage SVM and kNN Text Documents Classifier

Retrieval of Heterogeneus Sevices in C2NIWA Repository

Towards Effective Processing of Large Text Collections

Self Organizing Maps for Visualization of Categories

Selecting Features with SVM

Comparative Analysis of Text Representation Methods Using Classification

Improving Effectiveness of SVM Classifier for Large Scale Data

Spectral Clustering Wikipedia Keyword-Based search Results

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: documents categorization