Search results for: DOCUMENTS CLASSIFICATION

Search results for: DOCUMENTS CLASSIFICATION

results on page:
embed this view on your website

Filters

total: 14

clear all filters disabled

Text Documents Classification with Support Vector Machines
Publication
- P. Majewski
- Year 2008
Two Stage SVM and kNN Text Documents Classifier
Publication
- M. Kępa
- J. Szymański
- Year 2015
The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
Improving css-KNN Classification Performance by Shifts in Training Data
Publication
- K. Draszawka
- J. Szymański
- F. Guerra
- Year 2015
This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...
Contextual ontology for tonality assessment
Publication
- W. Waloszek
- N. Rizun
- Procedia Computer Science - Year 2020
classification tasks. The discussion focuses on two important research hypotheses: (1) whether it is possible to construct such an ontology from a corpus of textual document, and (2) whether it is possible and beneficial to use inferencing from this ontology to support the process of sentiment classification. To support the first hypothesis we present a method of extraction of hierarchy of contexts from a set of textual documents...

Full text available to download
Text classifiers for automatic articles categorization
Publication
- Year 2012
The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.
Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary
Publication
- N. Rizun
- W. Waloszek
- Year 2018
This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

Full text available to download
Improving the Accuracy in Sentiment Classification in the Light of Modelling the Latent Semantic Relations
Publication
- N. Rizun
- W. Waloszek
- Y. Taranenko
- Information - Year 2018
The research presents the methodology of improving the accuracy in sentiment classification in the light of modelling the latent semantic relations (LSR). The objective of this methodology is to find ways of eliminating the limitations of the discriminant and probabilistic methods for LSR revealing and customizing the sentiment classification process (SCP) to the more accurate recognition of text tonality. This objective was achieved...

Full text available to download
TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia
Open Research Data
open access
The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...
Searching for innovation knowledge: insight into KIBS companies
Publication
- M. Zięba
- E. Bolisani
- M. Paiola
- E. Scarso
- Knowledge Management Research & Practice - Year 2017
The paper analyses the activity of research for “innovation knowledge”—here defined as knowledge that can lead to the introduction of service innovations—by Knowledge-Intensive Business Services (KIBS) companies. It proposes a classification of the possible search approaches adopted by those companies based on two dimensions: the pro-activity of search efforts and the source primarily used. Such classification is then discussed...

Full text to download in external service
Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection
Publication
- P. Hrkút
- M. Ďuračík
- M. Mikušová
- M. Callejas-cuervo
- J. Żukowska
- Year 2019
The problem of plagiarism is becoming increasingly more significant with the growth of Internet technologies and the availability of information resources. Many tools have been successfully developed to detect plagiarisms in textual documents, but the situation is more complicated in the field of plagiarism of source codes, where the problem is equally serious. At present, there are no complex tools available to detect plagiarism...
Preliminary safety assessment of Polish interchanges
Publication
- M. Budzyński
- A. Tubis
- M. Rydlewski
- Archives of Transport - Year 2021
Interchanges are a key and the most complex element of a road infrastructure. The safety and functionality of interchanges determine the traffic conditions and safety of the entire road network. This applies particularly to motorways and express-ways, for which they are the only way to access and exchange traffic. A big problem in Poland is the lack of comprehensive tools for designers at individual stages of the design process....

Full text available to download
External Validation Measures for Nested Clustering of Text Documents
Publication
- K. Draszawka
- J. Szymański
- Year 2011
Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
Functional safety and reliability analysis methodoloogy for hazardous industrial plants
Publication
- K. Kosmowski
- Year 2013
This monograph is devoted to current problems and methods of the functional safety and reliability analyses of the programmable control and protection systems for industrial hazardous plants. The results of these analyses are useful in the process of safety management in life cycle, for effective reducing relevant risks at the design stage, and then controlling these risks during the operation of given installation. The methodology...
Wikipedia Articles Representation with Matrix'u
Publication
- J. Szymański
- Year 2013
In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

Full text to download in external service

Search

Filters

Catalog

Search results for: DOCUMENTS CLASSIFICATION