Search results for: DOCUMENTS CLASSIFICATION - Bridge of Knowledge

Search

Search results for: DOCUMENTS CLASSIFICATION

Search results for: DOCUMENTS CLASSIFICATION

  • Text Documents Classification with Support Vector Machines

    Publication
    • P. Majewski

    - Year 2008

  • Two Stage SVM and kNN Text Documents Classifier

    Publication

    - Year 2015

    The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

  • Improving css-KNN Classification Performance by Shifts in Training Data

    Publication

    - Year 2015

    This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...

  • Contextual ontology for tonality assessment

    Publication

    classification tasks. The discussion focuses on two important research hypotheses: (1) whether it is possible to construct such an ontology from a corpus of textual document, and (2) whether it is possible and beneficial to use inferencing from this ontology to support the process of sentiment classification. To support the first hypothesis we present a method of extraction of hierarchy of contexts from a set of textual documents...

    Full text available to download

  • Text classifiers for automatic articles categorization

    Publication

    The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

  • Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary

    Publication

    - Year 2018

    This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

    Full text available to download

  • Improving the Accuracy in Sentiment Classification in the Light of Modelling the Latent Semantic Relations

    Publication

    - Information - Year 2018

    The research presents the methodology of improving the accuracy in sentiment classification in the light of modelling the latent semantic relations (LSR). The objective of this methodology is to find ways of eliminating the limitations of the discriminant and probabilistic methods for LSR revealing and customizing the sentiment classification process (SCP) to the more accurate recognition of text tonality. This objective was achieved...

    Full text available to download

  • TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia

    Open Research Data

    The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...

  • Searching for innovation knowledge: insight into KIBS companies

    Publication

    - Knowledge Management Research & Practice - Year 2017

    The paper analyses the activity of research for “innovation knowledge”—here defined as knowledge that can lead to the introduction of service innovations—by Knowledge-Intensive Business Services (KIBS) companies. It proposes a classification of the possible search approaches adopted by those companies based on two dimensions: the pro-activity of search efforts and the source primarily used. Such classification is then discussed...

    Full text to download in external service

  • Increasing K-Means Clustering Algorithm Effectivity for Using in Source Code Plagiarism Detection

    Publication
    • P. Hrkút
    • M. Ďuračík
    • M. Mikušová
    • M. Callejas-cuervo
    • J. Żukowska

    - Year 2019

    The problem of plagiarism is becoming increasingly more significant with the growth of Internet technologies and the availability of information resources. Many tools have been successfully developed to detect plagiarisms in textual documents, but the situation is more complicated in the field of plagiarism of source codes, where the problem is equally serious. At present, there are no complex tools available to detect plagiarism...

  • Preliminary safety assessment of Polish interchanges

    Publication

    - Archives of Transport - Year 2021

    Interchanges are a key and the most complex element of a road infrastructure. The safety and functionality of interchanges determine the traffic conditions and safety of the entire road network. This applies particularly to motorways and express-ways, for which they are the only way to access and exchange traffic. A big problem in Poland is the lack of comprehensive tools for designers at individual stages of the design process....

    Full text available to download

  • External Validation Measures for Nested Clustering of Text Documents

    Publication

    Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...

  • Functional safety and reliability analysis methodoloogy for hazardous industrial plants

    Publication

    - Year 2013

    This monograph is devoted to current problems and methods of the functional safety and reliability analyses of the programmable control and protection systems for industrial hazardous plants. The results of these analyses are useful in the process of safety management in life cycle, for effective reducing relevant risks at the design stage, and then controlling these risks during the operation of given installation. The methodology...

  • Wikipedia Articles Representation with Matrix'u

    Publication

    - Year 2013

    In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

    Full text to download in external service