Search results for: BAG-OF-WORDS, DOCUMENT CATEGORIZATION, NEURAL NETWORKS, TEXT CLASSIFICATION, TEXT REPRESENTATION, WIKIPEDIA, WORD EMBEDDINGS

Search results for: BAG-OF-WORDS, DOCUMENT CATEGORIZATION, NEURAL NETWORKS, TEXT CLASSIFICATION, TEXT REPRESENTATION, WIKIPEDIA, WORD EMBEDDINGS

Didn't find any results in this catalog!

But we have some results in other catalogs.

Przykład wyników znalezionych w innych katalogach

zobacz wszystkie wyniki

Filters

total: 3990

clear all filters disabled

displaying 1000 best results Help

An Analysis of Neural Word Representations for Wikipedia Articles Classification
Publication
- J. Szymański
- N. Kawalec
- CYBERNETICS AND SYSTEMS - Year 2019
One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

Full text to download in external service
Text Categorization Improvement via User Interaction
Publication
- J. Atroszko
- J. Szymański
- D. Gil
- H. Mora
- Year 2018
In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Full text to download in external service
Evaluation of Path Based Methods for Conceptual Representation of the Text
Publication
- Ł. Kucharczyk
- J. Szymański
- Year 2014
Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Full text to download in external service
Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network
Publication
- A. Wawrzyński
- J. Szymański
- Applied Sciences-Basel - Year 2021
To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

Full text available to download
Path-based methods on categorical structures for conceptual representation of wikipedia articles
Publication
- Ł. Kucharczyk
- J. Szymański
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Year 2017
Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Full text available to download
Comparative Analysis of Text Representation Methods Using Classification
Publication
- J. Szymański
- CYBERNETICS AND SYSTEMS - Year 2014
In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Full text to download in external service
Spectral Clustering Wikipedia Keyword-Based search Results
Publication
- J. Szymański
- T. Dziubich
- FRONTIERS IN ROBOTICS AND AI - Year 2017
The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

Full text available to download
TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia
Open Research Data
open access
The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...
An automated learning model for twitter sentiment analysis using Ranger AdaBelief optimizer based Bidirectional Long Short Term Memory
Publication
- S. Natarajan
- S. Kurian
- P. Bidare Divakarachari
- P. Falkowski-Gilski
- EXPERT SYSTEMS - Year 2024
Sentiment analysis is an automated approach which is utilized in process of analysing textual data to describe public opinion. The sentiment analysis has major role in creating impact in the day-to-day life of individuals. However, a precise interpretation of text still relies as a major concern in classifying sentiment. So, this research introduced Bidirectional Long Short Term Memory with Ranger AdaBelief Optimizer (Bi-LSTM RAO)...

Full text to download in external service
Text classifiers for automatic articles categorization
Publication
- Year 2012
The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

Search

Didn't find any results in this catalog!

Filters

Catalog

Search results for: BAG-OF-WORDS, DOCUMENT CATEGORIZATION, NEURAL NETWORKS, TEXT CLASSIFICATION, TEXT REPRESENTATION, WIKIPEDIA, WORD EMBEDDINGS