Wyniki wyszukiwania dla: BAG-OF-WORDS, DOCUMENT CATEGORIZATION, NEURAL NETWORKS, TEXT CLASSIFICATION, TEXT REPRESENTATION, WIKIPEDIA, WORD EMBEDDINGS

Wyniki wyszukiwania dla: BAG-OF-WORDS, DOCUMENT CATEGORIZATION, NEURAL NETWORKS, TEXT CLASSIFICATION, TEXT REPRESENTATION, WIKIPEDIA, WORD EMBEDDINGS

wyników na stronę:
osadź ten widok na swojej stronie

Filtry

wszystkich: 3990

wyczyść wszystkie filtry niedostępne

wyświetlamy 1000 najlepszych wyników Pomoc

An Analysis of Neural Word Representations for Wikipedia Articles Classification
Publikacja
- J. Szymański
- N. Kawalec
- CYBERNETICS AND SYSTEMS - Rok 2019
One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

Pełny tekst do pobrania w serwisie zewnętrznym
Text Categorization Improvement via User Interaction
Publikacja
- J. Atroszko
- J. Szymański
- D. Gil
- H. Mora
- Rok 2018
In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Pełny tekst do pobrania w serwisie zewnętrznym
Evaluation of Path Based Methods for Conceptual Representation of the Text
Publikacja
- Ł. Kucharczyk
- J. Szymański
- Rok 2014
Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Pełny tekst do pobrania w serwisie zewnętrznym
Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network
Publikacja
- A. Wawrzyński
- J. Szymański
- Applied Sciences-Basel - Rok 2021
To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

Pełny tekst do pobrania w portalu
Path-based methods on categorical structures for conceptual representation of wikipedia articles
Publikacja
- Ł. Kucharczyk
- J. Szymański
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2017
Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Pełny tekst do pobrania w portalu
Comparative Analysis of Text Representation Methods Using Classification
Publikacja
- J. Szymański
- CYBERNETICS AND SYSTEMS - Rok 2014
In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Pełny tekst do pobrania w serwisie zewnętrznym
Spectral Clustering Wikipedia Keyword-Based search Results
Publikacja
- J. Szymański
- T. Dziubich
- FRONTIERS IN ROBOTICS AND AI - Rok 2017
The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

Pełny tekst do pobrania w portalu
TF-IDF weighted bag-of-words preprocessed text documents from Simple English Wikipedia
Dane Badawcze
open access
The SimpleWiki2K-scores dataset contains TF-IDF weighted bag-of-words preprocessed text documents (raw strings are not available) [feature matrix] and their multi-label assignments [label-matrix]. Label scores for each document are also provided for an enhanced multi-label KNN [1] and LEML [2] classifiers. The aim of the dataset is to establish a benchmark...
An automated learning model for twitter sentiment analysis using Ranger AdaBelief optimizer based Bidirectional Long Short Term Memory
Publikacja
- S. Natarajan
- S. Kurian
- P. Bidare Divakarachari
- P. Falkowski-Gilski
- EXPERT SYSTEMS - Rok 2024
Sentiment analysis is an automated approach which is utilized in process of analysing textual data to describe public opinion. The sentiment analysis has major role in creating impact in the day-to-day life of individuals. However, a precise interpretation of text still relies as a major concern in classifying sentiment. So, this research introduced Bidirectional Long Short Term Memory with Ranger AdaBelief Optimizer (Bi-LSTM RAO)...

Pełny tekst do pobrania w serwisie zewnętrznym
Text classifiers for automatic articles categorization
Publikacja
- Rok 2012
The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.
Towards semantic-rich word embeddings
Publikacja
- G. Beringer
- M. Jabłoński
- P. Januszewski
- A. Sobecki
- J. Szymański
- Annals of Computer Science and Information Systems - Rok 2019
In recent years, word embeddings have been shown to improve the performance in NLP tasks such as syntactic parsing or sentiment analysis. While useful, they are problematic in representing ambiguous words with multiple meanings, since they keep a single representation for each word in the vocabulary. Constructing separate embeddings for meanings of ambiguous words could be useful for solving the Word Sense Disambiguation (WSD)...

Pełny tekst do pobrania w portalu
Wikipedia Articles Representation with Matrix'u
Publikacja
- J. Szymański
- Rok 2013
In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

Pełny tekst do pobrania w serwisie zewnętrznym
Parallel Computations of Text Similarities for Categorization Task
Publikacja
- J. Szymański
- Rok 2013
In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....
Selection of Relevant Features for Text Classification with K-NN
Publikacja
- Rok 2013
In this paper, we describe five features selection techniques used for a text classification. An information gain, independent significance feature test, chi-squared test, odds ratio test, and frequency filtering have been compared according to the text benchmarks based on Wikipedia. For each method we present the results of classification quality obtained on the test datasets using K-NN based approach. A main advantage of evaluated...

Pełny tekst do pobrania w serwisie zewnętrznym
Methodology for Text Classification using Manually Created Corpora-based Sentiment Dictionary
Publikacja
- N. Rizun
- W. Waloszek
- Rok 2018
This paper presents the methodology of Textual Content Classification, which is based on a combination of algorithms: preliminary formation of a contextual framework for the texts in particular problem area; manual creation of the Hierarchical Sentiment Dictionary (HSD) on the basis of a topically-oriented Corpus; tonality texts recognition via using HSD for analysing the documents as a collection of topically completed fragments...

Pełny tekst do pobrania w portalu
Word and Text

Czasopisma

ISSN: 2069-9271
Text categorization with semantic commonsense knowledge: First results
Publikacja
- P. Majewski
- J. Szymański
- Rok 2008
Do przetwarzania tekstów typowo wykorzystuje się reprezentacjeBOW. Podejście takie nie daje jednak dobrych rezultatów w sytuacjigdy podobne dokumenty nie współdzielą ze sobą słów.W artykule zaprezentowano podejście do konstrukcji funkcjijądra dla klasyfikatorów SVM opartego na zewnętrznej bazie wiedzyo pojęciach językowych.
Agile Commerce in the light of Text Mining
Publikacja
- A. Baj-Rogowska
- Przedsiębiorczość i Zarządzanie - Rok 2017
The survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...

Pełny tekst do pobrania w portalu
Wikipedia and WordNet integration based on words co-occurrences
Publikacja
- J. Kilanowski
- J. Szymański
- Rok 2009
The article presents a method for automatic integration of two lexical resources: semantic dictionary WordNet and electronic encyclopaedia Wikipedia. Our goal is to add automatically an semantic tags - a WordNet synset identifier to the title of the Wikipedia article. We've analyze several different ap-proaches to these problem and implement our own solution, based on word occurrences in synsets descriptions and the article body....
The Method of a Two-Level Text-Meaning Similarity Approximation of the Customers’ Opinions
Publikacja
- N. Rizun
- P. Kapłański
- Y. Taranenko
- Studia Ekonomiczne. Zeszyty Naukowe Uniwersytetu Ekonomicznego w Katowicach - Rok 2016
The method of two-level text-meaning similarity approximation, consisting in the implementation of the classification of the stages of text opinions of customers and identifying their rank quality level was developed. Proposed and proved the significance of major hypotheses, put as the basis of the developed methodology, notably about the significance of suggestions about the existence of analogies between mathematical bases of...

Pełny tekst do pobrania w portalu
Text Documents Classification with Support Vector Machines
Publikacja
- P. Majewski
- Rok 2008
Categorization of Wikipedia articles with spectral clustering
Publikacja
- J. Szymański
- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2011
Abstract. The article reports application of clustering algorithms for creating hierarchical groups withinWikipedia articles.We evaluate three spectral clustering algorithms based on datasets constructed with usage ofWikipedia categories. Selected algorithm has been implemented in the system that categorize Wikipedia search results in the fly.
Self-Organizing Map representation for clustering Wikipedia search results
Publikacja
- J. Szymański
- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2011
The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...
Self–Organizing Map representation for clustering Wikipedia search results
Publikacja
- J. Szymański
- Rok 2011
The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal...

Pełny tekst do pobrania w serwisie zewnętrznym
Clothes Detection and Classification Using Convolutional Neural Networks
Publikacja
- J. Cychnerski
- A. Brzeski
- A. Boguszewski
- M. Marmołowski
- M. Trojanowicz
- Rok 2017
In this paper we describe development of a computer vision system for accurate detection and classification of clothes for e-commerce images. We present a set of experiments on well established architectures of convolutional neural networks, including Residual networks, SqueezeNet and Single Shot MultiBox Detector (SSD). The clothes detection network was trained and tested on DeepFashion dataset, which contains box annotations...

Pełny tekst do pobrania w serwisie zewnętrznym
Prioritising national healthcare service issues from free text feedback – A computational text analysis & predictive modelling approach
Publikacja
- A. Ojo
- N. Rizun
- G. Walsh
- M. I. Mashinchi
- M. Venosa
- M. N. Rao
- DECISION SUPPORT SYSTEMS - Rok 2024
Patient experience surveys have become a key source of evidence for supporting decision-making and continuous quality improvement within healthcare services. To harness free-text feedback collected as part of these surveys for additional insights, text analytics methods are increasingly employed when the data collected is not amenable to traditional qualitative analysis due to volume. However, while text analytics techniques offer...

Pełny tekst do pobrania w portalu
Representation of hypertext documents based on terms, Links and text compressibility
Publikacja
- J. Szymański
- W. Duch
- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2010
Opisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.
Ontology-based text convolution neural network (TextCNN) for prediction of construction accidents
Publikacja
- S. Donghui
- L. Zhigang
- J. Zurada
- A. Manikas
- J. Guan
- P. Weichbroth
- KNOWLEDGE AND INFORMATION SYSTEMS - Rok 2024
The construction industry suffers from workplace accidents, including injuries and fatalities, which represent a significant economic and social burden for employers, workers, and society as a whole.The existing research on construction accidents heavily relies on expert evaluations,which often suffer from issues such as low efficiency, insufficient intelligence, and subjectivity.However, expert opinions provided in construction...

Pełny tekst do pobrania w serwisie zewnętrznym
Classification of objects in the LIDAR point clouds using Deep Neural Networks based on the PointNet model
Publikacja
- Z. Kowalczuk
- K. Szymański
- IFAC-PapersOnLine - Rok 2019
This work attempts to meet the challenges associated with the classification of LIDAR point clouds by means of deep learning. In addition to achieving high accuracy, the designed system should allow the classification of point clouds covering an area of several dozen square kilometers within a reasonable time interval. Therefore, it must be characterized by fast processing and efficient use of memory. Thus, the most popular approaches...

Pełny tekst do pobrania w portalu
Two Stage SVM and kNN Text Documents Classifier
Publikacja
- M. Kępa
- J. Szymański
- Rok 2015
The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...
Development and Research of the Text Messages Semantic Clustering Methodology
Publikacja
- N. Rizun
- P. Kapłański
- Y. Taranenko
- Rok 2016
The methodology of semantic clustering analysis of customer’s text-opinions collection is developed. The author's version of the mathematical models of formalization and practical realization of short textual messages semantic clustering procedure is proposed, based on the customer’s text-opinions collection Latent Semantic Analysis knowledge extracting method. An algorithm for semantic clustering of the text-opinions is developed,...

Pełny tekst do pobrania w portalu
Thresholding Strategies for Large Scale Multi-Label Text Classifier
Publikacja
- K. Draszawka
- J. Szymański
- Rok 2013
This article presents an overview of thresholding methods for labeling objects given a list of candidate classes’ scores. These methods are essential to multi-label classiﬁcation tasks, especially when there are a lot of classes which are organized in a hierarchy. Presented techniques are evaluated using the state-of-the-art dedicated classiﬁer on medium scale text corpora extracted from Wikipedia. Obtained results show that the...

Pełny tekst do pobrania w serwisie zewnętrznym
Time-domain prosodic modifications for text-to-speech synthesizer
Publikacja
- J. Łopatka
- P. Suchomski
- A. Czyżewski
- Rok 2010
An application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. Prosodic modifications that improve the naturalness of the synthesized signal are discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.
Embedded Representations of Wikipedia Categories
Publikacja
- J. Majkutewicz
- J. Szymański
- A. Sobecki
- H. Mora
- D. Gil
- Rok 2021
In this paper, we present an approach to building neural representations of the Wikipedia category graph. We test four different methods and examine the neural embeddings in terms of preservation of graphs edges, neighborhood coverage in representation space, and their influence on the results of a task predicting parent of two categories. The main contribution of this paper is application of neural representations for improving the...

Pełny tekst do pobrania w serwisie zewnętrznym
Deep neural networks approach to skin lesions classification — A comparative analysis
Publikacja
- Rok 2017
The paper presents the results of research on the use of Deep Neural Networks (DNN) for automatic classification of the skin lesions. The authors have focused on the most effective kind of DNNs for image processing, namely Convolutional Neural Networks (CNN). In particular, three kinds of CNN were analyzed: VGG19, Residual Networks (ResNet) and the hybrid of VGG19 CNN with the Support Vector Machine (SVM). The research was carried...

Pełny tekst do pobrania w serwisie zewnętrznym
Generating actionable evidence from free-text feedback to improve maternity and acute hospital experiences: A computational text analytics & predictive modelling approach
Publikacja
- A. Ojo
- N. Rizun
- M. Isazad Mashinchi
- G. Walsh
- J. Gruda
- M. N. Narayana
- M. Venosa
- C. Foley
- D. Rohde
- R. Flynn
- EUROPEAN JOURNAL OF PUBLIC HEALTH - Rok 2023
Background Patient experience surveys are a key source of evidence for supporting decision-making and quality improvement in healthcare services. These surveys contain two main types of questions: closed and open-ended, asking about patients’ care experiences. Apart from the knowledge obtained from analysing closed-ended questions, invaluable insights can be gleaned from free-text data. Advanced analytics techniques are increasingly...

Pełny tekst do pobrania w serwisie zewnętrznym
Selected Technical Issues of Deep Neural Networks for Image Classification Purposes
Publikacja
- Bulletin of the Polish Academy of Sciences-Technical Sciences - Rok 2019
In recent years, deep learning and especially Deep Neural Networks (DNN) have obtained amazing performance on a variety of problems, in particular in classification or pattern recognition. Among many kinds of DNNs, the Convolutional Neural Networks (CNN) are most commonly used. However, due to their complexity, there are many problems related but not limited to optimizing network parameters, avoiding overfitting and ensuring good...

Pełny tekst do pobrania w portalu
Semantic Analysis and Text Summarization in Socio-Technical Systems
Publikacja
- N. Rizun
- Rok 2018
In this chapter the authors present the results of the development the methodology for increasing the reliability of the functioning of the Socio-Technical System. The existed methods and algorithms for processing unstructured (textual) information were studied. Taking into account noted above strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the summarization projection...

Pełny tekst do pobrania w serwisie zewnętrznym
Interactive Information Search in Text Data Collections
Publikacja
- Rok 2013
This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

Pełny tekst do pobrania w serwisie zewnętrznym
Text

Czasopisma

eISSN: 1327-9556
Automatic Classification of Polish Sign Language Words
Publikacja
- T. Dziubich
- J. Szymański
- Przegląd Elektrotechniczny - Rok 2014
In the article we present the approach to automatic recognition of hand gestures using eGlove device. We present the research results of the system for detection and classification of static and dynamic words of Polish language. The results indicate the usage of eGlove allows to gain good recognition quality that additionally can be improved using additional data sources such as RGB cameras.

Pełny tekst do pobrania w portalu
INFLUENCE OF DATA NORMALIZATION ON THE EFFECTIVENESS OF NEURAL NETWORKS APPLIED TO CLASSIFICATION OF PAVEMENT CONDITIONS – CASE STUDY
Publikacja
- K. Marciniuk
- B. Kostek
- Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne - Rok 2018
In recent years automatic classification employing machine learning seems to be in high demand for tele-informatic-based solutions. An example of such solutions are intelligent transportation systems (ITS), in which various factors are taken into account. The subject of the study presented is the impact of data pre-processing and normalization on the accuracy and training effectiveness of artificial neural networks in the case...
When Neural Networks Meet Decisional DNA: A Promising New Perspective for Knowledge Representation and Sharing
Publikacja
- H. Zhang
- C. Sanin
- E. Szczerbicki
- CYBERNETICS AND SYSTEMS - Rok 2016
ABSTRACT In this article, we introduce a novel concept combining neural network technology and Decisional DNA for knowledge representation and sharing. Instead of using traditional machine learning and knowledge discovery methods, this approach explores the way of knowledge extraction through deep learning processes based on a domain’s past decisional events captured by Decisional DNA. We compare our approach with kNN (k-nearest...

Pełny tekst do pobrania w portalu
External Validation Measures for Nested Clustering of Text Documents
Publikacja
- K. Draszawka
- J. Szymański
- Rok 2011
Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...
Deep neural networks for data analysis
Kursy Online
- K. Draszawka
The aim of the course is to familiarize students with the methods of deep learning for advanced data analysis. Typical areas of application of these types of methods include: image classification, speech recognition and natural language understanding. Celem przedmiotu jest zapoznanie studentów z metodami głębokiego uczenia maszynowego na potrzeby zaawansowanej analizy danych. Do typowych obszarów zastosowań tego typu metod należą:...
A novel approach exploiting properties of convolutional neural networks for vessel movement anomaly detection and classification
Publikacja
- B. Czaplewski
- M. Dzwonkowski
- ISA TRANSACTIONS - Rok 2022
The article concerns the automation of vessel movement anomaly detection for maritime and coastal traffic safety services. Deep Learning techniques, specifically Convolutional Neural Networks (CNNs), were used to solve this problem. Three variants of the datasets, containing samples of vessel traffic routes in relation to the prohibited area in the form of a grayscale image, were generated. 1458 convolutional neural networks with...

Pełny tekst do pobrania w portalu
Words context analysis for improvement of information retrieval
Publikacja
- J. Szymański
- Rok 2012
In the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...
Towards Effective Processing of Large Text Collections
Publikacja
- J. Szymański
- H. Krawczyk
- Rok 2012
In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof...
Bożena Kostek prof. dr hab. inż.

Osoby

Laboratorium Akustyki Fonicznej
Text-mining Similarity Approximation Operators for Opinion Mining in BI tools
Publikacja
- N. Rizun
- P. Kapłański
- Y. Taranenko
- S. Alessandro
- Rok 2016
The concept of the Text-mining Similarity Approximation Operators for Opinion Mining as extensions to Natural Language Interface Database is defined. The new operators: “keywords of” dimension; subsetting operator “about C is q”; aggregation operator “by similar C” are proposed. These operators are based on the Latent Semantic Analysis and Social Network Analysis

Pełny tekst do pobrania w portalu

Wyszukiwarka

Filtry

Katalog

Wyniki wyszukiwania dla: BAG-OF-WORDS, DOCUMENT CATEGORIZATION, NEURAL NETWORKS, TEXT CLASSIFICATION, TEXT REPRESENTATION, WIKIPEDIA, WORD EMBEDDINGS

Bożena Kostek prof. dr hab. inż.