Wyniki wyszukiwania dla: text representation documents categorization information retrieval

Text Categorization Improvement via User Interaction

Publikacja

J. Atroszko
J. Szymański
D. Gil
H. Mora

- Rok 2018

In this paper, we propose an approach to improvement of text categorization using interaction with the user. The quality of categorization has been defined in terms of a distribution of objects related to the classes and projected on the self-organizing maps. For the experiments, we use the articles and categories from the subset of Simple Wikipedia. We test three different approaches for text representation. As a baseline we use...

Pełny tekst do pobrania w serwisie zewnętrznym

Evaluation of Path Based Methods for Conceptual Representation of the Text

Publikacja

- Rok 2014

Typical text clustering methods use the bag of words (BoW) representation to describe content of documents. However, this method is known to have several limitations. Employing Wikipedia as the lexical knowledge base has shown an improvement of the text representation for data-mining purposes. Promising extensions of that trend employ hierarchical organization of Wikipedia category system. In this paper we propose three path-based...

Pełny tekst do pobrania w serwisie zewnętrznym

Path-based methods on categorical structures for conceptual representation of wikipedia articles

Publikacja

- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS - Rok 2017

Machine learning algorithms applied to text categorization mostly employ the Bag of Words (BoW) representation to describe the content of the documents. This method has been successfully used in many applications, but it is known to have several limitations. One way of improving text representation is usage of Wikipedia as the lexical knowledge base – an approach that has already shown promising results in many research studies....

Pełny tekst do pobrania w portalu

Spectral Clustering Wikipedia Keyword-Based search Results

Publikacja

- FRONTIERS IN ROBOTICS AND AI - Rok 2017

The paper summarizes our research in the area of unsupervised categorization of Wikipedia articles. As a practical result of our research, we present an application of spectral clustering algorithm used for grouping Wikipedia search results. The main contribution of the paper is a representation method for Wikipedia articles that has been based on combination of words and links and used for categoriation of search results in this...

Pełny tekst do pobrania w portalu

Text classifiers for automatic articles categorization

Publikacja

- Rok 2012

The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

Information Retrieval with the Use of Music Clustering by Directions Algorithm

Publikacja

A. Kaczmarek

- Rok 2013

This paper introduces the Music Clustering by Directions (MCBD) algorithm. The algorithm is designed to support users of query by humming systems in formulating queries. This kind of systems makes it possible to retrieve songs and tunes on the basis of a melody recorded by the user. The Music Clustering by Directions algorithm is a kind of an interactive query expansion method. On the basis of query, the algorithm provides suggestions...

Pełny tekst do pobrania w serwisie zewnętrznym

Representation of hypertext documents based on terms, Links and text compressibility

Publikacja

J. Szymański
W. Duch

- LECTURE NOTES IN COMPUTER SCIENCE - Rok 2010

Opisano metody reprezentacji dokumentów tekstowych oparte na słowach, wzajemnych powiązaniach i metodach kompresji. Dokonano ich oceny w oparciu o klasyfikator SVM.

Comparative Analysis of Text Representation Methods Using Classification

Publikacja

J. Szymański

- CYBERNETICS AND SYSTEMS - Rok 2014

In our work, we review and empirically evaluate five different raw methods of text representation that allow automatic processing of Wikipedia articles. The main contribution of the article—evaluation of approaches to text representation for machine learning tasks—indicates that the text representation is fundamental for achieving good categorization results. The analysis of the representation methods creates a baseline that cannot...

Pełny tekst do pobrania w serwisie zewnętrznym

Information Retrieval in Wikipedia with Conceptual Directions

Publikacja

J. Szymański

- Rok 2015

The paper describes our algorithm used for retrieval of textual information from Wikipedia. The experiments show that the algorithm allows to improve typical evaluation measures of retrieval quality. The improvement of the retrieval results was achieved by two phase usage approach. In first the algorithm extends the set of content that has been indexed by the specified keywords and thus increases the Recall value. Then, using the...

Pełny tekst do pobrania w serwisie zewnętrznym

Music Information Retrieval in Music Repositories

Publikacja

B. Kostek

- Rok 2013

This chapter reviews the key concepts associated with automated Music Information Retrieval (MIR). First, current research trends and system solutions in terms of music retrieval and music recommendation are discussed. Next, experiments performed on a constructed music database are presented. A proposal for music retrieval and annotation aided by gaze tracking is also discussed.

Pełny tekst do pobrania w serwisie zewnętrznym

Words context analysis for improvement of information retrieval

Publikacja

J. Szymański

- Rok 2012

In the article we present an approach to improvement of retrieval informationfrom large text collections using words context vectors. The vectorshave been created analyzing English Wikipedia with Hyperspace Analogue to Language model of words similarity. For test phrases we evaluate retrieval with direct user queries as well as retrieval with context vectors of these queries. The results indicate that the proposed method can not...

Two Stage SVM and kNN Text Documents Classifier

Publikacja

- Rok 2015

The paper presents an approach to the large scale text documents classification problem in parallel environments. A two stage classifier is proposed, based on a combination of k-nearest neighbors and support vector machines classification methods. The details of the classifier and the parallelisation of classification, learning and prediction phases are described. The classifier makes use of our method named one-vs-near. It is...

Information retrieval with semantic memory model

Publikacja

J. Szymański

- Cognitive Systems Research - Rok 2011

Psycholinguistic theories of semantic memory form the basis of understanding of natural language concepts. These theories are used here as an inspiration for implementing a computational model of semantic memory in the form of semantic network. Combining this network with a vector-based object-relation-feature value representation of concepts that includes also weights for confidence and support, allows for recognition of concepts...

Pełny tekst do pobrania w serwisie zewnętrznym

Music information analysis and retrieval techniques

Publikacja

B. Kostek
Ł. Kania

- Archives of Acoustics - Rok 2008

Celem artykułu jest przedstawienie kluczowych zagadnień gwałtownie rozwijającej się gałęzi multimediów, reprezentowanej przez systemy automatycznego wyszukiwania informacji muzycznej MIR - Music Information Retrieval, która urasta do samodzielnej dziedziny zastosowań w obrębie informatyki muzycznej. W artykule przedstawiono wybrane systemy wyszukiwania informacji muzycznej oraz przedstawiono przykład takiego systemu, zrealizowanego...

Pełny tekst do pobrania w portalu

Interactive Information Retrieval Algorithm for Wikipedia Articels

Publikacja

J. Szymański

- Rok 2012

The article presents an algorithm for retrieving textual information in documents collection. The algorithm employs a category system that organizers the repository and using interaction with user improves search precision. The algorithm was implemented for simple English Wikipedia and the first evaluation results indicates the proposed method can help to retrieve information from large document repositories.

Quality evaluation of computer aided information retrieval from machine typed paper documents

Publikacja

- Rok 2003

Celem międzynarodowego projektu memorial jest wspomagane komputerowo rozpoznawanie maszynopisów. Referat prezentuje zagadnienie pomiaru jakości takiego procesu. Wskazano w nim potencjalne miejsca pojawiania się błędów oraz przedstawiono i sklasyfikowano odpowiednie miary.

Extraction of information from born-digital PDF documents for reproducible research

Publikacja

- Journal of Advanced Management - Rok 2016

Born-digital PDF electronic documents might reasonably be expected to preserve useful data units of their source originals that suffice to produce executable papers for reproducible research. Unfortunately, developers of authoring tools may adopt arbitrary PDF generation strategies, producing a plethora of internal data representations. Such common information units as text paragraphs, tables, function graphs and flow diagrams,...

Pełny tekst do pobrania w portalu

Distributed representation of information on cyclic events

Publikacja

A. Opaliński

- STUDIA INFORMATICA. SYSTEMS AND INFORMATION TECHNOLOGY. SYSTEMY I TECHNOLOGIE INFORMACYJNE - Rok 2011

A representation of information on cyclic events has been proposed which is advantageous for computing environments where a distributed set of Receivers reacts to cyclic events generated by distributed sources. In such scenario no immanent central information repository exist on event timing or volume. Receivers are able to learn the event cycles without communicating with each other, merely on the basis of the fact that an event...

Pełny tekst do pobrania w portalu

Parallel Computations of Text Similarities for Categorization Task

Publikacja

J. Szymański

- Rok 2013

In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm....

Visual content representation and retrieval for Cognitive Cyber Physical Systems

Publikacja

C. S. d. Oliveira
C. Sanin
E. Szczerbicki

- Procedia Computer Science - Rok 2019

Cognitive Cyber Physical Systems have gained significant attention from academia and industry during the past few decade. One of the main reasons behind this interest is the potential of such technologies to revolutionize human life since they intend to work robustly under complex visual scenes, which environmental conditions may vary, adapting to a comprehensive range of unforeseen changes, and exhibiting prospective behavior...

Pełny tekst do pobrania w portalu

Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

Publikacja

- Applied Sciences-Basel - Rok 2021

To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches...

Pełny tekst do pobrania w portalu

Text categorization with semantic commonsense knowledge: First results

Publikacja

P. Majewski
J. Szymański

- Rok 2008

Do przetwarzania tekstów typowo wykorzystuje się reprezentacjeBOW. Podejście takie nie daje jednak dobrych rezultatów w sytuacjigdy podobne dokumenty nie współdzielą ze sobą słów.W artykule zaprezentowano podejście do konstrukcji funkcjijądra dla klasyfikatorów SVM opartego na zewnętrznej bazie wiedzyo pojęciach językowych.

Text Documents Classification with Support Vector Machines

Publikacja

P. Majewski

- Rok 2008

External Validation Measures for Nested Clustering of Text Documents

Publikacja

- Rok 2011

Abstract. This article handles the problem of validating the results of nested (as opposed to "flat") clusterings. It shows that standard external validation indices used for partitioning clustering validation, like Rand statistics, Hubert Γ statistic or F-measure are not applicable in nested clustering cases. Additionally to the work, where F-measure was adopted to hierarchical classification as hF-measure, here some methods to...

Music information retrieval—The impact of technology, crowdsourcing, big data, and the cloud in art.

Publikacja

B. Kostek

- Journal of the Acoustical Society of America - Rok 2019

The exponential growth of computer processing power, cloud data storage, and crowdsourcing model of gathering data bring new possibilities to music information retrieval (mir) field. Mir is no longer music content retrieval only; the area also comprises the discovery of expressing feelings and emotions contained in music, incorporating other than hearing modalities for helping this issue, users’ profiling, merging music with social...

Pełny tekst do pobrania w portalu

LSA Is not Dead: Improving Results of Domain-Specific Information Retrieval System Using Stack Overflow Questions Tags

Publikacja

S. Olewniczak
J. Szymański
P. Malak
R. Komar
A. Letowska

- Rok 2024

The paper presents the approach to using tags from Stack Overflow questions as a data source in the process of building domain-specific unsupervised term embeddings. Using a huge dataset of Stack Overflow posts, our solution employs the LSA algorithm to learn latent representations of information technology terms. The paper also presents the Teamy.ai system, currently developed by Scalac company, which serves as a platform that...

Pełny tekst do pobrania w portalu

Music information analysis and retrieval - a review

Publikacja

B. Kostek
Ł. Kania

- Rok 2008

W referacie przedstawiono wybrane zagadnienia związane z analizą i wyszukiwaniem informacji muzycznej. Przegląd ten został oparty na literaturze związanej z dziedziną informatyki muzycznej i koncentruje się wokół problemu parametryzacji dźwięków muzycznych i sygnałów fonicznych oraz analizie przydatności wybranych metod tzw. sztucznej inteligencji (ang. computational intelligence) do akwizycji i rozpoznawania obiektów muzycznych...

Music Information Retrieval – Soft Computing versus Statistics . Wyszukiwanie informacji muzycznej - algorytmy uczące versus metody statystyczne

Publikacja

B. Kostek

- Rok 2015

Music Information Retrieval (MIR) is an interdisciplinary research area that covers automated extraction of information from audio signals, music databases and services enabling the indexed information searching. In the early stages the primary focus of MIR was on music information through Query-by-Humming (QBH) applications, i.e. on identifying a piece of music by singing (singing/whistling), while more advanced implementations...

Pełny tekst do pobrania w serwisie zewnętrznym

Interactive Information Search in Text Data Collections

Publikacja

- Rok 2013

This article presents a new idea for retrieving in text repositories, as well as it describes general infrastructure of a system created to implement and test those ideas. The implemented system differs from today’s standard search engine by introducing process of interactive search with users and data clustering. We present the basic algorithms behind our system and measures we used for results evaluation. The achieved results...

Pełny tekst do pobrania w serwisie zewnętrznym

Report of the ISMIS 2011 Contest : Music Information Retrieval

Publikacja

B. Kostek
A. Kupryjanow
P. Żwan
W. Jiang
Z. W. Raś
M. Wojnarski
J. Świetlicka

- Rok 2011

This report presents an overview of the data mining contestorganized in conjunction with the 19th International Symposiumon Methodologies for Intelligent Systems (ISMIS 2011), in days betweenJan 10 and Mar 21, 2011, on TunedIT competition platform. The contestconsisted of two independent tasks, both related to music information retrieval:recognition of music genres and recognition of instruments, for agiven music sample represented...

"Computing with words" concept applied to musical information retrieval

Publikacja

B. Kostek

- Electronic Notes in Theoretical Computer Science - Rok 2003

W artykule zaproponowano wykorzystanie koncepcji "przetwarzania słów języka naturalnego" do znalezienia związku pomiędzy wybranymi parametrami dźwięków muzycznych a subiektywnie postrzeganą barwą. W pierwszej kolejności przedstawiono klasyczne metody mapowania parametrów mierzalnych i ich subiektywnych odpowiedników, następnie zbudowano bazę wiedzy w oparciu o wyniki testów subiektywnych. W procesie obróbki wykorzystano metodę...

"Computing with word" concept applied to musical information retrieval

Publikacja

B. Kostek

- Rok 2003

W artykule zaproponowano wykorzystanie koncepcji "przetwarzania słów języka naturalnego" do znalezienia związku pomiędzy wybranymi parametrami dźwięków muzycznych a subiektywnie postrzeganą barwą. W pierwszej kolejności przedstawiono klasyczne metody mapowania parametrów mierzalnych i ich subiektywnych odpowiedników, następnie zbudowano bazę wiedzy w oparciu o wyniki testów subiektywnych. W procesie obróbki wykorzystano metodę...

SUBJECTIVE PERCEPTION OF MUSIC GENRES IN THE FIELD OF MUSIC INFORMATION RETRIEVAL SYSTEMS

Publikacja

- Rok 2014

The aim of this paper is to evaluate the relationship between perception of music genres and subjective features of music that can be assigned to them. For this purpose a group of subjective features such as loudness, melody, rhythm, volume, instrumentation was chosen to describe music genres. A group of 30 listeners with normal hearing, ranging from 20 to 40, was created. Each sub-ject participating in listening tests was asked...

SUBJECTIVE PERCEPTION OF MUSIC GENRES IN THE FIELD OF MUSIC INFORMATION RETRIEVAL SYSTEMS

Publikacja

- Rok 2014

The aim of this paper is to evaluate the relationship between perception of music genres and subjective features of music that can be assigned to them. For this purpose a group of subjective features such as loudness, melody, rhythm, volume, instrumentation was chosen to describe music genres. A group of 30 listeners with normal hearing, ranging from 20 to 40, was created. Each sub-ject participating in listening tests was asked...

Previous Opinions is All You Need - Legal Information Retrieval System

Publikacja

M. Osowski
K. Lorenc
P. Drozda
R. Scherer
K. Szałapak
K. Komar-Komarowski
J. Szymański
A. Sobecki

- Rok 2023

We present a system for retrieving the most relevant legal opinions to a given legal case or question. To this end, we checked several state-of-the-art neural language models. As a training and testing data, we use tens of thousands of legal cases as question-opinion pairs. Text data has been subjected to advanced pre-processing adapted to the specifics of the legal domain. We empirically chose the BERT-based HerBERT model to perform...

Pełny tekst do pobrania w serwisie zewnętrznym

Musical Instrument Classification and Duet Analysis Employing Music Information Retrieval Techniques.

Publikacja

B. Kostek

- Rok 2004

Artykuł przedstawia w sposób przeglądowy prace Katedry Systemów Multimedialnych Politechniki Gdańskiej związane z wyszukiwaniem informacji muzycznej, a w szczególności z klasyfikacją dźwięków instrumentów muzycznych. W opisywanych eksperymentach wykorzystano sztuczne sieci neuronowe.

Perception-based data processing in acoustics. Applications to music information retrieval and psychophysiology of hearing.

Publikacja

B. Kostek

- Rok 2005

Tematyka książki obejmuje w pierwszej kolejności opis mechanizmów kognitywnych leżących u podstaw percepcji muzyki. Przedstawione zostały również zagadnienia automatycznego rozpoznawania dźwięków instrumentów muzycznych i muzyki, zastosowanie nowych metod z dziedziny sztucznej inteligencji w szeroko rozumianej inżynierii dźwięku oraz komputerowych metod badania słuchu.

Wikipedia Articles Representation with Matrix'u

Publikacja

J. Szymański

- Rok 2013

In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets ofWikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

Pełny tekst do pobrania w serwisie zewnętrznym

Management of Textual Data at Conceptual Level

Publikacja

J. Szymański

- Rok 2011

The article presents the approach to the management of a large repository of documents at conceptual level. We describe our approach to representing Wikipedia articles using their categories. The representation has been used to construct groups of similar articles. Proposed approach has been implemented in prototype system that allows to organize articles that are search results for a given query. Constructed clusters allow to...

Retrieval with Semantic Sieve

Publikacja

- Rok 2013

The article presents an algorithm we called Semantic Sieve applied for refining search results in text documents repository. The algorithm calculates socalled conceptual directions that enables interaction with the user and allows to narrow the set of results to the most relevant ones. We present the system where the algorithm has been implemented. The system also offers in the presentation layer clustering of the results into...

Pełny tekst do pobrania w serwisie zewnętrznym

Review on Wikification methods

Publikacja

J. Szymański
M. Naruszewicz

- AI COMMUNICATIONS - Rok 2019

The paper reviews methods on automatic annotation of texts with Wikipedia entries. The process, called Wikification aims at building references between concepts identified in the text and Wikipedia articles. Wikification finds many applications, especially in text representation, where it enables one to capture the semantic similarity of the documents. Also, it can be considered as automatic tagging of the text. We describe typical...

Pełny tekst do pobrania w serwisie zewnętrznym

Improving css-KNN Classification Performance by Shifts in Training Data

Publikacja

- Rok 2015

This paper presents a new approach to improve the performance of a css-k-NN classifier for categorization of text documents. The css-k-NN classifier (i.e., a threshold-based variation of a standard k-NN classifier we proposed in [1]) is a lazy-learning instance-based classifier. It does not have parameters associated with features and/or classes of objects, that would be optimized during off-line learning. In this paper we propose...

An Analysis of Neural Word Representations for Wikipedia Articles Classification

Publikacja

J. Szymański
N. Kawalec

- CYBERNETICS AND SYSTEMS - Rok 2019

One of the current popular methods of generating word representations is an approach based on the analysis of large document collections with neural networks. It creates so-called word-embeddings that attempt to learn relationships between words and encode this information in the form of a low-dimensional vector. The goal of this paper is to examine the differences between the most popular embedding models and the typical bag-of-words...

Pełny tekst do pobrania w serwisie zewnętrznym

Improving the Accuracy in Sentiment Classification in the Light of Modelling the Latent Semantic Relations

Publikacja

N. Rizun
W. Waloszek
Y. Taranenko

- Information - Rok 2018

The research presents the methodology of improving the accuracy in sentiment classification in the light of modelling the latent semantic relations (LSR). The objective of this methodology is to find ways of eliminating the limitations of the discriminant and probabilistic methods for LSR revealing and customizing the sentiment classification process (SCP) to the more accurate recognition of text tonality. This objective was achieved...

Pełny tekst do pobrania w portalu

Concept description vectors and the 20 question game

Publikacja

- Rok 2005

Knowledge of properties that are applicable to a given object is a necessary prerequisite to formulate intelligent question. Concept description vectors provide simplest representation of this knowledge, storing for each object information about the values of its properties. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured...

Pełny tekst do pobrania w serwisie zewnętrznym

Fusion-based Representation Learning Model for Multimode User-generated Social Network Content

Publikacja

A. M. Soomar

- ACM Journal of Data and Information Quality - Rok 2023

As mobile networks and APPs are developed, user-generated content (UGC), which includes multi-source heterogeneous data like user reviews, tags, scores, images, and videos, has become an essential basis for improving the quality of personalized services. Due to the multi-source heterogeneous nature of the data, big data fusion offers both promise and drawbacks. With the rise of mobile networks and applications, UGC, which includes...

Pełny tekst do pobrania w serwisie zewnętrznym

Development and Research of the Text Messages Semantic Clustering Methodology

Publikacja

N. Rizun
P. Kapłański
Y. Taranenko

- Rok 2016

The methodology of semantic clustering analysis of customer’s text-opinions collection is developed. The author's version of the mathematical models of formalization and practical realization of short textual messages semantic clustering procedure is proposed, based on the customer’s text-opinions collection Latent Semantic Analysis knowledge extracting method. An algorithm for semantic clustering of the text-opinions is developed,...

Pełny tekst do pobrania w portalu

Just look at to open it up: A biometric verification facility for password autofill to protect electronic documents

Publikacja

- MULTIMEDIA TOOLS AND APPLICATIONS - Rok 2021

Electronic documents constitute specific units of information, and protecting them against unauthorized access is a challenging task. This is because a password protected document may be stolen from its host computer or intercepted while on transfer and exposed to unlimited offline attacks. The key issue is, therefore, making document passwords hard to crack. We propose to augment a common text password authentication interface...

Pełny tekst do pobrania w portalu

Agile Commerce in the light of Text Mining

Publikacja

A. Baj-Rogowska

- Przedsiębiorczość i Zarządzanie - Rok 2017

The survey conducted for this study reveals that more than 84% of respondents have never encountered the term “agile commerce” and do not understand its meaning. At the same time, they are active participants of this strategy. Using digital channels as customers more often than ever before, they have already been included in the agile philosophy. Based on the above, the purpose of the study is to analyse major text sets containing...

Pełny tekst do pobrania w portalu

Ontologies vs. Rules — Comparison of Methods of Knowledge Representation Based on the Example of IT Services Management

Publikacja

- Rok 2013

This text provides a brief overview of selected structures aimed at knowledge representation in the form of ontologies based on description logic and aims at comparing them with their counterparts based on the rule-based approach. Due to the limitations on the length of the article, only elements associated with the representation of concepts could be shown, without including roles. The formalisms of the OWL language were used...

Pełny tekst do pobrania w serwisie zewnętrznym

Wyszukiwarka

Filtry

Katalog

Kategoria

Rok

Opcje

Wyniki wyszukiwania dla: text representation documents categorization information retrieval