Towards Effective Processing of Large Text Collections

Julian Szymański; Henryk Krawczyk

doi:10.1109/intech.2012.6457784

Towards Effective Processing of Large Text Collections

Abstract

In the article we describe the approach to parallelimplementation of elementary operations for textual data categorization.In the experiments we evaluate parallel computations ofsimilarity matrices and k-means algorithm. The test datasets havebeen prepared as graphs created from Wikipedia articles relatedwith links. When we create the clustering data packages, wecompute pairs of eigenvectors and eigenvalues for visualizationsof the datasets. We describe the method used for evaluation ofthe clustering quality. Finally we discuss achieved results, pointsome improvements and perspectives for future development.

Citations

0

CrossRef
0

Web of Science
0

Scopus

Authors (2)

Cite as

Full text

full text is not available in portal

Keywords

Details

Category:: Conference activity
Type:: materiały konferencyjne indeksowane w Web of Science
Title of issue:: 2nd International Conference on Innovative Computing Technology (INTECH) strony 293 - 298
Language:: English
Publication year:: 2012
Bibliographic description:: Szymański J., Krawczyk H..: Towards Effective Processing of Large Text Collections, W: 2nd International Conference on Innovative Computing Technology (INTECH), 2012, ,.
DOI:: Digital Object Identifier (open in new tab) 10.1109/intech.2012.6457784
Verified by:: Gdańsk University of Technology

seen 93 times

Recommended for you

Evaluation of Path Based Methods for Conceptual Representation of the Text

2014

Development and Research of the Text Messages Semantic Clustering Methodology

N. Rizun,
P. Kapłański,
Y. Taranenko

2016

Spectral Clustering Wikipedia Keyword-Based search Results

2017

Parallel Computations of Text Similarities for Categorization Task

J. Szymański

2013

Meta Tags

Towards Effective Processing of Large Text Collections

Abstract

Citations

Authors (2)

Julian Szymański dr hab. inż.

Henryk Krawczyk prof. dr hab. inż.

Cite as

Full text

Keywords

Details

Recommended for you

Evaluation of Path Based Methods for Conceptual Representation of the Text

Development and Research of the Text Messages Semantic Clustering Methodology

Spectral Clustering Wikipedia Keyword-Based search Results

Parallel Computations of Text Similarities for Categorization Task

Search

Towards Effective Processing of Large Text Collections

Abstract

Citations

Authors (2)

Julian Szymański dr hab. inż.

Henryk Krawczyk prof. dr hab. inż.

Cite as

Full text

Keywords

Details

Recommended for you

Evaluation of Path Based Methods for Conceptual Representation of the Text

Development and Research of the Text Messages Semantic Clustering Methodology

Spectral Clustering Wikipedia Keyword-Based search Results

Parallel Computations of Text Similarities for Categorization Task