Parallel Computations of Text Similarities for Categorization Task

Julian Szymański

Parallel Computations of Text Similarities for Categorization Task

Abstrakt

In this chapter we describe the approach to parallel implementation of similarities in high dimensional spaces. The similarities computation have been used for textual data categorization. A test datasets we create from Wikipedia articles that with their hyper references formed a graph used in our experiments. The similarities based on Euclidean distance and Cosine measure have been used to process the data using k-means algorithm. We describe the evaluation method used of the clustering quality as its parallel implementation. Finally we discuss achieved results, point some improvements and perspectives for future development. Proposed implementation can be used as evaluation task for measuring the relevancy of simulator described in Chapter.

Autor (1)

Julian Szymański dr hab. inż.

Cytuj jako

Pełna treść

pełna treść publikacji nie jest dostępna w portalu

Słowa kluczowe

.

Informacje szczegółowe

Kategoria:: Publikacja monograficzna
Typ:: rozdział, artykuł w książce - dziele zbiorowym /podręczniku w języku o zasięgu międzynarodowym
Tytuł wydania:: W : Modeling large-scale computing systems ; concepts and models strony 149 - 160
Język:: angielski
Rok wydania:: 2013
Opis bibliograficzny:: Szymański J.: Parallel Computations of Text Similarities for Categorization Task// W : Modeling large-scale computing systems ; concepts and models/ Gdańsk: Gdańsk Univesity of Technology, 2013, s.149-160
Weryfikacja:: Politechnika Gdańska

wyświetlono 107 razy

Publikacje, które mogą cię zainteresować

The chapter analyses the K-Means algorithm in its parallel setting. We provide detailed description of the algorithm as well as the way we paralellize the computations. We identiﬁed complexity of the particular steps of the algorithm that allows us to build the algorithm model in MERPSYS system. The simulations with the MERPSYS have been performed for diﬀerent size of the data as well as for diﬀerent number of the processors used for the computations. The results we got using the model have been compared to the results obtained from real computational environment.

J. Szymański

2016

J. Kuchta

2016

Meta Tagi