Abstrakt
The paper presents a workflow application for efficient parallel processing of data downloaded from an Internet portal. The workflow partitions input files into subdirectories which are further split for parallel processing by services installed on distinct computer nodes. This way, analysis of the first ready subdirectories can start fast and is handled by services implemented as parallel multithreaded applications using multiple cores of modern CPUs. The goal is to assess achievable speed-ups and determine which factors influence scalability and to what degree. Data processing services were implemented for assessment of context (positive or negative) in which the given keyword appears in a document. The testbed application used these services to determine how a particular brand was recognized by either authors of articles or readers in comments in a specific Internet portal focused on new technologies. Obtained execution times as well as speed-ups are presented for data sets of various sizes along with discussion on how factors such as load imbalance and memory/disk bottlenecks limit performance
Cytowania
-
3
CrossRef
-
0
Web of Science
-
4
Scopus
Autor (1)
Cytuj jako
Pełna treść
pełna treść publikacji nie jest dostępna w portalu
Słowa kluczowe
Informacje szczegółowe
- Kategoria:
- Aktywność konferencyjna
- Typ:
- materiały konferencyjne indeksowane w Web of Science
- Tytuł wydania:
- Procedia Computer Science, vol. 29 strony 499 - 508
- ISSN:
- 1877-0509
- Język:
- angielski
- Rok wydania:
- 2014
- Opis bibliograficzny:
- Czarnul P..: A Workflow Application for Parallel Processing of Big Data from an Internet Portal, W: Procedia Computer Science, vol. 29, 2014, Elsevier,.
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1016/j.procs.2014.05.045
- Weryfikacja:
- Politechnika Gdańska
wyświetlono 109 razy