Parallelization of large vector similarity computations in a hybrid CPU+GPU environment - Publikacja - MOST Wiedzy


Parallelization of large vector similarity computations in a hybrid CPU+GPU environment


The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector pairs: tuning of a GPU kernel with consideration of memory coalescing and using shared memory, minimization of GPU memory allocation costs, optimization of CPU–GPU communication in terms of size of data sent, overlapping CPU–GPU communication and kernel execution, concurrent kernel execution, determination of best sizes for data batches processed on CPUs and GPUs along with best GPU grid sizes. It is shown that all codes scale in hybrid environments with various relative performances of compute devices, even for a case when comparisons of various vector pairs take various amounts of time. Tests were performed on two high-performance hybrid systems with: 2 x Intel Xeon E5-2640 CPU + 2 x NVIDIA Tesla K20m and latest generation 2 x Intel Xeon CPU E5-2620 v4 + NVIDIA’s Pascal generation GTX 1070 cards. Results demonstrate expected improvements and beneficial optimizations important for users incorporating such types of computations into their parallel codes run on similar systems.


  • 6


  • 5

    Web of Science

  • 9


Cytuj jako

Pełna treść

pobierz publikację
pobrano 1067 razy
Wersja publikacji
Accepted albo Published Version
Creative Commons: CC-BY otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Publikacja w czasopiśmie
artykuł w czasopiśmie wyróżnionym w JCR
Opublikowano w:
JOURNAL OF SUPERCOMPUTING nr 74, strony 768 - 786,
ISSN: 0920-8542
Rok wydania:
Opis bibliograficzny:
Czarnul P.: Parallelization of large vector similarity computations in a hybrid CPU+GPU environment// JOURNAL OF SUPERCOMPUTING. -Vol. 74, nr. 2 (2018), s.768-786
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1007/s11227-017-2159-7
Bibliografia: test
  1. Alabduljalil, M.A., Tang, X., Yang, T.: Optimizing parallel algorithms for all pairs similarity search. In: S. Leonardi, A. Panconesi, P. Ferragina, A. Gionis (eds.) WSDM, pp. 203212. ACM (2013). URL http://dblp.uni- otwiera się w nowej karcie
  2. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Hannun, A.Y., Jun, B., Han, T., LeGresley, P., Li, X., Lin, L., Narang, S., Ng, A.Y., Ozair, S., Prenger, R., Qian, S., Raiman, J., Satheesh, S., Seetapun, D., Sen- gupta, S., Wang, C., Wang, Y., Wang, Z., Xiao, B., Xie, Y., Yogatama, D., Zhan, J., Zhu, Z.: Deep speech 2 : End-to-end speech recognition in english and mandarin. In: M. Balcan, K.Q. Weinberger (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR Workshop and Conference Proceedings, vol. 48, pp. 173182. (2016). URL
  3. Awekar, A., Samatova, N.F.: Fast matching for all pairs similarity search. Web Intelli- gence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on 1, 295300 (2009). DOI otwiera się w nowej karcie
  4. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Pro- ceedings of the 16th International Conference on World Wide Web, WWW '07, pp. 131140. ACM, New York, NY, USA (2007). DOI 10.1145/1242572.1242591. URL otwiera się w nowej karcie
  5. Czarnul, P.: Benchmarking performance of a hybrid intel xeon/xeon phi system for parallel computation of similarity measures between large vectors. International Jour- nal of Parallel Programming pp. 117 (2016). DOI 10.1007/s10766-016-0455-0. URL otwiera się w nowej karcie
  6. Czarnul, P., Kuchta, J., Ro±ciszewski, P., Procz, J.: Modeling energy consumption of parallel applications. In: 2016 Federated Conference on Computer Science and Infor- mation Systems (FedCSIS), pp. 855864 (2016) otwiera się w nowej karcie
  7. Czarnul, P., Ro±ciszewski, P.: Optimization of Execution Time under Power Consump- tion Constraints in a Heterogeneous Parallel System with GPUs and CPUs, pp. 6680. otwiera się w nowej karcie
  8. Springer Berlin Heidelberg, Berlin, Heidelberg (2014). DOI 10.1007/978-3-642-45249- 9_5. URL otwiera się w nowej karcie
  9. Czarnul, P., Ro±ciszewski, P., Matuszek, M., Szyma«ski, J.: Simulation of paral- lel similarity measure computations for large data sets. In: 2015 IEEE 2nd In- ternational Conference on Cybernetics (CYBCONF), pp. 472477 (2015). DOI 10.1109/CYBConf.2015.7175980 otwiera się w nowej karcie
  10. De Francisci, G., Lucchese, C., Baraglia, R.: Scaling out all pairs similarity search with mapreduce. Large-Scale Distributed Systems for Information Retrieval p. 27 (2010)
  11. Dunn, T., Banerjee, N.K., Banerjee, S., undened, undened, undened, un- dened: Gpu acceleration of document similarity measures for automated bug triaging. 2016 IEEE International Symposium on Software Reliabil- ity Engineering Workshops (ISSREW) 00(undened), 140145 (2016). DOI otwiera się w nowej karcie
  12. Harris, M.: High performance computing with cuda. optimizing cuda. In: SC07 (2007). Http://
  13. Hartung, M., Kolb, L., Groÿ, A., Rahm, E.: Optimizing Similarity Computations for Ontology Matching -Experiences from GOMMA, pp. 8189. Springer Berlin Heidelberg, Berlin, Heidelberg (2013). DOI 10.1007/978-3-642-39437-9_7. URL otwiera się w nowej karcie
  14. Jo, Y., Bae, D., Kim, S.: Ecient computations of link-based similarity mea- sures on the GPU. In: 3rd IEEE International Conference on Network In- frastructure and Digital Content, IC-NIDC 2012, Beijing, China, September 21- 23, 2012, pp. 261265. IEEE (2012). DOI 10.1109/ICNIDC.2012.6418756. URL otwiera się w nowej karcie
  15. Kruli², M., Skopal, T., Loko£, J., Beecks, C.: Combining cpu and gpu architectures for fast similarity search. Distributed and Parallel Databases 30(3), 179207 (2012). otwiera się w nowej karcie
  16. DOI 10.1007/s10619-012-7092-4. URL otwiera się w nowej karcie
  17. Lam, H.T., Dung, D.V., Perego, R., Silvestri, F.: An incremental prex ltering ap- proach for the all pairs similarity search problem. In: W.S. Han, D. Srivastava, G. Yu, H. Yu, Z.H. Huang (eds.) APWeb, pp. 188194. IEEE Computer Society (2010). URL otwiera się w nowej karcie
  18. Ma, C., Wang, L., Xie, X.: GPU accelerated chemical similarity calculation for com- pound library comparison. Journal of Chemical Information and Modeling 51(7), 1521 otwiera się w nowej karcie
  19. 1527 (2011). DOI 10.1021/ci1004948. URL otwiera się w nowej karcie
  20. Mabotuwana, T., Lee, M.C., Cohen-Solal, E.V.: An ontology-based similarity measure for biomedical data application to radiology reports. Journal of Biomedical Infor- matics 46(5), 857 868 (2013). DOI URL otwiera się w nowej karcie
  21. McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relat- edness to disambiguate terms in biomedical text. Journal of Biomedical Informat- ics 46(6), 1116 1124 (2013). DOI URL Special Section: Social Media Environments otwiera się w nowej karcie
  22. Obin, N., Roebel, A.: Similarity search of acted voices for automatic voice casting. otwiera się w nowej karcie
  23. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(9), 1642 1651 (2016). DOI 10.1109/TASLP.2016.2580302 otwiera się w nowej karcie
  24. Pantel, P., Crestan, E., Borkovsky, A., Popescu, A.M., Vyas, V.: Web-scale distributional similarity and entity set expansion. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 -Volume 2, EMNLP '09, pp. 938 947. Association for Computational Linguistics, Stroudsburg, PA, USA (2009). URL otwiera się w nowej karcie
  25. Phong, P.H., Son, L.H.: Linguistic vector similarity measures and applications to lin- guistic information classication. International Journal of Intelligent Systems 32(1), 6781 (2017). DOI 10.1002/int.21830. URL otwiera się w nowej karcie
  26. Pushpa, C., Girish, S., Nitin, S., Thriveni, J., Venugopal, K., Patnaik, L.: Computing semantic similarity measure between words using web search engine. In: D.C. Wyld, D. Nagamalai, N. Meghanathan (eds.) Third International Conference on Computer Science, Engineering & Applications (ICCSEA 2013), pp. 135142. Delhi, India (2013). ISBN : 978-1-921987-13-7, DOI: 10.5121/csit.2013.3514 otwiera się w nowej karcie
  27. Rodriguez-Serrano, J.A., Perronnin, F., Llados, J., Sanchez, G.: A similarity measure between vector sequences with application to handwritten word image retrieval. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 17221729 (2009). DOI 10.1109/CVPR.2009.5206783 otwiera się w nowej karcie
  28. Szymanski, J.: Mining relations between wikipedia categories. In: Networked Digital Technologies -Second International Conference, NDT 2010, Prague, Czech Republic, July 7-9, 2010. Proceedings, Part II, pp. 248255 (2010) otwiera się w nowej karcie
  29. Szymanski, J.: Comparative analysis of text representation methods using classication. Cybernetics and Systems 45(2), 180199 (2014) otwiera się w nowej karcie
  30. Yadav, K., Mittal, A., Ansari, M.: Parallel implementation of similarity measures on gpu architecture using cuda. Indian Journal of Computer Science and Engineering (IJCSE) 3(1) (2012). ISSN: 0976-5166
  31. Zadeh, R.B., Goel, A.: Dimension independent similarity computation. Journal of Ma- chine Learning Research 14(1), 16051626 (2013). URL id=2567715 otwiera się w nowej karcie
Politechnika Gdańska

wyświetlono 600 razy

Publikacje, które mogą cię zainteresować

Meta Tagi