Parallelization of large vector similarity computations in a hybrid CPU+GPU environment - Publication - Bridge of Knowledge

Search

Parallelization of large vector similarity computations in a hybrid CPU+GPU environment

Abstract

The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector pairs: tuning of a GPU kernel with consideration of memory coalescing and using shared memory, minimization of GPU memory allocation costs, optimization of CPU–GPU communication in terms of size of data sent, overlapping CPU–GPU communication and kernel execution, concurrent kernel execution, determination of best sizes for data batches processed on CPUs and GPUs along with best GPU grid sizes. It is shown that all codes scale in hybrid environments with various relative performances of compute devices, even for a case when comparisons of various vector pairs take various amounts of time. Tests were performed on two high-performance hybrid systems with: 2 x Intel Xeon E5-2640 CPU + 2 x NVIDIA Tesla K20m and latest generation 2 x Intel Xeon CPU E5-2620 v4 + NVIDIA’s Pascal generation GTX 1070 cards. Results demonstrate expected improvements and beneficial optimizations important for users incorporating such types of computations into their parallel codes run on similar systems.

Citations

  • 1 0

    CrossRef

  • 0

    Web of Science

  • 1 4

    Scopus

Cite as

Full text

download paper
downloaded 1496 times
Publication version
Accepted or Published Version
License
Creative Commons: CC-BY open in new tab

Keywords

Details

Category:
Articles
Type:
artykuł w czasopiśmie wyróżnionym w JCR
Published in:
JOURNAL OF SUPERCOMPUTING no. 74, pages 768 - 786,
ISSN: 0920-8542
Language:
English
Publication year:
2018
Bibliographic description:
Czarnul P.: Parallelization of large vector similarity computations in a hybrid CPU+GPU environment// JOURNAL OF SUPERCOMPUTING. -Vol. 74, nr. 2 (2018), s.768-786
DOI:
Digital Object Identifier (open in new tab) 10.1007/s11227-017-2159-7
Bibliography: test
  1. Alabduljalil, M.A., Tang, X., Yang, T.: Optimizing parallel algorithms for all pairs similarity search. In: S. Leonardi, A. Panconesi, P. Ferragina, A. Gionis (eds.) WSDM, pp. 203212. ACM (2013). URL http://dblp.uni- trier.de/db/conf/wsdm/wsdm2013.html#AlabduljalilTY13 open in new tab
  2. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Hannun, A.Y., Jun, B., Han, T., LeGresley, P., Li, X., Lin, L., Narang, S., Ng, A.Y., Ozair, S., Prenger, R., Qian, S., Raiman, J., Satheesh, S., Seetapun, D., Sen- gupta, S., Wang, C., Wang, Y., Wang, Z., Xiao, B., Xie, Y., Yogatama, D., Zhan, J., Zhu, Z.: Deep speech 2 : End-to-end speech recognition in english and mandarin. In: M. Balcan, K.Q. Weinberger (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR Workshop and Conference Proceedings, vol. 48, pp. 173182. JMLR.org (2016). URL http://jmlr.org/proceedings/papers/v48/amodei16.html
  3. Awekar, A., Samatova, N.F.: Fast matching for all pairs similarity search. Web Intelli- gence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on 1, 295300 (2009). DOI http://doi.ieeecomputersociety.org/10.1109/WI-IAT.2009.52 open in new tab
  4. Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Pro- ceedings of the 16th International Conference on World Wide Web, WWW '07, pp. 131140. ACM, New York, NY, USA (2007). DOI 10.1145/1242572.1242591. URL http://doi.acm.org/10.1145/1242572.1242591 open in new tab
  5. Czarnul, P.: Benchmarking performance of a hybrid intel xeon/xeon phi system for parallel computation of similarity measures between large vectors. International Jour- nal of Parallel Programming pp. 117 (2016). DOI 10.1007/s10766-016-0455-0. URL http://dx.doi.org/10.1007/s10766-016-0455-0 open in new tab
  6. Czarnul, P., Kuchta, J., Ro±ciszewski, P., Procz, J.: Modeling energy consumption of parallel applications. In: 2016 Federated Conference on Computer Science and Infor- mation Systems (FedCSIS), pp. 855864 (2016) open in new tab
  7. Czarnul, P., Ro±ciszewski, P.: Optimization of Execution Time under Power Consump- tion Constraints in a Heterogeneous Parallel System with GPUs and CPUs, pp. 6680. open in new tab
  8. Springer Berlin Heidelberg, Berlin, Heidelberg (2014). DOI 10.1007/978-3-642-45249- 9_5. URL http://dx.doi.org/10.1007/978-3-642-45249-9_5 open in new tab
  9. Czarnul, P., Ro±ciszewski, P., Matuszek, M., Szyma«ski, J.: Simulation of paral- lel similarity measure computations for large data sets. In: 2015 IEEE 2nd In- ternational Conference on Cybernetics (CYBCONF), pp. 472477 (2015). DOI 10.1109/CYBConf.2015.7175980 open in new tab
  10. De Francisci, G., Lucchese, C., Baraglia, R.: Scaling out all pairs similarity search with mapreduce. Large-Scale Distributed Systems for Information Retrieval p. 27 (2010)
  11. Dunn, T., Banerjee, N.K., Banerjee, S., undened, undened, undened, un- dened: Gpu acceleration of document similarity measures for automated bug triaging. 2016 IEEE International Symposium on Software Reliabil- ity Engineering Workshops (ISSREW) 00(undened), 140145 (2016). DOI doi.ieeecomputersociety.org/10.1109/ISSREW.2016.27 open in new tab
  12. Harris, M.: High performance computing with cuda. optimizing cuda. In: SC07 (2007). Http://gpgpu.org/static/sc2007/SC07_CUDA_5_Optimization_Harris.pdf
  13. Hartung, M., Kolb, L., Groÿ, A., Rahm, E.: Optimizing Similarity Computations for Ontology Matching -Experiences from GOMMA, pp. 8189. Springer Berlin Heidelberg, Berlin, Heidelberg (2013). DOI 10.1007/978-3-642-39437-9_7. URL http://dx.doi.org/10.1007/978-3-642-39437-9_7 open in new tab
  14. Jo, Y., Bae, D., Kim, S.: Ecient computations of link-based similarity mea- sures on the GPU. In: 3rd IEEE International Conference on Network In- frastructure and Digital Content, IC-NIDC 2012, Beijing, China, September 21- 23, 2012, pp. 261265. IEEE (2012). DOI 10.1109/ICNIDC.2012.6418756. URL http://dx.doi.org/10.1109/ICNIDC.2012.6418756 open in new tab
  15. Kruli², M., Skopal, T., Loko£, J., Beecks, C.: Combining cpu and gpu architectures for fast similarity search. Distributed and Parallel Databases 30(3), 179207 (2012). open in new tab
  16. DOI 10.1007/s10619-012-7092-4. URL http://dx.doi.org/10.1007/s10619-012-7092-4 open in new tab
  17. Lam, H.T., Dung, D.V., Perego, R., Silvestri, F.: An incremental prex ltering ap- proach for the all pairs similarity search problem. In: W.S. Han, D. Srivastava, G. Yu, H. Yu, Z.H. Huang (eds.) APWeb, pp. 188194. IEEE Computer Society (2010). URL http://dblp.uni-trier.de/db/conf/apweb/apweb2010.html#LamDPS10 open in new tab
  18. Ma, C., Wang, L., Xie, X.: GPU accelerated chemical similarity calculation for com- pound library comparison. Journal of Chemical Information and Modeling 51(7), 1521 open in new tab
  19. 1527 (2011). DOI 10.1021/ci1004948. URL http://dx.doi.org/10.1021/ci1004948 open in new tab
  20. Mabotuwana, T., Lee, M.C., Cohen-Solal, E.V.: An ontology-based similarity measure for biomedical data application to radiology reports. Journal of Biomedical Infor- matics 46(5), 857 868 (2013). DOI http://dx.doi.org/10.1016/j.jbi.2013.06.013. URL http://www.sciencedirect.com/science/article/pii/S1532046413000889 open in new tab
  21. McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relat- edness to disambiguate terms in biomedical text. Journal of Biomedical Informat- ics 46(6), 1116 1124 (2013). DOI http://dx.doi.org/10.1016/j.jbi.2013.08.008. URL http://www.sciencedirect.com/science/article/pii/S1532046413001238. Special Section: Social Media Environments open in new tab
  22. Obin, N., Roebel, A.: Similarity search of acted voices for automatic voice casting. open in new tab
  23. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(9), 1642 1651 (2016). DOI 10.1109/TASLP.2016.2580302 open in new tab
  24. Pantel, P., Crestan, E., Borkovsky, A., Popescu, A.M., Vyas, V.: Web-scale distributional similarity and entity set expansion. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 -Volume 2, EMNLP '09, pp. 938 947. Association for Computational Linguistics, Stroudsburg, PA, USA (2009). URL http://dl.acm.org/citation.cfm?id=1699571.1699635 open in new tab
  25. Phong, P.H., Son, L.H.: Linguistic vector similarity measures and applications to lin- guistic information classication. International Journal of Intelligent Systems 32(1), 6781 (2017). DOI 10.1002/int.21830. URL http://dx.doi.org/10.1002/int.21830 open in new tab
  26. Pushpa, C., Girish, S., Nitin, S., Thriveni, J., Venugopal, K., Patnaik, L.: Computing semantic similarity measure between words using web search engine. In: D.C. Wyld, D. Nagamalai, N. Meghanathan (eds.) Third International Conference on Computer Science, Engineering & Applications (ICCSEA 2013), pp. 135142. Delhi, India (2013). ISBN : 978-1-921987-13-7, DOI: 10.5121/csit.2013.3514 open in new tab
  27. Rodriguez-Serrano, J.A., Perronnin, F., Llados, J., Sanchez, G.: A similarity measure between vector sequences with application to handwritten word image retrieval. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 17221729 (2009). DOI 10.1109/CVPR.2009.5206783 open in new tab
  28. Szymanski, J.: Mining relations between wikipedia categories. In: Networked Digital Technologies -Second International Conference, NDT 2010, Prague, Czech Republic, July 7-9, 2010. Proceedings, Part II, pp. 248255 (2010) open in new tab
  29. Szymanski, J.: Comparative analysis of text representation methods using classication. Cybernetics and Systems 45(2), 180199 (2014) open in new tab
  30. Yadav, K., Mittal, A., Ansari, M.: Parallel implementation of similarity measures on gpu architecture using cuda. Indian Journal of Computer Science and Engineering (IJCSE) 3(1) (2012). ISSN: 0976-5166
  31. Zadeh, R.B., Goel, A.: Dimension independent similarity computation. Journal of Ma- chine Learning Research 14(1), 16051626 (2013). URL http://dl.acm.org/citation.cfm? id=2567715 open in new tab
Verified by:
Gdańsk University of Technology

seen 1948 times

Recommended for you

Meta Tags