Abstrakt
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector pairs: tuning of a GPU kernel with consideration of memory coalescing and using shared memory, minimization of GPU memory allocation costs, optimization of CPU–GPU communication in terms of size of data sent, overlapping CPU–GPU communication and kernel execution, concurrent kernel execution, determination of best sizes for data batches processed on CPUs and GPUs along with best GPU grid sizes. It is shown that all codes scale in hybrid environments with various relative performances of compute devices, even for a case when comparisons of various vector pairs take various amounts of time. Tests were performed on two high-performance hybrid systems with: 2 x Intel Xeon E5-2640 CPU + 2 x NVIDIA Tesla K20m and latest generation 2 x Intel Xeon CPU E5-2620 v4 + NVIDIA’s Pascal generation GTX 1070 cards. Results demonstrate expected improvements and beneficial optimizations important for users incorporating such types of computations into their parallel codes run on similar systems.
Cytowania
-
1 0
CrossRef
-
0
Web of Science
-
1 4
Scopus
Autor (1)
Cytuj jako
Pełna treść
- Wersja publikacji
- Accepted albo Published Version
- Licencja
- otwiera się w nowej karcie
Słowa kluczowe
Informacje szczegółowe
- Kategoria:
- Publikacja w czasopiśmie
- Typ:
- artykuł w czasopiśmie wyróżnionym w JCR
- Opublikowano w:
-
JOURNAL OF SUPERCOMPUTING
nr 74,
strony 768 - 786,
ISSN: 0920-8542 - Język:
- angielski
- Rok wydania:
- 2018
- Opis bibliograficzny:
- Czarnul P.: Parallelization of large vector similarity computations in a hybrid CPU+GPU environment// JOURNAL OF SUPERCOMPUTING. -Vol. 74, nr. 2 (2018), s.768-786
- DOI:
- Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1007/s11227-017-2159-7
- Bibliografia: test
-
- Alabduljalil, M.A., Tang, X., Yang, T.: Optimizing parallel algorithms for all pairs similarity search. In: S. Leonardi, A. Panconesi, P. Ferragina, A. Gionis (eds.) WSDM, pp. 203212. ACM (2013). URL http://dblp.uni- trier.de/db/conf/wsdm/wsdm2013.html#AlabduljalilTY13 otwiera się w nowej karcie
- Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Hannun, A.Y., Jun, B., Han, T., LeGresley, P., Li, X., Lin, L., Narang, S., Ng, A.Y., Ozair, S., Prenger, R., Qian, S., Raiman, J., Satheesh, S., Seetapun, D., Sen- gupta, S., Wang, C., Wang, Y., Wang, Z., Xiao, B., Xie, Y., Yogatama, D., Zhan, J., Zhu, Z.: Deep speech 2 : End-to-end speech recognition in english and mandarin. In: M. Balcan, K.Q. Weinberger (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR Workshop and Conference Proceedings, vol. 48, pp. 173182. JMLR.org (2016). URL http://jmlr.org/proceedings/papers/v48/amodei16.html
- Awekar, A., Samatova, N.F.: Fast matching for all pairs similarity search. Web Intelli- gence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on 1, 295300 (2009). DOI http://doi.ieeecomputersociety.org/10.1109/WI-IAT.2009.52 otwiera się w nowej karcie
- Bayardo, R.J., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Pro- ceedings of the 16th International Conference on World Wide Web, WWW '07, pp. 131140. ACM, New York, NY, USA (2007). DOI 10.1145/1242572.1242591. URL http://doi.acm.org/10.1145/1242572.1242591 otwiera się w nowej karcie
- Czarnul, P.: Benchmarking performance of a hybrid intel xeon/xeon phi system for parallel computation of similarity measures between large vectors. International Jour- nal of Parallel Programming pp. 117 (2016). DOI 10.1007/s10766-016-0455-0. URL http://dx.doi.org/10.1007/s10766-016-0455-0 otwiera się w nowej karcie
- Czarnul, P., Kuchta, J., Ro±ciszewski, P., Procz, J.: Modeling energy consumption of parallel applications. In: 2016 Federated Conference on Computer Science and Infor- mation Systems (FedCSIS), pp. 855864 (2016) otwiera się w nowej karcie
- Czarnul, P., Ro±ciszewski, P.: Optimization of Execution Time under Power Consump- tion Constraints in a Heterogeneous Parallel System with GPUs and CPUs, pp. 6680. otwiera się w nowej karcie
- Springer Berlin Heidelberg, Berlin, Heidelberg (2014). DOI 10.1007/978-3-642-45249- 9_5. URL http://dx.doi.org/10.1007/978-3-642-45249-9_5 otwiera się w nowej karcie
- Czarnul, P., Ro±ciszewski, P., Matuszek, M., Szyma«ski, J.: Simulation of paral- lel similarity measure computations for large data sets. In: 2015 IEEE 2nd In- ternational Conference on Cybernetics (CYBCONF), pp. 472477 (2015). DOI 10.1109/CYBConf.2015.7175980 otwiera się w nowej karcie
- De Francisci, G., Lucchese, C., Baraglia, R.: Scaling out all pairs similarity search with mapreduce. Large-Scale Distributed Systems for Information Retrieval p. 27 (2010)
- Dunn, T., Banerjee, N.K., Banerjee, S., undened, undened, undened, un- dened: Gpu acceleration of document similarity measures for automated bug triaging. 2016 IEEE International Symposium on Software Reliabil- ity Engineering Workshops (ISSREW) 00(undened), 140145 (2016). DOI doi.ieeecomputersociety.org/10.1109/ISSREW.2016.27 otwiera się w nowej karcie
- Harris, M.: High performance computing with cuda. optimizing cuda. In: SC07 (2007). Http://gpgpu.org/static/sc2007/SC07_CUDA_5_Optimization_Harris.pdf
- Hartung, M., Kolb, L., Groÿ, A., Rahm, E.: Optimizing Similarity Computations for Ontology Matching -Experiences from GOMMA, pp. 8189. Springer Berlin Heidelberg, Berlin, Heidelberg (2013). DOI 10.1007/978-3-642-39437-9_7. URL http://dx.doi.org/10.1007/978-3-642-39437-9_7 otwiera się w nowej karcie
- Jo, Y., Bae, D., Kim, S.: Ecient computations of link-based similarity mea- sures on the GPU. In: 3rd IEEE International Conference on Network In- frastructure and Digital Content, IC-NIDC 2012, Beijing, China, September 21- 23, 2012, pp. 261265. IEEE (2012). DOI 10.1109/ICNIDC.2012.6418756. URL http://dx.doi.org/10.1109/ICNIDC.2012.6418756 otwiera się w nowej karcie
- Kruli², M., Skopal, T., Loko£, J., Beecks, C.: Combining cpu and gpu architectures for fast similarity search. Distributed and Parallel Databases 30(3), 179207 (2012). otwiera się w nowej karcie
- DOI 10.1007/s10619-012-7092-4. URL http://dx.doi.org/10.1007/s10619-012-7092-4 otwiera się w nowej karcie
- Lam, H.T., Dung, D.V., Perego, R., Silvestri, F.: An incremental prex ltering ap- proach for the all pairs similarity search problem. In: W.S. Han, D. Srivastava, G. Yu, H. Yu, Z.H. Huang (eds.) APWeb, pp. 188194. IEEE Computer Society (2010). URL http://dblp.uni-trier.de/db/conf/apweb/apweb2010.html#LamDPS10 otwiera się w nowej karcie
- Ma, C., Wang, L., Xie, X.: GPU accelerated chemical similarity calculation for com- pound library comparison. Journal of Chemical Information and Modeling 51(7), 1521 otwiera się w nowej karcie
- 1527 (2011). DOI 10.1021/ci1004948. URL http://dx.doi.org/10.1021/ci1004948 otwiera się w nowej karcie
- Mabotuwana, T., Lee, M.C., Cohen-Solal, E.V.: An ontology-based similarity measure for biomedical data application to radiology reports. Journal of Biomedical Infor- matics 46(5), 857 868 (2013). DOI http://dx.doi.org/10.1016/j.jbi.2013.06.013. URL http://www.sciencedirect.com/science/article/pii/S1532046413000889 otwiera się w nowej karcie
- McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relat- edness to disambiguate terms in biomedical text. Journal of Biomedical Informat- ics 46(6), 1116 1124 (2013). DOI http://dx.doi.org/10.1016/j.jbi.2013.08.008. URL http://www.sciencedirect.com/science/article/pii/S1532046413001238. Special Section: Social Media Environments otwiera się w nowej karcie
- Obin, N., Roebel, A.: Similarity search of acted voices for automatic voice casting. otwiera się w nowej karcie
- IEEE/ACM Transactions on Audio, Speech, and Language Processing 24(9), 1642 1651 (2016). DOI 10.1109/TASLP.2016.2580302 otwiera się w nowej karcie
- Pantel, P., Crestan, E., Borkovsky, A., Popescu, A.M., Vyas, V.: Web-scale distributional similarity and entity set expansion. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 -Volume 2, EMNLP '09, pp. 938 947. Association for Computational Linguistics, Stroudsburg, PA, USA (2009). URL http://dl.acm.org/citation.cfm?id=1699571.1699635 otwiera się w nowej karcie
- Phong, P.H., Son, L.H.: Linguistic vector similarity measures and applications to lin- guistic information classication. International Journal of Intelligent Systems 32(1), 6781 (2017). DOI 10.1002/int.21830. URL http://dx.doi.org/10.1002/int.21830 otwiera się w nowej karcie
- Pushpa, C., Girish, S., Nitin, S., Thriveni, J., Venugopal, K., Patnaik, L.: Computing semantic similarity measure between words using web search engine. In: D.C. Wyld, D. Nagamalai, N. Meghanathan (eds.) Third International Conference on Computer Science, Engineering & Applications (ICCSEA 2013), pp. 135142. Delhi, India (2013). ISBN : 978-1-921987-13-7, DOI: 10.5121/csit.2013.3514 otwiera się w nowej karcie
- Rodriguez-Serrano, J.A., Perronnin, F., Llados, J., Sanchez, G.: A similarity measure between vector sequences with application to handwritten word image retrieval. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 17221729 (2009). DOI 10.1109/CVPR.2009.5206783 otwiera się w nowej karcie
- Szymanski, J.: Mining relations between wikipedia categories. In: Networked Digital Technologies -Second International Conference, NDT 2010, Prague, Czech Republic, July 7-9, 2010. Proceedings, Part II, pp. 248255 (2010) otwiera się w nowej karcie
- Szymanski, J.: Comparative analysis of text representation methods using classication. Cybernetics and Systems 45(2), 180199 (2014) otwiera się w nowej karcie
- Yadav, K., Mittal, A., Ansari, M.: Parallel implementation of similarity measures on gpu architecture using cuda. Indian Journal of Computer Science and Engineering (IJCSE) 3(1) (2012). ISSN: 0976-5166
- Zadeh, R.B., Goel, A.: Dimension independent similarity computation. Journal of Ma- chine Learning Research 14(1), 16051626 (2013). URL http://dl.acm.org/citation.cfm? id=2567715 otwiera się w nowej karcie
- Weryfikacja:
- Politechnika Gdańska
wyświetlono 1942 razy