Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems - Publication - Bridge of Knowledge

Search

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Abstract

This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals, ease of programming, debugging, deployment, portability, level of parallelism, constructs enabling parallelism and synchronization, features introduced in recent versions indicating trends, support for hybridity in parallel execution, and disadvantages. Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future. It can help to shape future versions of programming standards, select technologies best matching programmers’ needs, and avoid potential difficulties while using high-performance computing systems.

Citations

  • 1 5

    CrossRef

  • 0

    Web of Science

  • 1 8

    Scopus

Cite as

Full text

download paper
downloaded 54 times
Publication version
Accepted or Published Version
License
Creative Commons: CC-BY open in new tab

Keywords

Details

Category:
Articles
Type:
artykuły w czasopismach
Published in:
Scientific Programming pages 1 - 19,
ISSN: 1058-9244
Language:
English
Publication year:
2020
Bibliographic description:
Czarnul P., Proficz J., Drypczewski K.: Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems// Scientific Programming -, (2020), s.1-19
DOI:
Digital Object Identifier (open in new tab) 10.1155/2020/4176794
Bibliography: test
  1. P. Czarnul, Parallel Programming for Modern High Perfor- mance Computing Systems, Chapman and Hall/CRC Press, Boca Raton, FL, USA, 2018. open in new tab
  2. C++ v.11 thread support library, 2019. open in new tab
  3. Intel threading building blocks, 2019. open in new tab
  4. High Performance paralleX (HPX), 2019. open in new tab
  5. J. Nonaka, M. Matsuda, T. Shimizu et al., "A study on open source software for large-scale data visualization on sparc64fx based hpc systems," in Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 278-288, ACM, Chiyoda, Tokyo, Japan, January 2018. open in new tab
  6. M. U. Ashraf and F. E. Eassa, "Opengl based testing tool ar- chitecture for exascale computing," International Journal of Computer Science and Security (IJCSS), vol. 9, no. 5, p. 238, 2015.
  7. OpenMP Architecture Review Board, OpenMP Application Programming Interface, 2018. open in new tab
  8. NVIDIA: CUDA toolkit documentation v10.1.243, 2019. open in new tab
  9. Khronos OpenCL Working Group, " e openCL specifica- tion," 2019. open in new tab
  10. OpenACC-Standard.org, e OpenACC Application Pro- gramming Interface, 2018.
  11. S. Wienke, C. Terboven, J. C. Beyer, and M. S. Müller, "A pattern-based comparison of openacc and openmp for ac- celerator computing," in European Conference on Parallel Processing, pp. 812-823, Springer, Berlin, Germany, 2014. open in new tab
  12. J. Gosling, B. Joy, G. Steele, G. Bracha, A. Buckley, and D. Smith, " e Java language specification," 2019.
  13. M. Odersky, P. Altherr, V. Cremet et al., "Scala language specification," 2019.
  14. T. Lindholm, F. Yellin, G. Bracha, A. Buckley, and D. Smith, " e Java virtual machine specification," 2019.
  15. TCP/IP standard, 2019. open in new tab
  16. A. L. Russell, " e internet that wasn't," IEEE Spectrum, vol. 50, no. 8, pp. 39-43, 2013. open in new tab
  17. RDMA consortium, 2019. open in new tab
  18. InfiniBand architecture specification release 1.2.1 Annex A16: RoCE, 2010. open in new tab
  19. M. Beck and M. Kagan, "Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure," in Proceedings of the 3rd Workshop on Data Center-Converged and Virtual Ethernet Switching, Berkeley, CA, USA, September 2011.
  20. InfiniBand architecture specification release 1.2.1 Annex A17: RoCEv2, 2010. open in new tab
  21. P. Shamis, M. G. Venkata, M. G. Lopez et al., "UCX: an open source framework for HPC network APIs and beyond," in Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 40-43, IEEE, Santa Clara, CA, USA, August 2015. open in new tab
  22. B. Chapman, T. Curtis, S. Pophale et al., "Introducing openshmem: shmem for the pgas community," in Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, pp. 2.1-2.3, ACM, New York, NY, USA, October 2010. open in new tab
  23. M. Baker, F. Aderholdt, M. G. Venkata, and P. Shamis, "OpenSHMEM-UCX: evaluation of UCX for implementing 16 Scientific Programming OpenSHMEM programming model," in OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, pp. 114-130, Springer International Publish- ing, Berlin, Germany, 2016. open in new tab
  24. N. Papadopoulou, L. Oden, and P. Balaji, "A performance study of ucx over infiniband," in Proceedings of the 17th IEEE/ ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid '17, pp. 345-354, IEEE Press, Piscataway, NJ, USA, May 2017. open in new tab
  25. R. Love, Linux System Programming: Talking Directly to the Kernel and C Library, O'Reilly Media, Inc., Newton. MA, USA, 2007. open in new tab
  26. Message passing interface forum MPI: a message-passing interface standard, 2015. open in new tab
  27. MPICH-a portable implementation of MPI, 2019. open in new tab
  28. e Open MPI Project, "Open Mpi: open source high per- formance computing. A high performance message passing library," 2019.
  29. M. Nowicki and P. Bala, "Parallel computations in Java with PCJ library," in Proceedings of the 2012 International Con- ference on High Performance Computing & Simulation (HPCS), pp. 381-387, IEEE, Madrid, Spain, July 2012. open in new tab
  30. Berkeley UPC-unified parallel C, 2019.
  31. A. A. Buss and H. Papadopoulos, "STAPL: standard template adaptive parallel library," SYSTOR '10, vol. 10, 2010. open in new tab
  32. M. Kisiel-Dorohinicki, G. Dobrowolski, and E. Nawarecki, "Agent populations as computational intelligence," in Neural Networks and Soft Computing, L. Rutkowski and J. Kacprzyk, Eds., pp. 608-613, Physica-Verlag HD, Heidelberg, Germany, 2003. open in new tab
  33. M. Kisiel-Dorohinicki, "Agent-oriented model of simulated evolution," 2002. open in new tab
  34. M. J. North, T. R. Howe, N. T. Collier, and J. R. Vos, " e repast simphony runtime system," in Proceedings of the Agent 2005 Conference on Generative Social Processes, Models, and Mechanisms, vol. 10, pp. 13-15, Citeseer, Chicago, IL, USA, October 2005.
  35. N. Collier, "Repast: an extensible framework for agent sim- ulation," e University of Chicago's Social Science Research, vol. 36, p. 2003, 2003.
  36. N. Collier and M. North, "Repast hpc: a platform for large- scale agent-based modeling," Large-Scale Computing, vol. 10, pp. 81-109, 2012. open in new tab
  37. S. Cincotti, M. Raberto, and A. Teglio, "Credit money and macroeconomic instability in the agent-based model and simulator eurace. Economics: the open-access," Open-As- sessment E-Journal, vol. 4, 2010. open in new tab
  38. J. Dean and S. Ghemawat, "Mapreduce: simplified data processing on large clusters," in Proceedings of the OSDI'04: Sixth Symposium on Operating System Design and Imple- mentation, pp. 137-150, San Francisco, CA, USA, December 2004. open in new tab
  39. V. Kumar Vavilapalli, A. Murthy, C. Douglas et al., "Apache hadoop yarn: yet another resource negotiator," 2013. open in new tab
  40. T. White, Hadoop: e Definitive Guide, O'Reilly Media, Inc., Newton, MA, USA, 4th edition, 2015.
  41. J. Leverich and C. Kozyrakis, "On the energy (in)efficiency of hadoop clusters," ACM SIGOPS Operating Systems Review, vol. 44, no. 1, pp. 61-65, 2010. open in new tab
  42. A. Aji, F. Wang, H. Vo et al., "Hadoop-gis: a high performance spatial data warehousing system over mapreduce," Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, vol. 6, 2013. open in new tab
  43. J. Alwidian and A. A. A. AlAhmad, "Hadoop mapreduce job scheduling algorithms survey and use cases," Modern Applied Science, vol. 13, 2019.
  44. K. Kc and K. Anyanwu, "Scheduling hadoop jobs to meet deadlines," in Proceedings of the 2010 IEEE Second Interna- tional Conference on Cloud Computing Technology and Sci- ence, CLOUDCOM '10, pp. 388-392, IEEE Computer Society, Washington, DC, USA, November 2010. open in new tab
  45. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, "Dominant resource fairness: fair allocation of multiple resource types," in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pp. 323-336, USENIX Association, Berkeley, CA, USA, June 2011. open in new tab
  46. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, "Improving mapreduce performance in heterogeneous en- vironments," in Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pp. 29-42, USENIX Association, Berkeley, CA, USA, De- cember 2008. open in new tab
  47. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: cluster computing with working sets," in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, USENIX Association, Ber- keley, CA, USA, June 2010.
  48. M. Zaharia, M. Chowdhury, T. Das et al., "Resilient dis- tributed datasets: a fault-tolerant abstraction for in-memory cluster computing," in Proceedings of the 9th USENIX Con- ference on Networked Systems Design and Implementation, NSDI'12, USENIX Association, Berkeley, CA, USA, April 2012. open in new tab
  49. L. Xu, R. Butt, A., S. H. Lim, and R. Kannan, "A heterogeneity- aware task scheduler for spark," 2018. open in new tab
  50. V. S. Marco, B. Taylor, B. Porter, and Z. Wang, "Improving spark application throughput via memory aware task co- location: a mixture of experts approach," in Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Mid- dleware '17, pp. 95-108, ACM, New York, NY, USA, De- cember 2017. open in new tab
  51. P. Zhang and Z. Guo, "An improved speculative strategy for heterogeneous spark cluster," 2018. open in new tab
  52. S. McIntosh-Smith, M. Martineau, A. Poenaru, and P. Atkinson, Programming Your Gpu with Openmp, Uni- versity of Bristol, Bristol, UK, 2018.
  53. H. Karau and R. Warren, High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark, O'Reilly Media, Inc., Sebastopol, CA, USA, 1st edition, 2017.
  54. J. Dongarra, "Current trends in high performance computing and challenges for the future," 2017. open in new tab
  55. B. Trevino, "Five trends to watch in high performance computing," 2018. open in new tab
  56. P. Czarnul, J. Proficz, and A. Krzywaniak, "Energy-aware high-performance computing: survey of state-of-the-art tools, techniques, and environments," Scientific Programming, vol. 2019, Article ID 8348791, 19 pages, 2019. open in new tab
  57. A. Krzywaniak, J. Proficz, and P. Czarnul, "Analyzing energy/ performance trade-offs with power capping for parallel ap- plications on modern multi and many core processors," in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 339-346, Poznań, Poland, September 2018. open in new tab
  58. A. Krzywaniak and P. Czarnul, "Performance/energy aware optimization of parallel applications on gpus under power capping," Parallel Processing and Applied Mathematics, 2019. open in new tab
  59. B. Van Essen, R. Pearce, S. Ames, and M. Gokhale, "On the role of nvram in data-intensive architectures: an evaluation," in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 703-714, Shanghai, China, May 2012. open in new tab
  60. D. Li, J. S. Vetter, G. Marin et al., "Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications," in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 945-956, Kolkata, India, May 2012. open in new tab
  61. A. Malinowski and P. Czarnul, "Multi-agent large-scale parallel crowd simulation with nvram-based distributed cache," Journal of Computational Science, vol. 33, pp. 83-94, 2019. open in new tab
  62. A. Malinowski and P. Czarnul, "A solution to image pro- cessing with parallel MPI I/O and distributed NVRAM cache," Scalable Computing: Practice and Experience, vol. 19, no. 1, pp. 1-14, 2018. open in new tab
  63. A. Malinowski and P. Czarnul, "Distributed NVRAM cache-optimization and evaluation with power of adjacency matrix," in Computer Information Systems and Industrial Management-16th IFIP TC8 International Conference, CISIM 2017, volume of 10244 of Lecture Notes in Computer Science, K. Saeed, W. Homenda, and R. Chaki, Eds., pp. 15-26, Bia- lystok, Poland, 2017. open in new tab
  64. P. Dorożyński, P. Czarnul, A. Malinowski et al., "Check- pointing of parallel mpi applications using mpi one-sided api with support for byte-addressable non-volatile ram," Procedia Computer Science, vol. 80, pp. 30-40, 2016. open in new tab
  65. D. Lea, "JEP 266: more concurrency updates," 2019. open in new tab
  66. Baker M. B., Boehm S., Bouteiller A., et al., Openshmem specification 1.4, 2017.
  67. K. Karanasos, A. Suresh, and C. Douglas, Advancements in YARN Resource Manager, Springer International Publishing, Berlin, Germany, 2018. open in new tab
  68. J. Laskowski, " e internals of apache spark barrier execution mode," 2019.
  69. OpenACC-Standard.org, OpenACC Programming and Best Practices Guide, 2015.
  70. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in Pro- ceedings of the 32nd International Conference on International Conference on Machine Learning, ICML'15, vol. 37, pp. 1737-1746, Lille, France, July 2015.
  71. C. A. Emerson, "Hpc architectures-past, present and emerging trends," 2017.
  72. M. B. Giles and I. Reguly, "Trends in high-performance computing for engineering calculations," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 372, 2014. open in new tab
  73. P. Czarnul and P. Rościszewski, "Expert knowledge-based auto-tuning methodology for configuration and application parameters of hybrid cpu + gpu parallel systems," in Pro- ceedings of the 2019 International Conference on High Per- formance Computing & Simulation (HPCS 2019), Dublin, Ireland, July 2019.
  74. P. Czarnul, J. Kuchta, M. Matuszek et al., "MERPSYS: an environment for simulation of parallel application execution on large scale HPC systems," Simulation Modelling Practice and eory, vol. 77, pp. 124-140, 2017. open in new tab
  75. X. Li and P. C. Shih, "Performance comparison of cuda and openacc based on optimizations," in Proceedings of the 2018 2Nd High Performance Computing and Cluster Technologies Conference, HPCCT 2018, pp. 53-57, ACM, New York, NY, USA, June 2018. open in new tab
  76. S. Christgau, J. Spazier, B. Schnor, M. Hammitzsch, A. Babeyko, and J. Waechter, "A comparison of cuda and openacc: accelerating the tsunami simulation easywave," in Proceedings of the 2014 Workshop Proceedings on Architecture of Computing Systems (ARCS), pp. 1-5, Luebeck, Germany, February 2014.
  77. R. Sachetto Oliveira, B. M. Rocha, R. M. Amorim et al., "Comparing cuda, opencl and opengl implementations of the cardiac monodomain equations," in Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Waśniewski, Eds., pp. 111-120, Springer Berlin Heidelberg, Berlin, Germany, 2012.
  78. H. C. D. Silva, F. Pisani, and E. Borin, "A comparative study of sycl, opencl, and openmp," in Proceedings of the 2016 In- ternational Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 61- 66, New York, NY, USA, December 2016.
  79. M. Sugawara, S. Hirasawa, K. Komatsu, H. Takizawa, and H. Kobayashi, "A comparison of performance tunabilities between opencl and openacc," in Proceedings of the 2013 IEEE 7th International Symposium on Embedded Multicore Socs, pp. 147-152, Tokyo, Japan, September 2013. open in new tab
  80. S. J. Pennycook, J. D. Sewall, and J. R. Hammond, "Evaluating the impact of proposed openmp 5.0 features on performance, portability and productivity," in Proceedings of the 2018 IEEE/ ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 37-46, Dallas, TX, USA, November 2018. open in new tab
  81. M. A. Heroux, D. W. Doerfler, P. S. Crozier et al., "Improving performance via mini-applications. Sandia national labora- tories," Technical Report SAND2009-5574 3, Sandia National Laboratories, Livemore, CA, USA, 2009. open in new tab
  82. H. C. Edwards, C. R. Trott, D. Sunderland, and Kokkos, "Enabling manycore performance portability through poly- morphic memory access patterns," Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3202-3216, 2014.
  83. J. Diaz, C. Munoz-Caro, and A. Nino, "A survey of parallel programming models and tools in the multi and many-core era," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 8, pp. 1369-1386, 2012. open in new tab
  84. K. outi and S. R. Sathe, "Comparison of openmp & opencl parallel processing technologies," International Journal of Advanced Computer Science and Applications, vol. 3, no. 4, 2012. open in new tab
  85. S. Memeti, L. Li, S. Pllana, J. Kolodziej, and C. Kessler, "Benchmarking opencl, openacc, openmp, and cuda: pro- gramming productivity, performance, and energy con- sumption," in Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ARMS-CC '17, pp. 1-6, ACM, New York, NY, USA, July 2017. open in new tab
  86. M. Martineau, S. McIntosh-Smith, M. Boulton, and W. Gaudin, "An evaluation of emerging many-core parallel programming models," in Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'16, pp. 1-10, ACM, New York, NY, USA, May 2016. open in new tab
  87. S. J. Kang, S. Y. Lee, and K. M. Lee, "Performance comparison of openmp, mpi, and mapreduce in practical problems," Advances in Multimedia, vol. 2015, 2015. open in new tab
  88. J. Li, "Comparing spark vs mpi/openmp on word count mapreduce," 2018.
  89. H. Asaadi, D. Khaldi, and B. Chapman, "A comparative survey of the hpc and big data paradigms: analysis and ex- periments," in Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 423-432, Taipei, Taiwan, September 2016. open in new tab
  90. X. Lu, F. Liang, B. Wang, L. Zha, and Z. Xu, "Datampi: extending mpi to hadoop-like big data computing," in Pro- ceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 829-838, Minneapolis, MN, USA, May 2014. open in new tab
  91. J. L. Reyes-Ortiz, L. Oneto, and D. Anguita, "Big data analytics in the cloud: spark on hadoop vs mpi/openmp on beowulf," Procedia Computer Science, vol. 53, pp. 121-130, 2015. open in new tab
  92. Whiteson, D.: HIGGS data set, 2019.
  93. X. Li and P. C. Shih, "An early performance comparison of cuda and openacc," in Proceedings of the MATEC Web of Conferences, ICMIE, vol. 208, Lille, France, July 2018. open in new tab
Verified by:
Gdańsk University of Technology

seen 172 times

Recommended for you

Meta Tags