Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems - Publikacja - MOST Wiedzy

Wyszukiwarka

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Abstrakt

This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals, ease of programming, debugging, deployment, portability, level of parallelism, constructs enabling parallelism and synchronization, features introduced in recent versions indicating trends, support for hybridity in parallel execution, and disadvantages. Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future. It can help to shape future versions of programming standards, select technologies best matching programmers’ needs, and avoid potential difficulties while using high-performance computing systems.

Cytowania

  • 3

    CrossRef

  • 6

    Web of Science

  • 7

    Scopus

Cytuj jako

Pełna treść

pobierz publikację
pobrano 16 razy
Wersja publikacji
Accepted albo Published Version
Licencja
Creative Commons: CC-BY otwiera się w nowej karcie

Słowa kluczowe

Informacje szczegółowe

Kategoria:
Publikacja w czasopiśmie
Typ:
artykuły w czasopismach
Opublikowano w:
Scientific Programming strony 1 - 19,
ISSN: 1058-9244
Język:
angielski
Rok wydania:
2020
Opis bibliograficzny:
Czarnul P., Proficz J., Drypczewski K.: Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems// Scientific Programming -, (2020), s.1-19
DOI:
Cyfrowy identyfikator dokumentu elektronicznego (otwiera się w nowej karcie) 10.1155/2020/4176794
Bibliografia: test
  1. P. Czarnul, Parallel Programming for Modern High Perfor- mance Computing Systems, Chapman and Hall/CRC Press, Boca Raton, FL, USA, 2018. otwiera się w nowej karcie
  2. C++ v.11 thread support library, 2019. otwiera się w nowej karcie
  3. Intel threading building blocks, 2019. otwiera się w nowej karcie
  4. High Performance paralleX (HPX), 2019. otwiera się w nowej karcie
  5. J. Nonaka, M. Matsuda, T. Shimizu et al., "A study on open source software for large-scale data visualization on sparc64fx based hpc systems," in Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 278-288, ACM, Chiyoda, Tokyo, Japan, January 2018. otwiera się w nowej karcie
  6. M. U. Ashraf and F. E. Eassa, "Opengl based testing tool ar- chitecture for exascale computing," International Journal of Computer Science and Security (IJCSS), vol. 9, no. 5, p. 238, 2015.
  7. OpenMP Architecture Review Board, OpenMP Application Programming Interface, 2018. otwiera się w nowej karcie
  8. NVIDIA: CUDA toolkit documentation v10.1.243, 2019. otwiera się w nowej karcie
  9. Khronos OpenCL Working Group, " e openCL specifica- tion," 2019. otwiera się w nowej karcie
  10. OpenACC-Standard.org, e OpenACC Application Pro- gramming Interface, 2018.
  11. S. Wienke, C. Terboven, J. C. Beyer, and M. S. Müller, "A pattern-based comparison of openacc and openmp for ac- celerator computing," in European Conference on Parallel Processing, pp. 812-823, Springer, Berlin, Germany, 2014. otwiera się w nowej karcie
  12. J. Gosling, B. Joy, G. Steele, G. Bracha, A. Buckley, and D. Smith, " e Java language specification," 2019.
  13. M. Odersky, P. Altherr, V. Cremet et al., "Scala language specification," 2019.
  14. T. Lindholm, F. Yellin, G. Bracha, A. Buckley, and D. Smith, " e Java virtual machine specification," 2019.
  15. TCP/IP standard, 2019. otwiera się w nowej karcie
  16. A. L. Russell, " e internet that wasn't," IEEE Spectrum, vol. 50, no. 8, pp. 39-43, 2013. otwiera się w nowej karcie
  17. RDMA consortium, 2019. otwiera się w nowej karcie
  18. InfiniBand architecture specification release 1.2.1 Annex A16: RoCE, 2010. otwiera się w nowej karcie
  19. M. Beck and M. Kagan, "Performance evaluation of the RDMA over ethernet (RoCE) standard in enterprise data centers infrastructure," in Proceedings of the 3rd Workshop on Data Center-Converged and Virtual Ethernet Switching, Berkeley, CA, USA, September 2011.
  20. InfiniBand architecture specification release 1.2.1 Annex A17: RoCEv2, 2010. otwiera się w nowej karcie
  21. P. Shamis, M. G. Venkata, M. G. Lopez et al., "UCX: an open source framework for HPC network APIs and beyond," in Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 40-43, IEEE, Santa Clara, CA, USA, August 2015. otwiera się w nowej karcie
  22. B. Chapman, T. Curtis, S. Pophale et al., "Introducing openshmem: shmem for the pgas community," in Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, pp. 2.1-2.3, ACM, New York, NY, USA, October 2010. otwiera się w nowej karcie
  23. M. Baker, F. Aderholdt, M. G. Venkata, and P. Shamis, "OpenSHMEM-UCX: evaluation of UCX for implementing 16 Scientific Programming OpenSHMEM programming model," in OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, pp. 114-130, Springer International Publish- ing, Berlin, Germany, 2016. otwiera się w nowej karcie
  24. N. Papadopoulou, L. Oden, and P. Balaji, "A performance study of ucx over infiniband," in Proceedings of the 17th IEEE/ ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid '17, pp. 345-354, IEEE Press, Piscataway, NJ, USA, May 2017. otwiera się w nowej karcie
  25. R. Love, Linux System Programming: Talking Directly to the Kernel and C Library, O'Reilly Media, Inc., Newton. MA, USA, 2007. otwiera się w nowej karcie
  26. Message passing interface forum MPI: a message-passing interface standard, 2015. otwiera się w nowej karcie
  27. MPICH-a portable implementation of MPI, 2019. otwiera się w nowej karcie
  28. e Open MPI Project, "Open Mpi: open source high per- formance computing. A high performance message passing library," 2019.
  29. M. Nowicki and P. Bala, "Parallel computations in Java with PCJ library," in Proceedings of the 2012 International Con- ference on High Performance Computing & Simulation (HPCS), pp. 381-387, IEEE, Madrid, Spain, July 2012. otwiera się w nowej karcie
  30. Berkeley UPC-unified parallel C, 2019.
  31. A. A. Buss and H. Papadopoulos, "STAPL: standard template adaptive parallel library," SYSTOR '10, vol. 10, 2010. otwiera się w nowej karcie
  32. M. Kisiel-Dorohinicki, G. Dobrowolski, and E. Nawarecki, "Agent populations as computational intelligence," in Neural Networks and Soft Computing, L. Rutkowski and J. Kacprzyk, Eds., pp. 608-613, Physica-Verlag HD, Heidelberg, Germany, 2003. otwiera się w nowej karcie
  33. M. Kisiel-Dorohinicki, "Agent-oriented model of simulated evolution," 2002. otwiera się w nowej karcie
  34. M. J. North, T. R. Howe, N. T. Collier, and J. R. Vos, " e repast simphony runtime system," in Proceedings of the Agent 2005 Conference on Generative Social Processes, Models, and Mechanisms, vol. 10, pp. 13-15, Citeseer, Chicago, IL, USA, October 2005.
  35. N. Collier, "Repast: an extensible framework for agent sim- ulation," e University of Chicago's Social Science Research, vol. 36, p. 2003, 2003.
  36. N. Collier and M. North, "Repast hpc: a platform for large- scale agent-based modeling," Large-Scale Computing, vol. 10, pp. 81-109, 2012. otwiera się w nowej karcie
  37. S. Cincotti, M. Raberto, and A. Teglio, "Credit money and macroeconomic instability in the agent-based model and simulator eurace. Economics: the open-access," Open-As- sessment E-Journal, vol. 4, 2010. otwiera się w nowej karcie
  38. J. Dean and S. Ghemawat, "Mapreduce: simplified data processing on large clusters," in Proceedings of the OSDI'04: Sixth Symposium on Operating System Design and Imple- mentation, pp. 137-150, San Francisco, CA, USA, December 2004. otwiera się w nowej karcie
  39. V. Kumar Vavilapalli, A. Murthy, C. Douglas et al., "Apache hadoop yarn: yet another resource negotiator," 2013. otwiera się w nowej karcie
  40. T. White, Hadoop: e Definitive Guide, O'Reilly Media, Inc., Newton, MA, USA, 4th edition, 2015.
  41. J. Leverich and C. Kozyrakis, "On the energy (in)efficiency of hadoop clusters," ACM SIGOPS Operating Systems Review, vol. 44, no. 1, pp. 61-65, 2010. otwiera się w nowej karcie
  42. A. Aji, F. Wang, H. Vo et al., "Hadoop-gis: a high performance spatial data warehousing system over mapreduce," Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, vol. 6, 2013. otwiera się w nowej karcie
  43. J. Alwidian and A. A. A. AlAhmad, "Hadoop mapreduce job scheduling algorithms survey and use cases," Modern Applied Science, vol. 13, 2019.
  44. K. Kc and K. Anyanwu, "Scheduling hadoop jobs to meet deadlines," in Proceedings of the 2010 IEEE Second Interna- tional Conference on Cloud Computing Technology and Sci- ence, CLOUDCOM '10, pp. 388-392, IEEE Computer Society, Washington, DC, USA, November 2010. otwiera się w nowej karcie
  45. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, "Dominant resource fairness: fair allocation of multiple resource types," in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pp. 323-336, USENIX Association, Berkeley, CA, USA, June 2011. otwiera się w nowej karcie
  46. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, "Improving mapreduce performance in heterogeneous en- vironments," in Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pp. 29-42, USENIX Association, Berkeley, CA, USA, De- cember 2008. otwiera się w nowej karcie
  47. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, "Spark: cluster computing with working sets," in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, USENIX Association, Ber- keley, CA, USA, June 2010.
  48. M. Zaharia, M. Chowdhury, T. Das et al., "Resilient dis- tributed datasets: a fault-tolerant abstraction for in-memory cluster computing," in Proceedings of the 9th USENIX Con- ference on Networked Systems Design and Implementation, NSDI'12, USENIX Association, Berkeley, CA, USA, April 2012. otwiera się w nowej karcie
  49. L. Xu, R. Butt, A., S. H. Lim, and R. Kannan, "A heterogeneity- aware task scheduler for spark," 2018. otwiera się w nowej karcie
  50. V. S. Marco, B. Taylor, B. Porter, and Z. Wang, "Improving spark application throughput via memory aware task co- location: a mixture of experts approach," in Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Mid- dleware '17, pp. 95-108, ACM, New York, NY, USA, De- cember 2017. otwiera się w nowej karcie
  51. P. Zhang and Z. Guo, "An improved speculative strategy for heterogeneous spark cluster," 2018. otwiera się w nowej karcie
  52. S. McIntosh-Smith, M. Martineau, A. Poenaru, and P. Atkinson, Programming Your Gpu with Openmp, Uni- versity of Bristol, Bristol, UK, 2018.
  53. H. Karau and R. Warren, High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark, O'Reilly Media, Inc., Sebastopol, CA, USA, 1st edition, 2017.
  54. J. Dongarra, "Current trends in high performance computing and challenges for the future," 2017. otwiera się w nowej karcie
  55. B. Trevino, "Five trends to watch in high performance computing," 2018. otwiera się w nowej karcie
  56. P. Czarnul, J. Proficz, and A. Krzywaniak, "Energy-aware high-performance computing: survey of state-of-the-art tools, techniques, and environments," Scientific Programming, vol. 2019, Article ID 8348791, 19 pages, 2019. otwiera się w nowej karcie
  57. A. Krzywaniak, J. Proficz, and P. Czarnul, "Analyzing energy/ performance trade-offs with power capping for parallel ap- plications on modern multi and many core processors," in Proceedings of the 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 339-346, Poznań, Poland, September 2018. otwiera się w nowej karcie
  58. A. Krzywaniak and P. Czarnul, "Performance/energy aware optimization of parallel applications on gpus under power capping," Parallel Processing and Applied Mathematics, 2019. otwiera się w nowej karcie
  59. B. Van Essen, R. Pearce, S. Ames, and M. Gokhale, "On the role of nvram in data-intensive architectures: an evaluation," in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 703-714, Shanghai, China, May 2012. otwiera się w nowej karcie
  60. D. Li, J. S. Vetter, G. Marin et al., "Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications," in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 945-956, Kolkata, India, May 2012. otwiera się w nowej karcie
  61. A. Malinowski and P. Czarnul, "Multi-agent large-scale parallel crowd simulation with nvram-based distributed cache," Journal of Computational Science, vol. 33, pp. 83-94, 2019. otwiera się w nowej karcie
  62. A. Malinowski and P. Czarnul, "A solution to image pro- cessing with parallel MPI I/O and distributed NVRAM cache," Scalable Computing: Practice and Experience, vol. 19, no. 1, pp. 1-14, 2018. otwiera się w nowej karcie
  63. A. Malinowski and P. Czarnul, "Distributed NVRAM cache-optimization and evaluation with power of adjacency matrix," in Computer Information Systems and Industrial Management-16th IFIP TC8 International Conference, CISIM 2017, volume of 10244 of Lecture Notes in Computer Science, K. Saeed, W. Homenda, and R. Chaki, Eds., pp. 15-26, Bia- lystok, Poland, 2017. otwiera się w nowej karcie
  64. P. Dorożyński, P. Czarnul, A. Malinowski et al., "Check- pointing of parallel mpi applications using mpi one-sided api with support for byte-addressable non-volatile ram," Procedia Computer Science, vol. 80, pp. 30-40, 2016. otwiera się w nowej karcie
  65. D. Lea, "JEP 266: more concurrency updates," 2019. otwiera się w nowej karcie
  66. Baker M. B., Boehm S., Bouteiller A., et al., Openshmem specification 1.4, 2017.
  67. K. Karanasos, A. Suresh, and C. Douglas, Advancements in YARN Resource Manager, Springer International Publishing, Berlin, Germany, 2018. otwiera się w nowej karcie
  68. J. Laskowski, " e internals of apache spark barrier execution mode," 2019.
  69. OpenACC-Standard.org, OpenACC Programming and Best Practices Guide, 2015.
  70. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, "Deep learning with limited numerical precision," in Pro- ceedings of the 32nd International Conference on International Conference on Machine Learning, ICML'15, vol. 37, pp. 1737-1746, Lille, France, July 2015.
  71. C. A. Emerson, "Hpc architectures-past, present and emerging trends," 2017.
  72. M. B. Giles and I. Reguly, "Trends in high-performance computing for engineering calculations," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 372, 2014. otwiera się w nowej karcie
  73. P. Czarnul and P. Rościszewski, "Expert knowledge-based auto-tuning methodology for configuration and application parameters of hybrid cpu + gpu parallel systems," in Pro- ceedings of the 2019 International Conference on High Per- formance Computing & Simulation (HPCS 2019), Dublin, Ireland, July 2019.
  74. P. Czarnul, J. Kuchta, M. Matuszek et al., "MERPSYS: an environment for simulation of parallel application execution on large scale HPC systems," Simulation Modelling Practice and eory, vol. 77, pp. 124-140, 2017. otwiera się w nowej karcie
  75. X. Li and P. C. Shih, "Performance comparison of cuda and openacc based on optimizations," in Proceedings of the 2018 2Nd High Performance Computing and Cluster Technologies Conference, HPCCT 2018, pp. 53-57, ACM, New York, NY, USA, June 2018. otwiera się w nowej karcie
  76. S. Christgau, J. Spazier, B. Schnor, M. Hammitzsch, A. Babeyko, and J. Waechter, "A comparison of cuda and openacc: accelerating the tsunami simulation easywave," in Proceedings of the 2014 Workshop Proceedings on Architecture of Computing Systems (ARCS), pp. 1-5, Luebeck, Germany, February 2014.
  77. R. Sachetto Oliveira, B. M. Rocha, R. M. Amorim et al., "Comparing cuda, opencl and opengl implementations of the cardiac monodomain equations," in Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, K. Karczewski, and J. Waśniewski, Eds., pp. 111-120, Springer Berlin Heidelberg, Berlin, Germany, 2012.
  78. H. C. D. Silva, F. Pisani, and E. Borin, "A comparative study of sycl, opencl, and openmp," in Proceedings of the 2016 In- ternational Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pp. 61- 66, New York, NY, USA, December 2016.
  79. M. Sugawara, S. Hirasawa, K. Komatsu, H. Takizawa, and H. Kobayashi, "A comparison of performance tunabilities between opencl and openacc," in Proceedings of the 2013 IEEE 7th International Symposium on Embedded Multicore Socs, pp. 147-152, Tokyo, Japan, September 2013. otwiera się w nowej karcie
  80. S. J. Pennycook, J. D. Sewall, and J. R. Hammond, "Evaluating the impact of proposed openmp 5.0 features on performance, portability and productivity," in Proceedings of the 2018 IEEE/ ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 37-46, Dallas, TX, USA, November 2018. otwiera się w nowej karcie
  81. M. A. Heroux, D. W. Doerfler, P. S. Crozier et al., "Improving performance via mini-applications. Sandia national labora- tories," Technical Report SAND2009-5574 3, Sandia National Laboratories, Livemore, CA, USA, 2009. otwiera się w nowej karcie
  82. H. C. Edwards, C. R. Trott, D. Sunderland, and Kokkos, "Enabling manycore performance portability through poly- morphic memory access patterns," Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3202-3216, 2014.
  83. J. Diaz, C. Munoz-Caro, and A. Nino, "A survey of parallel programming models and tools in the multi and many-core era," IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 8, pp. 1369-1386, 2012. otwiera się w nowej karcie
  84. K. outi and S. R. Sathe, "Comparison of openmp & opencl parallel processing technologies," International Journal of Advanced Computer Science and Applications, vol. 3, no. 4, 2012. otwiera się w nowej karcie
  85. S. Memeti, L. Li, S. Pllana, J. Kolodziej, and C. Kessler, "Benchmarking opencl, openacc, openmp, and cuda: pro- gramming productivity, performance, and energy con- sumption," in Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ARMS-CC '17, pp. 1-6, ACM, New York, NY, USA, July 2017. otwiera się w nowej karcie
  86. M. Martineau, S. McIntosh-Smith, M. Boulton, and W. Gaudin, "An evaluation of emerging many-core parallel programming models," in Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM'16, pp. 1-10, ACM, New York, NY, USA, May 2016. otwiera się w nowej karcie
  87. S. J. Kang, S. Y. Lee, and K. M. Lee, "Performance comparison of openmp, mpi, and mapreduce in practical problems," Advances in Multimedia, vol. 2015, 2015. otwiera się w nowej karcie
  88. J. Li, "Comparing spark vs mpi/openmp on word count mapreduce," 2018.
  89. H. Asaadi, D. Khaldi, and B. Chapman, "A comparative survey of the hpc and big data paradigms: analysis and ex- periments," in Proceedings of the 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 423-432, Taipei, Taiwan, September 2016. otwiera się w nowej karcie
  90. X. Lu, F. Liang, B. Wang, L. Zha, and Z. Xu, "Datampi: extending mpi to hadoop-like big data computing," in Pro- ceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 829-838, Minneapolis, MN, USA, May 2014. otwiera się w nowej karcie
  91. J. L. Reyes-Ortiz, L. Oneto, and D. Anguita, "Big data analytics in the cloud: spark on hadoop vs mpi/openmp on beowulf," Procedia Computer Science, vol. 53, pp. 121-130, 2015. otwiera się w nowej karcie
  92. Whiteson, D.: HIGGS data set, 2019.
  93. X. Li and P. C. Shih, "An early performance comparison of cuda and openacc," in Proceedings of the MATEC Web of Conferences, ICMIE, vol. 208, Lille, France, July 2018. otwiera się w nowej karcie
Weryfikacja:
Politechnika Gdańska

wyświetlono 62 razy

Publikacje, które mogą cię zainteresować

Meta Tagi