Abstract
In this work, we evaluate an analytical GPU performance model based on Little's law, that expresses the kernel execution time in terms of latency bound, throughput bound, and achieved occupancy. We then combine it with the results of several research papers, introduce equations for data transfer time estimation, and finally incorporate it into the MERPSYS framework, which is a general-purpose simulator for parallel and distributed systems. The resulting solution enables the user to express a CUDA application in a MERPSYS editor using an extended Java language and then conveniently evaluate its performance for various launch configurations using different hardware units. We also provide a systematic methodology for extracting kernel characteristics, that are used as input parameters of the model. The model was evaluated using kernels representing different traits and for a large variety of launch configurations. We found it to be very accurate for computation bound kernels and realistic workloads, whilst for memory throughput bound kernels and uncommon scenarios the results were still within acceptable limits. We have also proven its portability between two devices of the same hardware architecture but different processing power. Consequently, MERPSYS with the theoretical models embedded in it can be used for evaluation of application performance on various GPUs and used for performance prediction and e.g. purchase decision making.
Citations
-
0
CrossRef
-
0
Web of Science
-
1
Scopus
Authors (2)
Cite as
Full text
- Publication version
- Accepted or Published Version
- DOI:
- Digital Object Identifier (open in new tab) 10.12694/scpe.v19i4.1439
- License
- open in new tab
Keywords
Details
- Category:
- Articles
- Type:
- publikacja w in. zagranicznym czasopiśmie naukowym (tylko język obcy)
- Published in:
-
Scalable Computing: Practice and Experience
no. 19,
edition 4,
pages 401 - 422,
ISSN: 1895-1767 - Language:
- English
- Publication year:
- 2018
- Bibliographic description:
- GAJGER T., Czarnul P.. Modelling and simulation of GPU processing in the MERPSYS environment. Scalable Computing: Practice and Experience, 2018, Vol. 19, iss. 4, s.401-422
- DOI:
- Digital Object Identifier (open in new tab) 10.12694/scpe.v19i4.1439
- Verified by:
- Gdańsk University of Technology
seen 141 times
Recommended for you
Modeling energy consumption of parallel applications
- P. Czarnul,
- J. Kuchta,
- P. Rościszewski
- + 1 authors