Normal view MARC view ISBD view

Optimizing a GPU algorithm through hardware profiling analysis

By:

Tinetti, Fernando Gustavo

Contributor(s):

Martín, Sergio M

Material type: Article

ArticleDescription: 1 archivo (606,2 KB)Subject(s):

HARDWARE

Online resources:

Click here to access online

Summary: Usage of GPU-based architectures for scientific computing has been steadily increasing in the last years. This new paradigm for both programming and execution has been applied to solve several classic problems much faster than using the conventional multiprocessor and/or multicomputer approach. These architectures allow an increase in performance - compared to conventional CPU processors - for specific types of algorithms that are particularly suitable for its greater number of simpler cores which execute one single instruction at a time, each one for different sets of data. Since this is still a relative new technology, GPU device manufacturers as well as independent researchers have published several experiences (success stories), best practices, and optimization guides to aid developers for obtaining the maximum program performance. However, there is still little information about the possible optimizations that can only be harnessed by analyzing the specific device's hardware performance counters. In this paper, we discuss several optimizations based on hardware profiling and share our learned lessons about how such data can be used to optimize a scientific algorithm on a GPU using CUDA.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes ( 3 )

Holdings
Item type	Home library	Collection	Call number	URL	Status	Date due	Barcode
Capítulo de libro	Biblioteca de la Facultad de Informática	Biblioteca digital	A0659 (Browse shelf(Opens below))	Link to resource	Recurso en Línea

Formato de archivo: PDF. -- Este documento es producción intelectual de la Facultad de Informática - UNLP (Colección BIPA/Biblioteca)

Usage of GPU-based architectures for scientific computing has been steadily increasing in the last years. This new paradigm for both programming and execution has been applied to solve several classic problems much faster than using the conventional multiprocessor and/or multicomputer approach. These architectures allow an increase in performance - compared to conventional CPU processors - for specific types of algorithms that are particularly suitable for its greater number of simpler cores which execute one single instruction at a time, each one for different sets of data. Since this is still a relative new technology, GPU device manufacturers as well as independent researchers have published several experiences (success stories), best practices, and optimization guides to aid developers for obtaining the maximum program performance. However, there is still little information about the possible optimizations that can only be harnessed by analyzing the specific device's hardware performance counters. In this paper, we discuss several optimizations based on hardware profiling and share our learned lessons about how such data can be used to optimize a scientific algorithm on a GPU using CUDA.

International Conference on Computational Science and Computational Intelligence (2014 mar. 10-13 : Las Vegas). Proceedings, vol. 1, pp. 45-51