Performance, optimization, and fitness: Connecting applications to architectures

Department

Electrical and Computer Engineering

Document Type

Article

Publication Title

Concurrency and Computation: Practice and Experience

ISSN

15320626

Volume

23

Issue

10

DOI

10.1002/cpe.1688

First Page

1066

Last Page

1100

Publication Date

1-1-2011

Abstract

Recent trends involving multicore processors and graphical processing units (GPUs) focus on exploiting task- and thread-level parallelism. In this paper, we have analyzed various aspects of the performance of these architectures including NVIDIA GPUs, and multicore processors such as Intel Xeon, AMD Opteron, IBM's Cell Broadband Engine. The case study used in this paper is a biological spiking neural network (SNN), implemented with the Izhikevich, Wilson, Morris-Lecar, and Hodgkin-Huxley neuron models. The four SNN models have varying requirements for communication and computation making them useful for performance analysis of the hardware platforms. We report and analyze the variation of performance with network (problem size) scaling, available optimization techniques and execution configuration. A Fitness performance model, that predicts the suitability of the architecture for accelerating an application, is proposed and verified with the SNN implementation results. The Roofline model, another existing performance model, has also been utilized to determine the hardware bottleneck(s) and attainable peak performance of the architectures. Significant speedups for the four SNN neuron models utilizing these architectures are reported; the maximum speedup of 574x was observed in our GPU implementation. Our results and analysis show that a proper match of architecture with algorithm complexity provides the best performance. Copyright © 2010 John Wiley & Sons, Ltd.

Share

COinS