A multi-Node GPGPU implementation of non- Linear anisotropic diffusion filter

Document Type

Conference Presentation


Electrical and Computer Engineering

Conference Title

Symposium on Application Accelerators in High-Performance Computing

Date of Presentation



The quality of an image is highly critical for applications such as robotic vision, surveillance, medical imaging, etc. The images captured in real-time are seldom noise free and therefore require noise removal for further processing. Out of several proposed noise removal schemes, anisotropic diffusion filtering is known to achieve highly precise results. However, the accuracy comes at an expense of high computation cost, especially for large data sets. The highly parallel nature of the aforementioned filtering algorithm makes it a good candidate for the General Purpose Graphical Processing Unit (GPGPU) clusters. In this research, we present a GPGPU cluster-based implementation of the non-linear anisotropic diffusion filter. Our implementation maps the computationally intensive parts of the algorithm to the GPGPU devices while the communication and serial processing are performed by the CPU hosts. Our efficiently mapped multi-node GPGPU implementation is capable of processing images as large as 156 mega-pixels and achieves a speed-up of 29x over an equivalent MPI-only implementation. In addition, our multi-node GPGPU implementation exhibits reasonable scaling behavior that improves with the size of the images. © 2012 IEEE.



First Page


Last Page