Campus Access Only

All rights reserved. This publication is intended for use solely by faculty, students, and staff of University of the Pacific. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, now known or later developed, including but not limited to photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the author or the publisher.

Date of Award


Document Type

Thesis - Pacific Access Restricted

Degree Name

Master of Science (M.S.)


Engineering Science

First Advisor

Vivek Krishnamani Pallipuram

First Committee Member

Jeffery Shafer

Second Committee Member

Jinzhu Gao


Image amplification is an important image enhancement technique for applications such as medicine, satellite imaging, forensic sciences, remote sensing, among others. The existing techniques are highly computationally intensive and take a lot of time to execute on conventional processors. Their highly computationally intensive nature makes them a good fit for massively parallel architectures such as the general-purpose graphical processing unit (GPGPU) devices. In this research, we accelerate a state-of-the-art image amplification technique on Nvidia’s GPGPU device, Kepler GK110 using the Compute Unified Device Architecture (CUDA) programming model. The technique comprises four computationally intensive stages namely Canny edge detection, vertical edge preservation, horizontal edge preservation and mean preserving interpolation. Using efficacious CUDA optimization techniques, we successively map the four stages of the algorithm to the GPGPU device, creating a hierarchy of five implementations. The final implementation of the hierarchy completely maps all of the algorithm stages to the GPGPU device, eliminating any costly intermediate host-device computations and focusing more on useful computations.

We provide a detailed analysis of the kernel time and end-to-end application time obtained for each implementation in the hierarchy. We also compare the GPGPU execution time for each algorithm stage with the equivalent serial implementation. We discuss an empirical method for identifying optimal GPGPU execution configuration to maximize the device utilization.

All of the GPGPU kernels executed on the Kepler GPGPU device achieve high speedup, as high as 90x, versus the optimized serial implementation. In addition, for the largest image size of 10240x10240, the most optimal GPGPU implementation achieves an end-to-end application speedup of 11.75x versus the serial counterpart. The research also presents the analysis on the implementation of the application on Amazon Web Services’ instances. This analysis further provides an opportunity to study the scalability of the application.



To access this thesis/dissertation you must have a valid email address and log-in to Scholarly Commons.

Find in PacificSearch



If you are the author and would like to grant permission to make your work openly accessible, please email


Rights Statement

Rights Statement

In Copyright. URI:
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).