Date of Award


Document Type


Degree Name

Master of Science in Engineering (M.S.Eng.)



First Advisor

Vivek Krishnamani Pallipuram

First Committee Member

Jinzhu Gao

Second Committee Member

David Mueller


The execution of the scientific applications on the Cloud comes with great flexibility, scalability, cost-effectiveness, and substantial computing power. Market-leading Cloud service providers such as Amazon Web service (AWS), Azure, Google Cloud Platform (GCP) offer various general purposes, memory-intensive, and compute-intensive Cloud instances for the execution of scientific applications. The scientific community, especially small research institutions and undergraduate universities, face many hurdles while conducting high-performance computing research in the absence of large dedicated clusters. The Cloud provides a lucrative alternative to dedicated clusters, however a wide range of Cloud computing choices makes the instance selection for the end-users. This thesis aims to simplify Cloud instance selection for end-users by proposing a probabilistic machine learning framework to allow to users select a suitable Cloud instance for their scientific applications.

This research builds on the previously proposed A2Cloud-RF framework that recommends high-performing Cloud instances by profiling the application and the selected Cloud instances. The framework produces a set of objective scores called the A2Cloud scores, which denote the compatibility level between the application and the selected Cloud instances. When used alone, the A2Cloud scores become increasingly unwieldy with an increasing number of tested Cloud instances. Additionally, the framework only examines the raw application performance and does not consider the execution cost to guide resource selection. To improve the usability of the framework and assist with economical instance selection, this research adds two Naïve Bayes (NB) classifiers that consider both the application’s performance and execution cost. These NB classifiers include: 1) NB with a Random Forest Classifier (RFC) and 2) a standalone NB module.

Naïve Bayes with a Random Forest Classifier (RFC) augments the A2Cloud-RF framework's final instance ratings with the execution cost metric. In the training phase, the classifier builds the frequency and probability tables. The classifier recommends a Cloud instance based on the highest posterior probability for the selected application.

The standalone NB classifier uses the generated A2Cloud score (an intermediate result from the A2Cloud-RF framework) and execution cost metric to construct an NB classifier. The NB classifier forms a frequency table and probability (prior and likelihood) tables. For recommending a Cloud instance for a test application, the classifier calculates the highest posterior probability for all of the Cloud instances. The classifier recommends a Cloud instance with the highest posterior probability. This study performs the execution of eight real-world applications on 20 Cloud instances from AWS, Azure, GCP, and Linode. We train the NB classifiers using 80% of this dataset and employ the remaining 20% for testing. The testing yields more than 90% recommendation accuracy for the chosen applications and Cloud instances. Because of the imbalanced nature of the dataset and multi-class nature of classification, we consider the confusion matrix (true positive, false positive, true negative, and false negative) and F1 score with above 0.9 scores to describe the model performance. The final goal of this research is to make Cloud computing an accessible resource for conducting high-performance scientific executions by enabling users to select an effective Cloud instance from across multiple providers.