Abstract
The popularity of distributed computing platforms (e.g., Hadoop) is largely due to their ability to address scalability issues that arise from data storage and processing limitations of standard computing systems. However, the decision to dedicate organizational resources and capital for such systems needs a careful consideration of several factors including evaluation of cloud-based distributed computing options. We propose a framework of metrics which we used to conduct an in-depth performance and cost benefit analysis of two standard Hadoop infrastructural choices, i.e., a Platform as a Service (PaaS) on-demand cloud setup and a local organizational setup. We evaluated the framework by means of an exploratory data analysis use-case for a large-scale graph processing research problem. Our analysis considered highly granular aspects of distributed computing performance and studied how utilization rates and infrastructure amortization times affect break-even times. We identified that virtual memory management adversely affects the performance of a cloud cluster during the reduce phase with the magnitude of degradation dependent on the type of MapReduce operation. Our study is intended not only as an evaluation of infrastructural choices but also a development of a metric framework that can serve as a baseline for researchers examining distributed infrastructures.
Original language | English (US) |
---|---|
Article number | 8656524 |
Pages (from-to) | 1075-1088 |
Number of pages | 14 |
Journal | IEEE Transactions on Cloud Computing |
Volume | 9 |
Issue number | 3 |
DOIs | |
State | Published - Jul 1 2021 |
Keywords
- Cloud computing
- computers and information processing
- cost benefit analysis
- data processing
- distributed computing
- performance evaluation
- platform-as-a-service
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture
- Computer Science Applications
- Computer Networks and Communications