Evaluating Distributed Computing Infrastructures: An Empirical Study Comparing Hadoop Deployments on Cloud and Local Systems

Devipsita Bhattacharya, Faiz Currim, Sudha Ram

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


The popularity of distributed computing platforms (e.g., Hadoop) is largely due to their ability to address scalability issues that arise from data storage and processing limitations of standard computing systems. However, the decision to dedicate organizational resources and capital for such systems needs a careful consideration of several factors including evaluation of cloud-based distributed computing options. We propose a framework of metrics which we used to conduct an in-depth performance and cost benefit analysis of two standard Hadoop infrastructural choices, i.e., a Platform as a Service (PaaS) on-demand cloud setup and a local organizational setup. We evaluated the framework by means of an exploratory data analysis use-case for a large-scale graph processing research problem. Our analysis considered highly granular aspects of distributed computing performance and studied how utilization rates and infrastructure amortization times affect break-even times. We identified that virtual memory management adversely affects the performance of a cloud cluster during the reduce phase with the magnitude of degradation dependent on the type of MapReduce operation. Our study is intended not only as an evaluation of infrastructural choices but also a development of a metric framework that can serve as a baseline for researchers examining distributed infrastructures.

Original languageEnglish (US)
Article number8656524
Pages (from-to)1075-1088
Number of pages14
JournalIEEE Transactions on Cloud Computing
Issue number3
StatePublished - Jul 1 2021


  • Cloud computing
  • computers and information processing
  • cost benefit analysis
  • data processing
  • distributed computing
  • performance evaluation
  • platform-as-a-service

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Science Applications
  • Computer Networks and Communications


Dive into the research topics of 'Evaluating Distributed Computing Infrastructures: An Empirical Study Comparing Hadoop Deployments on Cloud and Local Systems'. Together they form a unique fingerprint.

Cite this