All Faculty Presentations - School of Engineering and Computer Science

The Hadoop distributed filesystem: Balancing portability and performance

Jeffrey Shafer, Rice UniversityFollow
Scott Rixner, Rice University
Alan L. Cox, Rice University

Document Type

Conference Presentation

Department

Electrical and Computer Engineering

Conference Title

ISPASS 2010 - IEEE International Symposium on Performance Analysis of Systems and Software

Date of Presentation

5-27-2010

Abstract

Hadoop is a popular open-source implementation of MapReduce for the analysis of large datasets. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem - HDFS - is written in Java and designed for portability across heterogeneous hardware and software platforms. This paper analyzes the performance of HDFS and uncovers several performance issues. First, architectural bottlenecks exist in the Hadoop implementation that result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Second, portability limitations prevent the Java implementation from exploiting features of the native platform. Third, HDFS implicitly makes portability assumptions about how the native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. This paper investigates the root causes of these performance bottlenecks in order to evaluate tradeoffs between portability and performance in the Hadoop distributed filesystem. ©2010 IEEE.

First Page

122

Last Page

133

DOI

10.1109/ISPASS.2010.5452045

Recommended Citation

Shafer, J., Rixner, S., & Cox, A. L. (2010). The Hadoop distributed filesystem: Balancing portability and performance. Paper presented at ISPASS 2010 - IEEE International Symposium on Performance Analysis of Systems and Software.
https://scholarlycommons.pacific.edu/soecs-facpres/406

Link to Full Text

COinS

All Faculty Presentations - School of Engineering and Computer Science

The Hadoop distributed filesystem: Balancing portability and performance

Document Type

Department

Conference Title

Date of Presentation

Abstract

First Page

Last Page

DOI

Recommended Citation

Search

Browse

Author Corner

Links

All Faculty Presentations - School of Engineering and Computer Science

The Hadoop distributed filesystem: Balancing portability and performance

Authors

Document Type

Department

Conference Title

Date of Presentation

Abstract

First Page

Last Page

DOI

Recommended Citation

Share

Search

Browse

Author Corner

Links