Will They Blend?: Exploring Big Data Computation atop Traditional - - PowerPoint PPT Presentation

will they blend exploring big data computation atop
SMART_READER_LITE
LIVE PREVIEW

Will They Blend?: Exploring Big Data Computation atop Traditional - - PowerPoint PPT Presentation

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Ellis H. Wilson III 1 , 2 Mahmut Kandemir 1 Garth Gibson 2 , 3 1 Department of Computer Science and Engineering, The Pennsylvania State University 2 Panasas, Inc. 3


slide-1
SLIDE 1

Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage

Ellis H. Wilson III 1,2 Mahmut Kandemir 1 Garth Gibson 2,3

1Department of Computer Science and Engineering,

The Pennsylvania State University

2Panasas, Inc. 3Department of Computer Science,

Carnegie Mellon University

July 3rd, 2014

slide-2
SLIDE 2

Introduction/Background Converged Architectures Evaluation

Before We Begin: Get the Slides and Paper

Slides and Paper are Available At:

www.ellisv3.com

www.ellisv3.com Hadoop on NAS

slide-3
SLIDE 3

Introduction/Background Converged Architectures Evaluation

1

Introduction and Background From 10,000 Feet: Considering Hadoop’s Fit in HPC Goals of this Research: MapReduce in HPC?

2

Converged Architectures for Hadoop on NAS Overview of Architectures Reliability and Performance Implications RainFS

3

Performance Evaluation of Converged Architectures Setup and Benchmarks Performance Results

www.ellisv3.com Hadoop on NAS

slide-4
SLIDE 4

Introduction/Background Converged Architectures Evaluation Hadoop’s Fit in HPC Goals of this Research

Motivation

Divide between HPC and Big Data is increasingly foggy

Big Data processing framework MapReduce (MR) promises faster time-to-solution for data-intensive science But MR often comes tightly coupled with the Hadoop Distributed File System (HDFS) Standard HDFS requires local disks to the compute for distributed storage

HPC typically already has it’s own Parallel File System (PFS) solutions in place

Using Hadoop threatens to require large capital and maintenance investments Totally dropping MPI and similar solutions for MR is impossible Copying massive amounts of data from Network-Attached Storage (NAS) to HDFS and back is a common problem Dividing your storage into two pools, NAS and HDFS, will exacerbate the Compute-Storage gap

www.ellisv3.com Hadoop on NAS

slide-5
SLIDE 5

Introduction/Background Converged Architectures Evaluation Hadoop’s Fit in HPC Goals of this Research

Hurdles to Adoption of Hadoop in HPC

Loss of Infrastructure Consolidation Forced Import/Export I/O Performance Degradation Loss of High-Availability No Modification to Files Inefficient Compute-Storage Coupling

www.ellisv3.com Hadoop on NAS

slide-6
SLIDE 6

Introduction/Background Converged Architectures Evaluation Hadoop’s Fit in HPC Goals of this Research

Goals of this Research

Three Main Goals/Contributions:

1 Explore if/how one can enable MR to run on traditional NAS

Enables reuse of existing storage – infrastructure consolidation

2 Explore whether one can use MR alongside MPI and others

without copying

Improves utility of capacity, reduces network contention, fights the I/O Gap

3 Identify the relative efficiencies and reliabilities of potential

solutions

Examine four different architecture approaches

www.ellisv3.com Hadoop on NAS

slide-7
SLIDE 7

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

First: Consider Traditional Hadoop

Typical Hadoop Architecture: Example of Write Path

www.ellisv3.com Hadoop on NAS

slide-8
SLIDE 8

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Exploration of Four Possible Architectures

Possible Architectures: Traditional HDFS Pointed at a PFS

Configure HDFS with PFS paths rather than to local disks

HDFS as a Wire Protocol in the PFS NAS Heads

Run DataNodes (DNs) on NAS heads instead of all clients

No HDFS, MR Directly to the PFS

Run MR configured to send data directly to PFS

RainFS: Replicating Array of Independent NAS File System

New Hadoop Filesystem designed specifically to intermediate between MR and PFS

www.ellisv3.com Hadoop on NAS

slide-9
SLIDE 9

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Architecture Details: Traditional HDFS

Pros: Simplicity Cons: Performance Degradation: One full replica in network contention Reliability Limits: Duplication is the ceiling Copy Required: Distinct namespace

www.ellisv3.com Hadoop on NAS

slide-10
SLIDE 10

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Architecture Details: HDFS as a Wire Protocol

Pros: HDFS becomes Yet Another Protocol Reliability limits go away Cons: Performance Bottleneck: NAS Head limits throughput NAS Invasion: May not be possible (easy) with many NAS solutions Copy Required: Distinct namespace

www.ellisv3.com Hadoop on NAS

slide-11
SLIDE 11

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Architecture Details: No HDFS

Pros: High-Performance: Alleviates overheads and bottlenecks No Copies: Operates

  • n typical POSIX

namespace Cons: Requires Single Namespace: No HDFS to intermediate between distinct NAS No Replication: Must tolerate solely RAID

www.ellisv3.com Hadoop on NAS

slide-12
SLIDE 12

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Hadoop vs. HPC Storage: A Reliability Divergence

HPC Storage Enterprise storage solutions RAID 5/6 ECC-enabled hardware (sometimes end-to-end) Redundant hardware (PSU/NIC/etc) Hadoop Storage (HDFS) Commodity hard drives in compute nodes Replication performed across nodes/racks No ECC No Redundant hardware

www.ellisv3.com Hadoop on NAS

slide-13
SLIDE 13

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Converged Reliability Guarantees

RAID 5 RAID 6 RAID 5 RAID 6 RAID 5 RAID 6

  • Repl. 1
  • Repl. 1
  • Repl. 2
  • Repl. 2
  • Repl. 3
  • Repl. 3

DN-on-Client 1 / 0 2 / 0 3 / 1 5 / 1 – / – – / – DN-on-NAS Node 1 / 0 2 / 0 3 / 1 5 / 1 5 / 2 8 / 2 No HDFS 1 / 0 2 / 0 – / – – / – – / – – / – RainFS 1 / 0 2 / 0 3 / 1 5 / 1 5 / 2 8 / 2

Two main failure modes for converged HDFS/HPC storage: Failure of a disk Failure of a rack

www.ellisv3.com Hadoop on NAS

slide-14
SLIDE 14

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Locality Confusion: Write Transport

Errant Pass-Through Behavior on Write

1000 2000 3000 4000 5000 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 Network Throughput (MB/s) Time Since Start (Minutes:Seconds) Received Sent

www.ellisv3.com Hadoop on NAS

slide-15
SLIDE 15

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Read Transport

Errant Pass-Through Behavior on Read

500 1000 1500 2000 2500 00:00 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 Network Throughput (MB/s) Time Since Start (Minutes:Seconds) Received Sent

www.ellisv3.com Hadoop on NAS

slide-16
SLIDE 16

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Design Desirata

Four main goals for RainFS:

1 Client-Level Federation of NAS Systems: Enable

performance of all available NAS systems concurrently and maintain discrete failure domains

2 Full Replication: Restore replication ability in MapReduce 3 No Data Pass-Throughs: Writes/Reads should never go

through another client node.

4 A Fair Namespace: Create a framework-agnostic namespace

where no imports or exports are required.

www.ellisv3.com Hadoop on NAS

slide-17
SLIDE 17

Introduction/Background Converged Architectures Evaluation Overview of Architectures Reliability and Performance Implications RainFS

Main Implementation Mechanisms

Symbolic Links (symlinks) Symlinks on master failure domain are pointed at replica zero

  • n one of the NAS systems

Placement of replica zero is randomly chosen, following replicas are round-robined MR can read from MPI output; MPI can read from MR output Key algorithms and their synchronization issues are covered in the paper Hidden Metadata File Beside and named similarly to the symlink Manage where replicas exist, up/down state, etc Avoids dedicated, centralized metadata manager daemon

www.ellisv3.com Hadoop on NAS

slide-18
SLIDE 18

Introduction/Background Converged Architectures Evaluation Setup Results

Setup and Benchmarks in Use

Hardware Environment Cluster of 50 multi-core machines at Carnegie Mellon CentOS 5.5 running as VM on KVM DirectFlow(tm) network attached protocol to: 5 shelves of Panasas ActiveStor 12 Benchmarks in Use Ubiquitous Yahoo! TeraSort Benchmark Suite

TeraGen - Write-intensive TeraSort - Mixed, CPU-intensive TeraValidate - Read-intensive

www.ellisv3.com Hadoop on NAS

slide-19
SLIDE 19

Introduction/Background Converged Architectures Evaluation Setup Results

Impact of Architecture on Throughput Performance

Yahoo! TeraSort Benchmark (50 clients, 500GBs of Data)

500 1000 1500 2000 2500 3000 3500 TeraGen TeraValidate

Throughput (MB/s)

DN-on-Client DN-on-NAS No-DN RainFS

(a) Rep. Level 1: Write- and Read-Intensive

50 100 150 200 250 300 TeraSort

Throughput (MB/s)

DN-on-Client DN-on-NAS No-DN RainFS

(b) Rep. Level 1: Mixed

500 1000 1500 2000 2500 3000 3500 TeraGen TeraValidate

Throughput (MB/s)

DN-on-Client DN-on-NAS No-DN RainFS

(c) Rep. Level 2: Write- and Read-Intensive

50 100 150 200 250 300 TeraSort

Throughput (MB/s)

DN-on-Client DN-on-NAS No-DN RainFS

(d) Rep. Level 2: Mixed

www.ellisv3.com Hadoop on NAS

slide-20
SLIDE 20

Introduction/Background Converged Architectures Evaluation Setup Results

Impact of Replication on Performance

Relative Throughput Slow-Downs for Replication per Architecture

0.5 1 1.5 2 2.5 3 TeraGen Relative Slowdown

DN-on-Client

2.02

DN-on-NAS

2.38

RainFS

1.52

www.ellisv3.com Hadoop on NAS

slide-21
SLIDE 21

Introduction/Background Converged Architectures Evaluation Setup Results

Conclusion

Conclusions: Convergence of Big Data and HPC is happening – compute is easy, storage is hard Numerous pitfalls/caveats, especially relating to reliability Four different architectures using Hadoop MapReduce on NAS were explored – each have their own pros/cons RainFS demonstrates optimal performance with highest reliability

www.ellisv3.com Hadoop on NAS

slide-22
SLIDE 22

Introduction/Background Converged Architectures Evaluation Setup Results

Questions?

www.ellisv3.com Hadoop on NAS

slide-23
SLIDE 23

Introduction/Background Converged Architectures Evaluation Setup Results

Backup Slides – Begin Backup Slides –

www.ellisv3.com Hadoop on NAS

slide-24
SLIDE 24

Introduction/Background Converged Architectures Evaluation Setup Results

Super-Moore’s Law CPU Scaling in Supercomputers

Top 500 Supercomputer HPLinpack Results over Time

0.1 1 10 100 1000 10000 100000 1e+06 1e+07 1e+08 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 HPLinpack Rmax Year #1 Mean #500

Doubling Performance Every 13.5 Months

0.01 0.1 1 10 100 1000 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 HPLinpack Rmax/CPU Mean CPU Count (Normalized to 1 at start) Year Mean Rmax/CPUs Normalized Mean CPU Count

Doubling CPU Count Every 22 Months

www.ellisv3.com Hadoop on NAS

slide-25
SLIDE 25

Introduction/Background Converged Architectures Evaluation Setup Results

HDD Capacity and Bandwidth Scaling

Historical Data from over 1500 Hard Drives

0.001 0.01 0.1 1 10 100 1000 10000 100000 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 Capacity (GB) Year Fit-line: exp(0.452*x-901.432)

Doubling Capacity every 18 Months

50 100 150 200 250 300 500 1000 1500 2000 2500 3000 3500 4000 4500 Maximum Read Bandwidth (MB/s) Capacity (GB) Fit-line: 20.5088 * ln(x) - 34.0089

Doubling Bandwidth only once per decade!

www.ellisv3.com Hadoop on NAS

slide-26
SLIDE 26

Introduction/Background Converged Architectures Evaluation Setup Results

Taken In Perspective: A Grim Reality

Time taken to access all on-disk data: 1996 - 30 seconds 2006 - 30 minutes 2016 - 1 day

www.ellisv3.com Hadoop on NAS