Sunrise or Sunset: Exploring the Design Space of Big Data

Sunrise ¡or ¡Sunset: ¡Exploring ¡the ¡Design ¡Space ¡of ¡Big ¡Data ¡ So7ware ¡Stacks ¡ ¡ Panel ¡PresentaAon ¡at ¡HPBDC ¡‘17 ¡ ¡ by ¡ Dhabaleswar ¡K. ¡(DK) ¡Panda ¡ The ¡Ohio ¡State ¡University ¡ E-‑mail: ¡panda@cse.ohio-‑state.edu ¡ h<p://www.cse.ohio-‑state.edu/~panda ¡

Q1: ¡Are ¡Big ¡Data ¡So7ware ¡Stacks ¡Mature ¡or ¡Not? ¡ • Big ¡Data ¡soEware ¡stacks ¡like ¡Hadoop, ¡Spark ¡and ¡Memcached ¡have ¡been ¡ there ¡for ¡mulKple ¡years ¡ – Hadoop ¡– ¡11 ¡years ¡(Apache ¡Hadoop ¡0.1.0 ¡released ¡on ¡April, ¡2006) ¡ – Spark ¡– ¡ ¡5 ¡years ¡(Apache ¡Spark ¡0.5.1 ¡released ¡on ¡June, ¡2012) ¡ – Memcached ¡– ¡14 ¡years ¡(IniKal ¡release ¡of ¡Memcached ¡on ¡May ¡22, ¡2003) ¡ • Increasingly ¡being ¡used ¡in ¡producKon ¡environments ¡ • OpKmized ¡for ¡commodity ¡clusters ¡with ¡Ethernet ¡and ¡TCP/IP ¡interface ¡ • Not ¡yet ¡able ¡to ¡take ¡full ¡advantage ¡of ¡modern ¡cluster ¡and/or ¡HPC ¡ technologies ¡ ¡ ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 2 ¡

Data ¡Management ¡and ¡Processing ¡on ¡Modern ¡Clusters • SubstanKal ¡impact ¡on ¡designing ¡and ¡uKlizing ¡data ¡management ¡and ¡processing ¡systems ¡in ¡mulKple ¡Kers ¡ – Front-‑end ¡data ¡accessing ¡and ¡serving ¡(Online) ¡ • Memcached ¡+ ¡DB ¡(e.g. ¡MySQL), ¡HBase ¡ – Back-‑end ¡data ¡analyKcs ¡(Offline) ¡ • HDFS, ¡MapReduce, ¡Spark ¡ Front-end Tier Back-end Tier Data Analytics Apps/Jobs Memcached Memcached + DB (MySQL) Memcached + DB (MySQL) Web + DB (MySQL) Internet Web MapReduce Spark Server Web Server Server HDFS NoSQL DB NoSQL DB (HBase) Data Accessing NoSQL DB (HBase) and Serving (HBase) Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 3 ¡

Who ¡Are ¡Using ¡Hadoop? Focuses ¡on ¡large ¡data ¡and ¡data ¡analysis ¡ • • Hadoop ¡(e.g. ¡HDFS, ¡MapReduce, ¡RPC, ¡HBase) ¡environment ¡is ¡gaining ¡a ¡lot ¡of ¡ momentum ¡ • h<p://wiki.apache.org/hadoop/PoweredBy ¡ ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 4 ¡

Spark ¡Ecosystem ¡ • Generalize ¡MapReduce ¡to ¡support ¡new ¡apps ¡in ¡same ¡engine ¡ • Two ¡Key ¡ObservaKons ¡ – General ¡task ¡support ¡with ¡DAG ¡ ¡ – MulK-‑stage ¡and ¡interacKve ¡apps ¡require ¡faster ¡ data ¡sharing ¡ across ¡parallel ¡jobs ¡ BlinkDB Caffe, MLlib Spark … GraphX TensorFlow, Streaming (Machine (Machine Spark (graph) BigDL, etc. (real-time) Learning) Learning) SQL (Deep Learning) (Deep Learning) Spark YARN ¡ Standalone ¡ Apache ¡Mesos ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 5 ¡

Who ¡Are ¡Using ¡Spark? Focuses ¡on ¡large ¡data ¡and ¡data ¡analysis ¡with ¡in-‑memory ¡techniques ¡ • • Apache ¡Spark ¡is ¡gaining ¡a ¡lot ¡of ¡momentum ¡ • h<p://spark.apache.org/powered-‑by.html ¡ ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 6 ¡

Q2: ¡What ¡are ¡the ¡Main ¡Driving ¡forces ¡for ¡New-‑ generaAon ¡Big ¡Data ¡So7ware ¡Stacks? ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 7 ¡

Increasing ¡Usage ¡of ¡HPC, ¡Big ¡Data ¡and ¡Deep ¡Learning ¡ Big ¡Data ¡ HPC ¡ ¡ (Hadoop, ¡Spark, ¡ (MPI, ¡RDMA, ¡ HBase, ¡ Lustre, ¡etc.) ¡ Memcached, ¡ etc.) ¡ Deep ¡Learning ¡ (Caffe, ¡TensorFlow, ¡ BigDL, ¡etc.) ¡ Convergence ¡of ¡HPC, ¡Big ¡Data, ¡and ¡Deep ¡Learning!!! ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 8 ¡

How ¡Can ¡HPC ¡Clusters ¡with ¡High-‑Performance ¡Interconnect ¡and ¡Storage ¡ Architectures ¡Benefit ¡Big ¡Data ¡and ¡Deep ¡Learning ¡ApplicaAons? ¡ Can ¡HPC ¡Clusters ¡with ¡ How ¡much ¡ Can ¡RDMA-‑enabled ¡ Can ¡the ¡bo<lenecks ¡be ¡ high-‑performance ¡ performance ¡benefits ¡ alleviated ¡with ¡new ¡ high-‑performance ¡ storage ¡systems ¡(e.g. ¡ designs ¡by ¡taking ¡ can ¡be ¡achieved ¡ interconnects ¡ ¡ advantage ¡of ¡HPC ¡ SSD, ¡parallel ¡file ¡ through ¡enhanced ¡ benefit ¡Big ¡Data ¡ ¡ technologies? ¡ systems) ¡benefit ¡Big ¡ designs? processing ¡and ¡Deep ¡ Data ¡and ¡Deep ¡Learning ¡ ¡ Learning? How ¡to ¡design ¡ applicaKons? ¡ benchmarks ¡for ¡ ¡ What ¡are ¡the ¡major ¡ evaluaKng ¡the ¡ bo<lenecks ¡in ¡current ¡Big ¡ performance ¡of ¡Big ¡Data ¡ Data ¡processing ¡and ¡Deep ¡ and ¡Deep ¡Learning ¡ Learning ¡middleware ¡(e.g. ¡ middleware ¡on ¡HPC ¡ ¡ Hadoop, ¡Spark)? ¡ clusters? Bring ¡HPC, ¡Big ¡Data ¡processing, ¡and ¡Deep ¡ Learning ¡into ¡a ¡“convergent ¡trajectory”! ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 9 ¡

Can ¡We ¡Run ¡Big ¡Data ¡and ¡Deep ¡Learning ¡Jobs ¡on ¡ExisAng ¡HPC ¡ Infrastructure? ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 10 ¡

Q3: ¡What ¡Chances ¡ ¡are ¡Provided ¡for ¡the ¡Academia ¡ CommuniAes ¡in ¡Exploring ¡the ¡Design ¡Spaces ¡of ¡Big ¡Data ¡ So7ware ¡Stacks? ¡ ¡ ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 14 ¡

Designing ¡CommunicaAon ¡and ¡I/O ¡Libraries ¡for ¡Big ¡ Data ¡Systems: ¡Challenges ¡ ¡ ¡ ApplicaAons ¡ Benchmarks ¡ Big ¡Data ¡Middleware ¡ (HDFS, ¡MapReduce, ¡HBase, ¡Spark, ¡gRPC/TensorFlow, ¡and ¡Memcached) ¡ Programming ¡Models ¡ RDMA ¡Protocols ¡ (Sockets) ¡ CommunicaAon ¡and ¡I/O ¡Library ¡ Point-‑to-‑Point ¡ Threaded ¡Models ¡ VirtualizaAon ¡(SR-‑IOV) ¡ CommunicaAon ¡ and ¡SynchronizaAon ¡ I/O ¡and ¡File ¡Systems ¡ QoS ¡& ¡Fault ¡Tolerance ¡ Performance ¡Tuning ¡ Commodity ¡CompuAng ¡System ¡ Storage ¡Technologies ¡ Networking ¡Technologies ¡ Architectures ¡ (HDD, ¡SSD, ¡NVM, ¡and ¡NVMe-‑ (InfiniBand, ¡1/10/40/100 ¡GigE ¡ (MulA-‑ ¡and ¡Many-‑core ¡ SSD) ¡ and ¡Intelligent ¡NICs) ¡ architectures ¡and ¡accelerators) ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 15 ¡

The ¡High-‑Performance ¡Big ¡Data ¡(HiBD) ¡Project ¡ • RDMA ¡for ¡Apache ¡Spark ¡ ¡ • RDMA ¡for ¡Apache ¡Hadoop ¡2.x ¡(RDMA-‑Hadoop-‑2.x) ¡ – Plugins ¡for ¡Apache, ¡Hortonworks ¡(HDP) ¡and ¡Cloudera ¡(CDH) ¡Hadoop ¡distribuKons ¡ • RDMA ¡for ¡Apache ¡HBase ¡ • RDMA ¡for ¡Memcached ¡(RDMA-‑Memcached) ¡ Available ¡for ¡InfiniBand ¡and ¡RoCE ¡ • RDMA ¡for ¡Apache ¡Hadoop ¡1.x ¡(RDMA-‑Hadoop) ¡ Also ¡run ¡on ¡Ethernet ¡ • OSU ¡HiBD-‑Benchmarks ¡(OHB) ¡ – HDFS, ¡Memcached, ¡HBase, ¡and ¡Spark ¡Micro-‑benchmarks ¡ • hip://hibd.cse.ohio-‑state.edu ¡ • Users ¡Base: ¡230 ¡organizaKons ¡from ¡30 ¡countries ¡ • More ¡than ¡21,800 ¡downloads ¡from ¡the ¡project ¡site ¡ Network ¡Based ¡CompuAng ¡Laboratory ¡ HPBDC ¡‘17 ¡Panel ¡ 16 ¡

Sunrise or Sunset: Exploring the Design Space of Big Data - PowerPoint PPT Presentation

Sunrise or Sunset: Exploring the Design Space of Big Data So7ware Stacks Panel PresentaAon at HPBDC 17 by Dhabaleswar K. (DK) Panda The Ohio

Sunrise or Sunset: Exploring the Design Space of Big Data Software Stacks HPBDC 2017 3rd

Sunset HS Lacrosse Club Overview www.sunsetlacrosse.com January 7th 2016 SUNSET LACROSSE

Wisconsin Ave Baptist Church & Sunrise of T enley Circle ANC 3E Presentation October 12,

Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring the IPY with NOAA Exploring

SUNRISE Sophie Murray Head of Nutrition and Hydration and Deputy National Chair for the NACC

2015 Sunset Ridge Model Home San Jacinto, California Sunset Ridge K900 K900 3 BEDROOM 2

California Coastal Commission uniquely special + Sunset Beach Local Coastal Plan

Agenda Snapshot of Sunset Resort About Bulgaria Pomorie as destination Sunset

REPORT Park Board Meeting Monday, July 8, 2019 July 7: Symphony at Sunset Sunset Beach Park

CONTRACTOR PERFORMANCE Draft Sunset Rule Changes May 24, 2018 Sunset Bill (SB 312)

paradise refined 2 water sports 3 romantic moments 4 world class spa 5 personalized butler

Welcome! Housing Options Review Project Sunrise 6:56 am Sunset 5:58 pm Agenda: Presentation

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

PUBLI C MEETI NG PUBLI C MEETI NG April 28 April 28- -29, 2009 29, 2009 Governor David A.

Sunrise Communications Group AG Investor presentation September 2017 Sunrise leading fully

Advanced CUDA: GPU Memory Systems John E. Stone Theoretical and Computational Biophysics Group

A data acquisition system for the Cerenkov Telescope Array Julien HOULES, Dirk HOFFMANN

Two weight L 2 inequality for the Hilbert transform Eric T. Sawyer reporting on joint work with

Multi-field -attractor in fundamental theory Yusuke Yamada (Stanford Univ.) collaborators: R.

BSS Processes and Intermittency/Volatility Realised Quadratic Variation Turbulence Turbulence

Flavor without symmetries Alex Pomarol, UAB (Barcelona) Flavor without symmetries Alex Pomarol,

Virtualization BOF Isaku Yamahata <yamahata@valinux.co.jp>

Performance tuning for Java applications George Barnett, Atlassian Friday, 11 March 2011 Topics