SLIDE 39 Data analytics and computing ecosystem compared
Courtesy by “Exscale Computing and Big Data”, DANIEL A. REED AND JACK DONGARRA , CACM July 2015
Mahout: machine learning tool Hive: data warehouse software Pig: provide high level language for big data Sqoop: exchange data with traditional database Flume: log management Zookeeper: maintaining consistency Storm: real-time computation system. Hbase: a distributed, scalable big data store. AVRO: data serialization system.
Data Analytic
FORTRAN,C,C++: languages PAPI: performance and debugging tool MPI/OpenMP: multi-core parallel model SLURM: batch scheduler Lustre: parallel file system
Computational Science
NOTE: The Divergence of Big Data and HPC Eco-Systems!
Applications and Community Codes Mahout, R, and Applications
Application Level
Hive Pig Sqoop Flume Storm Map-Reduce AVRO Hbase Big Table (key- value store) HDFS (Hadoop File System) Zookeeper (coordination) Cloud Services (e.g. AWS
Application Level Middleware and Management
Virtual Machines and Cloud Services (optional) Linux OS variant FORTRAN, C, C++, and IDEs Domain-specific Libraries MPI/OpenMP + Accelerator Tools Numerical Libraries Performance and Debugging Lustre (Parallel File System) Batch Scheduler (such as SLURM) System Monitoring Tools Linux OS variant Ethernet Switches Local Node Storage Commodit y X86 Racks Infiniband + Ethernet Switches SAN + Local Node Storage X86 Racks + GPUs or Accelerators
System Software Cluster Software Data Analytics Ecosystem Computational Science Ecosystem
A&M 05-16-2016 39