From Performance Profiling to Predictive Analytics while evaluating - PowerPoint PPT Presentation

www.bsc.es From Performance Profiling to Predictive Analytics while evaluating Hadoop performance using ALOJA Nicolas Poggi , Senior Researcher June 2015

ALOJA talks in WBDB.ca 2015 0. About ALOJA – DEMO 1. From Performance Profiling to Predictive Analytics – Project evolution – PA uses and lines of research 2. A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML – Description of the Machine Learning process and current results 3. A characterization of cost-effectiveness of PaaS Hadoop in the Azure cloud – Performance evaluation and scalability of VMs in PaaS

ABOUT BSC’S AND ALOJA BIG DATA BENCHMARKING PROJECT

Barcelona Supercomputing Center (BSC) 22 year history in Computer Architecture research – Based at the Technical University of Catalonia (UPC) – Long track record in chip Architecture & Parallelism – Active research staff with 1000+ publications – Large ongoing life science computational projects – Mare Nostrum Super Computer Marenostrum Supercomputer Prominent body of research activity around Hadoop since 2008 – SLA-driven scheduling (Adaptive Scheduler), in memory caching, etc. Long-term relationship between BSC and Microsoft Research and Microsoft product teams Open model: – No patents, public IP, publications and open source main focus – 90+ publications, 4 Best paper awards ALOJA is the latest phase of the engagement

Initial motivation The Hadoop implements a complex distributed execution model – +100 interrelated config parameters – Requires manual iterative benchmarking and tuning Hadoop’s price/performance are affected by simple configurations – Performance gains SW >3x – and HW > 3x Commodity HW no longer low- end as in the early 2000’s – Hadoop performs poorly on scale-up, or low power New Cloud services for Hadoop – IaaS and PaaS – Direct vs. remote attached volumes Spread Hadoop ecosystem – Dominated by vendors – Lack of verifiable benchmarks

Current scenario and problematic What is the most cost-effective configuration for my needs? – Multidimensional problem Cost Replication InfiniBand + RAID Large VMs On-Premise SSDs High availability JBODs And where is my - + system configuration positioned on each of Performance these axes? Gb Ethernet Remote volumes Rotational HDDs Small VMs - Cloud

Project ALOJA Open initiative to Explore and produce a systematic study of Hadoop efficiency on different SW and HW – Both cost and performance – Including commodity, high-end, low-power, and cloud Results from of a growing need of the community to understand job execution details Explore different configuration deployment options and their tradeoffs – Both software and hardware – Cloud services and on-premise Seeks to provide knowledge, tools, and an online service – to with which users make better informed decisions – reduce the TCO for their Big Data infrastructures – Guide the future development and deployment of Big Data clusters and applications

ALOJA Platform components and status Benchmarking, Repository, and Analytics tools for Big Data Big Data Online Analytics Benchmarking Repository Composed of open-source – Benchmarking, provisioning and orchestration tools, – high-level system performance metric collection, – low-level Hadoop instrumentation based on BSC Tools – and Web based data analytics tools • And recommendations Online Big Data Benchmark repository of: – 42,000+ runs (from HiBench), some BigBench and TCP-H – Sharable, comparable, repeatable, verifiable executions Abstracting and leveraging tools for BD benchmarking – Not reinventing the wheel but, – most current BD tools designed for production, not for benchmarking – leverages current compatible tools and projects Dev VM toolset and sandbox – via Vagrant

Components Big Data Benchmarking ALOJA-DEPLOY Composed of scripts to: – Automatically create, stop, delete clusters in the cloud • From a simple and abstracted node and cluster definition files • Both for Linux and Windows • IaaS and PaaS (HDInsight) • Abstracted to support multiple providers – Provision and configuration of base software to servers • Both for cloud based as on premise • Composed of portable configuration management scripts • Designed for benchmarking needs – Orchestrate benchmark executions • Prioritized job queues • Results gathering and packaging ALOJA-BENCH – Multi-benchmark support – Flexible performance counter options – Dynamic SW and HW configurations 9

Workflow in ALOJA • VM sizes Cluster(s) • # nodes • OS, disks definition • Capabilities • Start cluster Execution • Exec Benchmarks plan • Gather results • Cleanup • Convert perf Import metric • Parse logs data • Import into DB • Data views in Vagrant Evaluate VM • Or data http://hadoop.bsc.es Historic • Predictive Repo PA and Analytics KD • Knowledge Discovery

ALOJA-WEB Online Repository Entry point for explore the results collected from the executions – Index of executions • Quick glance of executions Available at: http://hadoop.bsc.es • Searchable, Sortable – Execution details • Performance charts and histograms • Hadoop counters • Jobs and task details Data management of benchmark executions – Data importing from different clusters – Execution validation – Data management and backup Cluster definitions – Cluster capabilities (resources) – Cluster costs Sharing results – Download executions – Add external executions Documentation and References – Papers, links, and feature documentation 11

Features and Benchmark evaluations in ALOJA-WEB Benchmark Config Cost/Perf Performance Prediction Repository Evaluations Evaluation Details Tools Browse Best Scalability of Performance Modeling data executions execution VMs Charts Performance Hadoop Job Config Evaluation of Predict metrics counters improvement execs configurations details PaaS exec Parameter Evaluation of DBSCAN Config tree details evaluation clusters Evaluation of Anomaly HW configs detection …

ALOJA-WEB Entry point for explore the results collected from the executions, – Provides insights on the obtained results through continuously evolving data views. Online DEMO at: http://hadoop.bsc.es

PROJECT EVOLUTION AND LESSONS LEARNED ALONG THE WAY

Reasons for change in ALOJA Part of the change/evolution in the project due to focus shift • To available resources (Cloud) • Market changes: On-prem vs. Cloud – IaaS vs. PaaS » Pay-as-you-Go, Pay-what-you-process – Challenges » From local to remote (network) disks » Over 32 types of VM in Microsoft Azure – Increasing number of benchmarks • Needed to compare (and group together) benchs of different – jobs and systems • Deal with noise (outliers) and failed executions • Need automation – Predictive Analytics and KD – Expanding the scope / search space • From apps and framework • Including clusters/systems • To comparing providers (datacenters)

ALOJA Evolution summary Techniques for obtaining Cost/Performance Insights Predictive Analytics Aggregation • Automated modeling • Estimations • Summarize large • Virtual executions number of results • Automated KD • By criteria Benchmarking • Filter noise • Iterate configs • Fast processing • HW and SW • Real executions • Log parsing and Profiling data sanitization • Low-level • High Accuracy Evaluation of: • Manual Analysis Big Data Apps Frameworks Systems / Clusters Cloud Providers

Initial approach: Low-level profiling Profiling Hadoop with BSC’s HPC tools – Preliminary work, relying on over 20 years HPC experience and tools – Developed the Hadoop Instrumentation Toolkit • with custom hooks to capture events • Added a network sniffer HDP processes and communication CPU Memory Page Faults

Overview of HAT and HPC tools Hadoop Analysis Toolkit and BSC tools Hadoop + Hadoop Events Performance MonitoringT ools System Merge Networking Extrae Extrae traces Hadoop Tools *.mpit GenerateEvent Java JNI – Java (native) Wrapper.Event (Java) Paraver Traces extree_wrapper.so Wrapper.Event (C) *.prv libextrae.so Event (C) libpcap.so DIMEMAS Paraver (Simulation) (Visualization and Analysis) Paraver Config *.cfg

Hadoop in PARAVER Different Hadoop Phases – Map – Reduce Map Phase Reduce Phase

Sort + combine Detailed work done by Hadoop – Sort / Combine Flush Sort CreateSpillIndexFile Combine SortAndSpill 20

Network communications Communications between processes… … or between nodes

Network: low-level Low level details – TCP 3-way handshake DATA ACK DATA ACK Data analysis tool: SYN ACK DATA ACK SYN/ACK

Low-level profiling Pros • Understanding of Hadoop internals • Useful to improve and debug Hadoop framework • Detailed and accurate view of executions • Improve low-level system components, drivers, accelerators Cons • Non-deterministic nature of Hadoop • Not suitable for finding best configurations • Not suitable to test different systems – And Big Data platforms (re implement) • Virtualized environments introduces challenges for low-level tools • On PaaS you might not have admin user (root)

From Performance Profiling to Predictive Analytics while evaluating - PowerPoint PPT Presentation

www.bsc.es From Performance Profiling to Predictive Analytics while evaluating Hadoop performance using ALOJA Nicolas Poggi , Senior Researcher June 2015 ALOJA talks in WBDB.ca 2015 0. About ALOJA DEMO 1. From Performance Profiling to

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

Automating Predictive Analytics www.xpanseanalytics.com Agenda Predictive Analytics vs

Educational Predictive Analytics: Navigating Disparate Views Aaron Springer , Victoria Chou,

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

COVID-19 Predictive Analytics April 8th, 2020 Predictive Analytics Focus Areas Health System

Session 2 Predictive Analytics in Policyholder Behavior Eileen Burns, FSA, MAAA David Wang, FSA,

While Loops Python While Loops Form of the while loop: while condition : Statement Block

Profiling of Algorithms Profiling refers to the experimental measurement of the performance of

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel

Leaving no one behind The role of evidence-building and profiling to include displacement in

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

Web User Profiling using Data Redundancy http://aminer.org/profiling Xiaotao Gu, Hong Yang, Jie

COZ : Finding Code that Counts with Causal Profiling Anuja Golechha Agenda Profiling

Optimization Profiling VisualVM Exercise Meme Credit: Randall Munroe, hrefhttp://xkcd.comxkcd

SESSION 1: THE FOUNDATIONS OF FINANCE Its always about money The old saying that it is

A M A M IXED ED V ER ON ERIFICATION S TRATEG EGY T AILOR ED FOR OR ORED N ET ORKS ON ON C C HIP

Introduc)on to the Real-Time Applica)ons and Infrastructure

TUESDAY Y6 Domain/Y5 Gap Y6 wider reading RECORDED IN EXERCISE BOOKS 1 WALT Make

Gaia Space VRC Pilot Data Services Nicholas Walton & Guy Rixon (Institute of Astronomy)

(Toward) Radiative transfer on AMR with GPUs Dominique Aubert Universit de Strasbourg

Leandro Soares Indrusiak http://www-users.cs.york.ac.uk/lsi Dagstuhl Seminar 15121 March 2015

Indirect searches in the PAMELA and Fermi era Aldo Morselli INFN, Sezione di Roma Tor Vergata

From Performance Profiling to Predictive Analytics while evaluating - PowerPoint PPT Presentation

www.bsc.es From Performance Profiling to Predictive Analytics while evaluating Hadoop performance using ALOJA Nicolas Poggi , Senior Researcher June 2015 ALOJA talks in WBDB.ca 2015 0. About ALOJA DEMO 1. From Performance Profiling to

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Predictive Analytics for Capacity Planning HIC 2015 Andrae Gaeth What is predictive

Automating Predictive Analytics www.xpanseanalytics.com Agenda Predictive Analytics vs

Educational Predictive Analytics: Navigating Disparate Views Aaron Springer , Victoria Chou,

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

COVID-19 Predictive Analytics April 8th, 2020 Predictive Analytics Focus Areas Health System

Session 2 Predictive Analytics in Policyholder Behavior Eileen Burns, FSA, MAAA David Wang, FSA,

While Loops Python While Loops Form of the while loop: while condition : Statement Block

Profiling of Algorithms Profiling refers to the experimental measurement of the performance of

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Profiling of Data-Parallel Processors Daniel Kruck 09/02/2014 09/02/2014 Profiling Daniel

Leaving no one behind The role of evidence-building and profiling to include displacement in

Expression Profiling Mark Voorhies 4/4/2011 Mark Voorhies Expression Profiling Review

Web User Profiling using Data Redundancy http://aminer.org/profiling Xiaotao Gu, Hong Yang, Jie

COZ : Finding Code that Counts with Causal Profiling Anuja Golechha Agenda Profiling

Optimization Profiling VisualVM Exercise Meme Credit: Randall Munroe, hrefhttp://xkcd.comxkcd

SESSION 1: THE FOUNDATIONS OF FINANCE Its always about money The old saying that it is

A M A M IXED ED V ER ON ERIFICATION S TRATEG EGY T AILOR ED FOR OR ORED N ET ORKS ON ON C C HIP

Introduc)on to the Real-Time Applica)ons and Infrastructure

TUESDAY Y6 Domain/Y5 Gap Y6 wider reading RECORDED IN EXERCISE BOOKS 1 WALT Make

Gaia Space VRC Pilot Data Services Nicholas Walton &amp; Guy Rixon (Institute of Astronomy)

(Toward) Radiative transfer on AMR with GPUs Dominique Aubert Universit de Strasbourg

Leandro Soares Indrusiak http://www-users.cs.york.ac.uk/lsi Dagstuhl Seminar 15121 March 2015

Indirect searches in the PAMELA and Fermi era Aldo Morselli INFN, Sezione di Roma Tor Vergata

Gaia Space VRC Pilot Data Services Nicholas Walton & Guy Rixon (Institute of Astronomy)