Machine Learning Applications Ben Chandler Hewlett Packard Labs - PowerPoint PPT Presentation

A Platform for Accelerating Machine Learning Applications Ben Chandler Hewlett Packard Labs April 6th, 2016

HPE Big Data and HPC portfolio strategy Design and deliver comprehensive solutions with purpose-built platforms Innovate, design & deliver the best-in-class 1 hardware and software to support foundational infrastructure needs of the Big Data customers Optimized HW/SW Platforms Provide vertical solutions by building software 2 stack and partner ecosystem Enable Advisory Services to help manage 3 customer’s technology journey Drive HPC and Big Data across all Enterprises 2

Modernize your datacenter for massive parallel processing innovation Deliver automated intelligence, real-time insights and optimized performance Navigate the data-driven transformation journey across all enterprises with new HPC and Big Data capabilities that accelerate time-to-value for increased competitive differentiation Automated Real-time insights Optimized performance intelligence Deep Learning HPC Compute HPE Vertica for Integrity MC990 Trade & Match HPC for Trader Risk Compliant Innovation & Storage SQL on Hadoop X for Database Server Solution Workstation Archive Solution Solution Processing Apollo 6500, Apollo 4520 Apollo 2000 Apollo 4000 Series Apollo 4510 HPE Moonshot Extreme performance capabilities to process, manage and analyze data, I/O and storage intensive application workloads with high speed, scale, efficiency and enable high flexibility for open infrastructure innovation 3

Deliver automated intelligence in real-time for Deep Learning Unprecedented performance and scale with HPE Apollo 6500 high density GPU solution Use Cases Automated Intelligence delivered by HPE Apollo 6500 and Deep Transform Protect to a hybrid your digital Learning software infrastructure enterprise solutions Video, Image, Text, Large, highly complex, Real-time, near Enable Empower Audio, time series real-time analytics workplace a data-driven unstructured simulation productivity organization pattern recognition and modeling Faster Model training time, better fusion of data* Customer benefits HPE Apollo 6500 is an ideal HPC and Deep Learning platform providing unprecedented performance with 8 GPUs, high bandwidth fabric and a configurable GPU topology to match deep learning workloads − Up to 8 high powered GPUs per tray (node), 2P Intel E5-2600 v4 support − Choice of high-speed, low latency fabrics with 2x IO expansion − Workload optimized using flexible configuration capabilities * Benchmarking results provided at or shortly after announcement 4

HPE Apollo 6500 solution innovation System Design Innovation to maximize GPU capacity and performance with lower TCO New technologies, products Deep Learning, HPC Software platform Enablement Unique (HPE CCTK, Caffe, CUDA, Google TensorFlow, HPE IDOL) Solution differentiators − GPU density Cluster Management Enhancements − Configurable GPU topologies (Massive Scaling, Open APIs, tight Integration, multiple user interfaces) − More network bandwidth − Power and cooling optimization HPE Apollo 6500 − Manageability – Dense GPU server optimized for Deep − Better productivity Learning and HPC workloads – Density optimization – High performance fabrics 5

Roadmap – Motivating evidence – The CogX project and vision – Open-source availability

A simple data-intensive program val movie1 = ... val movie2 = ... val average = (movie1 + movie2) / 2 movie1 + movie2 average / 2

Simplified architecture diagram CPU Mem GPU Mem CPU GPU

Naïve data flow in practice CPU Mem GPU Mem CPU GPU val average = (movie1 + movie2) / 2

Optimized data flow in practice CPU Mem GPU Mem CPU GPU val average = fusedOp(movie1, movie2, 2)

Performance portability on GPUs 11

Roadmap – Motivating evidence – The CogX project and vision – Open-source availability

Vision performance-portable, high-productivity programming for accelerators 13

CogX What is CogX? • Domain-specific embedded language with associated optimizing compiler and runtime • Array programming language embedded in a state machine execution model • Targets advanced analytics workloads on massively parallel distributed systems • Design Goals – Optimal deployment on parallel hardware – Fast design iterations – Enforce scalability – Broad COTS hardware support – Compatible with shared infrastructure – High productivity for analysts and algorithm engineers

CogX compute model • Compute Graphs – Fields – Operators – Sensors/Actuators – Feedback/Time Compute Graph

CogX compute model val movie = ColorMovie ( “courtyard.mp4” ) val background = VectorField ( movie . fieldShape , Shape (3)) val nextBackground = 0.999f * background + 0.001f * movie background <== nextBackground val suspicious = reduceSum(abs( movie - background ))

Demo: Hello World application 17

CogX compute model val movie = ColorMovie ( “courtyard.mp4” ) ColorMovie movie t Compute graph

CogX compute model val background = VectorField ( movie . fieldShape , Shape (3)) ColorMovie movie t background t Compute graph

CogX compute model val nextBackground = 0.999f * background + 0.001f * movie ColorMovie movie t * 0.001f nextBackground t background t + * 0.999f Compute graph

CogX compute model background <== nextBackground ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f Compute graph

CogX compute model val suspicious = reduceSum(abs( movie - background )) ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum Compute graph

CogX compute model movie 0 movie 1 movie 2 * 0.001f * 0.001f * 0.001f + + + * 0.999f * 0.999f * 0.999f background 0 background 1 background 2 background 3 = 0 - - - reduceSum reduceSum reduceSum abs abs abs suspicious 0 suspicious 1 suspicious 2

Opportunities for optimization ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum Compute graph

Opportunities for optimization Initially: 6 separate device kernels. ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum device kernel Compute graph

Opportunities for optimization After a “single - output” kernel fuser pass: 2 device kernels remain. ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum device kernel Compute graph

Opportunities for optimization After a “multi - output” kernel fuser pass : only a single device kernel remains. ColorMovie movie t * 0.001f nextBackground t background t background t+1 + * 0.999f suspicious t reduce - abs Sum device kernel Compute graph

CogX compiler: translating CogX to OpenCL with kernel fusion parsing and optimizations, OpenCL code including kernel User CogX Kernel Optimized Syntax generation fusion model circuit kernel tree circuit (scala) (ops, fields) (kernels, (merged (ops, field bufs) kernels) fields) CogX code snippet A A val A = ScalarField(10,10) val B = ScalarField(10,10) C * val C = A * B val D = ScalarField(10,10) + B opencl E * val E = C + D + E multiply B kernel fused opencl opencl add D multiply/ kernel D add kernel

CogX core functions and operators • Basic operators • FFT/DCT • Type coercion • +, -, *, /, % • fft, fftInverse • toScalarField, toVectorField • Logical operators • fftRI, fftInverseRI • toMatrixField, toComplexField • >, >=, <, <=, ===, !=== • fftRows, fftInverseRows • toComplexVectorField, toColorField • Pointwise functions • fftColumns, fftInverseColumns • toGenericComplexField • cos, cosh, acos • dct, dctInverse, dctTransposed • Type construction • sin, sinh, asin • dctInverseTransposed • complex, polarComplex • tan, tanh, atan2 • Complex numbers • vectorField, complexVectorField • sq, sqrt, log, signum • phase, magnitude, conjugate • matrixField, colorField • pow, reciprocal • realPart, imaginaryPart • Reductions • exp, abs, floor • Convolution-like • reduceSum, blockReduceSum • Comparison functions • crossCorrelate, • reduceMin, blockReduceMin • max, min • reduceMax, blockReduceMax crossCorrelateSeparable • Shape manipulation • convolve, convolveSeparable • fieldReduceMax, fieldReduceMin • flip, shift, shiftCyclic • projectFrame, backProjectFrame • fieldReduceSum, fieldReduceMedian • transpose, subfield • crossCorrelateFilterAdjoint • Normalizations • expand, select, stack • convolveFilterAdjoint • normalizeL1, normalizeL2 • matrixRow, reshape • Gradient/divergence • Resampling • subfields, trim • backwardDivergence • supersample, downsample, upsample • vectorElement, vectorElements • backwardGradient • Special operators • transposeMatrices • centralGradient • winnerTakeAll • transposeVectors • forwardGradient • random • replicate, slice • Linear algebra • solve • dot, crossDot • transform • reverseCrossDot • warp • Debugging • <== • probe

Machine Learning Applications Ben Chandler Hewlett Packard Labs - PowerPoint PPT Presentation

A Platform for Accelerating Machine Learning Applications Ben Chandler Hewlett Packard Labs April 6th, 2016 HPE Big Data and HPC portfolio strategy Design and deliver comprehensive solutions with purpose-built platforms Innovate, design &

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

MLCC 2015 machine learning applications Francesca Odone ML applications Machine Learning

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Scattering Wavelet Transform Based Palm Print Biometric Recognition GROUP MEMBERS: MEET HARIA -

Steering Control and Looming Detection in Young Adults with DCD Y Ad lt ith DCD Rita F. de

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras

Visu Visual I Inertial Su Subse sea 3 3D D Recon onstr tructi tion on For Subsea Model

Tools and Methods for Multiscale Biomolecular Simulations Celeste Sagui Department of Physics,

Menu provides an avenue for ODE to Documentation introduce the recently released USDA and

SBI Holdings, Inc. Financial Results for the Nine-Month Period Ended Dec. 31, 2010 (Fiscal Year

nutritional information www.dippinstix.com premium produce Serving Size: 1 package 2.75oz (78g)