Declara've Systems for Large Scale Machine Learning - PowerPoint PPT Presentation

Declara've ¡Systems ¡for ¡ ¡ Large ¡Scale ¡Machine ¡Learning ¡ Markus ¡Weimer, ¡Tyson ¡Condie, ¡Raghu ¡Ramakrishnan ¡ ¡ Cloud ¡and ¡Informa'on ¡Services ¡Laboratory ¡ MicrosoA ¡

Joint ¡work ¡with ¡… ¡ Yingyi ¡Bu, ¡Vinayak ¡Borkar, ¡Michael ¡J. ¡Carey ¡ Joshua ¡Rosen, ¡Neoklis ¡Polyzo's ¡ University ¡of ¡California, ¡Irvine ¡ University ¡of ¡California, ¡Santa ¡Cruz ¡ 6/5/12 2

Example: ¡Spam ¡Filter ¡ Logged ¡ Inbox ¡ Event ¡ Spam ¡ Spam ¡ User ¡ Filter ¡ Interface ¡ 6/5/12 3

Machine ¡Learning ¡Workflow ¡ • Step ¡I: ¡Example ¡Forma;on ¡ – Feature ¡Extrac'on ¡ – Label ¡Extrac'on ¡ • Step ¡II: ¡Modeling ¡ • Step ¡III: ¡Deployment ¡(or ¡just ¡Evalua;on) ¡ Example ¡ Evalua'on ¡ Modeling ¡ Forma'on ¡ 6/5/12 4

Example ¡Forma'on ¡ Feature Extraction Large ¡Scale ¡Join ¡ Large ¡Scale ¡Join ¡ Bag ¡of ¡ EMail ¡ ID ¡ Words ¡ Bag ¡of ¡ Data ¡Parallel ¡ ID ¡ Label ¡ Words ¡ Func'ons ¡ Click ¡Log ¡ ID ¡ Label ¡ Label Extraction 6/5/12 5

Modeling ¡ • Many ¡Algorithms ¡are ¡inherently ¡sequen;al ¡ – Apply ¡model ¡to ¡data ¡ à ¡Look ¡at ¡Errors ¡ à ¡Update ¡ Model ¡ • Common ¡solu;ons ¡ – Subsampling ¡ – Train ¡on ¡par''ons, ¡merge ¡results ¡ – Rephrasing ¡of ¡algorithms ¡in ¡MapReduce ¡ 6/5/12 6

MapReduce ¡for ¡Modeling ¡ • Learning ¡algorithm ¡ access ¡the ¡data ¡only ¡ through ¡ sta;s;cal ¡ querys ¡ • A ¡sta's'cal ¡query ¡ returns ¡an ¡es'mate ¡of ¡ the ¡expecta'on ¡of ¡a ¡ func'on ¡f(x,y) ¡applied ¡ to ¡the ¡data. ¡ 6/5/12 7

MapReduce ¡for ¡Modeling ¡ • Rephrase ¡query ¡in ¡ summa'on ¡form. ¡ ¡ • Map: ¡ Calculate ¡func'on ¡ es'mates ¡over ¡data ¡ par''ons ¡ • Reduce: ¡ Aggregate ¡the ¡ func'on ¡es'mates. ¡ 6/5/12 8

Example ¡Methods ¡ • Convex ¡Op'miza'on ¡ – (Logis'c) ¡Regression ¡ – Support ¡Vector ¡ machines ¡ – … ¡ • K-‑Means ¡Clustering ¡ • Naïve ¡Bayes ¡ • Neural ¡Networks ¡ • … ¡ 6/5/12 9

Example: ¡Batch ¡Gradient ¡Descent ¡ (BGD) ¡ Until Convergence: ' * ( ) ∑ ) , ( ) * w t − η w t + 1 = 1.0 − ηλ ∂ w l y , w t , x ( + ( x , y ) Regularization Data Parallel Sum w t : Current Model l: loss function (e.g. squared error) x: Data ∂ : Gradient operator y: Label 6/5/12 10

Example: ¡Gradient ¡Computa'on ¡ Par''on ¡I ¡ Gradient ¡I ¡ Reduce ¡ Map ¡ Gradient ¡ Par''on ¡II ¡ Gradient ¡II ¡ Sum ¡ Par''on ¡III ¡ Gradient ¡III ¡ 6/5/12 11

Modeling ¡on ¡ Hadoop ¡MapReduce? ¡ • Machine ¡learning ¡algorithms ¡are ¡itera;ve ¡ – Each ¡itera'on ¡contains ¡mul'ple ¡Sta's'cal ¡Queries ¡ • Overhead ¡per ¡MapReduce ¡Job ¡ – Each ¡sta's'cal ¡query ¡is ¡a ¡job ¡ – A ¡job ¡entails ¡Scheduling, ¡Data ¡reading, ¡State ¡ transfer, ¡… ¡ – Especially ¡bad ¡on ¡shared ¡clusters ¡ 6/5/12 12

More ¡than ¡Map ¡Reduce ¡ • Complete ¡Job ¡DAGs ¡ – Beyond ¡the ¡fixed ¡map-‑ groupby-‑reduce ¡ – Arbitrary ¡length ¡and ¡ complexity ¡ • More ¡Operators ¡ – Join, ¡Filter, ¡Project, ¡… ¡ • Examples ¡ ¡ – Dryad ¡(MicrosoA ¡Research) ¡ – Hyracks ¡(UC ¡Irvine) ¡ – Stratosphere ¡(TU ¡Berlin) ¡ 6/5/12 13

More ¡than ¡Map ¡Reduce ¡ • Complete ¡Job ¡DAGs ¡ – Beyond ¡the ¡fixed ¡map-‑ groupby-‑reduce ¡ Machine – Arbitrary ¡length ¡and ¡ complexity ¡ Learning • More ¡Operators ¡ is – Join, ¡Filter, ¡Project, ¡… ¡ • Examples ¡ ¡ Cyclic! – Dryad ¡(MicrosoA ¡Research) ¡ – Hyracks ¡(UC ¡Irvine) ¡ – Stratosphere ¡(TU ¡Berlin) ¡ 6/5/12 14

Applied ¡Large ¡Scale ¡ML ¡requires ¡… ¡ • A ¡Rela;onal ¡Algebra ¡ – Join, ¡Filter, ¡Map, ¡ ¡… ¡ – For ¡feature ¡and ¡label ¡extrac'on ¡ • Itera;ve ¡computa;on ¡ – Loops ¡over ¡data ¡ Giraph One- Spark (Pregel) Offs – Incremental ¡model ¡updates ¡ • Scalability ¡/ ¡High ¡Performance ¡ – Jobs ¡must ¡execute ¡successfully ¡irrespec've ¡of ¡ ? the ¡data ¡set ¡size ¡/ ¡run'me ¡cluster ¡configura'on ¡ – More ¡favorable ¡cluster ¡setups ¡must ¡be ¡used ¡for ¡ speed-‑ups ¡(e.g. ¡cache ¡data ¡in ¡memory) ¡ 6/5/12 15

Take-‑away ¡ • Usability ¡is ¡bad ¡ – Developing ¡a ¡single ¡model ¡takes ¡months ¡ – Requires ¡many ¡tools ¡and ¡technologies ¡ • Pick ¡your ¡poison ¡on ¡a ¡way ¡to ¡a ¡subpar ¡solu'on ¡ – Subsampling ¡hurts ¡model ¡fidelity ¡ – Training ¡on ¡MapReduce ¡oAen ¡too ¡slow ¡ 6/5/12 16

Goals ¡ • Integrate ¡modeling ¡and ¡ETL ¡workflows ¡ – All ¡Pig ¡operators ¡ – Itera'on ¡is ¡a ¡first ¡class ¡ci'zen ¡ – Unify ¡MPI, ¡Pregel, ¡MapReduce, ¡… ¡on ¡a ¡ single ¡run'me ¡ • Improve ¡produc;vity ¡ – Free ¡the ¡Programmer ¡from ¡run'me ¡details ¡(like ¡ MapReduce) ¡ – Facilitate ¡easier ¡job ¡composi'on ¡ – IDE ¡support ¡ – UDFs ¡as ¡first ¡class ¡ci'zens ¡(unlike ¡Pig) ¡ 6/5/12 17

Vision ¡ User ¡ Program ¡ Logical ¡ Plan ¡ Physical ¡ Plan ¡ Execu'on ¡ Engine ¡ 6/5/12 18

Vision ¡ Loop ¡ User ¡ ScalOps ¡ Program ¡ Aware ¡on ¡ all ¡Levels ¡ Logical ¡ Algebricks ¡ Plan ¡ Physical ¡ Physical ¡ Plan ¡ Plan ¡ Execu'on ¡ Hyracks ¡ Engine ¡ 6/5/12 19

ScalOps ¡– ¡The ¡Language ¡ ScalOps ¡ Algebricks ¡ Physical ¡ Plan ¡ Hyracks ¡ 6/5/12 20

ScalOps ¡– ¡Overview ¡ • Embedded ¡Domain ¡Specific ¡Language ¡in ¡Scala ¡ • All ¡Pig ¡Operators ¡(Filter, ¡Join, ¡GroupBy, ¡…) ¡ • Itera'on ¡support ¡ • Rich ¡UDF ¡support ¡ – Inline ¡Scala ¡func'on ¡calls ¡/ ¡literals ¡ – Everything ¡callable ¡from ¡a ¡JVM ¡can ¡be ¡a ¡UDF ¡ • Support ¡in ¡major ¡IDEs ¡ 6/5/12 21

Example: ¡Batch ¡Gradient ¡Descent ¡ (BGD) ¡ Until Convergence: ' * ( ) ∑ ) , ( ) * w t − η w t + 1 = 1.0 − ηλ ∂ w l y , w t , x ( + ( x , y ) Regularization Data Parallel Sum w t : Current Model l: loss function (e.g. squared error) x: Data ∂ : Gradient operator y: Label 6/5/12 22

BGD ¡in ¡ScalOps ¡ Training ¡data; ¡ Table ¡is ¡ our ¡main ¡collec'on ¡type ¡ def ¡train(xy:Table[Example], ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡compute_grad:(Example, ¡Vector) ¡=> ¡Vector, ¡ Ini'alizer ¡ Loop ¡Condi'on ¡ Loop ¡Body ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡compute_loss:(Example, ¡Vector) ¡=> ¡Double) ¡= ¡{ ¡ ¡ ¡ ¡ ¡class ¡Env(w:VectorType, ¡lastError:DoubleType, ¡delta:DoubleType) ¡extends ¡Environment ¡ ¡ ¡ ¡ ¡val ¡initialValue ¡= ¡new ¡Env(VectorType.zeros(1000), ¡Double.MaxValue, ¡Double.MaxValue) ¡ ¡ ¡ Computes ¡a ¡gradient ¡ ¡ ¡loop(initialValue, ¡(env: ¡Env) ¡=> ¡env.delta ¡< ¡eps) ¡{ ¡env ¡=> ¡{ ¡ ¡ ¡ ¡ ¡ ¡ ¡val ¡gradient ¡= ¡xy.map(x=>compute_grad(x,env.w)).reduce(_+_) ¡ ¡ ¡ ¡ ¡ ¡ ¡val ¡loss ¡ ¡ ¡ ¡ ¡= ¡xy.map(x=>compute_loss(x,env.w)).reduce(_+_) ¡ Computes ¡the ¡loss ¡ ¡ ¡ ¡ ¡ ¡ ¡env.w ¡ ¡ ¡ ¡ ¡ ¡ ¡-‑= ¡gradient ¡ ¡ ¡ ¡ ¡ ¡ ¡env.delta ¡ ¡ ¡ ¡= ¡env.lastLoss ¡-‑ ¡loss ¡ ¡ ¡ ¡ ¡ ¡ ¡env.lastLoss ¡= ¡loss ¡ ¡ ¡ ¡ ¡ ¡ ¡env ¡ ¡ ¡ ¡ ¡} ¡ ¡ ¡} ¡ } ¡ Na've ¡UDFs ¡ ¡ ¡ ¡ 6/5/12 23

Spark!? ¡ • Scala ¡DSL ¡and ¡run'me ¡for ¡data ¡analy'cs ¡ – Op'mized ¡for ¡in-‑memory ¡computa'ons ¡ – Targets ¡machine ¡learning ¡algorithms ¡ val points = spark.textFile(...). map(parsePoint). • Logis'c ¡Regression ¡in ¡Spark ¡ partitionBy(HashPartitioner(NODES)). Physical ¡Layer ¡ ¡ cache() val points = spark.textFile(...). map(parsePoint). partitionBy(HashPartitioner(NODES)). cache() var w = Vector.random(D) // current separating plane for (i <- 1 to ITERATIONS) { val gradient = points.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ). reduce(_ + _) w -= gradient } 6/5/12 24

Declara've Systems for Large Scale Machine Learning - PowerPoint PPT Presentation

Declara've Systems for Large Scale Machine Learning Markus Weimer, Tyson Condie, Raghu Ramakrishnan Cloud and Informa'on Services Laboratory MicrosoA

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Large-Scale Machine Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science,

TensorFlow: A System for Learning-Scale Machine Learning Google Brain The Problem Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Reliability of Cloud-Scale Systems (CS 598) Fall 2018 Tianyin Xu 1 Reliability of Cloud-Scale

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Announcements Announcements Reading for Wednesday(Sep 21) The rest of the chapter 5

SAMS Programming - Section C Week 2 - Lecture 2: More strings + Nested loops + Style July 12,

CS 10: Problem solving via Object Oriented Programming Winter

CSS 161 Fundamentals of Compu3ng Flow control (4) October

On Emergent Misbehavior John Rushby With help from Hermann Kopetz Computer Science Laboratory

The Change- makers Toolkit Preparing Faculty to Make Academic Change Happen Julia M. Williams,

Giant gravitons, open strings and emergent geometry. David Berenstein, UCSB. Based mostly on

Implementation of Resilience via Operational Controls Art Conklin, University of Houston Funded

Sambuz

Useful Links

Newsletter

Mail Us