DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2018 // JOY ARULRAJ L E C T U R E # 0 2 : A C C E L E R A T I N G M A C H I N E L E A R N I N G I N F E R E N C E W I T H P R O B A B I L I S T I C P R E D I C A T E S
ANNOUNCEMENTS • Course webpage: – https://jarulraj.github.io/data-analytics-course/ • Start thinking about project topics – Read assigned papers for inspiration • No classes next week • Submit reviews in PDF format – GT username as filename GT 8803 // Fall 2018 2
TODAY’s PAPER • Accelerating Machine Learning Inference with Probabilistic Predicates – Query optimization – ML inference queries • Slides based on a presentation by Yao Lu @ SIGMOD 2018 GT 8803 // Fall 2018 3
TODAY’s PAPER MACHINE TRANSLATION QUERY PROCESSING STORAGE MANAGEMENT HARDWARE ACCELERATION LAYERS OF A DATA ANALYTICS SYSTEM GT 8803 // Fall 2018 4
TODAY’S AGENDA • Problem Overview • Key Idea • Technical Details • Experiments • Discussion GT 8803 // Fall 2018 5
ML INFERENCE ON BIG-DATA PLATFORMS • SQL + user-defined functions – On unstructured data blobs – Videos, Images, and Unstructured text GT 8803 // Fall 2018 6
ML INFERENCE Untrained neural network → Training → Inference on new data Source: What’s the Difference Between Deep Learning Training and Inference?, Michael Copeland, August 2016, NVIDIA Blog GT 8803 // Fall 2018 7
ML INFERENCE QUERY EXAMPLE • Find images of oranges Images → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → has person ? → has bear ? → has orange ? → ⋯ GT 8803 // Fall 2018 8
ML INFERENCE QUERY EXAMPLE • Inference takes time – Even when the predicate has low selectivity – Perhaps only 1-in-100 images have oranges • Reason – Every image has to be processed by all the UDFs Images → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result GT 8803 // Fall 2018 9
PROBLEM OVERVIEW • How can we accelerate such inference queries? Images → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result GT 8803 // Fall 2018 10
SOLUTION #1: PREDICATE PUSHDOWN • Traditional query optimization technique – Move filtering of data as close to the source as possible to avoid loading unnecessary data into higher-level operators Join Tables A, B → 𝐐𝐬𝐟𝐞𝐣𝐝𝐛𝐮𝐟𝐭 𝐩𝐨 𝐁, 𝐂 → Result • Cannot push predicates below the UDF – No “contains orange” column exists – Need to construct it using UDF GT 8803 // Fall 2018 11
SOLUTION #2: PRE-COMPUTING • Pre-computing all possible columns – High cost since too many UDFs & query predicates • Not a good fit for ad-hoc queries – Since only certain columns corresponding to certain images will be required • Not a good fit for online queries – Need to do inference on live data GT 8803 // Fall 2018 12
KEY IDEA • Accelerate queries by early filtering Images → Filter → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result GT 8803 // Fall 2018 13
KEY IDEA • Early filter constraints • Performance – Utility of data reduction >> Execution cost of early filter • Accuracy – Early filtering should not increase false negatives GT 8803 // Fall 2018 14
EARLY FILTERING Images → Filter → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result TRUE POSITIVE FALSE POSITIVE GT 8803 // Fall 2018 15
EARLY FILTERING Images → Filter → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result FALSE DATA NEGATIVE REDUCTION TRUE NEGATIVE GT 8803 // Fall 2018 16
EARLY FILTERING Images → Filter → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → σ ABCDEF → Result FALSE POSITIVE – FALSE NEGATIVE ↑ GT 8803 // Fall 2018 17
PROBABILISTIC EARLY FILTERING • Unlike queries on relational data – ML applications have in-built tolerance for errors – ML UDFs generate false positives & false negatives • So, filters can also be probabilistic! – Reducing accuracy can increase data reduction rate GT 8803 // Fall 2018 18
PROBABILISTIC PREDICATES • Goal: query speedup + desired accuracy – Train binary classifiers – Group input blobs into two categories • Blobs that disagree with the query predicate • Blobs that may agree with the query predicate • Classifiers are called probabilistic predicates – <Data reduction rate, Execution cost, Accuracy> GT 8803 // Fall 2018 19
PROBABILISTIC PREDICATES (PP s ) Images → PP ABCDEF → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → 𝜏 ABCDEF → Result 10-30 fps 50-1K fps • Low filter execution cost – High data reduction – Minimal impact on accuracy GT 8803 // Fall 2018 20
PROBABILISTIC PREDICATES (PP s ) Images → PP ABCDEF → 𝐕𝐄𝐆_𝐙𝐏𝐌𝐏𝐰𝟑 → 𝜏 ABCDEF → Result 10-30 fps 50-1K fps • Apply PP directly on raw blob – 5-1000x faster than UDF – Accuracy vs data-reduction curve GT 8803 // Fall 2018 21
SYSTEM WORKFLOW: BASELINE SYSTEM W/O PP s Query QO Input Plan Results a) Baseline System w/o PPs GT 8803 // Fall 2018 22
SYSTEM WORKFLOW: CONSTRUCTING PP s Queries Query Inputs Query Results PP Trainer PPs b) Constructing PPs GT 8803 // Fall 2018 23
SYSTEM WORKFLOW: FULL SYSTEM W/ PP s Query QO* Input PPs Plan* Results c) Full system w/ PPs GT 8803 // Fall 2018 24
challenges • How to build useful PPs – Good trade-off between data reduction rate, cost, and accuracy • Supporting complex query predicates? – Using simple PPs for ad-hoc queries GT 8803 // Fall 2018 25
PART-1: BUILDING USEFUL PP s • Probabilistic predicate – Can be thought of as a decision boundary separating two classes – Any classifier that can identify inputs far away from the decision boundary is an useful PP • Use different techniques for building PPs – Support vector machines (SVMs) – Deep neural networks, etc. GT 8803 // Fall 2018 26
SIMPLE PP USING LINEAR CLASSIFIER PP ABCDEF : 𝑔 x = w ⋅ x + b has orange has no orange PP discards 𝑔 𝑦 ≤ 𝑢ℎ Setting accuracy/ reduction tradeoff threshold ( 𝑢ℎ ) Accuracy = 3/3, Reduction = 5/10 - - - - - + - + - + Accuracy = 2/3, Reduction = 7/10 f(x) → th GT 8803 // Fall 2018 27
PP s for ARBITRARY DATA BLOBS • Input blob characteristics – Linearly separable or not – High dimensional data Documents – Sparse or dense Images Videos Audio GT 8803 // Fall 2018 28
PP s for ARBITRARY DATA BLOBS More ? h x f(x) w f kde (x)= f lsvm (x) d + (x) / d-(x) x f 1 (x) f 2 (x) f .. (x) Shallow DNN: Kernel Density Estimator: Linear SVM: Random forest etc. 𝑔 j x = j (𝑋 j ⋅ 𝑔 jin x + 𝑐 j ) < 𝑢ℎ 𝑔 x = kde h x / kde i x < 𝑢ℎ 𝑔 x = w ⋅ x + b < th any function that fits 𝑔 x < th Cost Cost Cost w/ GPU Training Training Training Inference Inference Inference Linearly-separable data Nonlinearly-separable data GT 8803 // Fall 2018 29
PP s for ARBITRARY DATA BLOBS More ? h x f(x) w f kde (x)= f lsvm (x) d + (x) / d-(x) x f 1 (x) f 2 (x) f .. (x) Shallow DNN: Kernel Density Estimator: Linear SVM: Random forest etc. 𝑔 j x = j (𝑋 j ⋅ 𝑔 jin x + 𝑐 j ) < 𝑢ℎ 𝑔 x = kde h x / kde i x < 𝑢ℎ 𝑔 x = w ⋅ x + b < th any function that fits 𝑔 x < th + + Dimension Reduction Model Selection Example: Feature Hashing, Select the best model Principal Component Analysis GT 8803 // Fall 2018 30
MODEL SELECTION • Given different PP methods, select best PP that maximizes data reduction rate – Test PP on a small sample of data • Model selection insights – Input dataset determines PP selection – Given a blob type, same PP applies for different predicates & accuracy thresholds GT 8803 // Fall 2018 31
PART-2: SUPPORTING COMPLEX PREDICATES • Queries with complex or new predicates – Large space of possible predicates – Costly to train/store a PP for each predicate – PPs for complex predicates do not generalize • Pick best PP combination – Query optimization problem – Inputs: available PPs, predicate, target accuracy – Goal: find PP combination ⇒ max reduction / cost GT 8803 // Fall 2018 32
COMBINING PP s USING QUERY OPTIMIZATION (QO) • Solution – Build PPs for simple predicates – Use QO to assemble PP combinations Red ∧ SUV PP z{|F PP uFv # PPs trained << # predicates Red PP }CD PP wxy SUV GT 8803 // Fall 2018 33
QUERY OPTIMIZATION OVER PP s • Predicate: 𝜏 ABCDEF ∨ 𝜏 •CDCDC ∧ 𝜏 €C• ∧ 𝜏 vAE • Conventional query optimization technique – Ordering predicates by data reduction/cost – Do not focus on combining predicates GT 8803 // Fall 2018 34
STEP #1: SELECT CANDIDATE PP EXPRESSIONS • Explore necessary conditions to satisfy predicate for improving speedup Available: PP €C• , PP vAE , PP ABCDEF , PP •CDCDC 𝜏 ABCDEF ∨ 𝜏 •CDCDC ∧ 𝜏 €C• ∧ 𝜏 vAE Necessary conds. ⇒ PP vAE ∧ PP €C• ∧ PP ABCDEF ∨ PP •CDCDC ⇒ PP vAE ∧ PP €C• ⇒ PP ABCDEF ∨ PP •CDCDC ⇒ PP €C• Greedily find a PP combination that ⇒ PP vAE has the best reduction / cost exponentially many choices GT 8803 // Fall 2018 35
Recommend
More recommend