Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! - PowerPoint PPT Presentation

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! Research and ANU http://alex.smola.org/teaching/berkeley2012 Stat 260 SP 12

3. Data Streams Building realtime *Analytics at home

Data Streams Data & Applications • Moments • Flajolet-Martin counter • Alon-Matias-Szegedy sketch • Heavy hitter detection • Lossy counting • Space saving • Semiring statistics • Bloom filter • CountMin sketch • Realtime analytics • Fault tolerance and scalability • Interpolating sketches

3.1 Streams

Data Streams • Cannot replay data • Limited memory / computation / realtime analytics • Time series Observe instances (x t , t) stock symbols, acceleration data, video, server logs, surveillance • Cash register Observe instances x i (weighted), always positive increments query stream, user activity, network traffic, revenue, clicks • Turnstile Increments and decrements (possibly require nonnegativity) caching, windowed statistics

Website Analytics NIPS • Continuous stream of users (tracked with cookie) • Many sites signed up for analytics service • Find hot links / frequent users / click probability / right now

Query Stream • Item stream • Find heavy hitters • Detect trends early (e.g. Obsama bin Laden killed) • Frequent combinations (cf. frequent items) • Source distribution • In real time

Network traffic analysis • TCP/IP packets • On switch with limited memory footprint • Realtime analytics • Busiest connections • Trends • Protocol-level data • Distributed information gathering

Financial Time Series • real time prediction • missing data • metadata (news, quarterly reports, financial background) • time-stamped data stream • multiple sources • different time resolution

News • Realtime news stream • Multiple sources (Reuters, AP, CNN, ...) • Same story from multiple sources • Stories are related

3.2 Moments

Warmup ? ... • Stream of m items x i • Want to compute statistics of what we’ve seen • Small cardinality n • Trivial to compute aggregate counts (dictionary lookup) • Memory is O(n) • Computation is O(log n) for storage & lookup • Large cardinality n • Exact storage of counts impossible • Exact test for previous occurrence impossible • Need approximate (dynamic) data structure

Finding the missing item • Sequence of instances [1..N] • One of them is missing • Identify it • Algorithm N X • Compute sum s := i i =1 • For each item decrement s via s ← s − x i • At the end identify missing item • We only need least significant log N bits

Finding the missing item • Sequence of instances [1..N] • Up to k of them are missing • Identify them • Algorithm N X i p • Compute sum for p up to k s p := i =1 • For each item decrement all s p via s p ← s p − x p i • Identify missing item by solving polynomial system • We only need least significant log N bits

Estimating F k

Moments • Characterize the skewness of distribution • Sequence of instances • Instantaneous estimates X n p F p := x • Special cases x ∈ X • F 0 is number of distinct items • F 1 is number of items (trivial to estimate) • F 2 describes ‘variance’ (used e.g. for database query plans)

Flajolet-Martin counter • Assume perfect hash functions (simplifies proof) • Design hash with Pr( h ( x ) = j ) = 2 − j log n bits 0 0 1 0 0 1 1 0 0 2 1 0 0 1 1 0 1 1 4 0 0 1 0 1 1 1 1 • Position of the rightmost 0 (LSB is position 1) • CDF for maximum over n items F ( j ) = (1 − 2 − j ) n (CDF of maximum over n random variables is F n )

Flajolet-Martin counter 0 0 1 0 0 1 1 0 0 2 1 0 0 1 1 0 1 1 4 0 0 1 0 1 1 1 1 • Intuitively expect that max x ∈ X h ( j ) ≈ log |X| • Repetitions of same element do not matter • Need O(log log |X|) bits to store counter • High probability bounding range ✓� � ◆ ≤ 2 � � Pr � max x ∈ X h ( j ) − log |X| � > log c � � c

Proof (for a version with 2-way independent hash functions see Alon, Matias and Szegedy) • Upper bound trivial |X| · 2 − j ≤ 1 ⇒ 2 j ≥ c |X| c = With probability at most 1/c the upper bound is exceeded (using union bound) • Lower bound • Probability of not exceeding j is bounded by ≤ 1 (1 − 2 − j ) |X| ≤ exp |X| · 2 − j � c ≤ e − c � Solve for j to obtain 2 j ≥ |X| c

Variations on FM counter • Lossy counting • Increment counter j to c with probability p -c for p<0.5 • Yields estimate of log-count (normalization!) • FM instead of bits inside Bloom filter ... more later • log n rather than log log n array • Set bit according to hash waste waste 0 0 0 0 0 1 0 1 1 0 1 1 1 1 • Count consecutive 1 instead of largest bit and fill gaps. • The log log bounds are tight (see AMS lower bound)

Computing F 2 • Strategy • Design random variable with E [ X ij ] = F 2 • Take average over subsets a X i := 1 X ¯ X ij a j =1 • Estimate is median ⇥ ¯ ¯ X 1 , . . . , ¯ ⇤ X := med X b • Random variable # 2 " X X ij := σ ( x, i, j ) x ∈ stream • σ is Rademacher hash with equiprobable {± 1 } • In expectation all cross terms cancel out yielding F 2

Average-Median Theorem • Random variables X ij with mean μ , variance σ a ⇥ ¯ • Mean estimate and X i := 1 ¯ X 1 , . . . , ¯ X ¯ ⇤ X := med X b X ij a j =1 • The probability of deviation is bounded by ≤ � for a = 8 � 2 ✏ − 2 and b = − 8 | ¯ � X − µ | ≥ ✏ 3 log � Pr • Note - Alon, Matias & Szegedy claim b = − 2 log δ but the Chernoff bounds don’t work out AFAIK

Proof • Bounding the mean Pick and apply Chebyshev bound to see a = 8 � 2 ✏ − 2 ≤ 1 that | ¯ � X i − µ | > ✏ Pr 8 • Bounding the median • Ensure that for at least half deviation is small ¯ X i • Failure probability is at most 1/8 • Chernoff (Mitzenmacher & Upfahl Theorem 4.4) Pr { x ≥ (1 + δ ) µ ) } ≤ e − µ δ 2 3 Plug in ✓ ◆ ✏ = 3; µ = b − 3 b and b ≤ − 8 8 hence � ≤ exp 3 log � 8

Computing F 2 • Mean # 2 # 2 " "X X X n 2 E [ X ij ] = E σ ( x, i, j ) = E σ ( x, i, j ) n x = x x ∈ stream x ∈ X x ∈ X • Variance # 4 " X X X X 2 n 2 x n 2 n 4 ⇥ ⇤ = E σ ( x, i, j ) = 3 x 0 − 2 E ij x x ∈ stream x,x 0 ∈ X x ∈ X X X X 2 n 2 x n 2 n 4 x ≤ 2 F 2 ⇥ ⇤ − [ E [ X ij ]] 2 = 2 x 0 − 2 E ij 2 x,x 0 ∈ X x ∈ X • Plugging into the Average-Median theorem shows that algorithm uses bits O ( ✏ − 2 log(1 / � ) log |X| n )

Computing F k in general • Random variable with expectation F k • Pick uniformly random element in sequence • Start counting instances until end a s r a n d o m a s c a n b e 3 1 2 3 1 1 • Use count r ij for r k ij − ( r ij − 1) k � � X ij = m • Apply the Average-Median theorem

More F k • Mean via telescoping sum h 1 k + (2 k − 1 k ) + . . . + ( n k 1 − ( n 1 − 1) k ) E [ X ij ] = i + . . . + ( n k |X| − ( n |X| − 1) k ) X n k = x = F k x ∈ X • Variance by brute force algebra Var [ X ij ] ≤ E [ X ij ] ≤ k |X| 1 − 1 /k F 2 k • We need at most O ( k |X| 1 − 1 /k ✏ − 2 log 1 / � (log m + log |X| ) bits to estimate F k . The rate is tight.

More F k • Mean via telescoping sum h 1 k + (2 k − 1 k ) + . . . + ( n k 1 − ( n 1 − 1) k ) E [ X ij ] = i + . . . + ( n k |X| − ( n |X| − 1) k ) no better than brute X n k = x = F k force for large k x ∈ X • Variance by brute force algebra Var [ X ij ] ≤ E [ X ij ] ≤ k |X| 1 − 1 /k F 2 k • We need at most O ( k |X| 1 − 1 /k ✏ − 2 log 1 / � (log m + log |X| ) bits to estimate F k . The rate is tight.

Uniform sampling

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! - PowerPoint PPT Presentation

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! Research and ANU http://alex.smola.org/teaching/berkeley2012 Stat 260 SP 12 3. Data Streams Building realtime *Analytics at home Data Streams Data & Applications Moments

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

News from the Sudbury Neutrino Observatory (SNO) Christine Kraus TAUP conference, Sendai,

Small Angle Scattering (SAXS/SANS) Small Angle Scattering (SAXS/SANS) Small Angle Scattering

Compact Summaries for Big Data Large Datasets Graham Cormode University of Warwick

Programmable timing functions Part 1: Timer-generated interrupts Textbook: Chapter 15,

1. X-ray and gamma-ray Astronomy PhD Course, University of Padua Page 1 High Energy and Time

Unit 4: Performance & Benchmarking CPU Performance Performance Pitfalls

Fixed Income Investor Presentation FY 2016 Results 24 February 2017 Ewen Stevenson Chief

Dynamo: Amazons Highly Available Key-value Store Josh Blum | 6.S897 | 09/28/2015 Introduction

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! - PowerPoint PPT Presentation

Scalable Machine Learning 3. Data Streams Alex Smola Yahoo! Research and ANU http://alex.smola.org/teaching/berkeley2012 Stat 260 SP 12 3. Data Streams Building realtime *Analytics at home Data Streams Data & Applications Moments

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

News from the Sudbury Neutrino Observatory (SNO) Christine Kraus TAUP conference, Sendai,

Small Angle Scattering (SAXS/SANS) Small Angle Scattering (SAXS/SANS) Small Angle Scattering

Compact Summaries for Big Data Large Datasets Graham Cormode University of Warwick

Programmable timing functions Part 1: Timer-generated interrupts Textbook: Chapter 15,

1. X-ray and gamma-ray Astronomy PhD Course, University of Padua Page 1 High Energy and Time

Unit 4: Performance &amp; Benchmarking CPU Performance Performance Pitfalls

Fixed Income Investor Presentation FY 2016 Results 24 February 2017 Ewen Stevenson Chief

Dynamo: Amazons Highly Available Key-value Store Josh Blum | 6.S897 | 09/28/2015 Introduction

Unit 4: Performance & Benchmarking CPU Performance Performance Pitfalls