Algorithms for Big Data (V) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (V) Chihao Zhang Shanghai Jiao Tong University Oct. 18, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review of the Last Lecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review of the Last Lecture Last time, we learnt Misra-Gries and Count Sketch for Frequency Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review of the Last Lecture Last time, we learnt Misra-Gries and Count Sketch for Frequency Estimation . The later has the advantage of being a linear sketch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Review of the Last Lecture Last time, we learnt Misra-Gries and Count Sketch for Frequency Estimation . The later has the advantage of being a linear sketch. It also generalize to turnstile model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Count Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Count Sketch Algorithm Count Sketch Init: 3 An array C [ j ] for j ∈ [ k ] where k = ε 2 . A random Hash function h : [ n ] → [ k ] from a 2-universal family. A random Hash function g : [ n ] → { − 1, 1 } from a 2-universal family. On Input ( y , ∆ ) : C [ h ( y )] ← C [ h ( y )] + ∆ · g ( y ) Output: On query a : Output � f a = g ( a ) · C [ h ( a )] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Performance We can apply the median trick to obtain: [� � ] � � ▶ Pr �� ⩾ ε ∥ f ∥ 2 ⩽ δ ; f a − f a ( 1 ) ▶ it costs O ε 2 log 1 δ ( log m + log n ) bits of memeory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The Performance We can apply the median trick to obtain: [� � ] � � ▶ Pr �� ⩾ ε ∥ f ∥ 2 ⩽ δ ; f a − f a ( 1 ) ▶ it costs O ε 2 log 1 δ ( log m + log n ) bits of memeory. Today we will see another simple sketch algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Count-Min . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Count-Min We assume that for each entry ( y , ∆ ) , it holds that ∆ ⩾ 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Count-Min We assume that for each entry ( y , ∆ ) , it holds that ∆ ⩾ 0. Algorithm Count-Min Init: An array C [ i ][ j ] for i ∈ [ t ] and j ∈ [ k ] where t = log ( 1 /δ ) and k = 2 /ε . Choose t independent random Hash function h 1 , . . . , h t : [ n ] → [ k ] from a 2-universal family. On Input ( y , ∆ ) : For each i ∈ [ t ] , C [ i ][ h i ( y )] ← C [ i ][ h i ( y )] + ∆ . Output: On query a : Output � f a = min 1 ⩽ i ⩽ t C [ i ][ h ( a )] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis Obviously we have f a ⩽ � f a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis Obviously we have f a ⩽ � f a . Our algorithm overestimates only if for some b ̸ = a , h i ( b ) = h i ( a ) . Let Y i , b be the indicator of this event. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis Obviously we have f a ⩽ � f a . Our algorithm overestimates only if for some b ̸ = a , h i ( b ) = h i ( a ) . Let Y i , b be the indicator of this event. Let X i be C [ i ][ h i ( a )] . Then [ ] ∑ b ∈ [ n ]: b ̸ = a f b ⩽ f a + ∥ f ∥ 1 � ∑ ∑ E X i = f b E [ Y i , b ] = f a + f b E [ Y i , b ] = f a + k . k b ∈ [ n ] b ∈ [ n ]: b ̸ = a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis Obviously we have f a ⩽ � f a . Our algorithm overestimates only if for some b ̸ = a , h i ( b ) = h i ( a ) . Let Y i , b be the indicator of this event. Let X i be C [ i ][ h i ( a )] . Then [ ] ∑ b ∈ [ n ]: b ̸ = a f b ⩽ f a + ∥ f ∥ 1 � ∑ ∑ E X i = f b E [ Y i , b ] = f a + f b E [ Y i , b ] = f a + k . k b ∈ [ n ] b ∈ [ n ]: b ̸ = a Thus, ∥ f ∥ 1 = 1 Pr [ | X i − f a | ⩾ ε ∥ f ∥ 1 ] ⩽ 2. kε ∥ f ∥ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Since our output is the minimum out of t independent X i ’s, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Since our output is the minimum out of t independent X i ’s, [ ] � f a − f a ⩾ ε ∥ f ∥ 1 = Pr [ | min { X 1 , . . . , X t } − f a | ⩾ ∥ f ∥ 1 ] Pr [ t ] ∧ ( | X i − f a | ⩾ ε ∥ f ∥ 1 ) = Pr i = 1 t Pr [ | X i − f a | ⩾ ε ∥ f ∥ 1 ] ⩽ 2 − t = δ . ∏ = i = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Since our output is the minimum out of t independent X i ’s, [ ] � f a − f a ⩾ ε ∥ f ∥ 1 = Pr [ | min { X 1 , . . . , X t } − f a | ⩾ ∥ f ∥ 1 ] Pr [ t ] ∧ ( | X i − f a | ⩾ ε ∥ f ∥ 1 ) = Pr i = 1 t Pr [ | X i − f a | ⩾ ε ∥ f ∥ 1 ] ⩽ 2 − t = δ . ∏ = i = 1 The algorithm computes a linear sketch using ( 1 ) ε log 1 O δ · ( log m + log n ) bits of memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Since our output is the minimum out of t independent X i ’s, [ ] � f a − f a ⩾ ε ∥ f ∥ 1 = Pr [ | min { X 1 , . . . , X t } − f a | ⩾ ∥ f ∥ 1 ] Pr [ t ] ∧ ( | X i − f a | ⩾ ε ∥ f ∥ 1 ) = Pr i = 1 t Pr [ | X i − f a | ⩾ ε ∥ f ∥ 1 ] ⩽ 2 − t = δ . ∏ = i = 1 The algorithm computes a linear sketch using ( 1 ) ε log 1 O δ · ( log m + log n ) bits of memory. It can be generalized to turnstile model (Exercise). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Frequency Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Frequency Moments The k -th frequency moment of a stream is ∑ f k j = ∥ f ∥ k F k ≜ k . j ∈ [ n ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Frequency Moments The k -th frequency moment of a stream is ∑ f k j = ∥ f ∥ k F k ≜ k . j ∈ [ n ] For example, F 2 is the size of self-join of a relation r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Frequency Moments The k -th frequency moment of a stream is ∑ f k j = ∥ f ∥ k F k ≜ k . j ∈ [ n ] For example, F 2 is the size of self-join of a relation r . Many problems we met before can be viewed as estimating F k for some special k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMS Estimator for F k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMS Estimator for F k Given ⟨ a 1 , . . . , a m ⟩ , then algorithm first sample a uniform index J ∈ [ m ] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMS Estimator for F k Given ⟨ a 1 , . . . , a m ⟩ , then algorithm first sample a uniform index J ∈ [ m ] . It then count the number of entries a j with a j = a J and j ⩾ J . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMS Estimator for F k Given ⟨ a 1 , . . . , a m ⟩ , then algorithm first sample a uniform index J ∈ [ m ] . It then count the number of entries a j with a j = a J and j ⩾ J . Algorithm AMS Estimator for F k Init: ( m , r , a ) ← ( 0, 0, 0 ) . On Input ( y , ∆ ) : m ← m + 1, β ∼ Ber ( 1 m ) ; if β = 1 then a ← y , r ← 0; end if if y = a then r ← r + 1 end if Output: ( r k − ( r − 1 ) k ) . m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis We first compute the expectation of the output X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithms for Big Data (V) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (V) Chihao Zhang Shanghai Jiao Tong University Oct. 18, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Review of the Last Lecture . . . .

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms & Data Structures Tuesday,

Analysis of Algorithms & Big-O CS16: Introduction to Algorithms & Data Structures Spring

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Observable Gravitational Waves from Axion-Like Particles Bhupal Dev Washington University in St.

Affinity Group 1 July 10, 2018 The University of Wisconsin Service Center will Serve the

T ak es a state and input sym b ol as argumen ts. Returns a state.

3.5: Isomorphism of Finite Automata Let M and N be the finite automata 0 0 C 0 1 Start A

Advanced Algorithms Polynomial Identity Testing (PIT) two polynomials f,

Conditional Sum Adders 8-Bit Conditional Sum Adder: Level 1 Deriving Design for Bits 4-7 =

Nondeterminism (Deterministic) FA required for every state q and every symbol of the alphabet

Liberal Safety for Answer Set Programs with External Sources Thomas Eiter, Michael Fink, Thomas

Algorithms for Big Data (V) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (V) Chihao Zhang Shanghai Jiao Tong University Oct. 18, 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Review of the Last Lecture . . . .

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms &amp; Data Structures Tuesday,

Analysis of Algorithms &amp; Big-O CS16: Introduction to Algorithms &amp; Data Structures Spring

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Observable Gravitational Waves from Axion-Like Particles Bhupal Dev Washington University in St.

Affinity Group 1 July 10, 2018 The University of Wisconsin Service Center will Serve the

T ak es a state and input sym b ol as argumen ts. Returns a state.

3.5: Isomorphism of Finite Automata Let M and N be the finite automata 0 0 C 0 1 Start A

Advanced Algorithms Polynomial Identity Testing (PIT) two polynomials f,

Conditional Sum Adders 8-Bit Conditional Sum Adder: Level 1 Deriving Design for Bits 4-7 =

Nondeterminism (Deterministic) FA required for every state q and every symbol of the alphabet

Liberal Safety for Answer Set Programs with External Sources Thomas Eiter, Michael Fink, Thomas

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms & Data Structures Tuesday,

Analysis of Algorithms & Big-O CS16: Introduction to Algorithms & Data Structures Spring

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data