today simplest
play

Today Simplest.. Load balance: m balls in n bins. n k n k n - PowerPoint PPT Presentation

Today Simplest.. Load balance: m balls in n bins. n k n k n ne k For simplicity: n balls in n bins. k ! k k k Round robin: load 1 ! Load balancing. Centralized! Not so good. = n ( n 1 ) ( n


  1. Today Simplest.. Load balance: m balls in n bins. ≤ n k � n � k � n � � ne � k For simplicity: n balls in n bins. ≤ k ! ≤ k k k Round robin: load 1 ! Load balancing. Centralized! Not so good. = n ( n − 1 ) ··· ( n − k + 1 ) Balls in Bins. � n = n k · n − 1 k − 1 ··· n − k + 1 ≥ n k · n k ··· n � k k ( k − 1 ) · 1 1 k Uniformly at random? Average load 1. Power of two choices. n ( n − 1 ) ··· ( n − k + 1 ) ≤ n k Max load? Cuckoo hashing. � k � k k ! ≥ e n . Uh Oh! Max load with probability ≥ 1 − δ ? δ = 1 n c for today. c is 1 or 2. Balls in bins. Power of two.. Analysis. n / 8 balls in n bins. For each of n balls, choose random bin: X i balls in bin i . Each ball chooses two bins at random. Pr [ X i ≥ k ] ≤ ∑ S ⊆ [ n ] , | S | = k Pr [ balls in S chooses bin i ] picks least loaded. From Union Bound: Pr [ ∪ i A i ] ≤ ∑ i Pr [ A i ] n balls in n bins. View as graph. � 1 � k � n � Pr [ balls in S chooses bin i ] = and subsets S . Bin is vertex. Choose two bins, pick least loaded. n k � k Each ball is edge. � n �� 1 Pr [ X i ≥ k ] ≤ still distributed, but a bit less than not looking. Analysis Intuition: k n Add edge, add one to lower endpoint’s “count.” Is max load lower? Yes? No? Yes. � k n k � 1 = 1 ≤ Max load is max vertices count. k ! n k ! How much lower? If max count is k . � neighbors with counts ≥ k − 1 , k − 2 , k − 3 ,... . log n / 2? log n ? O ( loglog n ) ? Choose k , so that Pr [ X i ≥ k ] ≤ 1 n 2 . and so on! O ( loglog n ) ! ! ! ! No cycles and max-load k → ≥ 2 k / 2 nodes in tree. Pr [ any X i ≥ k ] ≤ n × 1 n 2 = 1 n → max load ≤ k w.p. ≥ 1 − 1 n No connected component of size X and no cycles k ! ≥ n 2 for k = 2 e log n (Recall k ! ≥ ( k e ) k .) Exponentially better! Old bound is exponential of new bound. = ⇒ max load O ( log X ) . Lemma: Max load is Θ( log n ) with probability ≥ 1 − 1 n . Will show: Much better than n . Max conn. comp is O ( log n ) w.h.p. Actually Max load is Θ( log n / loglog n ) w.h.p. Average induced degree is small. (E.g.: cycle degree 2) (W.h.p. - means with probability at least 1 − O ( 1 / n c ) for today.) Extend tree intuition.

  2. Connected Component. Not dense. Removal Process! Random Graph: Component size is c log n and max-induced degree Induced degree of node on subset, S , is degree of internal edges. Claim: Component size in n vertex, n is 8 w.h.p. 8 edge random graph is O ( log n ) w/ prob. ≥ 1 − 1 n c . Process: Remove degree ≤ 16 nodes pause and incident edges. Repeat. Proof: Size k component, C , contains ≥ k − 1 edges. Claim: O ( log X ) iterations where X is max component size. �� n / 8 For any connected component: � 2 ( k − 1 ) � n �� k Pr [ | C | ≥ k ] ≤ (1) Induced degree of nodes in blue subset is 2, not 5! Average induced degree 8 → half nodes w/degree ≤ 16. k k − 1 n → half nodes removed in each iteration. Claim: Average induced degree on any subset of nodes is ≤ 8 with Possible C . Which edges. Prob. both endpoints inside C . → log X iterations to remove all nodes. probability ≥ 1 − O ( 1 n 2 ) . Claim: Max load is O ( loglog n ) w.h.p. Proof: Induced degree ≥ 8 � 2 k n � n �� n / 8 �� k → 4 k internal edges for subset of size k . Recall edge corresponds to ball. Pr [ | C | ≥ k ] ≤ k k k n Height of ball, h i , is load of bin when it is placed in bin. � 4 k � k � 8 k � 3 k � 3 k � e 1 . 25 � n �� n / 8 �� k � k Corresponding edge removed in iteration r i . � 2 k � k � e 2 � k � ne � k � k n � ne = n ≤ n Pr [ dense S ] ≤ ≤ ≤ k ( 0 . 93 ) k Property: h i ≤ 16 r i . ≤ (2) k 4 k n 32 n n k k 8 k n k 8 Case r i = 1 - only 16 balls incident to bin → h i ≤ 16. Starts at 1 / n 3 , decreasing till k ≤ n / 8 (at least) Induction: Previous removed edges(ball) induce load ≤ 16 ( r i − 1 ) . Choose k = − ( c + 1 ) log . 93 n make probability ≤ 1 / n c . → Total O ( 1 / n 2 ) . + 16 edges/balls this iteration. → h i ≤ 16 r i . Power of two choices. Cuckoo hashing. Sum up Hashing with two choices: max load O ( loglog n ) . Cuckoo hashing: Array. Two hash functions h 1 , h 2 . Insert x : place in h 1 ( x ) or h 2 ( x ) if space. Else bump elt y in h i ( x ) u.a.r. Bump y , x : place y in h i ( y ) � = h i ( x ) if space. Max load: log X where X is max component size. Else bump y ′ in h i ( y ) . Balls in bins: Θ( log n / loglog n ) load. X is O ( log n ) with high probability. Power of two: Θ( loglog n ) . If go too long. Fail. Rehash entire hash table. Max load is O ( loglog n ) . Fails if cycle. Cuckoo hashing. C l - event of cycle of length l . � m �� l � l � 2 ( l + 1 ) � e 2 �� n Pr [ C l ] ≤ ≤ (3) l + 1 l n 8 � l � e 2 Probability that an insert makes a cycle of length l ≤ l n 8 Rehash every Ω( n ) inserts (if ≤ n / 8 items in table.) O ( 1 ) time on average.

  3. See you on Thursday...

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend