cs270 lecture 1 algorithms example problem clustering
play

CS270: Lecture 1. Algorithms. Example Problem: clustering. - PowerPoint PPT Presentation

CS270: Lecture 1. Algorithms. Example Problem: clustering. Undergraduate. This class. 1. Classical. Modern. Flavor of the week? 1. Overview Points: documents, dna, preferences. 2. Cleanly Stated Problems. Shortest Paths, max-flow, MST.


  1. CS270: Lecture 1. Algorithms. Example Problem: clustering. Undergraduate. This class. 1. Classical. Modern. Flavor of the week? 1. Overview ◮ Points: documents, dna, preferences. 2. Cleanly Stated Problems. Shortest Paths, max-flow, MST. Address problems; messy or not. Vaguely stated problems! 2. Administration ◮ Graphs: applications to VLSI, parallel processing, image 3. Solutions: effective precise bounds! segmentation. 3. Dueling Subroutines: Congestion/Tolls. Analysis sometimes based on modelling world. Ineffective ..imprecise! 4. Techniques: Greedy Dyn. Programming Linear Programming. Heuristic, in practice. 5. Techniques tend to be Combinatorial. Probabilistic, linear algebra methods, continuous. Image Segmentation Example: recommendations. Sarah Palin likes True Grit (the old one.) Sarah Palin doesn’t like The Social Network. Sarah Palin doesn’t like Black Swan. Sarah Palin likes Sarah Palin on Discovery channel. Hillary Clinton doesn’t like True Grit (the old one.) Image example. Hillary Clinton likes The Social Network. Which region? Normalized Cut: Find S , which minimizes Hillary Clinton likes Black Swan. Should you recommend the discovery channel to Hillary? w ( S , S ) . What about you? w ( S ) × w ( S ) Are you Hillary? Are you Sarah? A bit of both? Ratio Cut: minimize w ( S , S ) High dimensional data: dimension for each movie. w ( S ) , More than three dimensions! Nearest neighbors. Principal Components methods. w ( S ) no more than half the weight. (Minimize cost per unit Topic Models. weight that is removed.) Reasoning about these methods. Either is generally useful!

  2. Linear Systems. Other Algorithmic Techiniques CS270: Administration. Revolution! Sketching: Large stream of data: a 1 , a 2 ,... Physical Simulation. Airflow. 1. Staff: Find digest. Satish Rao Solve Ax = b . Di Wang Graphs: Sparse graph. How long? Data: average, statistics. 2. Piazza. Log in! Pay attention to “spam everyone” n × n matrix A . Points: center point, k -medians, . especially. Middle School: substitution, adding equations ... 3. Assessment. High Dimensional optimization. Time: O ( n 3 ) . Gradient Descent. Convexity. 3.1 Homeworks (40%). Homework 1 out tonight/tomorrow. Now: ˜ O ( m ) . Hmmm. What’s that tilde? Linear Algebra. 3.2 1 Homework/Midterm (25 %) Eigenvalues. Techniques: 3.3 Project (35%) Semidefinite Programming. Groups of 2 or 3. Relate graph theory to matrix properties. Connect research to class. Dense matrix (graph) to sparse matrix (graph). Dueling Subroutines. Duality. Or explore/digest a topic from class. Approximating distances by trees. Lower bounding, upper bounder. 3.4 No Discussion this week. Electrical networks analysis. Upper uses lower’s evidence to find better solutions. Lower uses upper’s evidence to prove better lower bounds. Combinatorial Applications: Better Max Flow! Path Routing. Terminology Algorithms? Route along any path. Feasible...but good? Given G = ( V , E ) , ( s 1 , t 1 ) ,..., ( s k , t k ) , find a set of k paths How far from optimal could it be? connecting s i and t i and minimize max load on any edge. (A) It is optimal! Routing: Paths p 1 , p 2 ,..., p k , p i connects s i and t i . s 2 (B) A factor of two. Congestion of edge, e : c ( e ) (C) A factor of k , in general. s 3 number of paths in routing that contain e . ————— Value: 3 (C) and (A). Congestion of routing: maximum congestion of any edge. s 1 t 1 t k Find routing that minimizes congestion (or maximum ··· t 2 t 1 Value: k . Value: 2 congestion.) t 3 Opt: 1. s k s 2 s 1 ··· t 2 Stupid..also depth first search lexicographically! Route along shortest path! Duh. Optimal use of “resources” ..or edges.

  3. Shortest Path Routing and Congestion. One problem... Shortest Path Routing. minimizes ∑ i ℓ ( p i ) . Total congestion: ∑ e c ( e ) where c ( e ) congestion of edge. How far from optimal? Why? Optimal? Factor 2? Factor k ? Minimize each path length minimizes total congestion. t k Let ℓ ( p i ) be the length of path p i . ··· t 2 t 1 Also minimizes average: 1 m ∑ e c ( e ) . Just a scaling! · · (A) ∑ i ℓ ( p i ) = ∑ e c ( e ) ? Value: k . Average load is lower bound on the lowest max congestion! . . (B) ∑ i ℓ ( p i ) > ∑ e c ( e ) ? . . > 2 k . . Opt: 1. Shortest path routing minimizes average load. (C) ∑ i ℓ ( p i ) < ∑ e c ( e ) ? · · Does it minimize maximum load? (A). Proof? s k s 2 s 1 ··· Path i uses ℓ ( p i ) edges. Edge used by c ( e ) paths. Totals be the same. ∑ i ℓ ( p i ) = ∑ i ∑ e ∈ p i 1 = ∑ e ∑ p i ∋ e 1 = ∑ e c ( e ) Shortest path routing minimizes total congestion. Another problem. Toll problem and Routing problem. Proving lower bound: notation. Given G = ( V , E ) , ( s 1 , t 1 ) ,..., ( s k , t k ) , find a set of k paths assign one unit of “toll” to edges to maximize total toll for connecting pairs. Possible solution: 1 m on each edge. Given G = ( V , E ) , ( s 1 , t 1 ) ,..., ( s k , t k ) , find a set of k paths assign one unit of “toll” to edges to maximize total toll for connecting pairs. Toll collected: ≥ ∑ i ℓ ( p i ) . m s 2 Familiar? d ( e ) - toll assigned to edge e . Find d : e → R with ∑ e d ( e ) = 1 which maximizes d ( p ) - total toll assigned to path p . 1 s 3 Assign 11 on each of 11 edges. d ( u , v ) - total assigned to shortest path between u and v . d ( · ) - ∑ 11 + 3 3 11 + 3 11 = 9 d ( s i , t i ) . s 1 Total toll: 11 i polymorpic: edges, paths, pairs. 1 t 1 Can we do better? 1 2 2 d ( s i , t i ) - shortest path between s i and t i under d ( · ) . Assign 1 / 2 on these two edges. t 3 Total toll: 1 2 + 1 2 + 1 2 = 3 Digression? 2 d ( e ) suggests a weighted average. t 2 Remember uniform average congestion is lower bound on congestion of routing! Optimal toll solution (weighted average congestion) is lower bound on congestion.

  4. Proving lower bound. Toll is lower bound. Algorithm. Routing solution: p i connects ( s i , t i ) and has length d ( p i ) . c ( e ) - congestion on edge e under routing. Max c ( e ) ? From before: Assign tolls. max e c ( e ) ≥ ∑ e c ( e ) d ( e ) since ∑ e d ( e ) = 1 . Max bigger than minimum weighted average: How to route? Shortest paths! c ( e ) d ( e ) = ∑ ∑ max e c ( e ) ≥ ∑ e c ( e ) d ( e ) d ( p i ) Assign routing. e i Total length is total congestion: ∑ e c ( e ) d ( e ) = ∑ i d ( p i ) How to assign tolls? Higher tolls on congested edges. = ∑ ∑ i ∑ d ( p i ) d ( e ) Each path, p i , in routing has length d ( p i ) ≥ d ( s i , t i ) . Toll: d ( e ) ∝ 2 c ( e ) . e ∈ p i i A path uses “volume” d ( p i ) . = ∑ c ( e ) ≥ ∑ d ( p i ) ≥ ∑ e ∑ d ( e ) c ( e ) d ( e ) = ∑ Equilibrium: max d ( s i , t i ) . e Volume on edge is d ( e ) c ( e ) . The shortest path routing has has d ( e ) ∝ 2 c ( e ) . i : e ∋ p i e i i = ∑ d ( e ) ∑ 1 ∑ i d ( p i ) = ∑ e d ( e ) c ( e ) . A toll solution is lower bound on any routing solution. The routing does not change, the tolls do not change. e i : e ∋ p i Any routing solution is an upper bound on a toll solution. = ∑ d ( e ) c ( e ) e max e c ( e ) ≥ ∑ e d ( e ) c ( e ) = ∑ i d ( p i ) ≥ ∑ i d ( s i , t i ) . Routing solution cost ≥ Any toll solution cost. How good is equilibrium? The end: sort of. Getting to equilibrium. Path is routed along shortest path and d ( e ) ∝ 2 c ( e ) . For e with c ( e ) ≤ c max − 2log m ; 2 c ( e ) ≤ 2 c max − 2log m = 2 cmax m 2 . Maybe no equilibrium! ≥ ∑ d ( s i , t i ) = ∑ c opt d ( e ) c ( e ) i e Approximate equilibrium: ∑ e ′ 2 c ( e ′ ) c ( e ) = ∑ e 2 c ( e ) c ( e ) 2 c ( e ) = ∑ Each path is routed along a path with length Let c t = c max − 2log m . ∑ e 2 c ( e ) within a factor of 3 of the shortest path and d ( e ) ∝ 2 c ( e ) . e Got to here in class. Feel free to continue reading. ∑ e : c ( e ) > c t 2 c ( e ) c ( e ) ≥ ∑ e : c ( e ) > c t 2 c ( e ) + ∑ e : c ( e ) ≤ c t 2 c ( e ) Lose a factor of three at the beginning. c opt ≥ ∑ i d ( s i , t i ) ≥ 1 ( c t ) ∑ e : c ( e ) > c t 2 c ( e ) 3 ∑ e d ( p i ) . ≥ We obtain c max = 3 ( 1 + 1 m ) c opt + 2log m . ( 1 + 1 m ) ∑ e : c ( e ) > c t 2 c ( e ) This is worse! ( c t ) = c max − 2log m ≥ What do we gain? 1 + 1 ( 1 + 1 m ) m Or c max ≤ ( 1 + 1 m ) c opt + 2log m . (Almost) within 2log m of optimal!

  5. An algorithm! Tuning... Wrap up. Algorithm: reroute paths that are off by a factor of three. (Note: d ( e ) recomputed every rerouting.) p : w ( p ) = X ⇒ w ′ ( p ) = X / 2 − 1 for c ( e ) = Dueling players: Replace d ( e ) = ( 1 + ε ) c ( e ) . Toll player raises tolls on congested edges. Congestion player avoids tolls. Replace factor of 3 by ( 1 + 2 ε ) + 1 for c ( e ) s i t i Converges to near optimal solution! c max ≤ ( 1 + 2 ε ) c opt + 2log m / ε . . (Roughly) p ′ : w ( p ′ ) ≤ X / 3 ⇒ w ′ ( p ′ ) ≤ 2 X / 3 = A lower bound is “necessary” (natural), Fractional paths? and helpful (mysterious?)! Potential function: ∑ e w ( e ) , w ( e ) = 2 c ( e ) Moving path: Divides w ( e ) along long path (with w ( p ) of X ) by two. Multiplies w ( e ) along shorter ( w ( p ) ≤ X / 3) path by two. − X 2 + X 3 = − X 6 . Potential function decreases. = ⇒ termination and existence. Done for the day..... ...see you on Thursday.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend