tutorial theory of rash made easy
play

Tutorial: Theory of RaSH Made Easy Benjamin Doerr - PowerPoint PPT Presentation

Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut fr Informatik Saarbrcken Outline: Some Theory of RaSH Part I: Drift Analysis Motivation: Explains daily life A simple and powerful drift theorem 4


  1. Tutorial: Theory of RaSH Made Easy Benjamin Doerr Max-Planck-Institut für Informatik Saarbrücken

  2. Outline: Some Theory of RaSH � Part I: Drift Analysis – Motivation: Explains daily life – A simple and powerful drift theorem – 4 Applications � Coupon collector � RLS and (1+1) EA optimize OneMax � RLS and (1+1) EA optimize linear functions � Finding miminum spanning trees – Summary & Outlook � Part II: Random walk arguments [if time permits] Benjamin Doerr

  3. Drift Analysis: Motivation � Life in the Saarland is easy... – Get salary on day 0: M 0 = 1000 € (6 559.57 ₣ ) – Day 1: Spend half of it in the pub: M 1 = ½ M 0 = 500 – Day 2: Spend half of your money: M 2 = ½ M 1 = 250 – … – Day t: Spend half of your money: M t = ½ M t-1 – Question: When are you broke (M T < 1)? – Answer: T = ⌊ log 2 (M 0 ) + 1 ⌋ = 10 Benjamin Doerr

  4. Drift Analysis: Motivation +randomness � Life in the Saarland is easy... and chaotic – Get salary on day 0: M 0 = 1000 € (6 559.57 ₣ ) – Day 1: Expect to spend half of it: E(M 1 ) = ½ M 0 = 500 – Day 2: Expect to spend half of your money: E(M 2 ) = ½ M 1 – … – Day t: Expect to spend half of your money: E(M t ) = ½ M t-1 – Question: When do you expect to be broke? Truth: 10.95 is possible – Ideal answer: E(T) = ⌊ log 2 (M 0 ) + 1 ⌋ = 10 – Warnung: You hope for E(min{T|M T <1}) = min{T|E(M T )<1} = 10 – Solution: Drift-Theorem (next slide) E(M t ) = (1/2) t M 0 Benjamin Doerr

  5. Drift Analysis: The Theorem � A ‘new’ drift theorem (BD, Leslie Goldberg, Daniel Johannsen): ��� X � , X � , . . . �� ������ ��������� ������ ������ �� { � } ∪ �� , ∞ � � ��� δ > � �� ���� ���� ��� ��� t ∈ � � E � X t | X t − � � x � ≤ �� − δ � x � ��� X � ����� x � ���� ����������� ���� ��� T � ��� { t | X t � � } � ���� ��� E � T � ≤ � δ ��� x � � �� � ���� ��� ��� c > � ��� n ∈ � � ��� T > � δ ��� x � � c �� n �� ≤ n − c � � Some history: – Doob (1953), Tweedie (1976), Hajek (1982): Fundamental work, mathematical. – Early EA works (‘Dortmund’ 1995- ): Use direct methods, coupon collector, Chernoff bounds, ... [could have been done with drift] – Expected weight decrease method: ‘Drift-thinking’, but technical effort necess- ary to cope with not using drift analysis [should have been done with drift] – He&Yao (2001-04): First explicit use of drift analysis in EA theory. – Now: Many drift theorems and applications [BD: the above is the coolest ☺ ] Benjamin Doerr

  6. Drift Analysis: 4 Applications � A ‘new’ drift theorem (BD, Leslie Goldberg, Daniel Johannsen): ��� X � , X � , . . . �� ������ ��������� ������ ������ �� { � } ∪ �� , ∞ � � ��� δ > � �� ���� ���� ��� ��� t ∈ � � E � X t | X t − � � x � ≤ �� − δ � x � ��� X � ����� x � ���� ����������� ���� ��� T � ��� { t | X t � � } � ���� ��� E � T � ≤ � δ ��� x � � �� � ���� ��� ��� c > � ��� n ∈ � � ��� T > � δ ��� x � � c �� n �� ≤ n − c � � 4 Applications: – Coupon collector – OneMax – Linear functions – Minimum spanning trees � Making the Expected Weight Decrease Method obsolete Benjamin Doerr

  7. Application 1: Coupon Collector � Coupon Collector Problem: – There are n different types of coupons: T 1 , …, T n – Round 0: You start with no coupon – Each round t, you obtain a random coupon C t � Pr(C t = T k ) = 1/n for all t and k – After how many rounds do you have all [types of] coupons? � Analysis: – X t := Number of missing coupon types after round t – X 0 = n. Question: Smallest T such that X T = 0. – It X t-1 = x, then the chance to get a new coupon in round t is x/n. Hence E(X t ) = x – x/n = (1 – 1/n) x. [ δ = 1/n] Best – Drift-Thm gives: • E(T) ≤ (1/ δ )(ln x 0 + 1) = n (ln(n)+1) possible • For all c>0, Pr(T > (c+1) n ln(n)) < n -c Benjamin Doerr

  8. Application 2: RLS optimizes OneMax � One of the most simple randomized search heuristics (RaSH): Randomized Local Search (RLS), here used to maximize f: {0,1} n → R RLS: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. Pick k ∈ {1, …, n} uniformly at random 3. y := x; y k := 1 – x k % mutation: flip a random bit 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � Question: How long does it take to find the maximum of a simple function like OneMax = f: {0,1} n → R; x ↦ x 1 + x 2 + … + x n (number of ‘ ones ’ in x) � Remark: Of course, x = (1, 1, … , 1) is the maximum, and no-one needs an algorithm to find this out. Aim: Start understanding RaSH via simple examples Benjamin Doerr

  9. Application 2: RLS optimizes OneMax � RLS: x 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. Pick k ∈ {1, …, n} uniformly at random 3. y := x; y k := 1 – x k % mutation: flip a random bit 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � Question: How long does it take to find the maximum of a simple function like OneMax = f: {0,1} n → R; x ↦ x 1 + x 2 + … + x n (number of ‘ ones ’ in x) � Analysis (same as for coupon collector): – X t : Number of zeroes after iteration t (= “ f opt – f(x) ” ). Trivially, X 0 ≤ n – If X t-1 = k, then with probability k/n, we flip a ‘ zero ’ into a ‘ one ’ (X t = k – 1). Otherwise, y is worse than x and thus X t = k – Hence, E(X t ) = k – k/n = (1 – 1/n) k – Drift Thm gives: Maximum found after n (ln n +1) iterations (in expect.) Benjamin Doerr

  10. Application 2a: (1+1)-EA optimizes OneMax � One of the most simple evolutionary algorithms (EAs): (1+1)-EA, again used to maximize f: {0,1} n → R (1+1)-EA: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. y := x 3. For each i ∈ {1, …, n} do % mutation: Flip each bit w.p. 1/n with probability 1/n set y i := 1 – x i 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � ‘(1+1)’: population size = 1, generate 1 off-spring, perform ‘plus’-selection: choose new population from parents and off-springs � Cannot get stuck in local optima (“always converges”). Question: Time to maximize OneMax = f: {0,1} n → R; x ↦ x 1 + … + x n ? � Benjamin Doerr

  11. Application 2a: (1+1)-EA optimizes OneMax � X (1+1)-EA: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. y := x 3. For each i ∈ {1, …, n} do % mutation: Flip each bit w.p. 1/n with probability 1/n set y i := 1 – x i 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate Question: Time to maximize OneMax = f: {0,1} n → R; x ↦ x 1 + … + x n ? � � Analysis: – X t : Number of zeroes after iteration t. – If X t-1 = k, then the probability that exactly one of the missing bits is flipped, is (1 – 1/n) n-1 (1/n) k ≥ (1/e) (k/n). – Hence, E(X t ) ≤ (k – 1)(k/en) + k(1 – k/en) = k (1 – 1/en) – Drift Thm: Expected optimization time at most en(ln n + 1) Benjamin Doerr

  12. A 3: RLS optimizes Linear Functions � x RLS: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. Pick i ∈ {1, …, n} uniformly at random 3. y := x; y i := 1 – x i % mutation: flip a random bit 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate � Question: How long does it take to find the maximum of an arbitrary linear function f: {0,1} n → R; x ↦ a 1 x 1 + a 2 x 2 + … + a n x n (wlog 0<a 1 ≤a 2 ≤ … ≤a n ) � Analysis (same as for OneMax): – X t : Number of zeroes after iteration t. Trivially, X 0 ≤ n – If X t-1 = k, then with probability k/n, we flip a ‘ zero ’ into a ‘ one ’ (X t = k – 1). Otherwise, y is worse than x and thus X t = k – Message: You can use X t different from “ f opt – f(x t ) ” ! Drift Thm: E(T) ≤ (1/ δ )(ln X 0 +1), – Why not X t = “ f opt – f(x t ) ” ? and X 0 can be large! Benjamin Doerr

  13. A 3a: (1+1)-EA optimizes Linear Functions � x (1+1)-EA: 1. Pick x ∈ {0,1} n uniformly at random % random start-point 2. y := x 3. For each i ∈ {1, …, n} do % mutation: Flip each bit w.p. 1/n with probability 1/n set y i := 1 – x i 4. if f(y) ≥ f(x), then x := y % selection: keep the fitter 5. if not happy, go to 2. % repeat or terminate Maximize f: {0,1} n → R; x ↦ a 1 x 1 + a 2 x 2 + … + a n x n � (wlog 0<a 1 ≤a 2 ≤ … ≤a n ) ! � Classical difficult problem – Droste, Jansen, Wegener (2002): Exp. opt. time E(T) = O(n log n) – He, Yao (2001-04): E(T) = O(n log n) via drift analysis – J ä gersk ü pper (2008): E(T) ≲ 2.02 e n ln(n) via average drift analysis – D., Johannsen, Winzen (2010): e n ln(n) ≲ E(T) ≲ 1.39 e n ln(n) – D., Goldberg (2010 + ): O(n log n) whp for any c/n mutation probability Benjamin Doerr

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend