PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Jialin LIU advised by: Dr. - - PowerPoint PPT Presentation

portfolio methods in uncertain contexts
SMART_READER_LITE
LIVE PREVIEW

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Jialin LIU advised by: Dr. - - PowerPoint PPT Presentation

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Jialin LIU advised by: Dr. Olivier Teytaud, Researcher 1st class of Inria Saclay Dr. Marc Schoenauer, VP for research of Inria Saclay TAO, Inria Saclay, Univ.


slide-1
SLIDE 1

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS

Jialin LIU advised by:

  • Dr. Olivier Teytaud, Researcher 1st class of Inria Saclay
  • Dr. Marc Schoenauer, VP for research of Inria Saclay

TAO, Inria Saclay, Univ. Paris-Saclay, UMR CNRS 8623, France

March 2013 - December 2015

1 / 77

slide-2
SLIDE 2

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS

Profile

2010: Bachelor’s degree in Opto-electronic Information Engineering, Huazhong University of Science and Technology (HUST), China. 2012: Engineer’s degree in Computer Science (Network, Artificial Intelligence), Polytech’Paris-Sud, France. 2013: Master’s degree in Bioinformatics and Biostatistics, Ecole Polytechnique, France. 2015: Ph.D. degree in Computer Science, Universit´ e Paris-Saclay, France.

2 / 77

slide-3
SLIDE 3

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Motivation

Motivation

Why noisy optimization (i.e. optim. in front of a stochastic model) ?

Not that many works on noisy optimization faults in networks: you can not use an average over 50 years (many lines would be 100% guaranteed) ⇒ you need a (stochastic) model of faults

Why adversarial (i.e. worst case) problems ? Critical problems with uncertainties (technological breakthroughs, CO2 penalization ...) Why portfolio (i.e. combining/selecting solvers) ? Great in combinatorial optimization → let us generalize :) Why MCTS ?

Great recent tool Still many things to do

All related ?

All applicable to games All applicable to power systems Nash ⇒ mixed strategy ≃ portfolio

3 / 77

slide-4
SLIDE 4

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization criteria for black-box noisy optimization

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

4 / 77

slide-5
SLIDE 5

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization criteria for black-box noisy optimization

Black-box Noisy Optimization Framework

f : x → f(x, ω) from a domain D ⊂ Rd ֒ → Continuous optimization to R with random variable ω. Goal x∗ = argmin

x∈Rd

Eωf(x, ω) i.e. access to independent evaluations of f. Black-Box case: ֒ → do not use any internal property of f ֒ → access to f(x) only, not ∇f(x) ֒ → for a given x: randomly samples ω and returns f(x, ω) ֒ → for its nth request, returns f(x, ωn) x − → − → f(x, ω)

5 / 77

slide-6
SLIDE 6

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization criteria for black-box noisy optimization

Optimization criteria: State-of-the-art

Noise-free case: log-linear convergence [Auger, 2005, Rechenberg, 1973] log ||xn − x∗|| n ∼ A′ < 0 (1) Noisy case: log-log convergence [Fabian, 1967] log ||xn − x∗|| log(n) ∼ A′′ < 0 (2)

Figure: y-axis: log ||xn − x∗||, x-axis:#eval for log-linear convergence in noise-free case or log #eval for log-log convergence in noisy case.

6 / 77

slide-7
SLIDE 7

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization criteria for black-box noisy optimization

Optimization criteria: Convergence rates

Slopes for Uniform Rate, Simple Regret1[Bubeck et al., 2011] and Cumulative Regret x∗: the optimum of f xn: the nth evaluated search point ˜ xn: the optimum estimated after nth evaluation Uniform Rate URn = ||xn − x∗|| ֒ → all search points matter Simple Regret SRn = Eωf(˜ xn, ω) − Eωf(x∗, ω) ֒ → final recommendation matters Cumulative Regret CRn =

  • j≤n

(Eωf(xj, ω) − Eωf(x∗, ω)) ֒ → all recommendations matter Convergence rates: Slope(UR) = lim sup

n→∞

log(URn) log(n) (3) Slope(SR) = lim sup

n→∞

log(SRn) log(n) (4) Slope(CR) = lim sup

n→∞

log(CRn) log(n) . (5)

1Simple Regret = difference between expected payoff recommended vs optimal.

7 / 77

slide-8
SLIDE 8

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

8 / 77

slide-9
SLIDE 9

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Tricks for handling noise: Resampling: average multiple evaluations Large population Surrogate models Specific methods (stochastic gradient descent with finite differences) Here: focus on resampling Resampling number: how many times do we resample noise ?

9 / 77

slide-10
SLIDE 10

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Resampling methods: Non-adaptive resampling methods

[Recall] log-log convergence: log ||xn−x∗||

log(n)

∼ A′′ < 0, n is evaluation number Non-adaptive rules:

Exponential rules with ad hoc parameters ⇒ log-log convergence (mathematically proved by us) Other rules as a function of #iter: square root, linear rules, polynomial rules Other rules as a function of #iter and dimension

10 / 77

slide-11
SLIDE 11

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Resampling methods: Adaptive resampling methods

Adaptive rules: Bernstein [Mnih et al., 2008, Heidrich-Meisner and Igel, 2009] Here: FOR a pair of search points x, x′ to be compared DO WHILE computation time is not elapsed DO 1000 resamplings for x and x′ IF mean(difference) >> std THEN break ENDIF ENDWHILE ENDFOR

11 / 77

slide-12
SLIDE 12

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Resampling methods: Comparison

With Continuous Noisy Optimization (CNO) With Evolution Strategies (ES) With Differential Evolution (DE)

12 / 77

slide-13
SLIDE 13

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Comparison with CNO

Continuous Noisy Optimization: we propose Iterative Noisy Optimization Algorithm (INOA) as a general framework for noisy optimization. Key points: Sampler which chooses a sampling around the current approximation, Opt which updates the approximation of the optimum, resampling number rn = B⌈nβ⌉ and sampling step-size σn = A/nα Main application: finite differences sampling + quadratic model

13 / 77

slide-14
SLIDE 14

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Comparison with CNO: State-of-the-art and our results

3 types of noise: constant, linear or quadratic as a function of the SR: Var(f(x, ω)) = O

  • [Eωf(x, ω) − Eωf(x∗, ω)]z

(6) with z ∈ {0, 1, 2}.

z

  • ptimized for CR
  • ptimized for SR

slope(SR) slope(CR) slope(SR) slope(CR) 0 (constant var) − 1 2 1 2 − 2 3 2 3

[Fabian, 1967] [Dupaˇ c, 1957] [Shamir, 2013]

0 and −1 ∞-differentiable

[Fabian, 1967]

0 and “quadratic” −1

[Dupaˇ c, 1957]

1 (linear var) −1 −1

[Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010] [Rolet and Teytaud, 2010]

2 (quadratic var) −∞ −∞

[Jebalia and Auger, 2008] [Jebalia and Auger, 2008] [Jebalia and Auger, 2008] [Jebalia and Auger, 2008]

Table: State-of-the-art: Convergence rates. Blue: existing results, we also achieved. Red: new results by us.

Main application: finite differences sampling + quadratic model Various (new, proved) rates depending on assumptions Recovers existing rates (with a same algorithm) and beyond

14 / 77

slide-15
SLIDE 15

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Comparison with CNO: Results & Discussion

Our proposed algorithm (provably) reaches the same rate as Kiefer-Wolfowitz algorithm when the noise has constant variance as Bernstein-races optimization algorithms when the noise variance decreases linearly as a function of the simple regret as Evolution Strategies when the noise variance decreases quadratically as a function of the simple regret ⇒ no details here, focus on ES and DE.

15 / 77

slide-16
SLIDE 16

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

What about evolutionary algorithms ? Experiments with variance noise = constant (hard case)

Algorithms: ES + resampling DE + resamplnig Results: slope(SR) = − 1

2 in both cases

(with e.g. rules depending on #iter and dimension)

5 10 15 20 25 5 10 15

N1.01exp N1.1exp N2exp Nscale

Figure: Modified function F4 of CEC2005, dimension 2. x-axis: log(#eval); y-axis: log(SR).

16 / 77

slide-17
SLIDE 17

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Resampling methods: Partial conclusion

Conclusion: Adaptation of Newton’s algorithm for noisy fitness (∇f and Hf approximated by finite differences+resamplings) → leads to fast convergence rates + recovers many rates in one alg. + generic framework (but no proved application besides quadratic surrogate model) Non-adaptive methods lead to log-log convergence (math+xp) in ES Nscale = ⌈d−2 exp( 4n

5d )⌉ ok (slope(SR) = − 1 2) for both ES and DE

(nb: −1 possible with large mutation + small inheritance) In progress: Adaptive resampling methods might be merged with bounds on resampling numbers ⇒ in progress, unclear benefit for the moment.

17 / 77

slide-18
SLIDE 18

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

18 / 77

slide-19
SLIDE 19

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Variance reduction techniques

Monte Carlo [Hammersley and Handscomb, 1964, Billingsley, 1986] ˆ Ef(x, ω) = 1 n

n

  • i=1

f(x, ωi) → Eωf(x, ω). (7) Quasi Monte Carlo [Cranley and Patterson, 1976, Niederreiter, 1992, Wang and Hickernell, 2000, Mascagni and Chi, 2004] Use samples aimed at being as uniform as possible over the domain.

19 / 77

slide-20
SLIDE 20

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Variance reduction techniques: white-box

Antithetic variates Ensure some regularity of the sampling by using symmetries ˆ Eωf(x, ω)=1

n

n/2

i=1 (f(x, ωi) + f(x, −ωi)) .

Importance sampling Instead of sampling ω with density dP, we sample ω′ with density dP′ ˆ Eωf(x, ω)=1

n

n

i=1 dP(ωi ) dP′(ωi )f(x, ωi).

Control variates Instead of estimating Eωf(x, ω), we estimate Eω (f(x, ω) − g(x, ω)) using Eωf(x, ω) = Eωg(x, ω)

  • A

+Eω (f(x, ω) − g(x, ω))

  • B

.

20 / 77

slide-21
SLIDE 21

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Variance reduction techniques: grey-box

Common random numbers (CRN) or pairing Use the same samples ω1, . . . , ωn for all the population xn,1, . . . , xn,λ. Seedn = {seedn,1, . . . , seedn,mn}. Eωf(xn,k, ω) is then approximated as 1 mn

mn

  • i=1

f(xn,k, seedn,i). Different forms of pairing: Seedn is the same for all n mn increases and nested sets Seedn, i.e. ∀n, i ≤ mn, mn+1 ≥ mn, seedn,i = seedn+1,i all individuals in an offspring use the same seeds, + seeds are 100% changed between offspring

21 / 77

slide-22
SLIDE 22

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Noisy Optimization Optimization methods

Pairing: Partial conclusion

No details, just our conclusion: “almost” black-box easy to implement applicable for most applications

On the realistic problem, pairing provided a great improvement But there are counterexamples in which it is detrimental.

22 / 77

slide-23
SLIDE 23

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio: state of the art

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

23 / 77

slide-24
SLIDE 24

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio: state of the art

Portfolio of optimization algorithms

Usually: Portfolio → Combinatorial Optimization (SAT Competition) Recently: Portfolio → Continuous Optimization [Baudiˇ s and Poˇ s´ ık, 2014] This work: Portfolio → Noisy Optimization ֒ → Portfolio = choosing, online, between several algorithms

24 / 77

slide-25
SLIDE 25

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Relationship between portfolio and noisy optimization

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

25 / 77

slide-26
SLIDE 26

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Relationship between portfolio and noisy optimization

Why portfolio in Noisy Optimization?

Stochastic problem limited budget (time or total number of evaluations) target: anytime convergence to the optimum black-box

2 How to choose a suitable solver?

2Image from http://ethanclements.blogspot.fr/2010/12/postmodernism-essay-question.html

26 / 77

slide-27
SLIDE 27

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Relationship between portfolio and noisy optimization

Why portfolio in Noisy Optimization?

Stochastic problem limited budget (time or total number of evaluations) target: anytime convergence to the optimum black-box

2 How to choose a suitable solver?

Algorithm Portfolios: Select automatically the best in a finite set of solvers

2Image from http://ethanclements.blogspot.fr/2010/12/postmodernism-essay-question.html

26 / 77

slide-28
SLIDE 28

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

27 / 77

slide-29
SLIDE 29

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Portfolio of noisy optimization methods: proposal

A finite number of given noisy optimization solvers, “orthogonal” Unfair distribution of budget Information sharing (not very helpful here...) → Performs almost as well as the best solver

28 / 77

slide-30
SLIDE 30

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Portfolio of noisy optimization methods: NOPA

Algorithm 1 Noisy Optimization Portfolio Algorithm (NOPA).

1: Input noisy optimization solvers Solver1, Solver2 . . . , SolverM 2: Input a lag function LAG : N+ → N+ 3: Input a non-decreasing integer sequence r1, r2, . . .

◮ Periodic comparisons

4: Input a non-decreasing integer sequence s1, s2, . . .

◮ Number of resamplings

5: n ← 1

◮ Number of selections

6: m ← 1

◮ NOPA’s iteration number

7: i∗ ← null

◮ Index of recommended solver

8: x∗ ← null

◮ Recommendation

9: while budget is not exhausted do 10:

if m ≥ rn then

11:

i∗ = arg min

i∈{1,...,M}

ˆ Esn[f(˜ xi,LAG(rn))] ◮ Algorithm selection

12:

n ← n + 1

13:

else

14:

for i ∈ {1, . . . , M} do

15:

Apply one evaluation for Solveri

16:

end for

17:

m ← m + 1

18:

end if

19:

x∗ = ˜ xi∗,m ◮ Update recommendation

20: end while

29 / 77

slide-31
SLIDE 31

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Portfolio of noisy optimization methods: compare solvers early

lag function:

LAG(n) ≤ n: lag

∀i ∈ {1, . . . , M}, xi,LAG(n) = or = xi,n

30 / 77

slide-32
SLIDE 32

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Portfolio of noisy optimization methods: compare solvers early

lag function:

LAG(n) ≤ n: lag

∀i ∈ {1, . . . , M}, xi,LAG(n) = or = xi,n Why this lag ?

algorithms’ ranking is usually stable → no use comparing the very last it’s much cheaper to compare old points:

comparing good (i.e. recent) points → comparing points with similar fitness comparing points with similar fitness → very expensive

30 / 77

slide-33
SLIDE 33

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Portfolio of noisy optimization methods: Theorem with fair budget distribution

Theorem with fair budget distribution Assume that each solver i ∈ {1, . . . , M} has simple regret SRi,n = (1 + o(1)) Ci

nαi (as usual)

and noise variance = constant. Then for some universal rn, sn, LAGn, a.s. there exists n0 such that, for n ≥ n0: portfolio always chooses an optimal solver (optimal αi and Ci); the portfolio uses ≤ M · rn(1 + o(1)) evaluations ⇒ M times more than the best solver. Interpretation Negligible comparison budget (thanks to lag) On classical log-log graphs, the portfolio should perform similarly to the best solver, within the log(M) shift (proved)

31 / 77

slide-34
SLIDE 34

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

INOPA: introducing an unfair budget

NOPA: same budget for all solvers. Remark: we compare old recommendations (LAGn << n) they were known long ago, before spending all this budget therefore, except selected solvers, most of the budget is wasted :( ⇒ Lazy evaluation paradigm: evaluate f(.) only when you need it for your output ⇒ Improved NOPA (INOPA): unfaired budget distribution Use only LAG(rn) evaluations (negligible) on the sub-optimal solvers (INOPA) log(M′) shift with M′ the number of optimal solvers (proved)

32 / 77

slide-35
SLIDE 35

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Experiments: Unimodal case

Noisy Optimization Algorithms (NOAs): SA-ES: Self-Adaptive Evolution Strategy Fabian’s algorithm: a first-order method using gradients estimated by finite differences [Dvoretzky et al., 1956, Fabian, 1967] Noisy Newton’s algorithm: a second-order method using a Hessian matrix approximated also by finite differences (our contribution in CNO) Solvers z = 0 (constant var) z = 1 (linear var) z = 2 (quadratic var) RSAES .114 ± .002 .118 ± .003 .113 ± .003 Fabian1 −.838 ± .003 −1.011 ± .003 −1.016 ± .003 Fabian2 .108 ± .003 −1.339 ± .003 −2.481 ± .003 Newton −.070 ± .003 −.959 ± .092 −2.503 ± .285 NOPA no lag −.377 ± .048 −.978 ± .013 −2.106 ± .003 NOPA −.747 ± .003 −.937 ± .005 −2.515 ± .095 INOPA −.822 ± .003 −1.359 ± .027 −3.528 ± .144

Table: Slope(SR) for f(x) = ||x||2 + ||x||zN in dimension 15. Computation time = 40s.

33 / 77

slide-36
SLIDE 36

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Portfolio of noisy optimization methods

Experiments: Stochastic unit commitment problem

Solver d = 45 d = 63 d = 105 d = 125 RSAES .485 ± .071 .870 ± .078 .550 ± .097 .274 ± .097 Fabian1 1.339 ± .043 1.895 ± .040 1.075 ± .047 .769 ± .047 Fabian2 .394 ± .058 .521 ± .083 .436 ± .097 .307 ± .097 Newton .749 ± .101 1.138 ± .128 .590 ± .147 .312 ± .147 INOPA .394 ± .059 .547 ± .080 .242 ± .101 .242 ± .101

Table: Stochastic unit commitment problem (minimization). Computation time = 320s.

What’s more: Given a same budget, a INOPA of identical solvers can outperform its mono-solvers.

34 / 77

slide-37
SLIDE 37

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Conclusion

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

35 / 77

slide-38
SLIDE 38

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Conclusion

Portfolio and noisy optimization: Conclusion

Main conclusion: portfolios also great in noisy opt. (because in noisy opt., with lag, comparison cost = small) We show mathematically and empirically a log(M) shift when using M solvers, on a classical log-log scale Bound improved to log(M′) shift, with M′ = nb. of optimal solvers, with unfair distribution of budget (INOPA)

36 / 77

slide-39
SLIDE 39

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Portfolio and noisy optimization Conclusion

Portfolio and noisy optimization: Conclusion

Main conclusion: portfolios also great in noisy opt. (because in noisy opt., with lag, comparison cost = small) We show mathematically and empirically a log(M) shift when using M solvers, on a classical log-log scale Bound improved to log(M′) shift, with M′ = nb. of optimal solvers, with unfair distribution of budget (INOPA) Take-home messages portfolio = little overhead unfair budget = no overhead if “orthogonal” portfolio (orthogonal → M′ = 1) We mathematically confirmed the idea of orthogonality found in [Samulowitz and Memisevic, 2007]

36 / 77

slide-40
SLIDE 40

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Adversarial bandit

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

37 / 77

slide-41
SLIDE 41

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Adversarial bandit

Framework: Zero-sum matrix games

Game defined by matrix M I choose (privately) i Simultaneously, you choose (privately) j I earn Mi,j You earn −Mi,j So this is zero-sum.

Figure: 0-sum matrix game.

rock paper scissors rock 0.5 1 paper 1 0.5 scissors 1 0.5

Table: Example of 1-sum matrix game: Rock-paper-scissors.

38 / 77

slide-42
SLIDE 42

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Adversarial bandit

Framework: Nash Equilibrium (NE)

Definition (Nash Equilibrium) Zero-sum matrix game M My strategy = probability distrib. on rows = x Your strategy = probability distrib. on cols = y Expected reward = xTMy There exists x∗, y ∗ such that ∀x, y, xTMy∗ ≤ x∗TMy∗ ≤ x∗TMy. (8) (x∗, y ∗) is a Nash Equilibrium (no unicity). Definition (Approximate ǫ-Nash Equilibria) (x∗, y ∗) such that xTMy ∗−ǫ ≤ x∗TMy∗ ≤ x∗TMy+ǫ. (9) Example: The NE of Rock-paper-scissors is unique: (1/3, 1/3, 1/3).

39 / 77

slide-43
SLIDE 43

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Adversarial bandit

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

40 / 77

slide-44
SLIDE 44

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Adversarial bandit

Methods for computing Nash Equilibrium

Algorithm Complexity Exact solution? Confidence Time LP [von Stengel, 2002] O(Kα), α > 6 yes 1 constant [Grigoriadis and Khachiyan, 1995] O( K log(K) ǫ2 ) no 1 random [Grigoriadis and Khachiyan, 1995] O( log2(K) ǫ2 ) no 1 random with K log(K) processors EXP3 [Auer et al., 1995] O( K log(K) ǫ2 ) no 1 − δ constant Inf [Audibert and Bubeck, 2009] O( K log(K) ǫ2 ) no 1 − δ constant Our algorithm O(k3k K log K) yes 1 − δ constant (if NE is k-sparse)

Table: State-of-the-art of computing Nash Equilibrium for ESMG MK×K .

41 / 77

slide-45
SLIDE 45

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Adversarial bandit

Adversarial bandit algorithm Exp3.P

Algorithm 2 Exp3.P: variant of Exp3. η and γ are two parameters.

1: Input η ∈ R

◮ how much the distribution becomes peaked

2: Input γ ∈ (0, 1]

◮ exploration rate

3: Input a time horizon (computational budget) T ∈ N+ and the number of arms K ∈ N+ 4: Output a Nash-optimal policy p 5: y ← 0 6: for i ← 1 to K do

◮ initialization

7:

ωi ← exp( ηγ

3

  • T

K )

8: end for 9: for t ← 1 to T do 10:

for i ← 1 to K do

11:

pi ← (1 − γ)

ωi K j=1 ωj

+ γ

K

12:

end for

13:

Generate it according to (p1, p2, . . . , pK )

14:

Compute reward Rit ,t

15:

for i ← 1 to K do

16:

if i == it then

17:

ˆ Ri ←

Rit ,t pi

18:

else

19:

ˆ Ri ← 0

20:

end if

21:

ωi ← ωi exp

  • γ

3K (ˆ

Ri +

η pi √ TK )

  • 22:

end for

23: end for 24: Return probability distribution (p1, p2, . . . , pK )

42 / 77

slide-46
SLIDE 46

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Contribution for computing Nash Equilibrium

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

43 / 77

slide-47
SLIDE 47

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Contribution for computing Nash Equilibrium

Sparse Nash Equilibria (1/2)

Considering x∗ a Nash-optimal policy for ZSMG MK×K: Let us assume that x∗ is unique and has at most k non-zero components (sparsity). Let us show that x∗ is “discrete”: (Remark: Nash = solution of linear programming problem) ⇒ x∗ = also NE of a k × k submatrix: M′

k×k

⇒ x∗ = solution of LP in dimension k ⇒ x∗ = solution of k lin. eq. with coefficients in {−1, 0, 1} ⇒ x∗ = inv-matrix × vector ⇒ x∗ = obtained by “cofactors / det matrix” ⇒ x∗ has denominator at most k

k 2

By Hadamard determinant bound [Hadamard, 1893], [Brenner and Cummings, 1972]

44 / 77

slide-48
SLIDE 48

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Contribution for computing Nash Equilibrium

Sparse Nash Equilibria (2/2)

Computation of sparse Nash Equilibria Under assumption that the Nash is sparse: x∗ is rational with “small” denominator (previous slide!) So let us compute an ǫ-Nash (with ǫ small enough!) (sublinear time!) And let us compute its closest approximation with “small denominator” (Hadamard) Two new algorithms for exact Nash: Rounding-EXP3: switch to closest approximation Truncation-EXP3: remove small components and work on the remaining submatrix (exact solving) (requested precision ≃ k−3k/2 only ⇒ compl. k3kK log K)

45 / 77

slide-49
SLIDE 49

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Contribution for computing Nash Equilibrium

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

46 / 77

slide-50
SLIDE 50

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Contribution for computing Nash Equilibrium

Our proposal: Parameter-free adversarial bandit

No details here; in short: We compare various existing parametrizations of EXP3 We select the best We add sparsity as follows: for a budget of T rounds of EXP3, threshold = max

i∈{1,...,m} (Txi )α T

⇒ we get a parameter-free bandit for adversarial problems

47 / 77

slide-51
SLIDE 51

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

48 / 77

slide-52
SLIDE 52

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Scenarios Policies Simulator Average perf. Robustness Average cost

technological berakthrough CO2 penalization Maintain a connection Create new connection,...

Scenarios Policy R(k, s) Examples of scenario: CO2 penalization, gas curtailment in Eastern Europe, technological breakthrough Examples of policy: massive nuclear power plant building, massive renewable energies

49 / 77

slide-53
SLIDE 53

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Nash-planning for scenario-based decision making

Decision tools

METHOD EXTRACTION EXTRACTION COMPUTATIONAL INTERPRETATION OF POLICIES OF CRITICAL COST SCENARIOS Wald One One per policy K × S Nature decides later, minimizing our reward Savage One One per policy K × S Nature decides later, maximizing our regret Scenarios Handcrafted Handcrafted K′ × S′ Human expertise

  • ur proposal: Nash

Nash-optimal Nash-optimal (K + S) × log(K + S)(∗) Nature decides privately, before us

Table: Comparison between several tools for decision under uncertainty. K = |K| and S = |S|. ⇒ in this case sparsity performs very well. (*)improved if sparse, by our previous result!

Nash ⇒ fast selection of scenarios and options: sparsity both

fastens the NE computation and makes the output more readable (smaller matrix)

50 / 77

slide-54
SLIDE 54

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Application to power investment problem: Testcase and parameterization

We consider (big toy problem): 310 investment policies (k) 39 scenarios (s) reward: (k, s) → R(k, s)

51 / 77

slide-55
SLIDE 55

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Application to power investment problem: Testcase and parameterization

We consider (big toy problem): 310 investment policies (k) 39 scenarios (s) reward: (k, s) → R(k, s) We use Nash Equilibria, for their principled nature (Nature decides first and privately! that’s reasonable, right ?) and low computational cost in large scale settings compute the equilibria thanks to EXP3 (tuned)... ... with sparsity, for

improving the precision reducing the number of pure strategies in our recommendation (unreadable matrix

  • therwise!)

51 / 77

slide-56
SLIDE 56

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Application to power investment problem: Sparse-Nash algorithm

Algorithm 3 The Sparse-Nash algorithm for solving decision under uncertainty prob- lems. Input A family K of possible decisions k (investment policies). Input A family S of scenarios s. Input A mapping (k, s) → Rk,s, providing the rewards Run truncated.Exp3.P on R, get a probability distribution on K (support = key options) and a probability distribution on S (support = critical scenarios). Emphasize the policy with highest probability.

52 / 77

slide-57
SLIDE 57

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Application to power investment problem: Results

α AVERAGE SPARSITY LEVEL OVER 310 = 59049 ARMS T = K T = 10K T = 50K T = 100K T = 500K T = 1000K 0.1 13804 ± 52 non-sparse non-sparse non-sparse non-sparse non-sparse 0.3 2810 ± 59 non-sparse non-sparse non-sparse non-sparse non-sparse 0.5 396 ± 16 non-sparse non-sparse 59049 ± 197 49819 ± 195 non-sparse 0.7 43 ± 3 58925 ± 27 55383 ± 1507 46000 ± 278 9065 ± 160 non-sparse 0.9 4 ± 0 993 ± 64 797 ± 42 504 ± 25 98 ± 5 52633 ± 523 0.99 1 ± 0 2 ± 0 3 ± 0 2 ± 0 2 ± 0 7 ± 1

α ROBUST SCORE: WORST REWARD AGAINST PURE STRATEGIES T = K T = 10K T = 50K T = 100K T = 500K T = 1000K NT 4.922e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-01 0.1 4.948e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-01 0.3 5.004e-01 4.928e-01 4.956e-01 4.991e-01 5.221e-01 4.938e-01 0.5 5.059e-01 4.928e-01 4.956e-01 4.991e-01 5.242e-01 4.938e-01 0.7 5.054e-01 4.928e-01 4.965e-01 5.031e-01 5.317e-01 4.938e-01 0.9 4.281e-01 5.137e-01 5.151e-01 5.140e-01 5.487e-01 4.960e-01 0.99 3.634e-01 4.357e-01 4.612e-01 4.683e-01 5.242e-01 5.390e-01 Pure 3.505e-01 3.946e-01 4.287e-01 4.489e-01 5.143e-01 4.837e-01

Table: Average sparsity level and robust score. α is the truncation parameter. T is the budget.

53 / 77

slide-58
SLIDE 58

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Application to power investment problem: summary

Define long term scenarios (plenty!) ? Build simulator R(k, s) Classical solution (Savage): min

k∈K max s∈S regret(k, s)

Our proposal (Nash): automatically select submatrix

54 / 77

slide-59
SLIDE 59

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to robust optimization (power systems)

Application to power investment problem: summary

Define long term scenarios (plenty!) ? Build simulator R(k, s) Classical solution (Savage): min

k∈K max s∈S regret(k, s)

Our proposal (Nash): automatically select submatrix Our proposed tool has the following advantages: Natural extraction of interesting policies and critical scenarios:

α = .7 provides stable (and proved) results, but the extracted submatrix becomes easily readable (small enough) with larger values of α.

Faster than Wald or Savage methodologies. Take-home messages We get a fast criterion, faster than Wald’s or Savage’s criteria, with a natural interpretation, and more readable ⇒ but stochastic recommendation!

54 / 77

slide-60
SLIDE 60

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

55 / 77

slide-61
SLIDE 61

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Two parts: Seeds matter: **choose** your seeds ! More tricky but worth the effort: position-specific seeds ! (towards a better asymptotic behavior of MCTS ?)

56 / 77

slide-62
SLIDE 62

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing random seeds: Correlations

Figure: Success rate per seed (ranked) in 5x5 Domineering, with standard deviations on y-axis: the seed has a significant impact.

Fact: the random seed matters !

57 / 77

slide-63
SLIDE 63

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing random seeds: State-of-the-art

Stochastic algorithms randomly select their pseudo-random seed. We propose to choose the seed(s), and to combine them. State-of-the-art for combining random seeds: [Nagarajan et al., 2015] combines several AIs [Gaudel et al., 2010] uses Nash methods for combining several opening books [Saint-Pierre and Teytaud, 2014] constructs several AIs from a single stochastic

  • ne and combines them by the BestSeed and Nash approaches

58 / 77

slide-64
SLIDE 64

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Trick: present results with one white seed per column and one black seed per row

... ... Column player gets 1-Mi,j Row player gets Mi,j Mi,j M1,1 M2,1 MK,1 M1,2 M1,K M2,2 ... ... ... ... ... ... ... ... ... MK,K MK,2 M2,K ...

K random seed for White K random seeds for Black

... ... ...

Figure: One black seed per row, one white seed per column.

59 / 77

slide-65
SLIDE 65

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Propositions: Nash & BestSeed

Nash Nash = combines rows (more robust; we will see later) BestSeed BestSeed = just pick up the best row / best column

60 / 77

slide-66
SLIDE 66

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Better than squared matrices: rectangle methods

Remark: for choosing a row, if #rows = #cols, then #rows is more critical than #cols; for a given budget, increase #rows and decrease #cols (same budget!)

K K

Kt x Kt

Kt Kt

Figure: Left: square matrix of a game; right: rectangles of a game (K >> Kt).

61 / 77

slide-67
SLIDE 67

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Does it work ? experiments on Domineering

The opponent uses seeds which have never been used during the learning of the portfolio (cross-validation).

Figure: Results for domineering, with the BestSeed (left) and the Nash (right) approach, against the baseline (K ′ = 1) and the

exploiter ( K ′ > 1; opponent who “learns” very well). Kt = 900 in all experiments.

BestSeed performs well against the original algorithm (K ′ = 1), but poorly against the exploiter ( K ′ > 1). Nash outperforms the original algorithm both w.r.t K ′ = 1 (all cases) and K ′ > 1 (most cases).

62 / 77

slide-68
SLIDE 68

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Beyond cross-validation: experiments with transfer in the game of Go

Learning: BestSeed is applied to GnuGo, with MCTS and a budget of 400 simulations. Test: against “classical” GnuGo, i.e. the non-MCTS version of GnuGo.

Opponent Performance of BestSeed Performance with randomized seed GnuGo-classical level 1

  • 1. (± 0 )

.995 (± 0 ) GnuGo-classical level 2

  • 1. (± 0 )

.995 (± 0 ) GnuGo-classical level 3

  • 1. (± 0 )

.99 (± 0 ) GnuGo-classical level 4

  • 1. (± 0 )
  • 1. (± 0 )

GnuGo-classical level 5

  • 1. (± 0 )
  • 1. (± 0 )

GnuGo-classical level 6

  • 1. (± 0 )
  • 1. (± 0 )

GnuGo-classical level 7 .73 (± .013 ) .061 (± .004 ) GnuGo-classical level 8 .73 (± .013 ) .106 (± .006 ) GnuGo-classical level 9 .73 (± .013 ) .095 (± .006 ) GnuGo-classical level 10 .73 (± .013 ) .07 (± .004 ) Table: Performance of “BestSeed” and “randomized seed” against “classical” GnuGo.

Previous slide: we win against the AI which we have trained (but different seeds!). This slide: we improve the winning rate against another AI.

63 / 77

slide-69
SLIDE 69

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing random seeds: Partial conclusion

Conclusion: Seed optimization (NOT position specific) = can be seen as a simple and effective tool for building an opening book with no development effort, no human expertise, no storage of database. “Rectangle” provides significant improvements. The online computational overhead of the methods is negligible. The boosted AIs significantly outperform the baselines. BestSeed performs well, but can be overfitted ⇒ strength of Nash. Further work: The use of online bandit algorithms for dynamically choosing K/Kt. Note: The BestSeed and the Nash algorithms are not new. The algorithm and analysis of rectangles is new. The analysis of the impact of seeds is new. The applications to Domineering, Atari-go and Breakthfrough are new.

64 / 77

slide-70
SLIDE 70

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Two parts: Seeds matter: **choose** your seeds ! More tricky but worth the effort: position-specific seeds ! (towards a better asymptotic behavior of MCTS ?)

65 / 77

slide-71
SLIDE 71

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing position-based random seeds: Tsumego

Tsumego (by Yoji Ojima, Zen’s author) Input: a Go position Question: is this situation a win for white ? Output: yes or no

66 / 77

slide-72
SLIDE 72

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing position-based random seeds: Tsumego

Tsumego (by Yoji Ojima, Zen’s author) Input: a Go position Question: is this situation a win for white ? Output: yes or no Why so important? At the heart of many game algorithms In Go, Exptime complete [Robson, 1983]

66 / 77

slide-73
SLIDE 73

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Classical algorithms

Monte Carlo (MC) [Bruegmann, 1993, Cazenave, 2006, Cazenave and Borsboom, 2007] Monte Carlo Tree Search (MCTS) [Bouzy, 2004, Coulom, 2006] Nested MC [Cazenave, 2009] Voting scheme among MCTS [Gavin et al., ]

67 / 77

slide-74
SLIDE 74

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Classical algorithms

Monte Carlo (MC) [Bruegmann, 1993, Cazenave, 2006, Cazenave and Borsboom, 2007] Monte Carlo Tree Search (MCTS) [Bouzy, 2004, Coulom, 2006] Nested MC [Cazenave, 2009] Voting scheme among MCTS [Gavin et al., ] ⇒ here weighted voting scheme among MCTS

67 / 77

slide-75
SLIDE 75

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Evaluation of the game value

Algorithm 4 Evaluation of the game value.

1: Input current state s 2: Input a policy πB for Black, depending on a seed in N+ 3: Input a policy πW for White, depending on a seed in N+ 4: for i ∈ {1, . . . , K} do 5:

for j ∈ {1, . . . , K} do

6:

Mi,j ← outcome of the game starting in s with πB playing as Black with seed b(i) and πW playing as White with seed w(j)

7:

end for

8: end for 9: Compute weights p for Black and q for White for the matrix M (either BestSeed,

Nash, or other)

10: Return pTMq

◮ approximate value of the game M

68 / 77

slide-76
SLIDE 76

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Classical case (MC/MCTS): unpaired Monte Carlo averaging

b(1) K*K random seeds for Black b(i) b(2) b(K*K)

... ...

w(1) K*K random seeds for White w(i) w(2) w(K*K)

... ...

... ... Column player gets 1-Mi,j Row player gets Mi,j Mi,j M1,1 M2,1 MK,1 M1,2 M1,K M2,2 ... ... ... ... ... ... ... ... ... MK,K MK,2 M2,K ...

K random seed for White K random seeds for Black

... ... ...

Figure: Left: unpaired case (classical estimate by averaging); right: paired case: K seeds vs K seeds.

69 / 77

slide-77
SLIDE 77

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Experiments: Applied methods and setting

Compared methods for approximating v(s) Three methods use K 2 indep. batches of M MCTS-simulations using matrix of seeds:

Nash reweighting = Nash-value BestSeed reweighting = Intersection best row / best col Paired MC estimate = Average of the matrix

70 / 77

slide-78
SLIDE 78

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Experiments: Applied methods and setting

Compared methods for approximating v(s) Three methods use K 2 indep. batches of M MCTS-simulations using matrix of seeds:

Nash reweighting = Nash-value BestSeed reweighting = Intersection best row / best col Paired MC estimate = Average of the matrix

One unpaired method: classical MC estimate (the average of K 2 random MCTS) Baseline: a single long MCTS (=state of the art !) →only one which is not K 2-parallel

70 / 77

slide-79
SLIDE 79

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Experiments: Applied methods and setting

Compared methods for approximating v(s) Three methods use K 2 indep. batches of M MCTS-simulations using matrix of seeds:

Nash reweighting = Nash-value BestSeed reweighting = Intersection best row / best col Paired MC estimate = Average of the matrix

One unpaired method: classical MC estimate (the average of K 2 random MCTS) Baseline: a single long MCTS (=state of the art !) →only one which is not K 2-parallel Parameter setting: GnuGo-MCTS [Bayer et al., 2008] setting A: 1 000 simulations per move setting B: 80 000 simulations per move

70 / 77

slide-80
SLIDE 80

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Experiments: Average results over 50 Tsumego problems

200 400 600 800 1000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

Submatrix Size (N2) Performance Nash Paired Best Unpaired MCTS(1)

(a) setting A: 1 000 simulations per move.

200 400 600 800 1000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

Submatrix Size (N2) Performance Nash Paired Best Unpaired MCTS(1)

(b) setting B: 80 000 simulations per move.

Figure: Average over 50 Tsumego problems. x-axis: #simulations, y-axis: %correct answers. MCTS(1): one single MCTS run using all the budget.

Setting A (small budget): MCTS(1) outperforms weighted average of 81 MCTS runs (but we are more parallel !) Setting B (large budget): we outperform MCTS and all others by far ⇒ consistent with the limited scalability of MCTS for huge number of sim.

71 / 77

slide-81
SLIDE 81

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing position-based random seeds: Partial conclusion

Main conclusion: novel way of evaluating game values using Nash Equilibrium (theoretical validation & experiments on 50 Tsumego problems) Nash or BestSeed predictor requires far less simulations for finding accurate results + sometimes consistent whereas original MC is not !

72 / 77

slide-82
SLIDE 82

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing position-based random seeds: Partial conclusion

Main conclusion: novel way of evaluating game values using Nash Equilibrium (theoretical validation & experiments on 50 Tsumego problems) Nash or BestSeed predictor requires far less simulations for finding accurate results + sometimes consistent whereas original MC is not ! We outperformed average of MCTS runs sharing the budget a single MCTS using all the budget → For M large enough, our weighted averaging of 81 single MCTS runs with M simulations is better than a MCTS run with 81M simulations :)

72 / 77

slide-83
SLIDE 83

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Application to games

Optimizing position-based random seeds: Partial conclusion

Main conclusion: novel way of evaluating game values using Nash Equilibrium (theoretical validation & experiments on 50 Tsumego problems) Nash or BestSeed predictor requires far less simulations for finding accurate results + sometimes consistent whereas original MC is not ! We outperformed average of MCTS runs sharing the budget a single MCTS using all the budget → For M large enough, our weighted averaging of 81 single MCTS runs with M simulations is better than a MCTS run with 81M simulations :) Take-home messages We classify positions (“black wins” vs “white wins”). We use a WEIGHTED average of K 2 MCTS runs of M simulations. Our approach outperforms: all tested voting schemes among K 2 MCTS estimates of M simulations, and a pure MCTS of K 2 × M simulations, when M is large and K 2 = 81.

72 / 77

slide-84
SLIDE 84

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Conclusion

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

73 / 77

slide-85
SLIDE 85

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Conclusion

A work on sparsity, at the core of ZSMG A parameter-free adversarial bandit, obtained by tuning (no details provided in this talk) + sparsity Applications of ZSMG:

Nash + Sparsity → faster + more readable robust decision making

74 / 77

slide-86
SLIDE 86

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Adversarial portfolio Conclusion

A work on sparsity, at the core of ZSMG A parameter-free adversarial bandit, obtained by tuning (no details provided in this talk) + sparsity Applications of ZSMG:

Nash + Sparsity → faster + more readable robust decision making Random seeds = new MCTS variants ?

validated as opening book learning (Go, Atari-Go, Domineering, Breakthrough, Draughts,Phantom-Go. . . ) position-specific seeds validated on Tsumego

74 / 77

slide-87
SLIDE 87

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

1

Motivation

2

Noisy Optimization Optimization criteria for black-box noisy optimization Optimization methods

Resampling methods Pairing

3

Portfolio and noisy optimization Portfolio: state of the art Relationship between portfolio and noisy optimization Portfolio of noisy optimization methods Conclusion

4

Adversarial portfolio Adversarial bandit

Adversarial Framework State-of-the-art

Contribution for computing Nash Equilibrium

Sparsity: sparse NE can be computed faster Parameter-free adversarial bandit for large-scale problems

Application to robust optimization (power systems) Application to games Conclusion

5

Conclusion

75 / 77

slide-88
SLIDE 88

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

Conclusion & Further work

Noisy opt:

An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended to

  • ther surrogate models

ES/DE with resamplings have good rates for linear/quad var, and/or robust criteria (UR); for other cases resamplings are not sufficient for optimal rates (“mutate large inherit small” + huge population and/or surrogate models...)

76 / 77

slide-89
SLIDE 89

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

Conclusion & Further work

Noisy opt:

An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended to

  • ther surrogate models

ES/DE with resamplings have good rates for linear/quad var, and/or robust criteria (UR); for other cases resamplings are not sufficient for optimal rates (“mutate large inherit small” + huge population and/or surrogate models...)

Portfolio:

Application to noisy opt.; great benefits with several solvers of a given model Towards wider applications: portfolio of models ?

76 / 77

slide-90
SLIDE 90

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

Conclusion & Further work

Noisy opt:

An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended to

  • ther surrogate models

ES/DE with resamplings have good rates for linear/quad var, and/or robust criteria (UR); for other cases resamplings are not sufficient for optimal rates (“mutate large inherit small” + huge population and/or surrogate models...)

Portfolio:

Application to noisy opt.; great benefits with several solvers of a given model Towards wider applications: portfolio of models ?

Adversarial portfolio: successful use of sparsity; parameter-free bandits ?

76 / 77

slide-91
SLIDE 91

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

Conclusion & Further work

Noisy opt:

An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended to

  • ther surrogate models

ES/DE with resamplings have good rates for linear/quad var, and/or robust criteria (UR); for other cases resamplings are not sufficient for optimal rates (“mutate large inherit small” + huge population and/or surrogate models...)

Portfolio:

Application to noisy opt.; great benefits with several solvers of a given model Towards wider applications: portfolio of models ?

Adversarial portfolio: successful use of sparsity; parameter-free bandits ? MCTS and seeds: room for 5 ph.D. ... if there is funding for it :-)

76 / 77

slide-92
SLIDE 92

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

Conclusion & Further work

Noisy opt:

An algorithm, recovering most (but not all: Fabian’s rate!) existing results, extended to

  • ther surrogate models

ES/DE with resamplings have good rates for linear/quad var, and/or robust criteria (UR); for other cases resamplings are not sufficient for optimal rates (“mutate large inherit small” + huge population and/or surrogate models...)

Portfolio:

Application to noisy opt.; great benefits with several solvers of a given model Towards wider applications: portfolio of models ?

Adversarial portfolio: successful use of sparsity; parameter-free bandits ? MCTS and seeds: room for 5 ph.D. ... if there is funding for it :-) Most works here → ROBUSTNESS by COMBINATION (robust to solvers, to models, to parameters, to seeds ...)

76 / 77

slide-93
SLIDE 93

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Conclusion

Thanks for your attention ! Thanks to all the collaborators from Artelys, INRIA, CNRS, Univ. Paris-Saclay, Univ. Paris-Dauphine, Univ. du Littoral, NDHU ...

77 / 77

slide-94
SLIDE 94

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references I

Audibert, J.-Y. and Bubeck, S. (2009). Minimax policies for adversarial and stochastic bandits. In proceedings of the Annual Conference on Learning Theory (COLT). Auer, P ., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pages 322–331. IEEE Computer Society Press, Los Alamitos, CA. Auger, A. (2005). Convergence results for the (1, λ)-sa-es using the theory of φ-irreducible markov chains. Theoretical Computer Science, 334(1):35–69. Baudiˇ s, P . and Poˇ s´ ık, P . (2014). Online black-box algorithm portfolios for continuous optimization. In Parallel Problem Solving from Nature–PPSN XIII, pages 40–49. Springer.

78 / 77

slide-95
SLIDE 95

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references II

Bayer, A., Bump, D., Daniel, E. B., Denholm, D., Dumonteil, J., Farneb¨ ack, G., Pogonyshev, P ., Traber, T., Urvoy, T., and Wallin, I. (2008). Gnu go 3.8 documentation. Technical report, Free Software Fundation. Billingsley, P . (1986). Probability and Measure. John Wiley and Sons. Bouzy, B. (2004). Associating shallow and selective global tree search with Monte Carlo for 9x9 Go. In 4rd Computer and Games Conference, Ramat-Gan. Brenner, J. and Cummings, L. (1972). The Hadamard maximum determinant problem.

  • Amer. Math. Monthly, 79:626–630.

Bruegmann, B. (1993). Monte-carlo Go (unpublished draft http://www.althofer.de/bruegmann-montecarlogo.pdf).

79 / 77

slide-96
SLIDE 96

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references III

Bubeck, S., Munos, R., and Stoltz, G. (2011). Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science, 412(19):1832–1852. Cazenave, T. (2006). A phantom-go program. In van den Herik, H. J., Hsu, S.-C., Hsu, T.-S., and Donkers, H. H. L. M., editors, Proceedings of Advances in Computer Games, volume 4250 of Lecture Notes in Computer Science, pages 120–125. Springer. Cazenave, T. (2009). Nested monte-carlo search. In Boutilier, C., editor, IJCAI, pages 456–461. Cazenave, T. and Borsboom, J. (2007). Golois wins phantom go tournament. ICGA Journal, 30(3):165–166.

80 / 77

slide-97
SLIDE 97

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references IV

Coulom, R. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In P . Ciancarini and H. J. van den Herik, editors, Proceedings of the 5th International Conference on Computers and Games, Turin, Italy, pages 72–83. Cranley, R. and Patterson, T. (1976). Randomization of number theoretic methods for multiple integration. SIAM J. Numer. Anal., 13(6):904,1914. Dupaˇ c, V. (1957). O Kiefer-Wolfowitzovˇ e aproximaˇ cn´ ı Methodˇ e. ˇ Casopis pro pˇ estov´ an´ ı matematiky, 082(1):47–75. Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Annals of Mathematical Statistics, 33:642–669. Fabian, V. (1967). Stochastic Approximation of Minima with Improved Asymptotic Speed. Annals of Mathematical statistics, 38:191–200.

81 / 77

slide-98
SLIDE 98

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references V

Gaudel, R., Hoock, J.-B., P´ erez, J., Sokolovska, N., and Teytaud, O. (2010). A Principled Method for Exploiting Opening Books. In International Conference on Computers and Games, pages 136–144, Kanazawa, Japon. Gavin, C., Stewart, S., and Drake, P . Result aggregation in root-parallelized computer go. Grigoriadis, M. D. and Khachiyan, L. G. (1995). A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters, 18(2):53–58. Hadamard, J. (1893). R´ esolution d’une question relative aux d´ eterminants.

  • Bull. Sci. Math., 17:240–246.

Hammersley, J. and Handscomb, D. (1964). Monte carlo methods, methuen & co. Ltd., London, page 40.

82 / 77

slide-99
SLIDE 99

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references VI

Heidrich-Meisner, V. and Igel, C. (2009). Hoeffding and bernstein races for selecting policies in evolutionary direct policy search. In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 401–408, New York, NY, USA. ACM. Jebalia, M. and Auger, A. (2008). On multiplicative noise models for stochastic search. In et a.l., G. R., editor, Conference on Parallel Problem Solving from Nature (PPSN X), volume 5199, pages 52–61, Berlin, Heidelberg. Springer Verlag. Liu, J., Saint-Pierre, D. L., Teytaud, O., et al. (2014). A mathematically derived number of resamplings for noisy optimization. In Genetic and Evolutionary Computation Conference (GECCO 2014). Mascagni, M. and Chi, H. (2004). On the scrambled halton sequence. Monte-Carlo Methods Appl., 10(3):435–442.

83 / 77

slide-100
SLIDE 100

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references VII

Mnih, V., Szepesv´ ari, C., and Audibert, J.-Y. (2008). Empirical Bernstein stopping. In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 672–679, New York, NY, USA. ACM. Nagarajan, V., Marcolino, L. S., and Tambe, M. (2015). Every team deserves a second chance: Identifying when things go wrong (student abstract version). In 29th Conference on Artificial Intelligence (AAAI 2015), Texas, USA. Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. Rechenberg, I. (1973). Evolutionstrategie: Optimierung Technischer Systeme nach Prinzipien des Biologischen Evolution. Fromman-Holzboog Verlag, Stuttgart. Robson, J. M. (1983). The complexity of go. In IFIP Congress, pages 413–417.

84 / 77

slide-101
SLIDE 101

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references VIII

Rolet, P . and Teytaud, O. (2010). Adaptive noisy optimization. In Di Chio, C., Cagnoni, S., Cotta, C., Ebner, M., Ek´ art, A., Esparcia-Alcazar, A., Goh, C.-K., Merelo, J., Neri, F., Preuß, M., Togelius, J., and Yannakakis, G., editors, Applications of Evolutionary Computation, volume 6024 of Lecture Notes in Computer Science, pages 592–601. Springer Berlin Heidelberg. Saint-Pierre, D. L. and Teytaud, O. (2014). Nash and the Bandit Approach for Adversarial Portfolios. In CIG 2014 - Computational Intelligence in Games, Computational Intelligence in Games, page 7, Dortmund, Germany. IEEE, IEEE. Samulowitz, H. and Memisevic, R. (2007). Learning to solve qbf. In Proceedings of the 22nd National Conference on Artificial Intelligence, pages 255–260. AAAI. Shamir, O. (2013). On the complexity of bandit and derivative-free stochastic convex optimization. In COLT 2013 - The 26th Annual Conference on Learning Theory, June 12-14, 2013, Princeton University, NJ, USA, pages 3–24.

85 / 77

slide-102
SLIDE 102

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS References

Some references IX

Storn, R. (1996). On the usage of differential evolution for function optimization. In Fuzzy Information Processing Society, 1996. NAFIPS. 1996 Biennial Conference of the North American, pages 519–523. IEEE. von Stengel, B. (2002). Computing equilibria for two-person games. Handbook of Game Theory, 3:1723 – 1759. Wang, X. and Hickernell, F. (2000). Randomized halton sequences.

  • Math. Comput. Modelling, 32:887–899.

86 / 77

slide-103
SLIDE 103

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison in CNO: Iterative Noisy Optimization Algorithm

Algorithm 5 Iterative Noisy Optimization Algorithm (INOA).

1: Input Step-size params α > 0, A > 0 2: Input Revaluations params β ≥ 0, B > 0 3: Input Initial points xopt

1

= ˜ x1

4: Input Fitness function (= noisy objective function) 5: Input Sampling tool Sampler(·) and optimizer Opt(·) 6: n ← 1 7: while The computation time is not elapsed do 8:

Step-size σn = A/nα and revaluations nb rn = B⌈nβ⌉

9:

for i = 1 to rn do

10:

xn,i = Sampler(xopt

n

, σn, i), yn,i = fitness evaluation at xn,i

11:

end for

12:

Next approximation xopt

n+1 = Opt(xopt n

, (xn,i , yn,i )i∈{1,...,rn})

13:

n ← n + 1

14: end while 15: Return approximations (xopt

n

)n≥1, recommendations (˜ xm)m≥1, evaluation points (xn,i )n≥1,i∈{1,...,rn}, fitness evalua- tions (yn,i )n≥1,i∈{1,...,rn}

Key points: Sampler which chooses a sampling around the current approximation, Opt which updates the approximation of the optimum, resampling number rn, and sampling step-size σn.

87 / 77

slide-104
SLIDE 104

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison in CNO: Adaptation of Newton’s algorithm

A possible implementation: using quadratic approximations (≃ Newton). Algorithm 6 Adaptation of Newton’s algorithm for black-box noisy functions; gradi- ent+Hessian approximated by finite diffs+resamplings. Recommendations = the xn’s. ei = ith basis vector of Rd.

1: Input dimension d ∈ N+, A > 0, B > 0, α > 0, β > 0, an initial x1 ∈ Rd 2: n ← 1, ˜

x ← x1,ˆ h ← identity matrix

3: while (true) do 4:

Compute σn = A/nα ◮ step-size

5:

for i = 1 to d do

6:

Evaluate gi by finite differences at xn + σnei and xn − σnei (averaged over ⌈Bnβ⌉ resamplings).

7:

end for

8:

for i = 1 to d do

9:

Evaluate ˆ hi,i by finite differences at xn ± σnei and xn (averaged over ⌈Bnβ⌉ resamplings)

10:

for j = 1 to d, j = i do

11:

Evaluate ˆ hi,j by finite diffs with evaluations at xn ± σnei ± σnej (averaged over ⌈Bnβ/10⌉ resamplings)

12:

end for

13:

end for

14:

δ ← solution of ˆ hδ = −g ◮ possible next search point

15:

if δ > 1

2 σn then

16:

δ = 1

2 σn δ δ

◮ trust region style

17:

end if

18:

xn+1 = xn + δ, ˜ x ← xn+1,n ← n + 1 ◮ update recommendation

19: end while

88 / 77

slide-105
SLIDE 105

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Annexe: Resampling methods: Non-adaptive resampling methods

[Recall] log-log convergence: log ||xn−x∗||

log(n)

∼ A′′ < 0, n is evaluation number Non-adaptive rules: exponential rules with ad hoc parameters ⇒ log-log convergence (proved by us) Rule Formula Constant 1 Square root ⌈√n⌉ Linear n Quadratic n2 Exponential 2n Exponential ⌈1.1n⌉ Exponential ⌈1.01n⌉ Scale ⌈ 1

d2 exp( 4n 5d )⌉

Scale2 ⌈exp( n

10)/d2⌉

Table: Notation of resampling rules used in the presented experiments. d is the dimension of search space. From now on n is the iteration number.

89 / 77

slide-106
SLIDE 106

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison with ES: Settings

Self-Adaptive (µ/µ,λ)-ES Sphere function with additive noise f(x, ω) = ||x − x∗||2 + N, x ∈ Rd, x∗ = 0 Local noisy optimization (no local minima involved) Goal of this work: a conclusive answer for parameter-free resampling rules in simple settings

90 / 77

slide-107
SLIDE 107

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison with ES: Results & Discussion

Polynomial rule (rn = n2) performs badly in small dimension 2exp (rn = 2n) performs badly in large dimension Scale2 (rn = ⌈exp( n

10)/d2⌉) has the best of both worlds and performs nearly

  • ptimally among our non-adaptive rules in the considered setting, i.e.

σNoise ∼ O(1).

5 10 15 20 25 30 −7 −6 −5 −4 −3 −2 −1 1 2 λ=32, µ=1 λ=32, µ=8 λ=44, µ=1 λ=44, µ=10 λ=44, µ=11 5 10 15 20 25 30 −7 −6 −5 −4 −3 −2 −1 1 2 λ=32, µ=1 λ=32, µ=8 λ=44, µ=1 λ=44, µ=10 λ=44, µ=11 5 10 15 20 25 30 −7 −6 −5 −4 −3 −2 −1 1 2 λ=32, µ=1 λ=32, µ=8 λ=44, µ=1 λ=44, µ=10 λ=44, µ=11

Figure: quad (left); 2exp (middle); Scale2 (right) for d = 10. x-axis: log(#eval); y-axis: log(SR).

91 / 77

slide-108
SLIDE 108

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison in DE: Settings

DE/rand/2 [Storn, 1996] : p′

i = pa + F(pb − pc) + F(pd − pe)

CEC 2005 testbed (5 unimodal and 20 multimodal) + strong noise Dimension 2, 10 and 30

92 / 77

slide-109
SLIDE 109

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison in DE: Results & Discussion (1/2)

Strongly noisy model Our non-adaptive resampling rules were able to match − 1

2 5 10 15 20 25 5 10 15

N1.01exp N1.1exp N2exp Nscale

Figure: F4 (unimodal), dimension 2. x-axis: log(#eval); y-axis: log(SR).

93 / 77

slide-110
SLIDE 110

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Comparison in DE: Results & Discussion (2/2)

Adaptive rules

have difficulties with equal fitness values (e.g. plateaus), leading to possibly infinite loops (unless some limits are used) might be merged with bounds on resampling numbers

Non-adaptive methods improve adaptive methods, but adaptive methods do not clearly improve non-adaptive ones. Nscale = ⌈d−2 exp( 4n

5d )⌉, heuristically derived in [Liu et al., 2014], performs

reasonably

94 / 77

slide-111
SLIDE 111

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Portfolio of noisy optimization methods: INOPA

Solver1 Solver2 Solver3 Solver4 . . .lag(r1) lag(r2) . . . r1 r2 Solver1 Solver2 Solver3 Solver4 . . .lag(r1) lag(r2) . . . r1 r2 Solver1 Solver2 Solver3 Solver4 . . .lag(r1) lag(r2) . . . r1 r2 Solver1 Solver2 Solver3 Solver4 . . .lag(r1) lag(r2) . . . r1 r2

Figure: Schema of Improved Noisy Optimization Portfolio Algorithm (INOPA), using only LAG(rn) evaluations (negligible) on the sub-optimal solvers.

95 / 77

slide-112
SLIDE 112

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Portfolio of noisy optimization methods: Unfair budget distribution

Theoretically: Use only LAG(rn) evaluations (negligible) on the sub-optimal solvers (INOPA) log(M′) shift with M′ the number of optimal solvers In practice: n: portfolio iteration number rn = ⌈n4.2⌉: comparison period sn = ⌈n2.2⌉: revaluation number for comparing at iteration n

LAG(rn) = n << rn : index of recommendation to be compared

(This parametrization matches our conditions.)

96 / 77

slide-113
SLIDE 113

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Noisy Optimization Algorithms (NOAs)

SA-ES: Self-Adaptive Evolution Strategy Fabian’s algorithm: a first-order method using gradients estimated by finite differences[Dvoretzky et al., 1956, Fabian, 1967] Noisy Newton’s algorithm: a second-order method using a Hessian matrix approximated also by finite differences (Algorithm 6) . . . Solvers Algorithm and parametrization RSAES RSAES with λ = 10d, µ = 5d, resamplingn = 10n2 Fabian1 Fabian’s solver with stepsize σn = 100/n0.1, a = 1 Fabian2 Fabian’s solver with stepsize σn = 2/n0.49, a = 1 Newton Newton’s solver with stepsize σn = 100/n4, resamplingn = n2 NOPA NL NOPA of 4 mono-solvers with rn = ⌈n4.2⌉, sn = ⌈n2.2⌉, no lag NOPA NOPA of 4 mono-solvers with LAG(rn) = n, rn = ⌈n4.2⌉, sn = ⌈n2.2⌉ INOPA INOPA of 4 mono-solvers with LAG(rn) = n, rn = ⌈n4.2⌉, sn = ⌈n2.2⌉

Table: Mono-solvers and portfolios used in the experiments. NL refers to no lag, thus

LAG(rn) = rn.

97 / 77

slide-114
SLIDE 114

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Annexe: INOPA (1/2)

Algorithm 7 Improved Noisy Optimization Portfolio Algorithm (INOPA).

1: Input noisy optimization solvers Solver1, Solver2, . . . , SolverM 2: Input a lag function LAG : N+ → N+ 3: Input a non-decreasing positive integer sequence r1, r2, . . .

◮ Periodic comparisons

4: Input a non-decreasing integer sequence s1, s2, . . .

◮ Number of resamplings

5: n ← 1

◮ Number of selections

6: m ← 1

◮ NOPA’s iteration number

7: i∗ ← null

◮ Index of recommended solver

8: x∗ ← null

◮ Recommendation

98 / 77

slide-115
SLIDE 115

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe

Annexe: INOPA (2/2)

9: while budget is not exhausted do 10:

if m ≥ LAG(rn) or i∗ = null then

11:

i∗ = arg min

i∈{1,...,M}

ˆ Esn [f(˜ xi,LAG(rn))] ◮ Algorithm selection

12:

m′ ← rn

13:

while m′ < rn+1 do

14:

Apply one evaluation to solver i∗

15:

m′ ← m′ + 1

16:

x∗ = ˜ xi∗,m′ ◮ Update recommendation

17:

end while

18:

n ← n + 1

19:

else

20:

for i ∈ {1, . . . , M}\i∗ do

21:

Apply LAG(rn) − LAG(rn−1) evaluations for Solveri

22:

end for

23:

m ← m + 1

24:

end if

25: end while

99 / 77

slide-116
SLIDE 116

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

NOA 1: SA-ES with revaluations

Algorithm 2 Self-Adaptive Evolution Strategy with revaluations.

1: Parameters: K > 0, ζ ≥ 0, λ ≥ µ ∈ N∗, a dimension d ∈ N∗ 2: Input: an initial parent x1,i ∈ Rd and an initial σ1,i = 1, i ∈ {1, . . . , µ} 3: n ← 1 4: while (true) do 5:

Generate λ individuals ij , j ∈ {1, . . . , λ}, independently using ◮ Generation σj = σn,mod(j−1,µ)+1 × exp 1 2d N

  • and ij = xn,mod(j−1,µ)+1 + σj N

6:

Evaluate each of them ⌈Knζ⌉ times and average their fitness values ◮ Evaluation

7:

Define j1, . . . , jλ so that ◮ Ranking ˆ E⌈Knζ ⌉[f(ij1 )] ≤ ˆ E⌈Knζ ⌉[f(ij2 )] · · · ≤ ˆ E⌈Knζ ⌉[f(ijλ )]

8:

σn+1,k = σjk and xn+1,k = ijk , k ∈ {1, . . . , µ} ◮ Updating

9:

n ← n + 1

10: end while

100 / 77

slide-117
SLIDE 117

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

NOA 2: Fabian’s Algorithm (1/2)

Algorithm 3 Fabian’s stochastic gradient algorithm with finite differences[Shamir, 2013, Fabian, 1967].

1: Parameters: a dimension d ∈ N∗, 1

2 > γ > 0, a > 0, c > 0, m ∈ N∗, weights w1 > · · · > wm summing to 1, scales

1 ≥ u1 > · · · > um > 0

2: Input: an initial x1 ∈ Rd 3: n ← 1 4: while (true) do 5:

Compute σn = c/nγ

6:

Evaluate the gradient g at xn by finite differences, averaging over 2m samples per axis. ∀i ∈ {1, . . . , d}, ∀j{1 . . . m} x(i,j)+

n

= xn + uj ei and x(i,j)−

n

= xn − uj ei gi = 1 2σn

m

  • j=1

wj

  • f(x(i,j)+

n

) − f(x(i,j)−

n

)

  • 7:

Gradient step: Apply xn+1 = xn − a

n g

8:

n ← n + 1

9: end while

101 / 77

slide-118
SLIDE 118

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

NOA 2: Fabian’s Algorithm (2/2)

For Regret = SR or CR Slope(Regret) = lim

n→∞

log(Regret(n)) log(n)

Algorithm Parameter Slope(SR) Slope(CR) γ → 0 −1 1 Fabian γ → 1

2

1 γ → 1

4

− 1

2 1 2

102 / 77

slide-119
SLIDE 119

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Portfolio: Experiments: Multimodal case

Solver CT = 20s CT = 40s CT = 80s CT = 160s RSAES −.503 ± .008 −.483 ± .007 −.469 ± .006 −.465 ± .003 Fabian1 .002 ± 0 .002 ± 0 .002 ± 0 .002 ± 0 Fabian2 .002 ± 0 .002 ± 0 .002 ± 0 .002 ± 0 Newton .002 ± 0 .002 ± 0 .002 ± 0 .002 ± 0 NOPA NL −.469 ± .010 −.465 ± .006 −.452 ± .005 −.433 ± .006 NOPA −.465 ± .009 −.466 ± .008 −.430 ± .013 −.431 ± .009 INOPA −.468 ± .009 −.458 ± .007 −.445 ± .006 −.424 ± .006

Table: Slope(SR) for cartpole using a Neural Network policy with 2 neurons. CT: Computation time.

103 / 77

slide-120
SLIDE 120

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Application to power investment problem (1/2)

We consider investment policy: k = (C, F, X, S, W, P, T, U, N, A) ∈ {0, 1

2, 1}10

scenario: s = (Z, WB, PB, TB, XB, UB, SB, CC, NT) ∈ {0, 1

2, 1}9 k ∈ {0, 1 2 , 1} CORRESPONDING INVESTMENT C Coal F Nuclear fission X Nuclear fusion S Supergrids W Wind power P PV units T Solar thermal U Unconventional renewable N Nanogrids A massive storage in Scandinavia

Table: Parameters and descriptions of k.

s ∈ {0, 1 2 , 1} NATURE’S ACTION Z Massive geopolitical issues WB Wind power technological breakthrough PB PV Units breakthrough TB Solar thermal breakthrough XB Fusion breakthrough UB Unconventional renewable breakthrough SB Local storage breakthrough CC Climate change disaster NT Nuclear terrorism

Table: Parameters and descriptions of s.

104 / 77

slide-121
SLIDE 121

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Application to power investment problem (2/2)

∀k ∈ K, ∀s ∈ S R(k, s) =2 3(1 + rand) · (N(1 − Z)/5 − cost · (N + U + T + P + W + S + X + F + C) + 7XB · X + W(1 + WB)(SB + √ S)/2 + 3P(PB + SB) − 4C · CC − F · NT + S(1 − Z) + P · Z + U · UB + T · S · (1 + TB − SB/2) − F · NT + A · (1 + W + P − 2SB)) where cost is a meta-parameter.

105 / 77

slide-122
SLIDE 122

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Application to power investment problem: results

α AVERAGE SPARSITY LEVEL OVER 310 = 59049 ARMS 8⌈K/10⌉ simul. K simul. 32⌈K/10⌉ simul. 128⌈K/10⌉ simul. 512⌈K/10⌉ simul. 2048⌈K/10⌉ simul. 0.1 1873.70 ± 10.87 4621.87 ± 28.34 non-truncated non-truncated non-truncated non-truncated 0.3 491.53 ± 7.74 955.32 ± 13.78 12140.13 ± 234.46 7577.95 ± 154.37 710.45 ± 11.28 320.43 ± 2.91 0.5 126.18 ± 3.71 216.63 ± 5.53 1502.24 ± 33.85 687.42 ± 19.37 33.01 ± 1.14 10.16 ± 0.27 0.7 24.80 ± 1.23 36.69 ± 1.69 168.02 ± 6.94 63.04 ± 2.49 6.57 ± 0.27 2.59 ± 0.11 0.9 3.54 ± 0.23 3.73 ± 0.26 7.35 ± 0.49 5.12 ± 0.29 1.93 ± 0.09 1.17 ± 0.04 α EMPIRICAL MEAN REWARD AGAINST PURE STRATEGIES 8⌈K/10⌉ simul. K simul. 32⌈K/10⌉ simul. 128⌈K/10⌉ simul. 512⌈K/10⌉ simul. 2048⌈K/10⌉ simul. 0.1 2.595 ± .006 2.174 ± .006

  • .029 ± .004

1.050 ± .004 2.184 ± .005 4.105 ± .006 0.3 3.299 ± .010 3.090 ± .009 2.195 ± .017 3.892 ± .018 6.555 ± .008 6.822 ± .004 0.5 3.896 ± .016 3.779 ± .015 3.592 ± .016 5.275 ± .020 6.741 ± .007 6.853 ± .004 0.7 4.501 ± .030 4.454 ± .027 4.674 ± .022 6.101 ± .016 6.777 ± .007 6.858 ± .005 0.9 5.021 ± .058 5.149 ± .062 5.703 ± .040 6.536 ± .017 6.813 ± .007 6.873 ± .004 Pure 4.853 ± .158 5.027 ± .143 5.709 ± .101 6.137 ± .163 6.413 ± .136 6.844 ± .028 α ROBUST SCORE: WORST SCORE AGAINST PURE STRATEGIES 8⌈K/10⌉ simul. K simul. 32⌈K/10⌉ simul. 128⌈K/10⌉ simul. 512⌈K/10⌉ simul. 2048⌈K/10⌉ simul. 0.5 −5.560 ± 0.070 −5.693 ± 0.058 −5.725 ± 0.060 −3.479 ± 0.061 −0.576 ± 0.041 0.056 ± 0.024 0.7 −4.028 ± 0.094 −4.132 ± 0.094 −4.038 ± 0.074 −1.243 ± 0.032 0.010 ± 0.018 0.268 ± 0.011 0.9 −2.012 ± 0.107 −1.859 ± 0.115 −1.369 ± 0.081 −0.195 ± 0.028 0.272 ± 0.011 0.330 ± 0.003 Pure −0.938 ± 0.078 −0.971 ± 0.092 −0.455 ± 0.060 0.182 ± 0.021 0.323 ± 0.005 0.333 ± 0.000

Table: In these tables, the result is averaged over 100 independent learnings.

106 / 77

slide-123
SLIDE 123

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

A modified power investment problem

R′(k, s) =R(k, s) + c · ((X == XB) + (C = CC) + (NT = F) + (P == PB)) where c is a meta-parameter.

107 / 77

slide-124
SLIDE 124

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

A modified power investment problem: results (1/2)

α AVERAGE SPARSITY LEVEL OVER 310 = 59049 ARMS T = K T = 10K T = 50K T = 100K T = 500K T = 1000K 0.1 13804.380 ± 52.015 non-sparse non-sparse non-sparse non-sparse non-sparse 0.3 2810.120 ± 59.083 non-sparse non-sparse non-sparse non-sparse non-sparse 0.5 395.920 ± 15.835 non-sparse non-sparse 59048.960 ± 196.946 49819.430 ± 195.016 non-sparse 0.7 43.230 ± 2.624 58925.340 ± 26.821 55383.140 ± 150.057 46000.020 ± 277.653 9065.180 ± 159.610 non-sparse 0.9 3.600 ± 0.260 992.940 ± 64.474 796.500 ± 41.724 503.600 ± 24.927 97.670 ± 5.445 52632.820 ± 522.505 0.99 1.110 ± 0.031 2.250 ± 0.171 2.500 ± 0.180 2.310 ± 0.156 1.790 ± 0.121 6.700 ± 0.612 α PROXY EXPLOITABILITY T = K T = 10K T = 50K T = 100K T = 500K T = 1000K NT 4.922e-01 ± 5.649e-07 4.928e-01 ± 1.787e-06 4.956e-01 ± 4.016e-06 4.991e-01 ± 5.892e-06 5.221e-01 ± 1.404e-05 4.938e-01 ± 1.687e-06 0.1 4.948e-01 ± 5.739e-05 4.928e-01 ± 1.787e-06 4.956e-01 ± 4.016e-06 4.991e-01 ± 5.892e-06 5.221e-01 ± 1.404e-05 4.938e-01 ± 1.687e-06 0.3 5.004e-01 ± 1.397e-04 4.928e-01 ± 1.787e-06 4.956e-01 ± 4.016e-06 4.991e-01 ± 5.892e-06 5.221e-01 ± 1.404e-05 4.938e-01 ± 1.687e-06 0.5 5.059e-01 ± 2.272e-04 4.928e-01 ± 1.787e-06 4.956e-01 ± 4.016e-06 4.991e-01 ± 5.891e-06 5.242e-01 ± 5.491e-05 4.938e-01 ± 1.687e-06 0.7 5.054e-01 ± 1.327e-03 4.928e-01 ± 3.835e-06 4.965e-01 ± 3.896e-05 5.031e-01 ± 1.016e-04 5.317e-01 ± 9.573e-05 4.938e-01 ± 1.687e-06 0.9 4.281e-01 ± 6.926e-03 5.137e-01 ± 4.199e-04 5.151e-01 ± 5.007e-04 5.140e-01 ± 4.965e-04 5.487e-01 ± 9.413e-04 4.960e-01 ± 1.828e-04 0.99 3.634e-01 ± 8.191e-03 4.357e-01 ± 6.873e-03 4.612e-01 ± 5.380e-03 4.683e-01 ± 4.834e-03 5.242e-01 ± 3.302e-03 5.390e-01 ± 3.167e-03 Pure 3.505e-01 ± 7.842e-03 3.946e-01 ± 7.181e-03 4.287e-01 ± 6.203e-03 4.489e-01 ± 5.410e-03 5.143e-01 ± 3.597e-03 4.837e-01 ± 5.558e-03 α ROBUST SCORE: WORST SCORE AGAINST PURE STRATEGIES T = K T = 10K T = 50K T = 100K T = 500K T = 1000K NT 1.369e-02 2.092e-02 1.946e-02 1.492e-02 2.669e-02 4.525e-02 0.1 1.109e-02 2.092e-02 1.946e-02 1.492e-02 2.669e-02 4.525e-02 0.3 5.485e-03 2.092e-02 1.946e-02 1.492e-02 2.669e-02 4.525e-02 0.5 0.000e+00 2.092e-02 1.946e-02 1.492e-02 2.454e-02 4.525e-02 0.7 4.328e-04 2.091e-02 1.854e-02 1.083e-02 1.705e-02 4.525e-02 0.9 7.778e-02 0.000e+00 0.000e+00 0.000e+00 0.000e+00 4.304e-02 0.99 1.425e-01 7.806e-02 5.385e-02 4.564e-02 2.456e-02 0.000e+00 Pure 1.554e-01 1.191e-01 8.638e-02 6.503e-02 3.443e-02 5.537e-02

Table: Results for reward matrix R′ computed with c = 1. In these tables, the result is the average value of 100 learnings. The proxy exploitability is the difference between the best robust score in the table, minus the robust score.

108 / 77

slide-125
SLIDE 125

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

A modified power investment problem: results (2/2)

α AVERAGE SPARSITY LEVEL OVER 310 = 59049 ARMS T = K T = 10K T = 50K T = 100K T = 500K T = 1000K 0.1 6394.625 ± 84.308 non-sparse non-sparse non-sparse non-sparse non-sparse 0.3 1337.896 ± 40.491 non-sparse non-sparse non-sparse non-sparse non-sparse 0.5 206.146 ± 12.647 non-sparse non-sparse non-sparse non-sparse non-sparse 0.7 25.563 ± 2.045 non-sparse non-sparse non-sparse 59048.750 ± 0.250 non-sparse 0.9 3.729 ± 0.353 42616.313 ± 1476.644 47581 ± 1015.506 38361.182 ± 1091.373 4510.125 ± 726.595 58323.125 ± 157.971 0.99 1.208 ± 0.072 4.479 ± 0.575 5.333 ± 0.565 6.000 ± 0.969 2.875 ± 1.076 8.500 ± 2.204 α PROXY EXPLOITABILITY T = K T = 10K T = 50K T = 100K T = 500K T = 1000K NT 1.151e-01 ± 6.772e-08 1.151e-01 ± 2.175e-07 1.153e-01 ± 3.707e-07 1.154e-01 ± 5.797e-07 1.167e-01 ± 2.046e-06 1.152e-01 ± 1.297e-06 0.1 1.158e-01 ± 1.843e-05 1.151e-01 ± 2.175e-07 1.153e-01 ± 3.707e-07 1.154e-01 ± 6.019e-07 1.167e-01 ± 2.046e-06 1.152e-01 ± 1.297e-06 0.3 1.160e-01 ± 3.441e-05 1.151e-01 ± 2.175e-07 1.153e-01 ± 3.707e-07 1.154e-01 ± 6.019e-07 1.167e-01 ± 2.046e-06 1.152e-01 ± 1.297e-06 0.5 1.166e-01 ± 9.751e-05 1.151e-01 ± 2.175e-07 1.153e-01 ± 3.707e-07 1.154e-01 ± 5.797e-07 1.167e-01 ± 2.046e-06 1.152e-01 ± 1.297e-06 0.7 1.165e-01 ± 6.176e-04 1.151e-01 ± 2.175e-07 1.153e-01 ± 3.707e-07 1.154e-01 ± 5.797e-07 1.167e-01 ± 2.051e-06 1.152e-01 ± 1.297e-06 0.9 1.068e-01 ± 3.176e-03 1.156e-01 ± 4.586e-05 1.160e-01 ± 6.348e-05 1.172e-01 ± 9.722e-05 1.266e-01 ± 3.829e-04 1.152e-01 ± 4.288e-06 0.99 8.423e-02 ± 3.118e-03 1.119e-01 ± 2.316e-03 1.189e-01 ± 1.888e-03 1.202e-01 ± 2.101e-03 1.145e-01 ± 4.519e-03 1.186e-01 ± 7.684e-04 Pure 7.810e-02 ± 2.570e-03 8.354e-02 ± 2.710e-03 9.327e-02 ± 2.202e-03 9.658e-02 ± 2.097e-03 1.120e-01 ± 3.625e-03 8.755e-02 ± 6.497e-03 α ROBUST SCORE: WORST SCORE AGAINST PURE STRATEGIES T = K T = 10K T = 50K T = 100K T = 500K T = 1000K NT 1.494e-03 4.594e-04 3.592e-03 4.772e-03 9.903e-03 3.388e-03 0.1 7.727e-04 4.594e-04 3.592e-03 4.772e-03 9.903e-03 3.388e-03 0.3 5.838e-04 4.594e-04 3.592e-03 4.772e-03 9.903e-03 3.388e-03 0.5 0.000e+00 4.594e-04 3.592e-03 4.772e-03 9.903e-03 3.388e-03 0.7 9.391e-05 4.594e-04 3.592e-03 4.772e-03 9.903e-03 3.388e-03 0.9 9.758e-03 0.000e+00 2.860e-03 2.992e-03 0.000e+00 3.371e-03 0.99 3.236e-02 3.647e-03 0.000e+00 0.000e+00 1.211e-02 0.000e+00 Pure 3.848e-02 3.204e-02 2.559e-02 2.362e-02 1.463e-02 3.103e-02

Table: Results for reward matrix R′ computed with c = 10. In these tables, the result is the average value of 100 learnings. The proxy exploitability is the difference between the best robust score in the table, minus the robust score.

109 / 77

slide-126
SLIDE 126

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Rectangle method

Algorithm 4 Rectangular algorithms: K = Kt

Input A stochastic AI playing as Black, a stochastic AI′ playing as White. Build Mi,j = 1 if AIi (Black) wins against AI′

j (White) for i ∈ {1, . . . , K} and j ∈ {1, . . . , Kt }.

Build M′

i,j = 1 if AI′ i (White) wins against AIj (Black) for i ∈ {1, . . . , K} and j ∈ {1, . . . , Kt }.

if BestSeed then BAI is AIi where i maximizes Kt

j=1 Mi,j .

WAI′ is AI′

i where i maximizes Kt j=1 M′ i,j .

else if Nash then Compute (p, q) a Nash equilibrium of M. BAI is AIi with probability pi Compute (p′, q′) a Nash equilibrium of M′. WAI′ is AI′

j with probability p′ i

else if Uniform then BAI is AIi with probability 1/K. WAI′ is AI′

j with probability 1/K.

end if Return A boosted AI termed BAI playing as Black, a boosted AI WAI′ playing as White.

110 / 77

slide-127
SLIDE 127

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Comparison in CNO: Results & Discussion (1/2)

3 types of noise: constant, linear or quadratic in function of the SR: Var(f(x, ω)) = O

  • [Eωf(x, ω) − Eωf(x∗, ω)]z

with z ∈ {0, 1, 2}.

z

  • ptimized for CR
  • ptimized for SR

s(SR) s(CR) s(SR) s(CR) α ≃ ∞, β ≃ 4α + 1+ β = 6α, α = ∞ 0 (constant var) 1/2 −2/3 −1/2

[Fabian, 1967] [Dupaˇ c, 1957]

2/3

[Shamir, 2013]

0 and ∞-differentiable −1 [Fabian, 1967] α = 0, β ≃ ∞ 0 and “quadratic” −1

[Dupaˇ c, 1957]

1 (linear var) α ≃ ∞, β ≃ 2α + 1+ −1 −1

[Rolet and Teytaud, 2010]

2 (quadratic var) α ≃ ∞, β > 1 −∞ −∞

Table: slope(SR) and slope(CR) for INOA for various values of α and β, in the case of twice-differentiable functions. z is related to the intensity of the noise. The references mean that

  • ur algorithm gets the same rate as in the cited paper. No reference means that the result is new.

111 / 77

slide-128
SLIDE 128

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Optimizing random seeds: Correlations

Fact: the random seed matters !

Figure: Success rate per seed (ranked) in 5x5 Domineering (left) and 9x9 Domineering (right), with standard deviations on y-axis: the seed has a significant impact.

112 / 77

slide-129
SLIDE 129

PORTFOLIO METHODS IN UNCERTAIN CONTEXTS Annexe: Noisy Optimization Algorithms (NOAs)

Optimizing random seeds

Board size 5x5 7x7 9x9 Atari-Go 73.5% 67.5% 59% Domineering 86% 71.5% 65.5% Board size 5x5 6x6 7x7 8x8 Breakthrough 65.5% 57.5% 55.5% 57%

113 / 77