Analysis of Evaluation-Function Learning by Comparison of Sibling - PowerPoint PPT Presentation

Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes Tomoyuki Kaneko 1 and Kunihito Hoki 2 1 University of Tokyo, Japan kaneko@acm.org 2 University of Electro-Communications Advances in Computer Games 13 Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 1 / 1

Outline Background: Machine learning of evaluation functions Recent success in shogi Analysis of (partial) gradient of Minmax value When is it differentiable? Is it equal to gradient of leaf evaluation? (implicitly assumed in previous work) Experiments in shogi: How frequently is Minmax value non-differentiable? Upper bounds by Multiple PVs Different gradients in multiple PVs Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 2 / 1

Minmax search (Tilburg photo) Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 3 / 1

Minmax search Minmax value: result of Minmax search Minimum or Maximum of children (for a internal node) Evaluation by evaluation function (for a leaf node) PV: principal variation (the left most branch) Path from the root to a leaf, s.t. Minmax (child) = Minmax (parent) Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 4 / 1

Evaluation function Definition eval( p , θ ) : p : a game position θ ∈ R N : a parameter vector Assumption: eval( p , θ ) is differentiable w.r.t. θ Example: θ = ( a , b ) eval( p , θ ) = a · # pawns ( p ) + b · # pieces ( p ) ∂ ∂ a eval( p , θ ) = # pawns ( p ) Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 5 / 1

Motivation: machine learning Goal of leaning evaluation functions Adjustment of Minmax value via θ : Comparison: better Minmax value for a grandmasters’ move than that of other legal moves (Nowatzyk2000, Tesauro2001, Hoki2006 ) Success in shogi: outperformed all hand tuned evaluation functions How it works ➔ First talk@Session10 (tomorrow) TDLeaf: similar Minmax value to that of future positions (Baxter et al. 2000) Common problem: How to obtain the gradient of Minmax value? Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 6 / 1

Partial derivative of Minmax value 3.5 x*x 3 2.5 2 1.5 1 0.5 0 Adjustment by Gradient descent -0.5 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 Goal: Adjustment of Minmax value of root (R) Method (ideal): update θ by ∂ R ∂θ i Known problem: R is not always partially differentiable Method (work around): update θ by ∂ L ∂θ i Work around: use ∂ L (the leaf of PV), instead of ∂ R ∂θ i ∂θ i Observation: Equal Minmax value, R= L (by definition) Expectation: Similar gradients, ∂ R= ∂ L ✎ ☞ ∂θ i ∂θ i How different?: ∂ Root (R) ↔ ∂ PVleaf (L) ✍ ✌ ∂θ i ∂θ i Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 7 / 1

Example + informal discussion: one child OK: ∂ R = ∂ L ∂θ i ∂θ i L is always PV for any Minmax value Minmax value R always equals that of L ✞ ☎ ✝ ✆ L + δ = R + δ Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 8 / 1

Example: two children (different leaf values) OK: ∂ R = ∂ L ∂θ i ∂θ i If n = − 5 for any δ : L is better than n while ( L + δ > n , i.e., δ < 5 ) L will be PV for δ < 5 ✞ ☎ Minmax value R equals that of L when δ < 5 , ✝ ✆ L + δ = R + δ ( δ < 5) Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 9 / 1

Example: two children (tie) NG: ∂ R (not defined) ∂θ i ✎ ☞ If n = 0 for any δ : L + δ > n ( δ > 0 ) ✍ ✌ ✎ ☞ L + δ < n ( δ < 0 ) L is better than n while ( L + δ > 0 ) n is better than L while ( L + δ < 0 ) L+ δ = R+ δ ( δ > 0 ) ✍ ✌ L+ δ � R ( δ < 0 ) L or n will be PV for δ ≈ 0 Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 10 / 1

Unique PV ↔ Differentiable? ✎ ☞ True as expected: � ∂ � R = ∂ Unique PV → L ✍ ✌ ∂θ i ∂θ i False: ✎ ☞ � ∂ � �� ∂ R = ∂ R defined → Unique PV ∧ L ✍ ✌ ∂θ i ∂θ i ∂θ i ✎ ☞ A counter example exists: � ∂ � ∂ R � ∂ R defined ∧ L ✍ ✌ ∂θ i ∂θ i ∂θ i Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 11 / 1

Example: two children (different leaf values) OK: ∂ R = ∂ L ∂θ i ∂θ i If θ i changed by ∆ ( θ i ← θ i + ∆ ), all leaves (L and n ) will change For any gradients of L and n , L is better than n for small | ∆ | ✤ ✜ � � � � L + ∂ n + ∂ L · ∆ > n · ∆ ( ∃ a > 0 , | ∆ | < a ), ∂θ i ∂θ i R θ i ← θ i +∆ ≈ L + ∂ L · ∆ ( | ∆ | < a ). ✣ ✢ ∂θ i Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 12 / 1

Example: two children (tie, same leaf gradient) OK: ∂ R = ∂ L ∂θ i ∂θ i Even if L and n has the same value, R is still differentiable if L and n have the same gradient. ✎ ☞ R θ i ← θ i +∆ ≈ L + ∂ L · ∆ ✍ ✌ ∂θ i Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 13 / 1

Example: two children (tie, different leaf gradients) NG: ∂ R (not defined) ∂θ i When L and n has the same value but different gradients, change of R depends on whether lim ∆ → + 0 or lim ∆ →− 0 ✛ ✘ L + ∂   L · ∆ ( | ∆ | > 0; L is PV )    ∂θ i   R θ i ← θ i +∆ ≈ n + ∂    n · ∆ ( | ∆ | < 0; n is PV )   ✚ ✙  ∂θ i Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 14 / 1

Example: ∂ L hidden by others ∂θ i NG: ∂ R � ∂ L (defined but different) ∂θ i ∂θ i ∂ L = 1 ∂θ i ∂ R = 0 ↔ ∂θ i Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 15 / 1

Practical issues and experiments ✗ ✔ (1) How frequently non-differentiable R exists? Estimation of upper bounds in training positions by: Multiple PVs ✖ ✕ Different gradients in multiple PVs (2) Is 1 small enough for update step ∆ ? ∆ ≥ 1 for integer parameters ∀ ǫ > 0 , ∃ δ > 0 for real parameters ✛ ✘ How frequently objective function J will be improved by update along with ∂ J , for ∆ = 1 , 2 , 4 , and 8 ? ∂θ i ✚ ✙ ➔ please see proceedings Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 16 / 1

Experiments in shogi: evaluation functions Practical evaluation functions: Learnt: main evaluation function of GPSShogi revision 2590 Near optimal by learning ≈ 1 . 4 (8) million parameters Hand-tuned: old evaluation function used until 2008 Reasonable but far from optimal Poor evaluation functions: Piece: initial values in learning Same piece values as Learnt, 0 for others. Piece128: extreme initial values 128 for piece values, 0 for others. ☛ ✟ GPSShogi: open source, winner of CO 2011 ✡ ✠ http://gps.tanaka.ecc.u-tokyo.ac.jp/gpsshogi/index.php?GPSShogiEn Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 17 / 1

Statistics: #legal moves and #moves of similar evaluation 140 Siblings (Learnt) Siblings (Hand-tuned) 120 Siblings (Piece) Siblings (Piece128) Average moves 100 All legal moves 80 60 40 20 0 0 20 40 60 80 100 120 140 160 Move number Legal moves ■ : ≈ 20 (opening), ≈ 130 (endgame) Practical evaluation functions (Learnt + , Hand-tuned ✕ ): ≈ 20 (opening, endgame) moves in αβ window of 2 pawns. Poor evaluation functions (Piece * , Piece128 ❏ ): ≈ 40 moves or more in αβ window of 2 pawns. Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 18 / 1

Frequency: number of PVs Cumulative frequency (%) 100 Learnt Hand 80 Piece 60 Piece128 40 20 0 1 2 3 4 #PV Practical evaluation functions: almost always unique Learnt + : unique PV for almost all positions Hand-tuned ✕ : unique PV in more than 80% of positions, more than 2 PVs in less than 4% of positions. Poor evaluation functions: rarely unique Piece * : multiple PVs for more than 86% of positions Piece128 ❏ : multiple PVs for more than 99% of positions Tomoyuki Kaneko (University of Tokyo) Analysis of Evaluation-Function Learning Advances in Computer Games 13 19 / 1

Analysis of Evaluation-Function Learning by Comparison of Sibling - PowerPoint PPT Presentation

Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes Tomoyuki Kaneko 1 and Kunihito Hoki 2 1 University of Tokyo, Japan kaneko@acm.org 2 University of Electro-Communications Advances in Computer Games 13 Tomoyuki Kaneko

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

EVALUATION AND COMPARISON OF QUALITY EVALUATION AND COMPARISON OF QUALITY OF BEECH WOOD (FAGUS

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Function Calls Function Calls Python supports expressions with math-like functions A

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

BEMUSE PHASE II: BEMUSE PHASE II: COMPARISON AND ANALYSIS COMPARISON AND ANALYSIS OF THE

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

The Natural Logarithm Function and The Exponential Function One specific logarithm function is

Recursion What happens when a function calls itself? This is known as a recursive function

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Comparison of Object-Oriented Programming Languages Timothy Clark (488232) April 28, 2008

The Perceptual Proxies of Visual Comparison Presented by: Youssef Sherif 1 Visual comparison

FG1 Friday, May 5, 2000 9:00AM S OAP O PERA T ESTI NG Hans Buwalda CMG TestFrame Research

LAST CHANCE GRADE Calif alifornia Tran ansportation C Com ommis ission August 2020

Conceptual Engineering for Commuter Rail within the Eureka Subdivision Corridor November 22,

Canadian Funding Opportunities for EUREKA & Eurostars Presentation to ERA-Can+, 2016-09-28

Webinar #5: Critical Paths in the Evaluation of Adaptation Projects and Programmes 7 December

IFADS CORPORATE INDUCTION SEMINAR Rome, 16 September 2013 2 Background The first

Eliminators in Agda and in general, by Mathijs Swint and Tom Lokhorst on the 2nd of October 2008

Finding gravitational waves from the early Universe Eiichiro Komatsu [Max Planck Institute for

Analysis of Evaluation-Function Learning by Comparison of Sibling - PowerPoint PPT Presentation

Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes Tomoyuki Kaneko 1 and Kunihito Hoki 2 1 University of Tokyo, Japan kaneko@acm.org 2 University of Electro-Communications Advances in Computer Games 13 Tomoyuki Kaneko

Evaluation function Cost function g g Evaluation function Cost function expand vertex

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

EVALUATION AND COMPARISON OF QUALITY EVALUATION AND COMPARISON OF QUALITY OF BEECH WOOD (FAGUS

Function Fields, Curves Introduction Function Fields vs. Curves and Global sections Function

Function Calls Function Calls Python supports expressions with math-like functions A

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation &amp; Analysis Lori

BEMUSE PHASE II: BEMUSE PHASE II: COMPARISON AND ANALYSIS COMPARISON AND ANALYSIS OF THE

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

Purpose, Function, and Design Purpose, Function, and Design Purpose, Function, and Design

The Natural Logarithm Function and The Exponential Function One specific logarithm function is

Recursion What happens when a function calls itself? This is known as a recursive function

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Comparison of Object-Oriented Programming Languages Timothy Clark (488232) April 28, 2008

The Perceptual Proxies of Visual Comparison Presented by: Youssef Sherif 1 Visual comparison

FG1 Friday, May 5, 2000 9:00AM S OAP O PERA T ESTI NG Hans Buwalda CMG TestFrame Research

LAST CHANCE GRADE Calif alifornia Tran ansportation C Com ommis ission August 2020

Conceptual Engineering for Commuter Rail within the Eureka Subdivision Corridor November 22,

Canadian Funding Opportunities for EUREKA &amp; Eurostars Presentation to ERA-Can+, 2016-09-28

Webinar #5: Critical Paths in the Evaluation of Adaptation Projects and Programmes 7 December

IFADS CORPORATE INDUCTION SEMINAR Rome, 16 September 2013 2 Background The first

Eliminators in Agda and in general, by Mathijs Swint and Tom Lokhorst on the 2nd of October 2008

Finding gravitational waves from the early Universe Eiichiro Komatsu [Max Planck Institute for

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

Canadian Funding Opportunities for EUREKA & Eurostars Presentation to ERA-Can+, 2016-09-28