Root-finding and Optimization Ryan Martin UIC - PowerPoint PPT Presentation

Stat 451 Lecture Notes 02 12 Root-finding and Optimization Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaard’s ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: February 8, 2016 1 / 49

Outline 1 Introduction 2 Univariate problems 3 Multivariate problems 4 Other miscellaneous things 2 / 49

Motivation In statistical applications, point estimation problems often boil down maximizing a function: maximum likelihood least squares maximum a posteriori When the function to be optimized is “smooth,” we can reformulate optimization into a root-finding problem. Trouble: these problems often have no analytical solution. Therefore, we need numerical tools to solve them. 3 / 49

General setup Two kinds of problems: Root-finding: Solve f ( x ) = 0 for x ∈ R d , d ≥ 1. Optimization: Maximize g ( x ) for x ∈ R d , d ≥ 1. Equivalent if f is the derivative of g . We will address univariate and multivariate cases separately. Methods construct a sequence { x t : t ≥ 0 } designed to converge (as t → ∞ ) to the solution, denoted by x ⋆ . 4 / 49

General setup, cont. Theoretical considerations: Under what conditions on f (or g ) and initial guess x 0 can we prove that x t → x ⋆ ? If x t → x ⋆ , then how fast, i.e., what is its convergence order? Practical consideration: How to write and implement the algorithm? Can’t run the algorithm till t = ∞ , so how to stop? 5 / 49

Outline 1 Introduction 2 Univariate problems Bisection Newton’s method Fisher scoring Secant method Fixed-point iteration Available functions in R 3 Multivariate problems 4 Other miscellaneous things 6 / 49

Bisection – basic idea Find unique root x ⋆ of f in interval [ a , b ]. Claim: f ( a ) f ( b ) ≤ 0. Why? Pick an initial guess x 0 = ( a + b ) / 2. Then x ⋆ must be in either [ a , x 0 ] or [ x 0 , b ] . Evaluate f ( x ) at the endpoints to determine which one. The selected interval, call it [ a 1 , b 1 ], is now just like the initial interval; i.e., we know it must contain x ⋆ . Take x 1 = ( a 1 + b 1 ) / 2. Continue this process to construct a sequence { x t : t ≥ 0 } . 8 / 49

Bisection algorithm Assume f ( x ) and the interval [ a , b ] are given. 1 Set x = ( a + b ) / 2. 2 If f ( a ) f ( x ) ≤ 0, then b = x ; else a = x . 3 If “converged,” then stop; otherwise return to Step 1. The convergence criteria is usually something like | x new − x old | < ε, where ε is a specified small number, e.g., ε = 10 − 7 . A relative convergence criteria might be better: | x new − x old | < ε. | x old | 9 / 49

Bisection theory Claim: if f is continuous, then x t → x ⋆ . Proof: If [ a t , b t ] is the bounding interval at step t , then f ( a t ) f ( b t ) ≤ 0 and t →∞ a t = lim lim t →∞ b t . So, x t converges to some x ∞ , and by continuity f ( x ∞ ) 2 ≤ 0. Then f ( x ∞ ) = 0 and, since x ⋆ is unique root, x ∞ = x ⋆ . Convergence holds under very mild conditions of f , but the robustness comes at the price of the order of convergence. 10 / 49

Examples Find x ⋆ to maximize the function g ( x ) = log x 1 + x , x ∈ [1 , 5] . Note that g ′ ( x ) = 1 + x − 1 − log x . (1 + x ) 2 Find the 100 p -th percentile of a Student-t distribution, i.e., find x ⋆ such that F ( x ⋆ ) = p , where F is the t ν distribution function, with df = ν fixed. 11 / 49

Basic idea Newton’s method is usually presented in a calculus class. Idea is to approximate a nonlinear function near its root by a linear function which can be solved by hand. Recall Taylor’s theorem gives the linear approximation of a function f ( x ) in a neighborhood of some point x 0 as f ( x ) ≈ f ( x 0 ) + f ′ ( x 0 ) ( x − x 0 ) . Setting this approximation equal to 0 and solving gives x = x 0 − f ( x 0 ) f ′ ( x 0 ) . 13 / 49

Newton method – algorithm Assume the function f ( x ), its derivative f ′ ( x ), and an initial guess x 0 are given. Set t = 0. 1 Calculate x t +1 = x t − f ( x t ) f ′ ( x t ) . 2 If the convergence criteria is met, then stop; otherwise, set t ← t + 1 and return to Step 1. Warnings: Convergence depends on choice of x 0 . Unlike bisection, Newton might not converge! 14 / 49

Newton method – theory Claim: If f ′′ is continuous and x ⋆ is a root of f , with f ′ ( x ⋆ ) � = 0, then there exists a neighborhood N of x ⋆ such that Newton’s method converges to x ⋆ for any x 0 ∈ N . Proof uses Taylor approximation — given in class. Proof also shows that the convergence order is quadratic. Other results about Newton’s method are available; see HW. If Newton converges, then it’s faster than bisection, but added speed has a cost: Requires differentiability and the derivative f ′ Convergence is sensitive to choice of x 0 . 15 / 49

Fisher scoring In maximum likelihood applications, the goal is to find roots of the log-likelihood function, i.e., ℓ ′ (ˆ θ ) = 0. In this context, Newton’s method looks like θ t +1 = θ t − ℓ ′ ( θ t ) ℓ ′′ ( θ t ) , t ≥ 0 . But recall that − ℓ ′′ ( θ ) is an approximation to the Fisher information I n ( θ ). So, can rewrite Newton’s method as θ t +1 = θ t + ℓ ′ ( θ t ) I n ( θ t ) , t ≥ 0 . This modification is called Fisher scoring . 17 / 49

Secant method – basic idea Newton’s method requires a formula for f ′ ( x ). To avoid this, approximate f ′ ( x ) at x 0 by a difference ratio . That is, recall from calculus that f ′ ( x ) ≈ f ( x + h ) − f ( x ) , h small and positive . h Then the secant method follows Newton’s method exactly, except we substitute a difference ratio for f ′ ( x ). Name is because the linear approximation is a secant not a tangent line. 19 / 49

Secant method – algorithm Suppose f ( x ) and two initial guesses x 0 , x 1 are given. Set t = 1. 1 Calculate f ( x t ) x t +1 = x t − . f ( x t ) − f ( x t − 1 ) x t − x t − 1 2 If the convergence criteria are satisfied, then stop; else, set t ← t + 1 and return to Step 1. Two initial guesses are needed because the difference ratio in the first iteration requires two values. Can be unstable at early iterations because the difference ratio may be a poor approximation of f ′ ; reasonable sacrifice if f ′ is not available. If secant method converges, order is almost quadratic. 20 / 49

Fixed-point iteration – basic idea Some problems require finding a fixed-point — i.e., a point x ⋆ such that F ( x ⋆ ) = x ⋆ . A root-finding problem can be written as a fixed point problem with F ( x ) = f ( x ) + x . If the function F ( x ) is a contraction , i.e., | F ( x ) − F ( y ) | ≤ λ | x − y | , for λ ∈ (0 , 1) , then the point F ( x ) will be closer to x ⋆ = F ( x ⋆ ) than x . Banach’s fixed-point theorem says: contraction mappings have unique fixed point x ⋆ , and from a starting point x 0 , the iterates x t +1 = F ( x t ) will converge to x ⋆ . 22 / 49

Fixed-point iteration – algorithm Suppose F ( x ) and an initial guess x 0 are given. Set t = 0. 1 Calculate x t +1 = F ( x t ). 2 If convergence criteria are met, then stop; else, set t ← t + 1 and return to Step 1. It can be shown that λ t | F ( x t ) − x ⋆ | ≤ 1 − λ | x 0 − x ⋆ | , so fixed-point iteration converges at a geometric rate. If using fixed-point methods for root-finding, F ( x ) = f ( x ) + x may not be the best choice, e.g., maybe a scaled version would be better. 23 / 49

Example – Kepler’s equation Kepler’s equation in orbital mechanics says x = M + ε sin x , where M and ε ∈ (0 , 1) are fixed quantities. 3 Our goal is to solve for x , given M and ε . Write F ( x ) = M + ε sin x . Then F ′ ( x ) = ε cos x and | F ′ ( x ) | is uniformly bounded by ε . So F is a contraction and fixed-point iteration will converge to a solution to Kepler’s equation. 3 See wikipedia page on Kepler’s equation for more info. 24 / 49

Root-finding and Optimization Ryan Martin UIC - PowerPoint PPT Presentation

Stat 451 Lecture Notes 02 12 Root-finding and Optimization Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaards ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: February 8, 2016 1 / 49

PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO PRESS ROOT TO CONTINUE: PRESS ROOT TO

Root River Fisheries Root River Fisheries Craig Helker Craig Helker WDNR WDNR Root River

Certicate Transparency Root Explorer Nikita Korzhitskii Niklas Carlsson Web Public Key

CSE 421 Divide and Conquer: Finding Root Closest Pair of Points Shayan Oveis Gharan 1 Finding

Line Search 2 Lecture 4 ME EN 575 Andrew Ning aning@byu.edu Outline Root Finding Methods 1D

F root anycast: What, why and how Joo Damas ISC Overview What is a root server? What is

Square Root of Not: Square Root of Not: . . . A Major Difference Between Square Root of

Thoughts on F-Root Futures Jeff Osborn President, Internet Systems Consortium Whats the

Root Cause Analysis 1 Root Cause Analysis Root Cause Analysis is a method that is used to

Bisection Method For Finding Roots Root of function f: Value x such that f(x)=0 Many

Section 2 Solutions of Equations in One Variable (Root-Finding) Numerical Analysis I

Finding your way in a graph Finding your way in a graph Finding your way in a graph Finding your

Finding one root of a polynomial system Smales 17th problem Pierre Lairez Inria Saclay FoCM

BARE ROOT AND BARE ROOT AND CONTAINERIZED FOREST CONTAINERIZED FOREST PLANTS PLANTS PLANTS

Scaling the Root A study of the impact on the DNS root system of increasing the size and

Tutorial on Root Server System Root Server System Advisory Committee | October 2015 Outline 1.

Neural Networks: Optimization Part 1 Intro to Deep Learning, Fall 2017 1 Story so far

Global Calls for Economic Justice: the potential of Islamic finance Mukhtar Hussain Justice

Control and Optimization in Smart Power Grids INCITE Seminar @ Universitat Politcnica de

ECMWF analysis of the AIRS focus-day 20 July 2002 Tony McNally / Phil Watts / Marco Matricardi

One Dimensional Non-Linear Problems Lectures for PHD course on Numerical optimization Enrico

Fast Newton-type Methods for Nonnegative Matrix and Tensor Approximation Inderjit S. Dhillon

Rearrangements of numerical series Marion Scheepers October 13, 2011 Marion Scheepers

Leveraging time integration to increase efficiency and robustness of nonlinear implicit solvers