15 780 graduate artificial intelligence optimization
play

15-780 Graduate Artificial Intelligence: Optimization J. Zico - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2017 1 Outline Introduction to optimization Types of optimization problems, convexity Solving


  1. 15-780 – Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2017 1

  2. Outline Introduction to optimization Types of optimization problems, convexity Solving optimization problems 2

  3. Logistics HW0, some unintentional ambiguity about “no late days” criteria To be clear, in all future assignments, the policy is: You have 5 late days, no more than 2 on any assignment If you use up your five late days, you will receive 20% off per day for these two days If you submit any homework more than 2 days late, you will receive zero credit All homework, both programming and written portions, must be written up independently All students who submitted HW0 have been taken off waitlist 3

  4. Outline Introduction to optimization Types of optimization problems, convexity Solving optimization problems 4

  5. Continuous optimization The problems we have seen so far (i.e., search) in class involve making decisions over a discrete space of choices An amazing property: Discrete search (Convex) optimization Variables Discrete Continuous # Solutions Finite Infinite Solution complexity Exponential Polynomial One of the most significant trends in AI in the past 15 years has been the integration of optimization methods throughout the field 5

  6. Optimization definitions We’ll write optimization problems like this: minimize 𝑔 ( 𝑦 ) 푥 subject to 𝑦 ∈ 𝒟 which should be interpreted to mean: we want to find the value of 𝑦 that achieves the smallest possible value of 𝑔 ( 𝑦 ) , out of all points in 𝒟 Important terms: 𝑦 ∈ ℝ 푛 – optimization variable (vector with 𝑜 real-valued entries) 𝑔 : ℝ 푛 → ℝ – optimization objective 𝒟 ⊆ ℝ 푛 – constraint set 𝑦 ⋆ ≡ argmin 𝑔 ( 𝑦 ) – optimal solution 푥∈풞 𝑔 ⋆ ≡ 𝑔 𝑦 ⋆ ≡ min 푥∈풞 𝑔 ( 𝑦 ) – optimal objective 6

  7. Example: Weber point Given a collection of cities (assume on 2D plane) how can we find the location that minimizes the sum of distances to all cities? Denote the locations of the cities as ? 𝑧 1 , … , 𝑧 푚 Write as the optimization problem: 푚 ∑ 𝑦 − 𝑧 푚 minimize 2 푥 푖 = 1 7

  8. Example: image deblurring (a) Original image. (b) Blurry, noisy image. (c) Restored image. Figure from (O’Connor and Vandenberghe, 2014) Given corrupted image 𝑍 ∈ ℝ 푚×푛 , reconstruct image by solving optimization problem: � � 1 2 + 𝑌 푖 + 1,푗 − 𝑌 푖푗 2 2 minimize ∑ 𝑍 푖푗 − 𝐿 ∗ 𝑌 푖푗 + 𝜇 ∑ 𝑌 푖푗 − 𝑌 푖,푗 + 1 푋 푖,푗 푖,푗 where 𝐿 ∗ denotes convolution with a blurring filter 8

  9. 𝑒 𝑠 Example: robot trajectory planning Many robotic planning tasks are more complex than shortest path, e.g. have robot dynamics, require “smooth” controls Common to formulate planning problem as an optimization task Robot state 𝑦 푡 and control inputs 𝑣 푡 푇 2 minimize ∑ 𝑣 푡 2 푥 1:푇 ,푢 1:푇−1 푖 = 1 Figure from (Schulman et al., 2014) subject to 𝑦 푡 + 1 = 𝑔 dynamics 𝑦 푡 , 𝑣 푡 𝑦 푡 ∈ FreeSpace, ∀𝑢 𝑦 1 = 𝑦 init , 𝑦 푇 = 𝑦 goal 9

  10. Example: machine learning As we will see in much more detail shortly, virtually all (supervised) machine learning algorithms boil down to solving an optimization problem 푚 ∑ ℓ ℎ 휃 𝑦 푖 , 𝑧 푖 minimize 휃 푖 = 1 Where 𝑦 푖 ∈ 𝒴 are inputs, 𝑧 푖 ∈ 𝒵 are outputs, ℓ is a loss function, ad ℎ 휃 is a hypothesis function parameterized by 𝜄 , which are the parameters of the model we are optimizing over Much more on this soon 10

  11. The benefit of optimization One of the key benefits of looking at problems in AI as optimization problems: we separate out the definition of the problem from the method for solving it For many classes of problems, there are off-the-shelf solvers that will let you solve even large, complex problems, once you have put them in the right form 11

  12. Outline Introduction to optimization Types of optimization problems, convexity Solving optimization problems 12

  13. Classes of optimization problems Many different names for types of optimization problems: linear programming, quadratic programming, nonlinear programming, semidefinite programming, integer programming, geometric programming, mixed linear binary integer programming (the list goes on and on, can all get a bit confusing) We’re instead going to focus on two dimensions: convex vs. nonconvex and constrained vs. unconstrained Constrained Integer programming Linear programming Deep learning Unconstrained Most machine learning Convex Nonconvex 13

  14. Constrained vs. unconstrained C x ⋆ x 1 x 1 x ⋆ x 2 x 2 minimize 𝑔 ( 𝑦 ) minimize 𝑔 ( 𝑦 ) 푥 푥 subject to 𝑦 ∈ 𝒟 In unconstrained optimization, every point 𝑦 ∈ ℝ 푛 is feasible, so singular focus is on minimizing 𝑔 ( 𝑦 ) In contrast, for constrained optimization, it may be difficult to even find a point 𝑦 ∈ 𝒟 Often leads to very different methods for optimization (more next lecture) 14

  15. Convex vs. nonconvex optimization f 1 ( x ) f 2 ( x ) Convex function Nonconvex function Originally, researchers distinguished between linear (easy) and nonlinear (hard) problems But in 80s and 90s, it became clear that this wasn’t the right distinction, key difference is between convex and nonconvex problems Convex problem: minimize 𝑔 ( 𝑦 ) 푥 subject to 𝑦 ∈ 𝒟 Where 𝑔 is a convex function and 𝒟 is a convex set 15

  16. Convex sets . . . A set 𝒟 is convex if, for any 𝑦 , 𝑧 ∈ 𝒟 and 0 ≤ 𝜄 ≤ 1 𝜄𝑦 + 1 − 𝜄 𝑧 ∈ 𝒟 Nonconvex set Convex set Examples: All points 𝒟 = ℝ 푛 Intervals 𝒟 = { 𝑦 ∈ ℝ 푛 | 𝑚 ≤ 𝑦 ≤ 𝑣 } (elementwise inequality) 𝑦 ∈ ℝ 푛 𝐵𝑦 = 𝑐 } (for 𝐵 ∈ ℝ 푚×푛 , 𝑐 ∈ ℝ 푚 ) Linear equalities 𝒟 = 푚 Intersection of convex sets 𝒟 = ⋂ 𝒟 푖 푖 = 1 16

  17. Convex functions A function 𝑔 : ℝ 푛 → ℝ is convex if, for any 𝑦 , 𝑧 ∈ ℝ 푛 and 0 ≤ 𝜄 ≤ 1 𝑔 𝜄𝑦 + 1 − 𝜄 𝑧 ≤ 𝜄𝑔 𝑦 + 1 − 𝜄 𝑔 𝑧 ( y, f ( y )) ( x, f ( x )) Convex functions “curve upwards” (or at least not downwards) If 𝑔 is convex then −𝑔 is concave If 𝑔 is both convex and concave, it is affine, must be of form: 푛 𝑔 𝑦 = ∑ 𝑏 푖 𝑦 푖 + 𝑐 푖 = 1 17

  18. Examples of convex functions Exponential: 𝑔 𝑦 = exp 𝑏𝑦 , 𝑏 ∈ ℝ Negative logarithm: 𝑔 𝑦 = − log 𝑦 , with domain 𝑦 > 0 2 ≡ 𝑦 푇 𝑦 ≡ ∑ 푛 Squared Euclidean norm: 𝑔 𝑦 = 𝑦 2 2 𝑦 푖 푖 = 1 Euclidean norm: 𝑔 𝑦 = 𝑦 2 Non-negative weighted sum of convex functions 푚 𝑔 𝑦 = ∑ 𝑥 푖 𝑔 푖 ( 𝑦 ) , 𝑥 푖 ≥ 0, 𝑔 푖 convex 푖 = 1 18

  19. Poll: convex sets and functions Which of the following functions or sets are convex 1. A union of two convex sets 𝒟 = 𝒟 1 ∪ 𝒟 2 The set 𝑦 ∈ ℝ 2 𝑦 ≥ 0, 𝑦 1 𝑦 2 ≥ 1} 2. The function 𝑔 : ℝ 2 → ℝ , 𝑔 𝑦 = 𝑦 1 𝑦 2 3. The function 𝑔 : ℝ 2 → ℝ , 𝑔 𝑦 = 𝑦 1 2 + 𝑦 2 2 + 𝑦 1 𝑦 2 4. 19

  20. Convex optimization The key aspect of convex optimization problems that make them tractable is that all local optima are global optima Definition: a point 𝑦 is globally optimal if 𝑦 is feasible and there is no feasible 𝑧 such that 𝑔 𝑧 < 𝑔 𝑦 Definition: a point x is locally optimal if 𝑦 is feasible and there is some 𝑆 > 0 such that for all feasible 𝑧 with 𝑦 − 𝑧 2 ≤ 𝑆 , 𝑔 𝑦 ≤ 𝑔 𝑧 Theorem: for a convex optimization problem all locally optimal points are globally optimal 20

  21. Proof of global optimality Proof: Given a locally optimal 𝑦 (with optimality radius 𝑆 ), and suppose there exists some feasible 𝑧 such that 𝑔 𝑧 < 𝑔 𝑦 Now consider the point 𝑆 𝑨 = 𝜄𝑦 + 1 − 𝜄 𝑧 , 𝜄 = 1 − 2 𝑦 − 𝑧 2 1) Since 𝑦 , 𝑧 ∈ 𝒟 (feasible set), we also have 𝑨 ∈ 𝒟 (by convexity of 𝒟 ) 2) Furthermore, since 𝑔 is convex: and 𝑔 𝑨 = 𝑔 𝜄𝑦 + 1 − 𝜄 𝑧 ≤ 𝜄𝑔 𝑦 + 1 − 𝜄 𝑔 𝑧 < 𝑔 𝑦 푅 푥−푦 푅 푅 2 푥−푦 2 2 = 푅 𝑦 − 𝑨 2 = 𝑦 − 1 − 2 푥−푦 2 𝑦 + 2 푥−푦 2 𝑧 2 = 2 Thus, 𝑨 is feasible, within radius 𝑆 of 𝑦 , and has lower objective value, a contradiction of supposed local optimality of 𝑦 ∎ 21

  22. Outline Introduction to optimization Types of optimization problems, convexity Solving optimization problems 22

  23. The gradient A key concept in solving optimization problems is the notation of the gradient of a function (multi-variate analogue of derivative) For 𝑔 : ℝ 푛 → ℝ , gradient is defined as vector of partial derivatives 𝜖𝑔 𝑦 𝜖𝑦 1 ∇ x f ( x ) 𝜖𝑔 𝑦 x 1 ∇ 푥 𝑔 𝑦 ∈ ℝ 푛 = 𝜖𝑦 2 ⋮ 𝜖𝑔 𝑦 𝜖𝑦 푛 x 2 Points in “steepest direction” of increase in function 𝑔 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend