15-780 Graduate Artificial Intelligence: Optimization J. Zico - PowerPoint PPT Presentation

15-780 – Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2017 1

Outline Introduction to optimization Types of optimization problems, convexity Solving optimization problems 2

Logistics HW0, some unintentional ambiguity about “no late days” criteria To be clear, in all future assignments, the policy is: You have 5 late days, no more than 2 on any assignment If you use up your five late days, you will receive 20% off per day for these two days If you submit any homework more than 2 days late, you will receive zero credit All homework, both programming and written portions, must be written up independently All students who submitted HW0 have been taken off waitlist 3

Continuous optimization The problems we have seen so far (i.e., search) in class involve making decisions over a discrete space of choices An amazing property: Discrete search (Convex) optimization Variables Discrete Continuous # Solutions Finite Infinite Solution complexity Exponential Polynomial One of the most significant trends in AI in the past 15 years has been the integration of optimization methods throughout the field 5

Optimization definitions We’ll write optimization problems like this: minimize 𝑔 ( 𝑦 ) 푥 subject to 𝑦 ∈ 𝒟 which should be interpreted to mean: we want to find the value of 𝑦 that achieves the smallest possible value of 𝑔 ( 𝑦 ) , out of all points in 𝒟 Important terms: 𝑦 ∈ ℝ 푛 – optimization variable (vector with 𝑜 real-valued entries) 𝑔 : ℝ 푛 → ℝ – optimization objective 𝒟 ⊆ ℝ 푛 – constraint set 𝑦 ⋆ ≡ argmin 𝑔 ( 𝑦 ) – optimal solution 푥∈풞 𝑔 ⋆ ≡ 𝑔 𝑦 ⋆ ≡ min 푥∈풞 𝑔 ( 𝑦 ) – optimal objective 6

Example: Weber point Given a collection of cities (assume on 2D plane) how can we find the location that minimizes the sum of distances to all cities? Denote the locations of the cities as ? 𝑧 1 , … , 𝑧 푚 Write as the optimization problem: 푚 ∑ 𝑦 − 𝑧 푚 minimize 2 푥 푖 = 1 7

Example: image deblurring (a) Original image. (b) Blurry, noisy image. (c) Restored image. Figure from (O’Connor and Vandenberghe, 2014) Given corrupted image 𝑍 ∈ ℝ 푚×푛 , reconstruct image by solving optimization problem: � � 1 2 + 𝑌 푖 + 1,푗 − 𝑌 푖푗 2 2 minimize ∑ 𝑍 푖푗 − 𝐿 ∗ 𝑌 푖푗 + 𝜇 ∑ 𝑌 푖푗 − 𝑌 푖,푗 + 1 푋 푖,푗 푖,푗 where 𝐿 ∗ denotes convolution with a blurring filter 8

𝑒 𝑠 Example: robot trajectory planning Many robotic planning tasks are more complex than shortest path, e.g. have robot dynamics, require “smooth” controls Common to formulate planning problem as an optimization task Robot state 𝑦 푡 and control inputs 𝑣 푡 푇 2 minimize ∑ 𝑣 푡 2 푥 1:푇 ,푢 1:푇−1 푖 = 1 Figure from (Schulman et al., 2014) subject to 𝑦 푡 + 1 = 𝑔 dynamics 𝑦 푡 , 𝑣 푡 𝑦 푡 ∈ FreeSpace, ∀𝑢 𝑦 1 = 𝑦 init , 𝑦 푇 = 𝑦 goal 9

Example: machine learning As we will see in much more detail shortly, virtually all (supervised) machine learning algorithms boil down to solving an optimization problem 푚 ∑ ℓ ℎ 휃 𝑦 푖 , 𝑧 푖 minimize 휃 푖 = 1 Where 𝑦 푖 ∈ 𝒴 are inputs, 𝑧 푖 ∈ 𝒵 are outputs, ℓ is a loss function, ad ℎ 휃 is a hypothesis function parameterized by 𝜄 , which are the parameters of the model we are optimizing over Much more on this soon 10

The benefit of optimization One of the key benefits of looking at problems in AI as optimization problems: we separate out the definition of the problem from the method for solving it For many classes of problems, there are off-the-shelf solvers that will let you solve even large, complex problems, once you have put them in the right form 11

Classes of optimization problems Many different names for types of optimization problems: linear programming, quadratic programming, nonlinear programming, semidefinite programming, integer programming, geometric programming, mixed linear binary integer programming (the list goes on and on, can all get a bit confusing) We’re instead going to focus on two dimensions: convex vs. nonconvex and constrained vs. unconstrained Constrained Integer programming Linear programming Deep learning Unconstrained Most machine learning Convex Nonconvex 13

Constrained vs. unconstrained C x ⋆ x 1 x 1 x ⋆ x 2 x 2 minimize 𝑔 ( 𝑦 ) minimize 𝑔 ( 𝑦 ) 푥 푥 subject to 𝑦 ∈ 𝒟 In unconstrained optimization, every point 𝑦 ∈ ℝ 푛 is feasible, so singular focus is on minimizing 𝑔 ( 𝑦 ) In contrast, for constrained optimization, it may be difficult to even find a point 𝑦 ∈ 𝒟 Often leads to very different methods for optimization (more next lecture) 14

Convex vs. nonconvex optimization f 1 ( x ) f 2 ( x ) Convex function Nonconvex function Originally, researchers distinguished between linear (easy) and nonlinear (hard) problems But in 80s and 90s, it became clear that this wasn’t the right distinction, key difference is between convex and nonconvex problems Convex problem: minimize 𝑔 ( 𝑦 ) 푥 subject to 𝑦 ∈ 𝒟 Where 𝑔 is a convex function and 𝒟 is a convex set 15

Convex sets . . . A set 𝒟 is convex if, for any 𝑦 , 𝑧 ∈ 𝒟 and 0 ≤ 𝜄 ≤ 1 𝜄𝑦 + 1 − 𝜄 𝑧 ∈ 𝒟 Nonconvex set Convex set Examples: All points 𝒟 = ℝ 푛 Intervals 𝒟 = { 𝑦 ∈ ℝ 푛 | 𝑚 ≤ 𝑦 ≤ 𝑣 } (elementwise inequality) 𝑦 ∈ ℝ 푛 𝐵𝑦 = 𝑐 } (for 𝐵 ∈ ℝ 푚×푛 , 𝑐 ∈ ℝ 푚 ) Linear equalities 𝒟 = 푚 Intersection of convex sets 𝒟 = ⋂ 𝒟 푖 푖 = 1 16

Convex functions A function 𝑔 : ℝ 푛 → ℝ is convex if, for any 𝑦 , 𝑧 ∈ ℝ 푛 and 0 ≤ 𝜄 ≤ 1 𝑔 𝜄𝑦 + 1 − 𝜄 𝑧 ≤ 𝜄𝑔 𝑦 + 1 − 𝜄 𝑔 𝑧 ( y, f ( y )) ( x, f ( x )) Convex functions “curve upwards” (or at least not downwards) If 𝑔 is convex then −𝑔 is concave If 𝑔 is both convex and concave, it is affine, must be of form: 푛 𝑔 𝑦 = ∑ 𝑏 푖 𝑦 푖 + 𝑐 푖 = 1 17

Examples of convex functions Exponential: 𝑔 𝑦 = exp 𝑏𝑦 , 𝑏 ∈ ℝ Negative logarithm: 𝑔 𝑦 = − log 𝑦 , with domain 𝑦 > 0 2 ≡ 𝑦 푇 𝑦 ≡ ∑ 푛 Squared Euclidean norm: 𝑔 𝑦 = 𝑦 2 2 𝑦 푖 푖 = 1 Euclidean norm: 𝑔 𝑦 = 𝑦 2 Non-negative weighted sum of convex functions 푚 𝑔 𝑦 = ∑ 𝑥 푖 𝑔 푖 ( 𝑦 ) , 𝑥 푖 ≥ 0, 𝑔 푖 convex 푖 = 1 18

Poll: convex sets and functions Which of the following functions or sets are convex 1. A union of two convex sets 𝒟 = 𝒟 1 ∪ 𝒟 2 The set 𝑦 ∈ ℝ 2 𝑦 ≥ 0, 𝑦 1 𝑦 2 ≥ 1} 2. The function 𝑔 : ℝ 2 → ℝ , 𝑔 𝑦 = 𝑦 1 𝑦 2 3. The function 𝑔 : ℝ 2 → ℝ , 𝑔 𝑦 = 𝑦 1 2 + 𝑦 2 2 + 𝑦 1 𝑦 2 4. 19

Convex optimization The key aspect of convex optimization problems that make them tractable is that all local optima are global optima Definition: a point 𝑦 is globally optimal if 𝑦 is feasible and there is no feasible 𝑧 such that 𝑔 𝑧 < 𝑔 𝑦 Definition: a point x is locally optimal if 𝑦 is feasible and there is some 𝑆 > 0 such that for all feasible 𝑧 with 𝑦 − 𝑧 2 ≤ 𝑆 , 𝑔 𝑦 ≤ 𝑔 𝑧 Theorem: for a convex optimization problem all locally optimal points are globally optimal 20

Proof of global optimality Proof: Given a locally optimal 𝑦 (with optimality radius 𝑆 ), and suppose there exists some feasible 𝑧 such that 𝑔 𝑧 < 𝑔 𝑦 Now consider the point 𝑆 𝑨 = 𝜄𝑦 + 1 − 𝜄 𝑧 , 𝜄 = 1 − 2 𝑦 − 𝑧 2 1) Since 𝑦 , 𝑧 ∈ 𝒟 (feasible set), we also have 𝑨 ∈ 𝒟 (by convexity of 𝒟 ) 2) Furthermore, since 𝑔 is convex: and 𝑔 𝑨 = 𝑔 𝜄𝑦 + 1 − 𝜄 𝑧 ≤ 𝜄𝑔 𝑦 + 1 − 𝜄 𝑔 𝑧 < 𝑔 𝑦 푅 푥−푦 푅 푅 2 푥−푦 2 2 = 푅 𝑦 − 𝑨 2 = 𝑦 − 1 − 2 푥−푦 2 𝑦 + 2 푥−푦 2 𝑧 2 = 2 Thus, 𝑨 is feasible, within radius 𝑆 of 𝑦 , and has lower objective value, a contradiction of supposed local optimality of 𝑦 ∎ 21

The gradient A key concept in solving optimization problems is the notation of the gradient of a function (multi-variate analogue of derivative) For 𝑔 : ℝ 푛 → ℝ , gradient is defined as vector of partial derivatives 𝜖𝑔 𝑦 𝜖𝑦 1 ∇ x f ( x ) 𝜖𝑔 𝑦 x 1 ∇ 푥 𝑔 𝑦 ∈ ℝ 푛 = 𝜖𝑦 2 ⋮ 𝜖𝑔 𝑦 𝜖𝑦 푛 x 2 Points in “steepest direction” of increase in function 𝑔 23

15-780 Graduate Artificial Intelligence: Optimization J. Zico - PowerPoint PPT Presentation

15-780 Graduate Artificial Intelligence: Optimization J. Zico Kolter (this lecture) and Ariel Procaccia Carnegie Mellon University Spring 2017 1 Outline Introduction to optimization Types of optimization problems, convexity Solving

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

15-780 Graduate Artificial Intelligence: Adversarial attacks and provable defenses J. Zico

15-780 - graduate artificial intelligence ai and education i . Shayan Doroudi April 24, 2017 1

15-780 Graduate Artificial Intelligence: Probabilistic modeling J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Integer programming J. Zico Kolter (this lecture)

15-780 Graduate Artificial Intelligence: Machine learning J. Zico Kolter (this lecture) and

15-780 Graduate Artificial Intelligence: Probabilistic inference J. Zico Kolter (this

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

15-780 - graduate artificial intelligence ai and education iii . Shayan Doroudi May 1, 2017 1

Heuristic Algorithms for Multiobjective Combinatorial Optimization Adapted from a tutorial by

Efficient Black-Box Combinatorial Optimization Hamid Dadkhahi Karthikeyan Shanmugam Jesus Rios

Predicate Detection to Solve Combinatorial Optimization Problems Vijay K. Garg Parallel and

Foundations of Artificial Intelligence 20. Combinatorial Optimization: Introduction and

node2vec: Scalable Feature Learning for Networks Aditya Grover, Jure Leskovec Farzaneh Heidari

Neural Program Synthesis with Priority Queue Training Daniel A. Abolafia, Mohammad Norouzi,

and Analysis of Decision Trees Mikhail Moshkov King Abdullah University of Science and Technology

Counting the Optimal Solutions in Graphical Models Rina Dechter Radu Marinescu University of

Sambuz

Useful Links

Newsletter

Mail Us