compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 18 0

logistics 1 • Problem Set 3 on Spectral Methods due this Friday at 8pm . • Can turn in without penalty until Sunday at 11:59pm.

• Power method is an iterative algorithm for solving the non-convex 1 v T X T X v • More general iterative algorithms for optimization, specifically • What are they methods, when are they applied, and how do you • Small taste of what you can find in COMPSCI 590OP or 690OP. summary 2 analyze their performance? gradient descent and its variants. This Class (and until Thanksgiving): 2 Last Class: v v max optimization problem: computing more singular vectors. 2 • Power method for computing the top singular vector of a matrix. • High level discussion of Krylov methods, block versions for

summary max analyze their performance? gradient descent and its variants. This Class (and until Thanksgiving): Last Class: 2 computing more singular vectors. optimization problem: • Power method for computing the top singular vector of a matrix. • High level discussion of Krylov methods, block versions for • Power method is an iterative algorithm for solving the non-convex ⃗ v T X T X ⃗ v . ⃗ v : ∥ ⃗ v ∥ 2 2 ≤ 1 • More general iterative algorithms for optimization, specifically • What are they methods, when are they applied, and how do you • Small taste of what you can find in COMPSCI 590OP or 690OP.

• Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization possible solutions. Many of these problems are NP-Hard. Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of

• Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of possible solutions. Many of these problems are NP-Hard.

discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization possible solutions. Many of these problems are NP-Hard. Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite

continuous optimization examples 4

• A • 1 T mathematical setup 1. c . i 1 i d 0. T A b , 1 1, 2 • Often under some constraints: Typically up to some small approximation factor. 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) ⃗

• A • 1 T mathematical setup 1. c . i 1 i d 0. T A b , 1 1, 2 • Often under some constraints: Typically up to some small approximation factor. 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) + ϵ ⃗

mathematical setup Typically up to some small approximation factor. b , Often under some constraints: 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) + ϵ ⃗ • ∥ ⃗ ∥ ⃗ θ ∥ 2 ≤ 1, θ ∥ 1 ≤ 1. θ ≤ ⃗ • A ⃗ θ T A ⃗ ⃗ θ ≥ 0. θ = ∑ d 1 T ⃗ i = 1 ⃗ • ⃗ θ ( i ) ≤ c .

• Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6

why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6 • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter

d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i x i R where is some measurement of how far M • M x i y i M optimization in ml y i x i 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) y i 1 M . Model: M d with M x def x 1 x 1 d x d Parameter Vector: Example 1: Linear Regression Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X n d ) and labels y 1 y n , find L X n i 7

d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml y i R where is some measurement of how far M • M x i y i M y i x i M 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) x i i 1 Optimization Problem: Given data points (training points) x 1 def 1 x 1 d x d Example 1: Linear Regression Parameter Vector: . x n L n (the rows of data matrix X X 7 , find y n d ) and labels y 1 n θ : R d → R with M ⃗ = ⟨ ⃗ θ ( ⃗ θ,⃗ x ) x ⟩ Model: M ⃗

d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml R where is some measurement of how far M • M x i y i M x i y i x i 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) y i 1 M n def Example 1: Linear Regression Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X Parameter Vector: d ) and labels y 1 n i y n 7 X L , find θ : R d → R with M ⃗ = ⟨ ⃗ x ⟩ = ⃗ x ( 1 ) + . . . + ⃗ θ ( ⃗ θ,⃗ θ ( 1 ) · ⃗ θ ( d ) · ⃗ x ) x ( d ) . Model: M ⃗

minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml M y i R where is some measurement of how far M • M x i y i y i x i M 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) x i 1 Example 1: Linear Regression n def i Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X 7 d ) and labels y 1 , find L X n y n θ : R d → R with M ⃗ = ⟨ ⃗ x ⟩ = ⃗ x ( 1 ) + . . . + ⃗ θ ( ⃗ θ,⃗ θ ( 1 ) · ⃗ θ ( d ) · ⃗ x ) x ( d ) . Model: M ⃗ Parameter Vector: ⃗ θ ∈ R d (the regression coefficients)

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 18 0 logistics 1 Problem Set 3 on Spectral Methods due this Friday at 8pm . Can turn in without penalty until Sunday at

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Prof. Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

MA/CSSE 473 Day 40 Problems Decision Problems P and NP MA/CSSE 473 Day 40 HW 15 Due at

MP scheduling is difficult The simple fact that a task can use only one processor even when

Approximation Algorithms Subset Sum III Instance : X = { x 1 , . . . , x n } n integer

Multi-Resource Packing for Cluster Schedulers CS6453: Johan Bjrck The problem Tasks in modern

Theory of Computer Science June 1, 2016 E5. Some NP-Complete Problems, Part II Theory of

CSC2/458 Parallel and Distributed Systems PPMI: Basic Building Blocks Sreepathi Pai February 13,

Flexible VNE Algorithms Analysis using ALEVIN Juan Felipe Botero, Xavier Hesselbach, Michael

CS137: Today Electronic Design Automation Scheduling Force-Directed SAT/ILP Day