compsci 514 algorithms for data science
play

compsci 514: algorithms for data science Cameron Musco University - PowerPoint PPT Presentation

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 18 0 logistics 1 Problem Set 3 on Spectral Methods due this Friday at 8pm . Can turn in without penalty until Sunday at


  1. compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall 2019. Lecture 18 0

  2. logistics 1 • Problem Set 3 on Spectral Methods due this Friday at 8pm . • Can turn in without penalty until Sunday at 11:59pm.

  3. • Power method is an iterative algorithm for solving the non-convex 1 v T X T X v • More general iterative algorithms for optimization, specifically • What are they methods, when are they applied, and how do you • Small taste of what you can find in COMPSCI 590OP or 690OP. summary 2 analyze their performance? gradient descent and its variants. This Class (and until Thanksgiving): 2 Last Class: v v max optimization problem: computing more singular vectors. 2 • Power method for computing the top singular vector of a matrix. • High level discussion of Krylov methods, block versions for

  4. summary max analyze their performance? gradient descent and its variants. This Class (and until Thanksgiving): Last Class: 2 computing more singular vectors. optimization problem: • Power method for computing the top singular vector of a matrix. • High level discussion of Krylov methods, block versions for • Power method is an iterative algorithm for solving the non-convex ⃗ v T X T X ⃗ v . ⃗ v : ∥ ⃗ v ∥ 2 2 ≤ 1 • More general iterative algorithms for optimization, specifically • What are they methods, when are they applied, and how do you • Small taste of what you can find in COMPSCI 590OP or 690OP.

  5. • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization possible solutions. Many of these problems are NP-Hard. Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of

  6. • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of possible solutions. Many of these problems are NP-Hard.

  7. discrete vs. continuous optimization Discrete (Combinatorial) Optimization: (traditional CS algorithms) maximum independent set, traveling salesman problem scheduling, sequence alignment, submodular maximization possible solutions. Many of these problems are NP-Hard. Continuous Optimization: (not covered in core CS curriculum. Touched on in ML/advanced algorithms, maybe.) programming 3 • Graph Problems: min-cut, max flow, shortest path, matchings, • Problems with discrete constraints or outputs: bin-packing, • Generally searching over a finite but exponentially large set of • Unconstrained convex and non-convex optimization. • Linear programming, quadratic programming, semidefinite

  8. continuous optimization examples 4

  9. continuous optimization examples 4

  10. • A • 1 T mathematical setup 1. c . i 1 i d 0. T A b , 1 1, 2 • Often under some constraints: Typically up to some small approximation factor. 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) ⃗

  11. • A • 1 T mathematical setup 1. c . i 1 i d 0. T A b , 1 1, 2 • Often under some constraints: Typically up to some small approximation factor. 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) + ϵ ⃗

  12. mathematical setup Typically up to some small approximation factor. b , Often under some constraints: 5 Given some function f : R d → R , find ⃗ θ ⋆ with: f ( ⃗ θ ∈ R d f ( ⃗ θ ⋆ ) = min θ ) + ϵ ⃗ • ∥ ⃗ ∥ ⃗ θ ∥ 2 ≤ 1, θ ∥ 1 ≤ 1. θ ≤ ⃗ • A ⃗ θ T A ⃗ ⃗ θ ≥ 0. θ = ∑ d 1 T ⃗ i = 1 ⃗ • ⃗ θ ( i ) ≤ c .

  13. • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6

  14. • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6

  15. why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6 • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter

  16. why continuous optimization? Modern machine learning centers around continuous optimization. Typical Set Up: (supervised machine learning) (neural network, linear function, low-degree polynomial etc). neural network, coefficients in a linear function or polynomial) vector such that the model does a good job mapping inputs to predictions on your training data. This training step is typically formulated as a continuous optimization problem. 6 • Have a model, which is a function mapping inputs to predictions • The model is parameterized by a parameter vector (weights in a • Want to train this model on input data, by picking a parameter

  17. d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i x i R where is some measurement of how far M • M x i y i M optimization in ml y i x i 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) y i 1 M . Model: M d with M x def x 1 x 1 d x d Parameter Vector: Example 1: Linear Regression Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X n d ) and labels y 1 y n , find L X n i 7

  18. d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml y i R where is some measurement of how far M • M x i y i M y i x i M 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) x i i 1 Optimization Problem: Given data points (training points) x 1 def 1 x 1 d x d Example 1: Linear Regression Parameter Vector: . x n L n (the rows of data matrix X X 7 , find y n d ) and labels y 1 n θ : R d → R with M ⃗ = ⟨ ⃗ θ ( ⃗ θ,⃗ x ) x ⟩ Model: M ⃗

  19. d (the regression coefficients) minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml R where is some measurement of how far M • M x i y i M x i y i x i 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) y i 1 M n def Example 1: Linear Regression Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X Parameter Vector: d ) and labels y 1 n i y n 7 X L , find θ : R d → R with M ⃗ = ⟨ ⃗ x ⟩ = ⃗ x ( 1 ) + . . . + ⃗ θ ( ⃗ θ,⃗ θ ( 1 ) · ⃗ θ ( d ) · ⃗ x ) x ( d ) . Model: M ⃗

  20. minimizing the loss function: x i is from y i . 2 (least squares regression) • y i optimization in ml M y i R where is some measurement of how far M • M x i y i y i x i M 1 1 and M x i y i ln 1 exp y i M x i (logistic regression) x i 1 Example 1: Linear Regression n def i Optimization Problem: Given data points (training points) x 1 x n (the rows of data matrix X 7 d ) and labels y 1 , find L X n y n θ : R d → R with M ⃗ = ⟨ ⃗ x ⟩ = ⃗ x ( 1 ) + . . . + ⃗ θ ( ⃗ θ,⃗ θ ( 1 ) · ⃗ θ ( d ) · ⃗ x ) x ( d ) . Model: M ⃗ Parameter Vector: ⃗ θ ∈ R d (the regression coefficients)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend