CS 6316 Machine Learning Review of Linear Algebra and Probability - PowerPoint PPT Presentation

CS 6316 Machine Learning Review of Linear Algebra and Probability Yangfeng Ji Department of Computer Science University of Virginia

Overview 1. Course Information 2. Basic Linear Algebra 3. Probability Theory 4. Statistical Estimation 1

Course Information

Instructors ◮ Yangfeng Ji ◮ Office hour: Wednesday 11 AM - 12 PM ◮ Office: Rice 510 ◮ Hanjie Chen (TA) ◮ Office hour: Tuesday and Thursday 1 PM – 2 PM ◮ Office: Rice 442 ◮ Kai Lin (TA) ◮ Office hour: TBD 3

Goal Understand the basic concepts and models from the computational perspective To ◮ provide a wide coverage of basic topics in machine learning ◮ Example: PAC learning, linear predictors, SVM, boosting, k NN, decision trees, neural networks, etc ◮ discuss a few fundamental concepts in each topic ◮ Example: learnability, generalization, overfitting/underfitting, VC dimension, max margins methods, etc. 4

Textbook Shalev-Shwartz and Ben-David. Understanding Machine Learning: From Theory to Algorithms . 2014 1 1 https: //www.cse.huji.ac.il/~shais/UnderstandingMachineLearning/index.html 5

Outline This course will cover the basic materials on the following topics 1. Learning theory 2. Linear classification and regression 3. Model selection and validation 4. Boosting and support vector machines 5. Neural networks 6. Clustering and dimensionality reduction 6

Outline (II) The following topics will not be the emphasis of this course ◮ Statistical modeling ◮ Statistical Learning and Graphical Models by Farzad Hassanzadeh ◮ Deep learning ◮ Deep Learning for Visual Recognition by Vicente Ordonez-Roman 7

Reference Courses For fans of machine learning: ◮ Shalev-Shwartz. Understanding Machine Learning. 2014 ◮ Mohri. Foundations of Machine Learning. Fall 2018 8

Reference Books For fans of machine learning: ◮ Hastie, Tibshirani, and Friedman. The Elements of Statistical Learning (2nd Edition). 2009 ◮ Murphy. Machine Learning: A Probabilistic Perspective. 2012 ◮ Bishop. Pattern Recognition and Machine Learning. 2006 ◮ Mohri, Rostamizadeh, and Talwalkar. Foundations of Machine Learning. 2nd Edition. 2018 9

Homework and Grading Policy ◮ Homeworks (75%) ◮ Five homeworks, each of them worth 15% ◮ Final project (22%) ◮ Project proposal: 5% ◮ Midterm report: 5% ◮ Final project presentation: 6% ◮ Final project report: 6% ◮ Class attendance (3%): we will take attendance at three randomly-selected lectures. Each is worth 1% 10

Grading Policy The final grade is threshold-based instead of percentage-based 11

Late Penalty ◮ Homework submission will be accepted up to 72 hours late, with 20% deduction per 24 hours on the points as a penalty ◮ It is usually better if students just turn in what they have in time ◮ Submission will not be accepted if more than 72 hours late ◮ Do not submit the wrong homework — late penalty will be applied if resubmit after deadline 12

Violation of the Honor Code Plagiarism, examples are ◮ in a homework submission, copying answers from others directly (even, with some minor changes) ◮ in a report, copying texts from a published paper (even, with some minor changes) ◮ in a code, using someone else’s functions/implementations without acknowledging the contribution 13

Webpages ◮ Course webpage http://yangfengji.net/uva-ml-course/ which contains all the information you need about this course. ◮ Piazza https://piazza.com/virginia/spring2020/cs6316/home 14

Basic Linear Algebra

Linear Equations Consider the following system of equations 4 x 1 − 5 x 2 � − 13 (1) − 2 x 1 + 3 x 2 � 9 In matrix notation, it can be written as a more compact from A x � b (2) with � � � � � � − 5 x 1 − 13 4 A � x � b � (3) − 2 3 x 2 9 16

Basic Notations � � � � � � 4 − 5 x 1 − 13 A � x � b � − 2 3 x 2 9 ◮ A ∈ R m × n : a matrix with m rows and n columns ◮ The element on the i -th row and the j -th column is denoted as a i , j ◮ x ∈ R n : a vector with n entries. By convention, an n -dimensional vector is often thought of as matrix with n rows and 1 column, known as a column vector. ◮ The i -th element is denoted as x i 17

Vector Norms ◮ A norm of a vector � x � is informally a measure of the “length” of the vector. ◮ Formally, a norm is any function f : R n → R that satisfies four properties 1. f ( x ) ≥ 0 for any x ∈ R n 2. f ( x ) � 0 if and only if x � 0 3. f ( a x ) � | a | · f ( x ) for any x ∈ R n 4. f ( x + y ) ≤ f ( x ) + f ( y ) , for any x , y ∈ R n 18

ℓ 2 Norm The ℓ 2 norm of a vector x ∈ R n is defined as � � n � x 2 � x � 2 � (4) i i � 1 y x � x � 2 x Exercise : prove ℓ 2 norm satisfies all four properties 19

ℓ 1 Norms The ℓ 1 norm of a vector x ∈ R n is defined as n � � x � 1 � | x i | (5) i � 1 20

Quiz For a two-dimensional vector x � ( x 1 , x 2 ) ∈ R 2 , which of the following plot is � x � 1 � 1 ? x 2 x 2 x 2 x 1 x 1 x 1 (a) (b) (c) 21

Quiz For a two-dimensional vector x � ( x 1 , x 2 ) ∈ R 2 , which of the following plot is � x � 1 � 1 ? Answer: (b) x 2 x 2 x 2 x 1 x 1 x 1 (d) (e) (f) 21

Dot Product The dot product of x , y ∈ R n is defined as n � � x , y � � x T y � x i y i (6) i � 1 where x T is the transpose of x . ◮ � x � 2 2 � � x , x � ◮ If x � ( 0 , 0 , . . . , , . . . , 0 ) , then � x , y � � y i 1 �� x i ◮ If x is an unit vector ( � x � 2 � 1 ), then � x , y � is the projection of y on the direction of x y x 22

Cauchy-Schwarz Inequality For all x , y ∈ R n |� x , y �| ≤ � x � 2 � y � 2 (7) with equality if and only if x � α y with α ∈ R Proof : y x Let ˜ x � � x � 2 and ˜ y � � y � 2 , then ˜ x and ˜ y are both unit vectors. Based on the geometric interpretation on the previous slide, we have � ˜ x , ˜ y � ≤ 1 (8) if and only if ˜ x � ˜ y . 23

Frobenius Norm The Forbenius norm of a matrix A � [ a i , j ] ∈ R m × n denoted by � · � F is defined as � A � F � � � � � 1 / 2 a 2 (9) i , j i j ◮ The Frobenius norm can be interpreted as the ℓ 2 norm of a vector when treating A as a vector of size mn . 24

Two Special Matrices ◮ The identity matrix, denoted as I ∈ R n × n ] , is a square matrix with ones on the diagonal and zeros everywhere else.   1     ... I � (10)       1   ◮ A diagonal matrix, denoted as D � diag ( d 1 , d 2 , . . . , d n ) , is a matrix where all non-diagonal elements are 0.   d 1     ... D � (11)       d n   25

Inverse The inverse of a square matrix A ∈ R n × n is denoted as A − 1 , which is the unique matrix such that A − 1 A � I � AA − 1 (12) ◮ Non-square matrices do not have inverses (by definition) ◮ Not all square matrices are invertible ◮ The solution of the linear equations in Eq. (1) is x � A − 1 b 26

Orthogonal Matrices ◮ Two vectors x , y ∈ R n are orthogonal if � x , y � � 0 y x ◮ A square matrix U ∈ R n × n is orthogonal, if all its columns are orthogonal to each other and normalized (orthonormal) � u i , u j � � 0 , � u i � � 1 , � u j � � 1 (13) for i , j ∈ [ n ] and i � j ◮ Furthermore, U T U � I � UU T , which further implies U − 1 � U T 27

Symmetric Matrices A symmetric matrix A ∈ R n × n is defined as A T � A (14) or, in other words, a i , j � a j , i ∀ i , j ∈ [ n ] (15) Comments ◮ The identity matrix I is symmetric ◮ A diagonal matrix is symmetric 28

Eigen Decomposition Every symmetric matrix A can be decomposed as A � UΛU T (16) with   λ 1    ...  ◮ Λ � as a diagonal matrix (Slide 25)       λ n   ◮ Q is an orthogonal matrix (Slide 27) ◮ Exercise : if A is invertible, show A − 1 � U Λ − 1 U T with Λ − 1 � diag ( 1 λ 1 , . . . , 1 λ n ) 29

Symmetric Positive Semidefinite Matrices A symmetric matrix P ∈ R n × n is positive semidefinite if and only if x T P x ≥ 0 (17) for all x ∈ R n . 30

Symmetric Positive Semidefinite Matrices A symmetric matrix P ∈ R n × n is positive semidefinite if and only if x T P x ≥ 0 (17) for all x ∈ R n . Eigen decomposition (Slide 29) of P as P � UΛU T (18) with Λ � diag ( λ 1 , . . . , λ n ) and λ i ≥ 0 (19) 30

Symmetric Positive Definite Matrices A symmetric matrix P ∈ R n × n is positive definite if and only if x T P x > 0 (20) for all x ∈ R n . ◮ Eigen values of P , Λ � diag ( λ 1 , . . . , λ n ) with λ i > 0 (21) ◮ Exercise : if one of the eigen values λ i < 0 , show that you can also find a vector x such that x T P x < 0 31

Quiz The identity matrix I is ◮ a diagonal matrix? ◮ a symmetric matrix? ◮ an orthogonal matrix? ◮ a positive (semi-)definite matrix? Further reference [Kolter and Do, 2015] 32

Quiz The identity matrix I is ◮ a diagonal matrix? � ◮ a symmetric matrix? � ◮ an orthogonal matrix? � ◮ a positive (semi-)definite matrix? � Further reference [Kolter and Do, 2015] 32

Probability Theory

CS 6316 Machine Learning Review of Linear Algebra and Probability - PowerPoint PPT Presentation

CS 6316 Machine Learning Review of Linear Algebra and Probability Yangfeng Ji Department of Computer Science University of Virginia Overview 1. Course Information 2. Basic Linear Algebra 3. Probability Theory 4. Statistical Estimation 1

CS 6316 Machine Learning The Bias-Complexity Tradeoff Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Introduction to Learning Theory Yangfeng Ji Department of Computer

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Boosting Yangfeng Ji Department of Computer Science University of

CS 6316 Machine Learning Clustering Yangfeng Ji Department of Computer Science University of

CS 6316 Machine Learning Support Vector Machines and Kernel Meth- ods Yangfeng Ji Department of

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Linear Predictors Yangfeng Ji Department of Computer Science

CS 6316 Machine Learning Model Selection and Validation Yangfeng Ji Department of Computer

CS 6316 Machine Learning Generative Models Yangfeng Ji Department of Computer Science

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Larry Holder School of EECS Washington State University Artificial Intelligence 1 } Full joint

Security alarm system feeling of security or cause for alarm? Kirils Solovjovs

Probabilities and Independence Alice Gao Lecture 10 Based on work by K. Leyton-Brown, K. Larson,

CS 486/686 Lecture 11 Semantics of a Bayesian Network 1 The Holmes scenario Mr. Holmes lives in

For Monday Read chapter 18, sections 1-2 Homework: Chapter 14, exercise 8 a-d Program

Embedded Analytcs and Automotve Security Aileen Smith Chief Strategy Ofcer Corporate Overview

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Provable Efficient Skeleton Learning of Encodable Discrete Bayes Nets in Poly-Time and Sample