Linear Algebra for Machine Learning Sargur N. Srihari - PowerPoint PPT Presentation

Deep Learning Srihari Linear Algebra for Machine Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1

Deep Learning Srihari Overview • Linear Algebra is based on continuous math rather than discrete math – Computer scientists have little experience with it • Essential for understanding ML algorithms • Here we discuss: – Discuss scalars, vectors, matrices, tensors – Multiplying matrices/vectors – Inverse, Span, Linear Independence – SVD, PCA 2

Deep Learning Srihari Scalar • Single number • Represented in lower-case italic x – E.g., let be the slope of the line x ∈ ! • Defining a real-valued scalar n ∈ ! – E.g., let be the number of units • Defining a natural number scalar 3

Deep Learning Srihari Vector • An array of numbers • Arranged in order • Each no. identified by an index • Vectors are shown in lower-case bold ⎡ ⎤ x 1 ⎢ ⎥ ⎢ ⎥ x 2 ⎢ ⎥ ⇒ x T = x 1 ,x 2 ,..x n ⎡ ⎤ x = ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ ⎥ x n ⎢ ⎥ ⎣ ⎦ • If each element is in R then x is in R n • We think of vectors as points in space 4 – Each element gives coordinate along an axis

Deep Learning Srihari Matrix • 2-D array of numbers • Each element identified by two indices • Denoted by bold typeface A • Elements indicated as A m,n ⎡ ⎤ – E.g., A 1,1 A 1,2 ⎢ ⎥ A = ⎢ ⎥ A 2,1 A 2,2 ⎣ ⎦ • A i : is i th row of A , A :j is j th column of A • If A has shape of height m and width n with real-values then A ∈ ! m × n 5

Deep Learning Srihari Tensor • Sometimes need an array with more than two axes • An array arranged on a regular grid with variable number of axes is referred to as a tensor • Denote a tensor with bold typeface: A • Element (i,j,k) of tensor denoted by A i,j,k 6

Deep Learning Srihari Transpose of a Matrix • Mirror image across principal diagonal ⎡ ⎤ ⎡ ⎤ A 1,1 A 1,2 A 1,3 A 1,1 A 2,1 A 3,1 ⎢ ⎥ ⎢ ⎥ ⇒ A T = ⎢ ⎥ ⎢ ⎥ A = A 2,1 A 2,2 A 2,3 A 1,2 A 2,2 A 3,2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A 3,1 A 3,2 A 3,3 A 1,3 A 2,3 A 3,3 ⎣ ⎦ ⎣ ⎦ • Vectors are matrices with a single column – Often written in-line using transpose x = [ x 1 ,..,x n ] T • Since a scalar is a matrix with one element a=a T 7

Deep Learning Srihari Matrix Addition • If A and B have same shape (height m , width n ) C = A + B ⇒ C i , j = A i , j + B i , j • A scalar can be added to a matrix or multiplied by a scalar D = aB + c ⇒ D i , j = aB i , j + c • Vector added to matrix (non-standard matrix algebra) C = A+b ⇒ C i , j = A i , j + b j – Called broadcasting since vector b is added to each 8 row of A

Deep Learning Srihari Multiplying Matrices • For product C=AB to be defined, A has to have the same no. of columns as the no. of rows of B • If A is of shape m x n and B is of shape n x p then matrix product C is of shape m x p ∑ C = AB ⇒ C i , j = A i , k B k , j k • Note that the standard product of two matrices is not just the product of two individual elements 9

Deep Learning Srihari Multiplying Vectors • Dot product of two vectors x and y of same dimensionality is the matrix product x T y • Conversely, matrix product C=AB can be viewed as computing C ij the dot product of row i of A and column j of B 10

Deep Learning Srihari Matrix Product Properties • Distributivity over addition: A(B+C)=AB+AC • Associativity: A(BC)=(AB)C • Not commutative: AB=BA is not always true • Dot product between vectors is commutative: x T y=y T x • Transpose of a matrix product has a simple form: (AB) T =B T A T 11

Deep Learning Srihari Linear Transformation • A x = b – where and A ∈ ! n × n b ∈ ! n – More explicitly A 11 x 1 + A 12 x 2 +....+ A 1n x n = b n equations in 1 A 2 1 x 1 + A 2 2 x 2 +....+ A 2 n x n = b 2 n unknowns A n 1 x 1 + A m 2 x 2 +....+ A n , n x n = b n ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ A 1,1 A 1, n ! x 1 b Can view A as a linear transformation ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A = " " " x = " b = " of vector x to vector b ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ A n ,1 ! A nn x n b ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ n ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ n x n n x 1 n x 1 • Sometimes we wish to solve for the unknowns x ={ x 1 ,..,x n } when A and b provide constraints 12

Deep Learning Srihari Identity and Inverse Matrices • Matrix inversion is a powerful tool to analytically solve A x = b • Needs concept of Identity matrix • Identity matrix does not change value of vector when we multiply the vector by identity matrix – Denote identity matrix that preserves n-dimensional vectors as I n – Formally and I n ∈ ! n × n ∀ x ∈ ! n , I n x = x ⎡ ⎤ – Example of I 3 1 0 0 ⎢ ⎥ 0 1 0 ⎢ ⎥ 13 ⎢ ⎥ 0 0 1 ⎣ ⎦

Deep Learning Srihari Matrix Inverse • Inverse of square matrix A defined as A − 1 A = I n • We can now solve A x = b as follows: A x = b A − 1 A x = A − 1 b I n x = A − 1 b x = A − 1 b • This depends on being able to find A -1 • If A -1 exists there are several methods for finding it 14

Deep Learning Srihari Solving Simultaneous equations • A x = b where A is ( M+1 ) x ( M+1 ) x is ( M+1 ) x 1: set of weights to be determined b is N x 1 • Two closed-form solutions 1. Matrix inversion x = A -1 b 2. Gaussian elimination 15

Deep Learning Srihari Linear Equations: Closed-Form Solutions 1. Matrix Formulation: Ax=b Solution: x=A -1 b 2. Gaussian Elimination followed by back-substitution L 2 -3L 1 à L 2 L 3 -2L 1 à L 3 -L 2 /4 à L 2

Deep Learning Srihari Example: System of Linear Equations in Linear Regression • Instead of A x = b • We have Φ w = t – where Φ is design matrix of m features (basis functions ϕ i (x j ) ) for samples x j , t is targets of sample – We need weight w to be used with m basis functions to determine output m ( ) ∑ y ( x,w ) = w i φ i x 17 i = 1

Deep Learning Srihari Disadvantage of closed-form solutions • If A -1 exists, the same A -1 can be used for any given b – But A -1 cannot be represented with sufficient precision – It is not used in practice • Gaussian elimination also has disadvantages – numerical instability (division by small no.) – O ( n 3 ) for n x n matrix • Software solutions use value of b in finding x – E.g., difference (derivative) between b and output is 18 used iteratively

Deep Learning Srihari How many solutions for A x = b exist? • System of equations with A 11 x 1 + A 12 x 2 +....+ A 1n x n = b 1 A 2 1 x 1 + A 2 2 x 2 +....+ A 2 n x n = b 2 – n variables and m equations is • Solution is x =A -1 b A m 1 x 1 + A m 2 x 2 +....+ A m n x n = b m • In order for A -1 to exist A x = b must have exactly one solution for every value of b – It is also possible for the system of equations to have no solutions or an infinite no. of solutions for some values of b • It is not possible to have more than one but fewer than infinitely many solutions – If x and y are solutions then z = α x + ( 1- α ) y is a 19 solution for any real α

Deep Learning Srihari Span of a set of vectors • Span of a set of vectors: set of points obtained by a linear combination of those vectors – A linear combination of vectors { v (1 ) ,.., v (n) } with coefficients c i is ∑ v ( i ) c i i – System of equations is A x = b • A column of A , i.e., A :i specifies travel in direction i • How much we need to travel is given by x i ∑ • This is a linear combination of vectors A x = x i A :, i i – Thus determining whether A x = b has a solution is equivalent to determining whether b is in the span of columns of A • This span is referred to as column space or range of A

Deep Learning Srihari Conditions for a solution to A x = b • Matrix must be square, i.e., m=n and all columns must be linearly independent – Necessary condition is n ≥ m b ∈ ! m • For a solution to exist when we require the ! m column space be all of – Sufficient Condition • If columns are linear combinations of other columns, ! m column space is less than – Columns are linearly dependent or matrix is singular ! m • For column space to encompass at least one set of m linearly independent columns • For non-square and singular matrices – Methods other than matrix inversion are used

Deep Learning Srihari Norms • Used for measuring the size of a vector • Norms map vectors to non-negative values • Norm of vector x is distance from origin to x – It is any function f that satisfies: ( ) = 0 ⇒ x = 0 f x ( ) + f y ( ) Triangle Inequality f (x + y ) ≤ f x ( ) = α f x ( ) ∀ α ∈ ! f α x 22

Deep Learning Srihari L P Norm 1 • Definition ⎛ ⎞ p ∑ p x p = x i ⎜ ⎟ ⎝ ⎠ i • L 2 Norm – Called Euclidean norm, written simply as ||x|| – Squared Euclidean norm is same as x T x • L 1 Norm – Useful when 0 and non-zero have to be distinguished (since L 2 increases slowly near origin, e.g., 0.1 2 =0.01) • L ∞ Norm ∞ = max x x i i – Called max norm 23

Deep Learning Srihari Size of a Matrix • Frobenius norm 1 ⎛ ⎞ 2 ∑ 2 A i , j A F = ⎜ ⎟ ⎝ ⎠ i,j • It is analogous to L 2 norm of a vector 24

Deep Learning Srihari Angle between Vectors • Dot product of two vectors can be written in terms of their L 2 norms and angle θ between them x T y = x 2 cos θ 2 y 25

Linear Algebra for Machine Learning Sargur N. Srihari - PowerPoint PPT Presentation

Deep Learning Srihari Linear Algebra for Machine Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1 Deep Learning Srihari Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Machine Learning for Computational Linguistics A refresher on linear algebra ar ltekin

Machine Learning Prof. Kuan-Ting Lai 2020/4/11 Applied Math for Machine Learning Linear

Chapter 4: Vectors, Matrices, and Linear Algebra Scott Owen & Greg Corrado Linear Algebra is

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Low rank approximation and write avoiding algorithms Laura Grigori Inria Paris - LJLL, UPMC

Field-Wide Estimation of Soil 2/26 Moisture Using Compressive Sensing Importance of moisture

Backward Perturbation Analysis for Scaled Total Least Squares Problems David Titley-P eloquin

Jeffrey D. Ullman Stanford University Often, our data can be represented by an m-by-n matrix.

t r

Graph Theoretic Approaches to Atom ic Vibrations in Fullerenes ERNESTO ESTRADA Department of

Linear Regression II, SGD, Perceptron Milan Straka October 14, 2019 Charles University in

Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova

Linear Algebra for Machine Learning Sargur N. Srihari - PowerPoint PPT Presentation

Deep Learning Srihari Linear Algebra for Machine Learning Sargur N. Srihari srihari@cedar.buffalo.edu 1 Deep Learning Srihari Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have

Chapter 1 What is Linear Algebra? Chapter 1 What is Linear Algebra? The study of linear

Graphics 2014 Linear Algebra II Linear Maps &amp; Matrices Linear Maps &amp; Matrices CORE

Lecture 14: Dense Linear Algebra David Bindel 18 Oct 2010 Where we are This week: dense

Linear Algebra Linear algebra has become as basic and as applicable as calculus, and

PV Math Department MCL Vision Credit Options Credit General General/Post- College Honors

Linear algebra explained in four pages Excerpt from the N O BULLSHIT GUIDE TO LINEAR ALGEBRA by

Matrices Basic Linear Algebra Overview Lecture will cover why matrices and linear algebra

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Expressive Linear Algebra in Haskell Henning Thielemann 2019-08-21 Expressive Linear Algebra in

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Machine Learning for Computational Linguistics A refresher on linear algebra ar ltekin

Machine Learning Prof. Kuan-Ting Lai 2020/4/11 Applied Math for Machine Learning Linear

Chapter 4: Vectors, Matrices, and Linear Algebra Scott Owen &amp; Greg Corrado Linear Algebra is

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Basic Concepts in Linear Algebra Department of Mathematics Boise State University September 14,

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Low rank approximation and write avoiding algorithms Laura Grigori Inria Paris - LJLL, UPMC

Field-Wide Estimation of Soil 2/26 Moisture Using Compressive Sensing Importance of moisture

Backward Perturbation Analysis for Scaled Total Least Squares Problems David Titley-P eloquin

Jeffrey D. Ullman Stanford University Often, our data can be represented by an m-by-n matrix.

t r

Graph Theoretic Approaches to Atom ic Vibrations in Fullerenes ERNESTO ESTRADA Department of

Linear Regression II, SGD, Perceptron Milan Straka October 14, 2019 Charles University in

Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova

Graphics 2014 Linear Algebra II Linear Maps & Matrices Linear Maps & Matrices CORE

Chapter 4: Vectors, Matrices, and Linear Algebra Scott Owen & Greg Corrado Linear Algebra is