 
              Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu tingtingmu@manchester.ac.uk School of Computer Science University of Manchester Manchester M13 9PL, UK Editor: NA 1. Linear Algebra Basics 1.1 Basic Concepts and Notations A matrix is a rectangular array of numbers arranged in rows and columns. By X ∈ R m × n , we denote a matrix X with m rows and n columns of real-valued numbers. The notation X = [ x ij ] (or X = [ x i,j ] ) indicates that the element of X at its i -th row and j -th column is denoted by x ij (or x i,j ): ⎡ ⎤ ⋯ ⎢ ⎥ x 11 x 12 x 13 x 1 n ⎢ ⎥ ⋯ ⎢ ⎥ ⎢ ⎥ x 21 x 22 x 23 x 2 n X = ⎢ ⋯ ⎥ ⎢ ⎥ (1) x 31 x 32 x 33 x 3 n . ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ x m 1 x m 2 x m 3 x mn For instance, − 0 . 4 A = [ 1 . 2 ] 4 (2) 3 0 1 is a 2 × 3 matrix containing two rows and three columns. Given a matrix X , the notation X ∶ ,i is usually used to denote its i -th column. Its i -th row can be denoted by X i, ∶ . Its element at the i -th row and j -th column, which is referred to as the ij -th element, can be denoted by X ij . A row vector is a matrix with one row. By x = [ x 1 ,x 2 ,...,x n ] , we denote a row vector of dimension n . For instance, the 2nd row of the matrix A in Eq. (2) is A 2 , ∶ = [ 3 , 0 , 1 ] . ⎡ ⎤ ⎢ ⎥ ⎢ x 1 ⎥ ⎢ ⎥ A column vector is a matrix with one column. By x = ⎢ ⎥ x 2 ⎢ ⎥ ⋮ , we denote a column vector ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ x n of dimension n . For instance, the 3rd column of the matrix A in Eq. (2) is A ∶ , 3 = [ − 0 . 4 ] . 1 1
The i -th element of a vector x , which can be either a row or column vector, is denoted by x i . A matrix with the same number of rows and columns is called a square matrix . A square matrix with ones on the diagonal and zeros everywhere else is called the identity matrix, typically denoted by I : ⎡ ⎤ ⋯ ⎢ ⎥ 1 0 0 0 ⎢ ⎥ ⋯ ⎢ ⎥ ⎢ 0 1 0 0 ⎥ I = ⎢ ⋯ ⎥ ⎢ ⎥ 0 0 1 0 . (3) ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ 0 0 0 1 An identity matrix of size n is denoted by I n ∈ R n × n . A matrix with all the non-diagonal ele- ments equal to 0 is called a diagonal matrix , typically denoted by D = diag ([ d 1 ,d 2 ,...,d n ]) : ⎡ ⎤ ⋯ ⎢ ⎥ 0 0 0 d 1 ⎢ ⎥ ⋯ ⎢ ⎥ ⎢ ⎥ 0 d 2 0 0 D = ⎢ ⋯ ⎥ ⎢ ⎥ 0 0 0 d 3 . (4) ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ 0 0 0 d n Clearly, I = diag ([ 1 , 1 ,..., 1 ]) . A diagonal matrix formed from the n -dimensional vector x is diag ( x ) , written as ⎡ ⎤ ⋯ ⎢ ⎥ 0 0 0 x 1 ⎢ ⎥ ⋯ ⎥ ⎢ ⎢ ⎥ 0 x 2 0 0 diag ( x ) = ⎢ ⋯ ⎥ ⎢ ⎥ 0 0 0 x 3 . (5) ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ 0 0 0 x n 1.2 Matrix Operations A summary of some frequently used matrix operations is provided below. • The transpose of a matrix X , denoted by X T , is formed by “flipping” the rows and columns: ( X T ) ij = X ji . For instance, ⎡ ⎤ − 2 ⎢ ⎥ 1 ⎢ ⎥ − 7 ⎢ ⎥ T [ 1 0 ] = ⎢ ⎥ 0 0 0 4 ⎢ ⎥ − 2 . (6) ⎢ ⎥ 4 1 0 1 ⎢ ⎥ − 7 ⎣ ⎦ 0 T = X . It has the property of ( X T ) • The sum operation is applied to two matrices of the same size. Given two m × n matrices X and Y , their sum is calculated entrywise such that ( X + Y ) ij = X ij + Y ij . For instance, − 7 1 ] = [ 1 + 0 0 + 0 0 + 0 − 7 + 1 − 6 [ 1 ] + [ 0 ] = [ 1 ] . 0 0 0 0 1 0 0 − 2 − 2 + 1 4 + 2 1 + 1 0 + 1 − 1 4 1 0 1 2 1 6 2 1 It has the property of ( X + Y ) T = X T + Y T . (7) 2
• The product of a number (also called a scalar) and a matrix is referred to as scalar multiplication . Given a scalar c and a matrix X , their scalar multiplication is computed by multiplying every entry of X by c such that ( c X ) ij = c ( X ) ij . For instance, − 7 2 × 1 2 × 0 2 × 0 2 × (− 7 ) − 14 2 [ 1 0 ] = [ ] = [ 2 ] . (8) 0 0 0 0 − 2 2 × (− 2 ) 2 × 4 2 × 1 2 × 0 − 4 4 1 8 2 0 It has the property of ( c X ) T = c X T . • The multiplication operation is defined over two matrices where the number of columns of the left matrix has to be the same as the number of rows of the right matrix. Given an m × n matrix X and an n × p matrix Y , their multiplication is denoted by XY , where n ( XY ) ij = ∑ (9) X ik Y kj . k = 1 An illustration example of calculating the multiplication of a 4 × 2 matrix A = [ a i,j ] and a 2 × 3 matrix B = [ b i,j ] is shown in Figure 1. a 1,1 b 1,2 + a 1,2 b 2,2 a 3,1 b 1,3 + a 3,2 b 2,3 Figure 1: An illustration of calculating matrix multiplication. The figure is adapted from the Wikipedia page on matrix multiplication. Given matrices A ∈ R m × n , B ∈ R n × p , C ∈ R n × p and D ∈ R p × q , some properties of the matrix multiplication are shown in the following: A ( B + C ) = AB + AC , (10) ( B + C ) D = BD + CD , (11) ( AB ) D = A ( BD ) , (12) ( AB ) T = B T A T . (13) 3
• The trace operation is defined for a square matrix X ∈ R n × n , denoted by tr ( X ) . It is the sum of all the diagonal elements in the matrix, given by n tr ( X ) = ∑ X ii . (14) i = 1 Given two square matrices X and Y of size n , and two matrices A ∈ R m × n and B ∈ R n × m some properties of the trace are shown in the following: tr ( X T ) , tr ( X ) = (15) tr ( X + Y ) = tr ( X ) + tr ( Y ) , (16) tr ( AB ) = tr ( BA ) . (17) • The inverse of a square matrix X of size n is denoted by X − 1 , which is the unique matrix such that XX − 1 = X − 1 X = I . (18) Non-square matrices do not have inverses by definition. For some square matrices, their inverse may not exist. We say that X is invertible or ( non-singular ) if X − 1 exists, and non-invertible (or singular ) otherwise. Given two invertible square matrices X and Y of the same size, some properties of the inverse are shown in the following: ( X − 1 ) = − 1 X , (19) ( X T ) ( X − 1 ) = − 1 , T (20) ( XY ) − 1 = Y − 1 X − 1 . (21) • Given two n -dimensional column vectors x and y , the quantity x T y is called the inner product (or dot product ) of the two vectors, which is a real number computed by ⎡ ⎤ ⎢ ⎥ ⎢ y 1 ⎥ ⎢ ⎥ x T y = [ x 1 ,x 2 ,...,x n ] ⎢ ⎥ = n ∑ y 2 ⎢ ⎥ ⋮ x i y i . (22) ⎢ ⎥ ⎢ ⎥ i = 1 ⎣ ⎦ y n • A norm of a vector x is informally a measure of the “length” of the vector, and is usually denoted by ∥ x ∥ . Assuming x is an n -dimensional column vector, the commonly used Euclidean norm (or called l 2 -norm ) is given by � √ � � � n ∥ x ∥ 2 = i = ∑ x 2 x T x . (23) i = 1 Another example of the norm is the l 1 -norm , given by ∥ x ∥ 1 = n ∣ x i ∣ . ∑ (24) i = 1 4
• A norm can also be defined for a matrix. For example, the Frobenius norm of an m × n matrix X is given by � √ √ � tr ( XX T ) . � � m n ∥ X ∥ F = ij = tr ( X T X ) = ∑ ∑ X 2 (25) i = 1 j = 1 1.3 Symmetric Matrices Given a square matrix X ∈ R n × n , it is symmetric if X = X T . For instance, the following 4 × 4 matrix is symmetric: ⎡ ⎤ − 7 ⎢ ⎥ ⎢ 1 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ 0 4 3 0 ⎢ ⎥ . (26) ⎢ ⎥ 0 3 2 1 ⎢ ⎥ − 7 − 1 . 6 ⎣ ⎦ 0 1 Given an arbitrary square matrix X ∈ R n × n , the matrix X + X T is symmetric. 2. Calculus Basics 2.1 Derivative and Differentiation Rules Given a function of a real variable f ( x ) ∶ R → R , its derivative f ′ ( x ) (or d f dx in Leibniz’s notation) measures the rate at which the function value changes with respect to the change of the input variable x , where f ( x + ∆ x ) − f ( x ) f ′ ( x ) = d dx = lim f . (27) ∆ x ∆ x → 0 This gives the trivial case that the derivative of a constant function is zero. The tangent line to the graph of a function f ( x ) at a chosen input value is the straight line that ”just touches” the function curve at that point. The slope of the tangent line is equal to the derivative of the function at the chosen value (see Figure 2 for example). The process of finding a derivative is called differentiation. Here is a summary of rules for computing the derivative of a function in calculus, referred to as differentiation rules . • Linearity: For any functions f ( x ) and g ( x ) and any real numbers a and b , the deriva- tive of the function h ( x ) = af ( x ) + bg ( x ) with respect to x is h ′ ( x ) = af ′ ( x ) + bg ′ ( x ) . (28) Its special cases include the constant factor rule ( af ) ′ = af ′ , the sum rule ( f + g ) ′ = f ′ + g ′ , and the subtraction rule ( f − g ) ′ = f ′ − g ′ . • Product rule: For any functions f ( x ) and g ( x ) , the derivative of the function h ( x ) = f ( x ) g ( x ) with respect to x is h ′ ( x ) = f ′ ( x ) g ( x ) + f ( x ) g ′ ( x ) . (29) 5
Recommend
More recommend