Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu - PDF document

Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu tingtingmu@manchester.ac.uk School of Computer Science University of Manchester Manchester M13 9PL, UK Editor: NA 1. Linear Algebra Basics 1.1 Basic Concepts and Notations A matrix is a rectangular array of numbers arranged in rows and columns. By X ∈ R m × n , we denote a matrix X with m rows and n columns of real-valued numbers. The notation X = [ x ij ] (or X = [ x i,j ] ) indicates that the element of X at its i -th row and j -th column is denoted by x ij (or x i,j ): ⎡ ⎤ ⋯ ⎢ ⎥ x 11 x 12 x 13 x 1 n ⎢ ⎥ ⋯ ⎢ ⎥ ⎢ ⎥ x 21 x 22 x 23 x 2 n X = ⎢ ⋯ ⎥ ⎢ ⎥ (1) x 31 x 32 x 33 x 3 n . ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ x m 1 x m 2 x m 3 x mn For instance, − 0 . 4 A = [ 1 . 2 ] 4 (2) 3 0 1 is a 2 × 3 matrix containing two rows and three columns. Given a matrix X , the notation X ∶ ,i is usually used to denote its i -th column. Its i -th row can be denoted by X i, ∶ . Its element at the i -th row and j -th column, which is referred to as the ij -th element, can be denoted by X ij . A row vector is a matrix with one row. By x = [ x 1 ,x 2 ,...,x n ] , we denote a row vector of dimension n . For instance, the 2nd row of the matrix A in Eq. (2) is A 2 , ∶ = [ 3 , 0 , 1 ] . ⎡ ⎤ ⎢ ⎥ ⎢ x 1 ⎥ ⎢ ⎥ A column vector is a matrix with one column. By x = ⎢ ⎥ x 2 ⎢ ⎥ ⋮ , we denote a column vector ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ x n of dimension n . For instance, the 3rd column of the matrix A in Eq. (2) is A ∶ , 3 = [ − 0 . 4 ] . 1 1

The i -th element of a vector x , which can be either a row or column vector, is denoted by x i . A matrix with the same number of rows and columns is called a square matrix . A square matrix with ones on the diagonal and zeros everywhere else is called the identity matrix, typically denoted by I : ⎡ ⎤ ⋯ ⎢ ⎥ 1 0 0 0 ⎢ ⎥ ⋯ ⎢ ⎥ ⎢ 0 1 0 0 ⎥ I = ⎢ ⋯ ⎥ ⎢ ⎥ 0 0 1 0 . (3) ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ 0 0 0 1 An identity matrix of size n is denoted by I n ∈ R n × n . A matrix with all the non-diagonal elements equal to 0 is called a diagonal matrix , typically denoted by D = diag ([ d 1 ,d 2 ,...,d n ]) : ⎡ ⎤ ⋯ ⎢ ⎥ 0 0 0 d 1 ⎢ ⎥ ⋯ ⎢ ⎥ ⎢ ⎥ 0 d 2 0 0 D = ⎢ ⋯ ⎥ ⎢ ⎥ 0 0 0 d 3 . (4) ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ 0 0 0 d n Clearly, I = diag ([ 1 , 1 ,..., 1 ]) . A diagonal matrix formed from the n -dimensional vector x is diag ( x ) , written as ⎡ ⎤ ⋯ ⎢ ⎥ 0 0 0 x 1 ⎢ ⎥ ⋯ ⎥ ⎢ ⎢ ⎥ 0 x 2 0 0 diag ( x ) = ⎢ ⋯ ⎥ ⎢ ⎥ 0 0 0 x 3 . (5) ⎢ ⎥ ⋮ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ ⎢ ⎥ ⋯ ⎣ ⎦ 0 0 0 x n 1.2 Matrix Operations A summary of some frequently used matrix operations is provided below. • The transpose of a matrix X , denoted by X T , is formed by “flipping” the rows and columns: ( X T ) ij = X ji . For instance, ⎡ ⎤ − 2 ⎢ ⎥ 1 ⎢ ⎥ − 7 ⎢ ⎥ T [ 1 0 ] = ⎢ ⎥ 0 0 0 4 ⎢ ⎥ − 2 . (6) ⎢ ⎥ 4 1 0 1 ⎢ ⎥ − 7 ⎣ ⎦ 0 T = X . It has the property of ( X T ) • The sum operation is applied to two matrices of the same size. Given two m × n matrices X and Y , their sum is calculated entrywise such that ( X + Y ) ij = X ij + Y ij . For instance, − 7 1 ] = [ 1 + 0 0 + 0 0 + 0 − 7 + 1 − 6 [ 1 ] + [ 0 ] = [ 1 ] . 0 0 0 0 1 0 0 − 2 − 2 + 1 4 + 2 1 + 1 0 + 1 − 1 4 1 0 1 2 1 6 2 1 It has the property of ( X + Y ) T = X T + Y T . (7) 2

• The product of a number (also called a scalar) and a matrix is referred to as scalar multiplication . Given a scalar c and a matrix X , their scalar multiplication is computed by multiplying every entry of X by c such that ( c X ) ij = c ( X ) ij . For instance, − 7 2 × 1 2 × 0 2 × 0 2 × (− 7 ) − 14 2 [ 1 0 ] = [ ] = [ 2 ] . (8) 0 0 0 0 − 2 2 × (− 2 ) 2 × 4 2 × 1 2 × 0 − 4 4 1 8 2 0 It has the property of ( c X ) T = c X T . • The multiplication operation is defined over two matrices where the number of columns of the left matrix has to be the same as the number of rows of the right matrix. Given an m × n matrix X and an n × p matrix Y , their multiplication is denoted by XY , where n ( XY ) ij = ∑ (9) X ik Y kj . k = 1 An illustration example of calculating the multiplication of a 4 × 2 matrix A = [ a i,j ] and a 2 × 3 matrix B = [ b i,j ] is shown in Figure 1. a 1,1 b 1,2 + a 1,2 b 2,2 a 3,1 b 1,3 + a 3,2 b 2,3 Figure 1: An illustration of calculating matrix multiplication. The figure is adapted from the Wikipedia page on matrix multiplication. Given matrices A ∈ R m × n , B ∈ R n × p , C ∈ R n × p and D ∈ R p × q , some properties of the matrix multiplication are shown in the following: A ( B + C ) = AB + AC , (10) ( B + C ) D = BD + CD , (11) ( AB ) D = A ( BD ) , (12) ( AB ) T = B T A T . (13) 3

• The trace operation is defined for a square matrix X ∈ R n × n , denoted by tr ( X ) . It is the sum of all the diagonal elements in the matrix, given by n tr ( X ) = ∑ X ii . (14) i = 1 Given two square matrices X and Y of size n , and two matrices A ∈ R m × n and B ∈ R n × m some properties of the trace are shown in the following: tr ( X T ) , tr ( X ) = (15) tr ( X + Y ) = tr ( X ) + tr ( Y ) , (16) tr ( AB ) = tr ( BA ) . (17) • The inverse of a square matrix X of size n is denoted by X − 1 , which is the unique matrix such that XX − 1 = X − 1 X = I . (18) Non-square matrices do not have inverses by definition. For some square matrices, their inverse may not exist. We say that X is invertible or ( non-singular ) if X − 1 exists, and non-invertible (or singular ) otherwise. Given two invertible square matrices X and Y of the same size, some properties of the inverse are shown in the following: ( X − 1 ) = − 1 X , (19) ( X T ) ( X − 1 ) = − 1 , T (20) ( XY ) − 1 = Y − 1 X − 1 . (21) • Given two n -dimensional column vectors x and y , the quantity x T y is called the inner product (or dot product ) of the two vectors, which is a real number computed by ⎡ ⎤ ⎢ ⎥ ⎢ y 1 ⎥ ⎢ ⎥ x T y = [ x 1 ,x 2 ,...,x n ] ⎢ ⎥ = n ∑ y 2 ⎢ ⎥ ⋮ x i y i . (22) ⎢ ⎥ ⎢ ⎥ i = 1 ⎣ ⎦ y n • A norm of a vector x is informally a measure of the “length” of the vector, and is usually denoted by ∥ x ∥ . Assuming x is an n -dimensional column vector, the commonly used Euclidean norm (or called l 2 -norm ) is given by � √ � � � n ∥ x ∥ 2 = i = ∑ x 2 x T x . (23) i = 1 Another example of the norm is the l 1 -norm , given by ∥ x ∥ 1 = n ∣ x i ∣ . ∑ (24) i = 1 4

• A norm can also be defined for a matrix. For example, the Frobenius norm of an m × n matrix X is given by � √ √ � tr ( XX T ) . � � m n ∥ X ∥ F = ij = tr ( X T X ) = ∑ ∑ X 2 (25) i = 1 j = 1 1.3 Symmetric Matrices Given a square matrix X ∈ R n × n , it is symmetric if X = X T . For instance, the following 4 × 4 matrix is symmetric: ⎡ ⎤ − 7 ⎢ ⎥ ⎢ 1 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ 0 4 3 0 ⎢ ⎥ . (26) ⎢ ⎥ 0 3 2 1 ⎢ ⎥ − 7 − 1 . 6 ⎣ ⎦ 0 1 Given an arbitrary square matrix X ∈ R n × n , the matrix X + X T is symmetric. 2. Calculus Basics 2.1 Derivative and Differentiation Rules Given a function of a real variable f ( x ) ∶ R → R , its derivative f ′ ( x ) (or d f dx in Leibniz’s notation) measures the rate at which the function value changes with respect to the change of the input variable x , where f ( x + ∆ x ) − f ( x ) f ′ ( x ) = d dx = lim f . (27) ∆ x ∆ x → 0 This gives the trivial case that the derivative of a constant function is zero. The tangent line to the graph of a function f ( x ) at a chosen input value is the straight line that ”just touches” the function curve at that point. The slope of the tangent line is equal to the derivative of the function at the chosen value (see Figure 2 for example). The process of finding a derivative is called differentiation. Here is a summary of rules for computing the derivative of a function in calculus, referred to as differentiation rules . • Linearity: For any functions f ( x ) and g ( x ) and any real numbers a and b , the derivative of the function h ( x ) = af ( x ) + bg ( x ) with respect to x is h ′ ( x ) = af ′ ( x ) + bg ′ ( x ) . (28) Its special cases include the constant factor rule ( af ) ′ = af ′ , the sum rule ( f + g ) ′ = f ′ + g ′ , and the subtraction rule ( f − g ) ′ = f ′ − g ′ . • Product rule: For any functions f ( x ) and g ( x ) , the derivative of the function h ( x ) = f ( x ) g ( x ) with respect to x is h ′ ( x ) = f ′ ( x ) g ( x ) + f ( x ) g ′ ( x ) . (29) 5

Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu - PDF document

Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu tingtingmu@manchester.ac.uk School of Computer Science University of Manchester Manchester M13 9PL, UK Editor: NA 1. Linear Algebra Basics 1.1 Basic Concepts and Notations A matrix

COMP24111 Course Unit Overview Ke Chen and Tingting Mu http:/ / syllabus.cs.manchester.ac.uk/ ugt/

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [25.1, KPM] COMP24111 Machine Learning

Welcome Maths in Year 1 Aims Maths teaching in Year 1 Different areas of Maths How

Monday Tuesday Wednesday Thursday Friday 8:50 -9am Registration Maths meetings /Pre- Maths

The New Curriculum Maths Your friendly Maths team: Lizzie Kirk, Helen Twining and Helen Bramall

Welcome! Aims Give an overview of the GFS maths vision Dispel some maths myths

Reception Maths Workshop Maths in Early Years Maths in the Early Years builds an important

Warstones Primary ry Why is maths so important? Maths is everywhere in the world around them.

Blackboard Collaborate ?? CONNECT WITH MATHS ~ MATHS IN ACTION ~ COMMUNITY LAUNCH 21/07/2014

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

COMP24111: Machine Learning and Optimization (Part I) Dr. Tingting Mu Email:

Welcome to our Maths Parent Workshop What maths discussions or thinking could you create with

ETF Maths Pipeline & regional leads AoC English & Maths Conference Solihull College 13

Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge Graphs Part 2: Part 3:

Maths in Year 1 Recap, Consolidation and Mastery Maths across the school Visual Range

Temporal Planning Planning with Temporal and Concurrent Actions 1 Literature Malik

Optimizations & Bounds for Sparse Symmetric Matrix-Vector Multiply Berkeley Benchmarking and

Basic Math Review for CS1340 Dr. Mihail August 14, 2018 (Dr. Mihail) Math Review for CS1340

Generic Circuit Operators Jean Vuillemin cole Normale Suprieure, Paris Minimal area

HL7 2.x Security Hacking medical devices Anirudh Duggal Disclaimer: All the views/ research

Statistical Modeling Approaches for Statistical Modeling Approaches for Information Retrieval

Latent Semantic Indexing Information Systems M Prof. Paolo Ciaccia

Tricks for kernel methods in large datasets Matthias Treder Stellenbosch University MML 10 May

Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu - PDF document

Maths Knowledge Overview - for Part 1, COMP24111 Tingting Mu tingtingmu@manchester.ac.uk School of Computer Science University of Manchester Manchester M13 9PL, UK Editor: NA 1. Linear Algebra Basics 1.1 Basic Concepts and Notations A matrix

COMP24111 Course Unit Overview Ke Chen and Tingting Mu http:/ / syllabus.cs.manchester.ac.uk/ ugt/

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [25.1, KPM] COMP24111 Machine Learning

Welcome Maths in Year 1 Aims Maths teaching in Year 1 Different areas of Maths How

Monday Tuesday Wednesday Thursday Friday 8:50 -9am Registration Maths meetings /Pre- Maths

The New Curriculum Maths Your friendly Maths team: Lizzie Kirk, Helen Twining and Helen Bramall

Welcome! Aims Give an overview of the GFS maths vision Dispel some maths myths

Reception Maths Workshop Maths in Early Years Maths in the Early Years builds an important

Warstones Primary ry Why is maths so important? Maths is everywhere in the world around them.

Blackboard Collaborate ?? CONNECT WITH MATHS ~ MATHS IN ACTION ~ COMMUNITY LAUNCH 21/07/2014

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

COMP24111: Machine Learning and Optimization (Part I) Dr. Tingting Mu Email:

Welcome to our Maths Parent Workshop What maths discussions or thinking could you create with

ETF Maths Pipeline &amp; regional leads AoC English &amp; Maths Conference Solihull College 13

Tutorial Overview https://kgtutorial.github.io Part 1: Knowledge Graphs Part 2: Part 3:

Maths in Year 1 Recap, Consolidation and Mastery Maths across the school Visual Range

Temporal Planning Planning with Temporal and Concurrent Actions 1 Literature Malik

Optimizations &amp; Bounds for Sparse Symmetric Matrix-Vector Multiply Berkeley Benchmarking and

Basic Math Review for CS1340 Dr. Mihail August 14, 2018 (Dr. Mihail) Math Review for CS1340

Generic Circuit Operators Jean Vuillemin cole Normale Suprieure, Paris Minimal area

HL7 2.x Security Hacking medical devices Anirudh Duggal Disclaimer: All the views/ research

Statistical Modeling Approaches for Statistical Modeling Approaches for Information Retrieval

Latent Semantic Indexing Information Systems M Prof. Paolo Ciaccia

Tricks for kernel methods in large datasets Matthias Treder Stellenbosch University MML 10 May

ETF Maths Pipeline & regional leads AoC English & Maths Conference Solihull College 13

Optimizations & Bounds for Sparse Symmetric Matrix-Vector Multiply Berkeley Benchmarking and