SLIDE 1
The Role of Linear Algebra in EEG Analysis Robert Hergenroder II - - PowerPoint PPT Presentation
The Role of Linear Algebra in EEG Analysis Robert Hergenroder II - - PowerPoint PPT Presentation
The Role of Linear Algebra in EEG Analysis Robert Hergenroder II December 20, 2016 Abstract In this paper we are exploring the role of linear algebra in processing electroencephalography signals via two separate, but complementary, processes:
SLIDE 2
SLIDE 3
An Introduction to EEG technologies
◮ Electroencephalography, or EEG, measures electrical signals
- n the scalp.
◮ It is designed to record neuron activity. ◮ In comparison to other brain measurement devices, such as
Magnetic Resonance Imaging, EEG technology allows for more precise measurements in regards to time (in the millisecond range.)
◮ Linear Algebra allows for configuring the data in an optimum
manner.
SLIDE 4
Figure 1
Figure: Represented is the electrode placement and measurements of an EEG.
SLIDE 5
There are various algorithms for the processing of the raw data to look at differing aspects of modeling cognition as it relates to EEG. In particular we are going to be looking at Independent Component Analysis and Principle Component Analysis and how they work together to produce more easily read data.
SLIDE 6
Independent Component Analysis, or ICA
◮ ICA is composed of different techniques to process
multivariate data.
◮ ICA is used to remove “noise,” skin galvanization, eye
movement, etc, from raw data.
◮ ICA is able to identify independent sources within various
signals.
◮ ICA includes techniques for finding:
◮ A, a co-efficient matrix ◮
b, a noise vector
◮
s, a source vector
◮ Which satisfies the following equation:
- x = A
s + b
SLIDE 7
Figure: The importance of ICA
SLIDE 8
Constructing a Linear System
◮ ICA allows us to look at the data in a linear way.
◮ Each measurement is an n-dimensional space, where n is the
number of channels the EEG is gathering data from.
◮ There are t measurements, where t is a set of vectors
associated with any particular time measurement in Rn
This is exampled where n = 3 in Figure 2, where each data point translates to a vector x
SLIDE 9
Figure 2
Figure: Each point represents a x vector in R3.
SLIDE 10
Our Data Matrix
Each data point translates to an x vector: X =
- x1
- x2
...
- xt
- ∈ Rn
X is an n × t matrix where the rows are a given channel’s electrical
- utput across time, and the columns are a single time “snapshot”
- f what every channel’s electrical charge is.
X = x11 x12 ... x1t x21 x22 ... x2t ... ... ... ... xn1 xn2 ... xnt
SLIDE 11
An example matrix
We shall define an example matrix as follows: X =
- x1
- x2
- x3
- =
1 1 2 2 1 1 1 Ergo channel one measured electrical signals of 1, 1, and 0, at t = 1, t = 2, and t = 3 respectively.
SLIDE 12
Decomposing our Data Matrix to a set of basis vectors
These x vectors can be further decomposed to a set of basic source vectors, s, which are linearly independent.
- x1 ⇒ S =
- s1
- s2
...
- sn
- ∈ Rn
...
- xn ⇒ S =
- s1
- s2
...
- sn
- ∈ Rn
SLIDE 13
The simplest source matrix
These source vectors can represented most simplistically as the n × n identity matrix: S =
- s1
- s2
...
- sn
- =
1 ... 1 ... ... ... ... ... 1 = I Although through ICA algorithms we are able to produce a more functional, and efficient, matrix of source data.
SLIDE 14
Defining a Coefficient Matrix
The decomposition of input data to a basis vector allows us to define a coefficient, a, to each source vector s which corresponds to the values of the data point vectors, x.
- x =
n
- i=1
ait si = S a By collecting the vectors we are able to produce the equation form: X = SA X is our collection of data point vectors in an n × t matrix, S is a collection of decomposed basis vectors in an n × n matrix, and A is the collection of coefficient vectors in an n × t matrix.
SLIDE 15
Exampling a Coefficient Matrix
In our example, we will be using our previously defined X matrix, and defining our source matrix as the identity matrix for ease of calculation.
- x1 =
1 2 1 = S a1 = 1 1 1 1 2 1
- x2 =
1 2 = S a2 = 1 1 1 1 2
- x3 =
1 1 = S a3 = 1 1 1 1 1
SLIDE 16
Example Matrices
And with our example matrix: X = SA = 1 1 1 1 1 2 2 1 1 1 At this point data can undergo Principle Component Analysis to reduce the matrices ranks and make the processing more efficient.
SLIDE 17
Principle Component Analysis
Principle Component Analysis, or PCA, is used to determine a more accurate matrix by distinguishing a more efficient matrix to express the source vectors and to identify noise vectors. This allows for reduction in dimensionality, making the data more
- concise. Our original data,
x vectors, are thought to be composed
- f two main portions: a source vector,
s, and a noise vector, b.
- x =
s + b This is done by examining correlations between the components of
- ur
x vectors to distinguish sources with the largest variance, which would denote unique signals. A co-variance matrix is mapped which relates the correlations between our x channels.
SLIDE 18
Further delineations of PCA
Noise and sources are arbitrarily defined. While ICA allows for each component to be distinguished, PCA maps channel correlations and is often utilized as a pre-processing step before ICA algorithms are ran. The first step in PCA of EEG data is to find a “covariance matrix.”
SLIDE 19
Finding a Co-variance Matrix
Co-variance is defined as the expected change between two data points across time. This allows researchers to see the correlations between the data points. Correlation between data points is presumed to represent noise and redundancy. To begin finding a co-variance matrix it is assumed that the data is “zero mean,” or that the mean of each channel has been removed. Such data would fulfill the following equation: 1 t
t
- j=1
[X]ij = 0, where i = 1, ..., n Zeroing the mean has a number of benefits in data analysis, coefficients represent each data point more accurately as correlation is minimized, and allows for a superposition of data points which more acutely reflects their variance.
SLIDE 20
Zeroing the mean of our example
Our example matrix therefore becomes: 1 3
3
- t=1
[X]1t = 1 3(1 + 1 + 0) = 2 3 ⇒ xT
1 =
- 1 − 2
3
1 − 2
3
0 − 2
3
- =
1
3 1 3
− 2
3
- 1
3
3
- t=1
[X]2t = 1 3(2 + 2 + 1) = 5 3 ⇒ xT
2 =
- 2 − 5
3
2 − 5
3
1 − 5
3
- =
1
3 1 3
− 2
3
- 1
3
3
- t=1
[X]3t = 1 3(1 + 0 + 1) = 2 3 ⇒ xT
3 =
- 1 − 2
3
0 − 2
3
1 − 2
3
- =
1
3
− 2
3 1 3
SLIDE 21
Our new zero-mean matrix
X =
1 3 1 3
− 2
3 1 3 1 3
− 2
3 1 3
− 2
3 1 3
SLIDE 22
Defining a covariance Matrix
The co-variance matrix itself is defined by the following equation: Σ = 1 t
t
- j=1
- xi
xi
T = 1
t XXT When Σij = 0 The channels xi and xj are entirely uncorrelated.
SLIDE 23
Our example covariance matrix
Our example co-variance matrix would be computed as follows: Σ = 1 3
1 3 1 3
− 2
3 1 3 1 3
− 2
3 1 3
− 2
3 1 3
1 3 1 3 1 3 1 3 1 3
− 2
3
− 2
3
− 2
3 1 3
Σ = 1 3
2 3 2 3
− 1
3 2 3 2 3
− 1
3
− 1
3
− 1
3 2 3
Σ =
2 9 2 9
− 1
9 2 9 2 9
− 1
9
− 1
9
− 1
9 2 9
Ergo in our example all the channels are correlated in some manner.
SLIDE 24
Decomposing the matrix to its eigensystem
By diagonalizing the co-variance matrix we are able to minimize redundancy and maximize the interesting dynamics. Since the interesting dynamics we are searching for is typically limited, this can allow for a reduction in the dimensionality of the matrix. To begin, we find the eigenvalues, λ, and eigenvectors, v, of Σ: Σ =
n
- i=1
λi vi vT
i
= VΛVT
SLIDE 25
Our examples eigensystem
And with our example Σ: Σ =
1 2(−1 −
√ 3)
1 2(−1 +
√ 3) −1
1 2(−1 −
√ 3)
1 2(−1 +
√ 3) 1 1 1
1 9(3 +
√ 3)
1 9(3 −
√ 3) ∗
1 2(−1 −
√ 3)
1 2(−1 −
√ 3) 1
1 2(−1 +
√ 3)
1 2(−1 +
√ 3) 1 −1 1
SLIDE 26
Decomposition allows for a reduction in rank
Σ is only full rank if and only if all of the eigenvalues are positive. If some of the eigenvalues are equal to zero, we are able to reduce the rank of the matrix to r, or the number of non-zero eigenvalues.
- x =
r
- i=1
- ai
vi = S a Typically this reduction makes processing more efficient and quicker, without any loss of potentially important data. The vectors are collected to the familiar form: X = SA Where X is our n × t data matrix, S is our n × r matrix of relevant, e.g. non-zero, eigenvectors, and A is our r × t co-efficient matrix, which points to our x vectors locations in our reduced r-dimensional space.
SLIDE 27
Whitening our Example
The whitening process as defined by our example: X∗ =
3
√
3+ √ 3 3
√
3− √ 3
1
2(−1 −
√ 3)
1 2(−1 −
√ 3) 1
1 2(−1 +
√ 3)
1 2(−1 +
√ 3) 1
1 9 1 9
− 2
9 1 9 1 9
− 2
9
− 2
9
− 2
9 1 9
= − 1
3
- 3 +
√ 3 − 1
3
- 3 +
√ 3
- 1
2 + 5 6 √ 3
− 1
3
- 9 + 5
√ 3 − 1
3
- 9 + 5
√ 3
- 11
6 + 19 6 √ 3
SLIDE 28
A breakdown of the ICA process
At this point, the data is ready to be ran through Independent Component Analysis. ICA constructs a neural network which learns from the given parameters to separate data into the signals we want, and noise. Ergo, the exact algorithm used would vary based upon desired results. Although there are many, many different ICA algorithms, there are two main ones used in the analysis of EEG data: Infomax, and FastICA. These ICA algorithms take the channel data, x∗, the source data,
- s, and the coefficient matrix, A∗ of the following equation:
- x∗ = A∗
s And removes the unwanted noise data, b
SLIDE 29
The Infomax Algorithm
The Infomax algorithm was first utilized to translate EEG data in 1995 by Bell and Sejnowski. It acts as an unsupervised learning algorithm which is based on maximizing the entropy, as it is indication of randomness in a variable, of the output produced by the linear transformations. Maximizing the entropy therefore minimizes the mutual correlations between channels. Infomax relies
- n finding the probability density function, e.g. the likelihood of
- ur source vectors,
s, falling within a particular range. This constructs a mixing matrix from a learning algorithm, with every iteration the matrix becomes more precise. The mixing matrix is defined as follows:
SLIDE 30
The Infomax Algorithm
∆W ∝ [(WT)−1 − ( y) xT, ∆ w ∝ y Where W is the mixing matrix, y is a linear transform of our source vectors, and x is our pre-processed data vector. At this point the data needs to be processed via computer, as the process is iterative and it would take years to calculate the process by hand, if it could even be done. The most common utilized software is MAT-LAB, although others have coded in python.
SLIDE 31
FastICA
The methodologies behind FastICA are very similar to Infomax: create an artificial learning environment which assigns a weight factor to each piece of source data to minimize correlation. It involves finding a function and its derivatives to maximize non-Gaussianity.
SLIDE 32
Example
Unfortunately, my MATLAB and python skills are lacking to produce my own example. Instead here are some EEG data which visualizes the difference an ICA algorithm makes.
SLIDE 33
Visual representation of ICA process
SLIDE 34