The Role of Linear Algebra in EEG Analysis Robert Hergenroder II - - PowerPoint PPT Presentation

the role of linear algebra in eeg analysis
SMART_READER_LITE
LIVE PREVIEW

The Role of Linear Algebra in EEG Analysis Robert Hergenroder II - - PowerPoint PPT Presentation

The Role of Linear Algebra in EEG Analysis Robert Hergenroder II December 20, 2016 Abstract In this paper we are exploring the role of linear algebra in processing electroencephalography signals via two separate, but complementary, processes:


slide-1
SLIDE 1

The Role of Linear Algebra in EEG Analysis

Robert Hergenroder II December 20, 2016

slide-2
SLIDE 2

Abstract

In this paper we are exploring the role of linear algebra in processing electroencephalography signals via two separate, but complementary, processes: Independent Component Analysis and Principle Component Analysis.

slide-3
SLIDE 3

An Introduction to EEG technologies

◮ Electroencephalography, or EEG, measures electrical signals

  • n the scalp.

◮ It is designed to record neuron activity. ◮ In comparison to other brain measurement devices, such as

Magnetic Resonance Imaging, EEG technology allows for more precise measurements in regards to time (in the millisecond range.)

◮ Linear Algebra allows for configuring the data in an optimum

manner.

slide-4
SLIDE 4

Figure 1

Figure: Represented is the electrode placement and measurements of an EEG.

slide-5
SLIDE 5

There are various algorithms for the processing of the raw data to look at differing aspects of modeling cognition as it relates to EEG. In particular we are going to be looking at Independent Component Analysis and Principle Component Analysis and how they work together to produce more easily read data.

slide-6
SLIDE 6

Independent Component Analysis, or ICA

◮ ICA is composed of different techniques to process

multivariate data.

◮ ICA is used to remove “noise,” skin galvanization, eye

movement, etc, from raw data.

◮ ICA is able to identify independent sources within various

signals.

◮ ICA includes techniques for finding:

◮ A, a co-efficient matrix ◮

b, a noise vector

s, a source vector

◮ Which satisfies the following equation:

  • x = A

s + b

slide-7
SLIDE 7

Figure: The importance of ICA

slide-8
SLIDE 8

Constructing a Linear System

◮ ICA allows us to look at the data in a linear way.

◮ Each measurement is an n-dimensional space, where n is the

number of channels the EEG is gathering data from.

◮ There are t measurements, where t is a set of vectors

associated with any particular time measurement in Rn

This is exampled where n = 3 in Figure 2, where each data point translates to a vector x

slide-9
SLIDE 9

Figure 2

Figure: Each point represents a x vector in R3.

slide-10
SLIDE 10

Our Data Matrix

Each data point translates to an x vector: X =

  • x1
  • x2

...

  • xt
  • ∈ Rn

X is an n × t matrix where the rows are a given channel’s electrical

  • utput across time, and the columns are a single time “snapshot”
  • f what every channel’s electrical charge is.

X =     x11 x12 ... x1t x21 x22 ... x2t ... ... ... ... xn1 xn2 ... xnt    

slide-11
SLIDE 11

An example matrix

We shall define an example matrix as follows: X =

  • x1
  • x2
  • x3
  • =

  1 1 2 2 1 1 1   Ergo channel one measured electrical signals of 1, 1, and 0, at t = 1, t = 2, and t = 3 respectively.

slide-12
SLIDE 12

Decomposing our Data Matrix to a set of basis vectors

These x vectors can be further decomposed to a set of basic source vectors, s, which are linearly independent.

  • x1 ⇒ S =
  • s1
  • s2

...

  • sn
  • ∈ Rn

...

  • xn ⇒ S =
  • s1
  • s2

...

  • sn
  • ∈ Rn
slide-13
SLIDE 13

The simplest source matrix

These source vectors can represented most simplistically as the n × n identity matrix: S =

  • s1
  • s2

...

  • sn
  • =

    1 ... 1 ... ... ... ... ... 1     = I Although through ICA algorithms we are able to produce a more functional, and efficient, matrix of source data.

slide-14
SLIDE 14

Defining a Coefficient Matrix

The decomposition of input data to a basis vector allows us to define a coefficient, a, to each source vector s which corresponds to the values of the data point vectors, x.

  • x =

n

  • i=1

ait si = S a By collecting the vectors we are able to produce the equation form: X = SA X is our collection of data point vectors in an n × t matrix, S is a collection of decomposed basis vectors in an n × n matrix, and A is the collection of coefficient vectors in an n × t matrix.

slide-15
SLIDE 15

Exampling a Coefficient Matrix

In our example, we will be using our previously defined X matrix, and defining our source matrix as the identity matrix for ease of calculation.

  • x1 =

  1 2 1   = S a1 =   1 1 1     1 2 1  

  • x2 =

  1 2   = S a2 =   1 1 1     1 2  

  • x3 =

  1 1   = S a3 =   1 1 1     1 1  

slide-16
SLIDE 16

Example Matrices

And with our example matrix: X = SA =   1 1 1     1 1 2 2 1 1 1   At this point data can undergo Principle Component Analysis to reduce the matrices ranks and make the processing more efficient.

slide-17
SLIDE 17

Principle Component Analysis

Principle Component Analysis, or PCA, is used to determine a more accurate matrix by distinguishing a more efficient matrix to express the source vectors and to identify noise vectors. This allows for reduction in dimensionality, making the data more

  • concise. Our original data,

x vectors, are thought to be composed

  • f two main portions: a source vector,

s, and a noise vector, b.

  • x =

s + b This is done by examining correlations between the components of

  • ur

x vectors to distinguish sources with the largest variance, which would denote unique signals. A co-variance matrix is mapped which relates the correlations between our x channels.

slide-18
SLIDE 18

Further delineations of PCA

Noise and sources are arbitrarily defined. While ICA allows for each component to be distinguished, PCA maps channel correlations and is often utilized as a pre-processing step before ICA algorithms are ran. The first step in PCA of EEG data is to find a “covariance matrix.”

slide-19
SLIDE 19

Finding a Co-variance Matrix

Co-variance is defined as the expected change between two data points across time. This allows researchers to see the correlations between the data points. Correlation between data points is presumed to represent noise and redundancy. To begin finding a co-variance matrix it is assumed that the data is “zero mean,” or that the mean of each channel has been removed. Such data would fulfill the following equation: 1 t

t

  • j=1

[X]ij = 0, where i = 1, ..., n Zeroing the mean has a number of benefits in data analysis, coefficients represent each data point more accurately as correlation is minimized, and allows for a superposition of data points which more acutely reflects their variance.

slide-20
SLIDE 20

Zeroing the mean of our example

Our example matrix therefore becomes: 1 3

3

  • t=1

[X]1t = 1 3(1 + 1 + 0) = 2 3 ⇒ xT

1 =

  • 1 − 2

3

1 − 2

3

0 − 2

3

  • =

1

3 1 3

− 2

3

  • 1

3

3

  • t=1

[X]2t = 1 3(2 + 2 + 1) = 5 3 ⇒ xT

2 =

  • 2 − 5

3

2 − 5

3

1 − 5

3

  • =

1

3 1 3

− 2

3

  • 1

3

3

  • t=1

[X]3t = 1 3(1 + 0 + 1) = 2 3 ⇒ xT

3 =

  • 1 − 2

3

0 − 2

3

1 − 2

3

  • =

1

3

− 2

3 1 3

slide-21
SLIDE 21

Our new zero-mean matrix

X =   

1 3 1 3

− 2

3 1 3 1 3

− 2

3 1 3

− 2

3 1 3

  

slide-22
SLIDE 22

Defining a covariance Matrix

The co-variance matrix itself is defined by the following equation: Σ = 1 t

t

  • j=1
  • xi

xi

T = 1

t XXT When Σij = 0 The channels xi and xj are entirely uncorrelated.

slide-23
SLIDE 23

Our example covariance matrix

Our example co-variance matrix would be computed as follows: Σ = 1 3   

1 3 1 3

− 2

3 1 3 1 3

− 2

3 1 3

− 2

3 1 3

     

1 3 1 3 1 3 1 3 1 3

− 2

3

− 2

3

− 2

3 1 3

   Σ = 1 3   

2 3 2 3

− 1

3 2 3 2 3

− 1

3

− 1

3

− 1

3 2 3

   Σ =   

2 9 2 9

− 1

9 2 9 2 9

− 1

9

− 1

9

− 1

9 2 9

   Ergo in our example all the channels are correlated in some manner.

slide-24
SLIDE 24

Decomposing the matrix to its eigensystem

By diagonalizing the co-variance matrix we are able to minimize redundancy and maximize the interesting dynamics. Since the interesting dynamics we are searching for is typically limited, this can allow for a reduction in the dimensionality of the matrix. To begin, we find the eigenvalues, λ, and eigenvectors, v, of Σ: Σ =

n

  • i=1

λi vi vT

i

= VΛVT

slide-25
SLIDE 25

Our examples eigensystem

And with our example Σ: Σ =  

1 2(−1 −

√ 3)

1 2(−1 +

√ 3) −1

1 2(−1 −

√ 3)

1 2(−1 +

√ 3) 1 1 1    

1 9(3 +

√ 3)

1 9(3 −

√ 3)   ∗  

1 2(−1 −

√ 3)

1 2(−1 −

√ 3) 1

1 2(−1 +

√ 3)

1 2(−1 +

√ 3) 1 −1 1  

slide-26
SLIDE 26

Decomposition allows for a reduction in rank

Σ is only full rank if and only if all of the eigenvalues are positive. If some of the eigenvalues are equal to zero, we are able to reduce the rank of the matrix to r, or the number of non-zero eigenvalues.

  • x =

r

  • i=1
  • ai

vi = S a Typically this reduction makes processing more efficient and quicker, without any loss of potentially important data. The vectors are collected to the familiar form: X = SA Where X is our n × t data matrix, S is our n × r matrix of relevant, e.g. non-zero, eigenvectors, and A is our r × t co-efficient matrix, which points to our x vectors locations in our reduced r-dimensional space.

slide-27
SLIDE 27

Whitening our Example

The whitening process as defined by our example: X∗ =  

3

3+ √ 3 3

3− √ 3

  1

2(−1 −

√ 3)

1 2(−1 −

√ 3) 1

1 2(−1 +

√ 3)

1 2(−1 +

√ 3) 1  

1 9 1 9

− 2

9 1 9 1 9

− 2

9

− 2

9

− 2

9 1 9

  =   − 1

3

  • 3 +

√ 3 − 1

3

  • 3 +

√ 3

  • 1

2 + 5 6 √ 3

− 1

3

  • 9 + 5

√ 3 − 1

3

  • 9 + 5

√ 3

  • 11

6 + 19 6 √ 3

 

slide-28
SLIDE 28

A breakdown of the ICA process

At this point, the data is ready to be ran through Independent Component Analysis. ICA constructs a neural network which learns from the given parameters to separate data into the signals we want, and noise. Ergo, the exact algorithm used would vary based upon desired results. Although there are many, many different ICA algorithms, there are two main ones used in the analysis of EEG data: Infomax, and FastICA. These ICA algorithms take the channel data, x∗, the source data,

  • s, and the coefficient matrix, A∗ of the following equation:
  • x∗ = A∗

s And removes the unwanted noise data, b

slide-29
SLIDE 29

The Infomax Algorithm

The Infomax algorithm was first utilized to translate EEG data in 1995 by Bell and Sejnowski. It acts as an unsupervised learning algorithm which is based on maximizing the entropy, as it is indication of randomness in a variable, of the output produced by the linear transformations. Maximizing the entropy therefore minimizes the mutual correlations between channels. Infomax relies

  • n finding the probability density function, e.g. the likelihood of
  • ur source vectors,

s, falling within a particular range. This constructs a mixing matrix from a learning algorithm, with every iteration the matrix becomes more precise. The mixing matrix is defined as follows:

slide-30
SLIDE 30

The Infomax Algorithm

∆W ∝ [(WT)−1 − ( y) xT, ∆ w ∝ y Where W is the mixing matrix, y is a linear transform of our source vectors, and x is our pre-processed data vector. At this point the data needs to be processed via computer, as the process is iterative and it would take years to calculate the process by hand, if it could even be done. The most common utilized software is MAT-LAB, although others have coded in python.

slide-31
SLIDE 31

FastICA

The methodologies behind FastICA are very similar to Infomax: create an artificial learning environment which assigns a weight factor to each piece of source data to minimize correlation. It involves finding a function and its derivatives to maximize non-Gaussianity.

slide-32
SLIDE 32

Example

Unfortunately, my MATLAB and python skills are lacking to produce my own example. Instead here are some EEG data which visualizes the difference an ICA algorithm makes.

slide-33
SLIDE 33

Visual representation of ICA process

slide-34
SLIDE 34

Conclusion

EEG analysis is very important to emerging technologies. Brain to computer interfaces rely heavily on EEG data interpretation as they are non-invasive and typically more comfortable for the users than MRI technologies. Although the current algorithms are fairly efficient, they still have long processing times and a slight delay. New algorithms are consistently being produced, and each has their own merit dependent on the task at hand.