Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 - - PowerPoint PPT Presentation

radina nikolic
SMART_READER_LITE
LIVE PREVIEW

Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 - - PowerPoint PPT Presentation

Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 BCIT BCIT Self Organizing Map (SOM) Kohonen (1981) network Idea von der Marlsburg Id d M l b Unsupervised learning Artificial neural network Artificial


slide-1
SLIDE 1

Radina Nikolic BCIT

flowCAP 2010 NIH, Sep. 21-22, 2010

BCIT

, Sep , 0 0

slide-2
SLIDE 2

Self‐Organizing Map (SOM)

Kohonen (1981) network

Id d M l b

Idea von der Marlsburg Unsupervised learning Artificial neural network Artificial neural network

Two layers High-dimensional to 2D Topology preserving

slide-3
SLIDE 3

Flow Cytometry (FCM) Data

Multidimensional

L d t t

Large data sets High-throughput Growing amount of data Growing amount of data Standard data format Data analysis

ata a a ys s

Cell population identification

slide-4
SLIDE 4

Why SOM for FCM?

Efficient for large high-dimensional data sets

N ti d l i d t di t ib ti

No assumptions on underlying data distributions Neurobiological background Simple but extensible mathematical model Simple but extensible mathematical model Widely used in various domains Few attempts to use in flow cytometry

e atte pts to use

  • cyto

et y

slide-5
SLIDE 5

Neurobiological Background

The most realistic computational model of brain

f ti functions

Paradigm to explain functional structures of the brain

Self organization Self-organization Adaptive features

Multidimensional sensory inputs in human cortex are

u d e s o a se so y pu s u a co e a e represented as 2D maps, topology conserving

To what extent it can be regarded as biophysical model?

slide-6
SLIDE 6

Mathematical Model

Initialization

  • Randomly generate synaptic weight vector values (wi)
  • Choose an initial learning rate e and neighbourhood function (dx)

Sampling

  • Randomly select cell from input data set (x) layer

Competition

  • Determine winning neuron k (best matching unit) in output layer

g ( g ) p y ||wk – x|| = mini ||wi – x||

Cooperation

  • find the neighbourhood neurons

Synaptic Adaptation

y p p

  • Update the weight vectors of the winning neuron and neighbours

wi = wi + e * h * dx (xj, wi)

Convergence

  • Check Criteria
slide-7
SLIDE 7

SOM Applications

Importance of applying SOM in a proper way

emphasized by Kohonen:

The SOM is a clustering, visualization and abstraction

method

For classification pattern recognition and decision For classification, pattern recognition and decision

support, Learning Vector Quantification (LVQ) should be used

For automatic feature extraction and invariant

detection use Adaptive-subspace SOM (ASSOM)

slide-8
SLIDE 8

Proposed Approach ‐ flowKoh

FCM data loading

flowCore method read.FCS If no BioConductor software read.cvs

Data pre-processing

Cl t i

Clustering

R kohonen package for SOM

Labels generated and saved

Labels generated and saved

Results visualization (optional)

slide-9
SLIDE 9

Kohonen Package Parameters

R kohonen package allows for learning in both

i d d k h

unsupervised mode kohonen::som supervised mode kohonen::bdk, kohonen::xyf

Number of iterations rlen = 100

Number of iterations rlen 100

Learning rate alpha = c(0.05, 0.01) Neighbourhood function (radius) – Default Map Topology – Default (rectangular)

slide-10
SLIDE 10

Clustering Results Obtained

Challenge 3 Three flowCAP datasets clustered

Dataset Samples Events Dims Runtime (ss) GvHD 12 14000 6 21.86 DLBCL 30 10000 5 28.26 StemCell 30 5000 5 26.70

slide-11
SLIDE 11

Behind the Map View

SOM presents a simplified view

  • f a highly complex data set

Each node in the map is one

cluster

All the data associated with a All the data associated with a

given node may be made available via that node.

Position on the map may be

representative of a wide variety

  • f variables
  • f variables
slide-12
SLIDE 12

Visualization and Feature Selection

GvHD Datasets FSC and SSC excluded Two populations

FL1 H/FL3 H

FL1.H/FL3.H FL2.H/FL3.H/FL4.H

Sample 11 – exception

Sa p e e cep o

Scientifically valid?

slide-13
SLIDE 13

Interpretation and Assessment

Simple to implement, difficult to analyze and interpret

H t th l t i lt ?

How accurate are the clustering results? How meaningful are the clusters identified? Does visualization help us in feature selection? Does visualization help us in feature selection? Stability (convergence) and map plasticity? Is there any correlation between patterns observed and

s t e e a y co e at o bet ee patte s obse ed a d biological outcome (diagnosis)?

slide-14
SLIDE 14

Resources

Kohonen T., Self-organizing Maps. Springer, May 2006. Von der Malsburg C., Self-organization of orientation

sensitive cells in the striate cortex. Kybernetik, 14:85(100), 1973.,

Wehrens R

Self and Super organizing Maps in R: The

Wehrens R., Self- and Super-organizing Maps in R: The

kohonen Package, Journal of Statistical Software, October 2007, Volume 21, Issue 5.

Willkins M. F., A comparison of some neural and non-

neural methods for identification of phytoplankton from flow cytometry data Computer Applications in the flow cytometry data. Computer Applications in the Biosciences, 12(1):9–18, 1996.

slide-15
SLIDE 15

flowCAP Initiative

From software development perspective

C ll b ti ith BCCA (SSL P j t)

Collaboration with BCCA (SSL Project) Critical for both new/existing algorithms

Standard dataset test cases Standard dataset test cases Evaluation criteria – objective assessment measure(s)

Important

Feature extraction Scientific validation – Guidelines

Set of criteria

how to be flowCAP compliant

Set of criteria – how to be flowCAP compliant

slide-16
SLIDE 16

Special Thanks

BCCA Terry Fox Lab Oxford Bioinformatics

Programme

BCIT BCIT NIH Summit Audiences

Su ud e ces