Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 - - PowerPoint PPT Presentation
Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 - - PowerPoint PPT Presentation
Radina Nikolic flowCAP 2010 NIH, Sep. 21-22, 2010 , Sep , 0 0 BCIT BCIT Self Organizing Map (SOM) Kohonen (1981) network Idea von der Marlsburg Id d M l b Unsupervised learning Artificial neural network Artificial
Self‐Organizing Map (SOM)
Kohonen (1981) network
Id d M l b
Idea von der Marlsburg Unsupervised learning Artificial neural network Artificial neural network
Two layers High-dimensional to 2D Topology preserving
Flow Cytometry (FCM) Data
Multidimensional
L d t t
Large data sets High-throughput Growing amount of data Growing amount of data Standard data format Data analysis
ata a a ys s
Cell population identification
Why SOM for FCM?
Efficient for large high-dimensional data sets
N ti d l i d t di t ib ti
No assumptions on underlying data distributions Neurobiological background Simple but extensible mathematical model Simple but extensible mathematical model Widely used in various domains Few attempts to use in flow cytometry
e atte pts to use
- cyto
et y
Neurobiological Background
The most realistic computational model of brain
f ti functions
Paradigm to explain functional structures of the brain
Self organization Self-organization Adaptive features
Multidimensional sensory inputs in human cortex are
u d e s o a se so y pu s u a co e a e represented as 2D maps, topology conserving
To what extent it can be regarded as biophysical model?
Mathematical Model
Initialization
- Randomly generate synaptic weight vector values (wi)
- Choose an initial learning rate e and neighbourhood function (dx)
Sampling
- Randomly select cell from input data set (x) layer
Competition
- Determine winning neuron k (best matching unit) in output layer
g ( g ) p y ||wk – x|| = mini ||wi – x||
Cooperation
- find the neighbourhood neurons
Synaptic Adaptation
y p p
- Update the weight vectors of the winning neuron and neighbours
wi = wi + e * h * dx (xj, wi)
Convergence
- Check Criteria
SOM Applications
Importance of applying SOM in a proper way
emphasized by Kohonen:
The SOM is a clustering, visualization and abstraction
method
For classification pattern recognition and decision For classification, pattern recognition and decision
support, Learning Vector Quantification (LVQ) should be used
For automatic feature extraction and invariant
detection use Adaptive-subspace SOM (ASSOM)
Proposed Approach ‐ flowKoh
FCM data loading
flowCore method read.FCS If no BioConductor software read.cvs
Data pre-processing
Cl t i
Clustering
R kohonen package for SOM
Labels generated and saved
Labels generated and saved
Results visualization (optional)
Kohonen Package Parameters
R kohonen package allows for learning in both
i d d k h
unsupervised mode kohonen::som supervised mode kohonen::bdk, kohonen::xyf
Number of iterations rlen = 100
Number of iterations rlen 100
Learning rate alpha = c(0.05, 0.01) Neighbourhood function (radius) – Default Map Topology – Default (rectangular)
Clustering Results Obtained
Challenge 3 Three flowCAP datasets clustered
Dataset Samples Events Dims Runtime (ss) GvHD 12 14000 6 21.86 DLBCL 30 10000 5 28.26 StemCell 30 5000 5 26.70
Behind the Map View
SOM presents a simplified view
- f a highly complex data set
Each node in the map is one
cluster
All the data associated with a All the data associated with a
given node may be made available via that node.
Position on the map may be
representative of a wide variety
- f variables
- f variables
Visualization and Feature Selection
GvHD Datasets FSC and SSC excluded Two populations
FL1 H/FL3 H
FL1.H/FL3.H FL2.H/FL3.H/FL4.H
Sample 11 – exception
Sa p e e cep o
Scientifically valid?
Interpretation and Assessment
Simple to implement, difficult to analyze and interpret
H t th l t i lt ?
How accurate are the clustering results? How meaningful are the clusters identified? Does visualization help us in feature selection? Does visualization help us in feature selection? Stability (convergence) and map plasticity? Is there any correlation between patterns observed and
s t e e a y co e at o bet ee patte s obse ed a d biological outcome (diagnosis)?
Resources
Kohonen T., Self-organizing Maps. Springer, May 2006. Von der Malsburg C., Self-organization of orientation
sensitive cells in the striate cortex. Kybernetik, 14:85(100), 1973.,
Wehrens R
Self and Super organizing Maps in R: The
Wehrens R., Self- and Super-organizing Maps in R: The
kohonen Package, Journal of Statistical Software, October 2007, Volume 21, Issue 5.
Willkins M. F., A comparison of some neural and non-
neural methods for identification of phytoplankton from flow cytometry data Computer Applications in the flow cytometry data. Computer Applications in the Biosciences, 12(1):9–18, 1996.
flowCAP Initiative
From software development perspective
C ll b ti ith BCCA (SSL P j t)
Collaboration with BCCA (SSL Project) Critical for both new/existing algorithms
Standard dataset test cases Standard dataset test cases Evaluation criteria – objective assessment measure(s)
Important
Feature extraction Scientific validation – Guidelines