SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow - - PowerPoint PPT Presentation

samspectral efficient spectral clustering ffi i l l i on
SMART_READER_LITE
LIVE PREVIEW

SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow - - PowerPoint PPT Presentation

SamSPECTRAL: Efficient spectral clustering ffi i l l i on flow cytometry data Habil Zare PhD Candidate Terry Fox Laboratory, British Columbia Cancer Agency and Department of Computer Science, British Columbia University Vancouver, Canada


slide-1
SLIDE 1

SamSPECTRAL: ffi i l l i Efficient spectral clustering

  • n flow cytometry data

Habil Zare

PhD Candidate Terry Fox Laboratory, British Columbia Cancer Agency and Department of Computer Science, British Columbia University Vancouver, Canada

Joint work with: Parisa Shooshtari Supervisors: Dr Arvind Gupta Dr Ryan Brinkman Supervisors: Dr. Arvind Gupta, Dr. Ryan Brinkman & Dr. Andrew Weng

FlowCAP summit, September 2010

slide-2
SLIDE 2

High dimensionality High dimensionality

slide-3
SLIDE 3

Biology: Identifying Cell Populations Biology: Identifying Cell Populations Computer science: Clustering Data Points Computer science: Clustering Data Points

slide-4
SLIDE 4

Ch ll !

mathematics

Challenge!

Graph Theory

Spectral Clustering Spectral Clustering

slide-5
SLIDE 5

Technical,

Spectral Clustering

slide-6
SLIDE 6

?

Computational limitations:

  • 1000 events: Memory: 0 1 GB Time: 1 minute OK
  • 1000 events: Memory: 0.1 GB, Time: 1 minute OK
  • 100,000 events: Memory: 1000GB, rent for 50 years!
slide-7
SLIDE 7

Ch ll !

statistics

Challenge!

Sampling

uniformly

slide-8
SLIDE 8

Ch ll !

creativity

Challenge!

“Faithful” Sampling

faithful

slide-9
SLIDE 9

Comparison with uniform sampling: p p g

slide-10
SLIDE 10

Data reduction

slide-11
SLIDE 11

Technical,

“Faithful” Sampling

Faithful (Information Preserving) Sampling Algorithm , assuming the parameter h (neighborhood) is set:

  • a. Label all data points as unregistered.

b Pi k d i t d i d fi d ll i t d d

  • b. Pick a random unregistered point p and find all unregistered data

points within distance h from p. c Put all of these points in a set called community p and label them as

  • c. Put all of these points in a set called community p and label them as
  • registered. p is called the of this community.

d Repeat the above two steps until no unregistered points are left

  • d. Repeat the above two steps until no unregistered points are left.
slide-12
SLIDE 12

Rare Populations:

cancer stem cells detection of fetal cells in maternal blood leukemia and malaria diagnosis etc cancer stem cells, detection of fetal cells in maternal blood, leukemia and malaria diagnosis , etc.

  • Consists of only 0.1% to 2% of total events
  • SamSPECTRAL distinguished in 27/34 (79%) samples correctly.
  • Successful on all population greater than 0.15%
  • FLAME [11/34 (32%)] ]and flowMerge [9/34 (26%)]
slide-13
SLIDE 13

Other Applications:

vaccine design, Leukemia classification, lymphoma diagnosis, ….

300 * 7 * 5 1 month SamSPECTRAL: 1day

follicular

SLL

slide-14
SLIDE 14

Automatic identification of cell population for lymphoma diagnosis

  • Tube: CD19,CD5 and CD3
  • 5 dimensional clustering by SamSPECTRAL

SLL

follicular

slide-15
SLIDE 15

100 Patients

  • DLBC
  • Follicular
  • MCL
  • SLL
slide-16
SLIDE 16

MCL vs SLL

Three novel phenotypes for deferential diagnosis between MCL and SLL

Verified on 110 lymphoma patients

capable of correctly discriminating:

  • all the 43/43 (100%) MCL cases
  • 65/67 (98%) SLL cases

65/67 (98%) SLL cases previously known flow cytometry signatures: 27/43 (63%) MCL

  • 27/43 (63%) MCL
  • 48/67 (72%) SLL cases
slide-17
SLIDE 17

FlowCAP Results:

slide-18
SLIDE 18

Future Work:

  • Improving SamSPECTRAL (spectral clustering is flexible)
  • Using SamSPECTRAL to make biological discoveries

g g

(Analysis of thousands of lymphoma and leukemia patients is now possible to build subtype classifier & for discovery of novel biomarkers)

  • Facilitating clinical diagnosis based on flow cytometry
  • Facilitating clinical diagnosis based on flow cytometry

Biologist Collaborators are Most welcome!

slide-19
SLIDE 19

Reference:

Data reduction for spectral clustering to analyze high throughput flow cytometry data

Thanks to Brinkman lab Thanks to Brinkman lab

And: The MITACS Network of Centres of Excellence, Canadian Cancer Society grant #700374, and NIH/NIBIB grant EB008400

slide-20
SLIDE 20

Supplementary slides …

slide-21
SLIDE 21

Information retrieval and number of spectral clusters

slide-22
SLIDE 22

Comparative Results (1):

slide-23
SLIDE 23

Comparative Results(2) :

slide-24
SLIDE 24

14

Resolution: 10

cells

slide-25
SLIDE 25

Microscope

17th century

Flow Cytometer

20th century 20th century

slide-26
SLIDE 26

Difficulty in understanding high dimensional data for human:

slide-27
SLIDE 27

Limitations of manual gating:

  • Time consuming
  • Which dimension to gate first?
  • “Unknown” populations, yet potentially interesting from

biological and clinical point of view Challenges of computer-based clustering:

  • “small” populations
  • Adjacent populations
  • Non-elliptical shape populations