Transfer Learning for Auto-gating of Flow Cytometry Data Gyemin Lee - - PowerPoint PPT Presentation

transfer learning for auto gating of flow cytometry data
SMART_READER_LITE
LIVE PREVIEW

Transfer Learning for Auto-gating of Flow Cytometry Data Gyemin Lee - - PowerPoint PPT Presentation

Transfer Learning for Auto-gating of Flow Cytometry Data Gyemin Lee Lloyd Stoolman Clayton Scott University of Michigan ICML 2011 Workshop on Unsupervised and Transfer Learning July 2, 2011 Lee, Stoolman, Scott (University of Michigan) TL


slide-1
SLIDE 1

Transfer Learning for Auto-gating of Flow Cytometry Data

Gyemin Lee Lloyd Stoolman Clayton Scott University of Michigan ICML 2011 Workshop on Unsupervised and Transfer Learning July 2, 2011

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 1 / 13

slide-2
SLIDE 2

Flow Cytometry

A technique for rapidly quantifying physical and chemical properties of large numbers of cells.

e.g. size, shape, and fluorescent antigen attributes Applications : diagnosis of blood-related diseases such as acute leukemia, chronic lymphoproliferative disorders and malignant lymphomas

FS SS CD45 CD4 CD8 CD3 790 626 592 177 252 303 496 477 675 485 306 383 684 553 548 180 325 322 681 588 563 221 258 272 632 565 531 134 41 ... ... ... ... ... ...

Each column corresponds to a measured feature Each row corresponds to a cell 10, 000 ∼ 100, 000 cells/rows for an experiment

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 2 / 13

slide-3
SLIDE 3

Gating

Typical flow cytometry data analysis involves visualizing multiple 2-dimensional scatter plots and manually selecting subset of cells from the scatter plots. ⇓ gating ⇒ assigning binary labels yi ∈ {−1,1} to every cell xi

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 3 / 13

slide-4
SLIDE 4

Gating

The distribution of cell populations differs from patient to patient.

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 4 / 13

slide-5
SLIDE 5

Automated Gating

Problems of manual gating

labor-intensive and time-consuming highly subjective and not standardized modern clinical laboratories see dozens of cases per day

⇒ highly desirable to automate “gating” Automated gating

In flow cytometry data analysis, more than 70% of studies focused on automated gating techniques 1. In automatic gating, majority of approaches rely on unsupervised clustering/mixture modeling.

1Bashashati & Brinkman, 2009

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 5 / 13

slide-6
SLIDE 6

Automated Gating

Problems of manual gating

labor-intensive and time-consuming highly subjective and not standardized modern clinical laboratories see dozens of cases per day

⇒ highly desirable to automate “gating” Automated gating

In flow cytometry data analysis, more than 70% of studies focused on automated gating techniques 1. In automatic gating, majority of approaches rely on unsupervised clustering/mixture modeling.

1Bashashati & Brinkman, 2009

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 5 / 13

slide-7
SLIDE 7

Auto-gating as a Transfer Learning Problem

Given

M labeled source datasets Dm = {(xm,i, ym,i)}Nm

i=1 ∼ Pm for m = 1, . . . , M

an unlabeled target dataset T = {xt,i}Nt

i=1 ∼ Pt

Goal : assign labels {̂ yt,i}Nt

i=1 to T with low misclassification D1 D2

DM T

{̂ yt,i}Nt

i=1 Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 6 / 13

slide-8
SLIDE 8

Our Approach (1/2)

Consider linear decision functions ftest(x) = ⟨w,x⟩ + b ≷ 0

  • 1. Summarize expert knowledge fm from each of the M source dataset Dm

to build a baseline classifier f0. D1 ⇒ f1 D2 ⇒ f2 ⋮ ⋮ DM ⇒ fM ⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⇒ f0 = ⟨w0,x⟩ + b0 ≷ 0 (baseline)

where fm ∶ (wm, bm) ← SVM(Dm), m = 1, . . . , M f0 ∶ (w0, b0) ← robust mean({(wm, bm)}m)

f0

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 7 / 13

slide-9
SLIDE 9

Our Approach (2/2)

  • 2. Transfer the knowledge by adapting f0 to the target task T based on the

low-density separation principle. f0 T } ⇒ ft = ⟨wt,x⟩ + bt ≷ 0

Adjust the hyperplane parameters (w, b) so that the decision boundary passes through a region where the marginal density of T is low. Find (wt, bt) near (w0, b0) that minimizes the number of data points inside the margin

Nt

i=1

I{∣⟨wt, xt,i⟩ + bt∣ ∥wt∥ < ∆}

ft

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 8 / 13

slide-10
SLIDE 10

Auto-gating Example

Comparison of the gating from the baseline (f0) and the proposed transfer learning (ft) to the gating by the expert (true). true f0 ft

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 9 / 13

slide-11
SLIDE 11

Experiments - setup

5 10 15 20 25 30 35 2 4 6 8 10 x 10

4

Case Number of Cells total Cells (+) labeled Cells

35 peripheral blood datasets are provided by the Department of Pathology, University of Michigan Leave-One-Out Setting

choose a dataset as a target task T hide the labels of T treat the other datasets as source tasks Dm, m = 1, . . . , 34

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 10 / 13

slide-12
SLIDE 12

Experiments - results

Our Transfer Learning Approach

f0 : baseline classifier with no adaptation ft : classifier adapted to T by varying both the direction and the bias

Reference Approaches

Pooling : merge all the source data, and learn a classifier on this dataset Oracle : standard SVM with the true labels of the target task data

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 11 / 13

slide-13
SLIDE 13

Experiments - results

Pool f0 ft Oracle avg 9.81 3.70 2.49 2.12 std err 1.68 0.54 0.30 0.27

⇒ Our strategy can successfully replicate what experts do in the field without labeled training set for the target task.

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 12 / 13

slide-14
SLIDE 14

Conclusion and Forthcoming work

Conclusion

We cast flow cytometry auto-gating as a transfer learning problem. By combining the transfer learning and the low-density separation criterion for class separation, our strategy can leverage expert-gated datasets for the automatic gating of a new unlabeled dataset.

Forthcoming work

General kernel-based framework Generalization error analysis Joint with Gilles Blanchard

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 13 / 13

slide-15
SLIDE 15

Our Approach - detail

2-1. Varying bias

For a grid of biases {sj}, count points inside the margin

cj ←

Nt

i=1

I{∣⟨w, xt,i⟩ + b − si∣ ∥w∥ < ∆}, ∀j : count ̂ p(z) ← ∑

j

cj δ(z − sj) ∗ 1 √ 2πh exp (− z2 2h2 ) : smooth z∗ ← gradient descent (̂ p(z), 0) : find minimizing bias bnew ← b − z∗ : update bias

2-2. Varying normal vector

Let wt = w0 + atvt where vt = eig(cov([w1, . . . , wM])). For a grid of the amount of changes {ak}, count points inside the margin

ck ←

Nt

i=1

I{∣⟨w0 + akvt, xt,i⟩ + b∣ ∥w0 + akvt∥ < 1} : count g(a) ← ∑

k

ck δ(a − ak) ∗ 1 √ 2πh exp (− a2 2h2 ) : smooth at ← gradient descent (g(a), 0) : find minimizing at wnew ← w0 + atvt : update direction

Lee, Stoolman, Scott (University of Michigan) TL for Auto-gating of Flow Cytometry Data ICML 2011 workshop (July 2, 2011) 13 / 13