Use of FLOCK + Friedman-Rafsky (F- R) in Challenge 1 and 4 Mengya - - PowerPoint PPT Presentation

use of flock friedman rafsky f r in challenge 1 and 4
SMART_READER_LITE
LIVE PREVIEW

Use of FLOCK + Friedman-Rafsky (F- R) in Challenge 1 and 4 Mengya - - PowerPoint PPT Presentation

Use of FLOCK + Friedman-Rafsky (F- R) in Challenge 1 and 4 Mengya Liu, Southern Methodist University Rick Stanton, JCVI Richard Scheuermann, JCVI N01-AI40076 (BISC) U01-AI089859 (HIPC) R01-EB008400 (Gottardo R, PI) General cross sample


slide-1
SLIDE 1

Use of FLOCK + Friedman-Rafsky (F- R) in Challenge 1 and 4

Mengya Liu, Southern Methodist University Rick Stanton, JCVI Richard Scheuermann, JCVI

N01-AI40076 (BISC) U01-AI089859 (HIPC) R01-EB008400 (Gottardo R, PI)

slide-2
SLIDE 2

General cross sample comparison challenge

  • Algorithms like FLOCK identify data clusters in

multidimensional FCM data one file at a time

  • Would like to compare equivalent populations across

multiple samples

  • Previous approach
  • Either select a "representative" sample as a template
  • r concatenate data from multiple files
  • Generate centroid list using FLOCK
  • Cluster each sample file separately using centroid list
  • Problems associated with representative or concatenated
slide-3
SLIDE 3

Friedman-Rafsky (F-R) algorithm concept

  • Multivariate generalization of Wald Wolfowitz (WW) run test
  • WW is a non-parametric statistical test to determine if two populations have the

same distributions

  • Null hypothesis = both populations have same distributions
  • Label N total cells
  • m cells from populations A and
  • n cells from population B and combine
  • Sort
  • Test statistic is function of total runs R
  • Where R = N sequences of identical labels
  • Examples:
  • R = 2 for A A A A B B B B
  • R = 7 for A B A A B A B A
  • Null hypothesis rejected for small values of R
slide-4
SLIDE 4

Friedman-Rafsky (F-R) algorithm concept – Minimal Spanning Tree

(a) Pool samples of two sets (b) Calculate Minimal Spanning Tree (c) Remove edges linking different samples

Minimal Spanning Tree allows multivariate generalization

slide-5
SLIDE 5

F-R Advantages and Drawbacks

Advantages:

  • Non-parametric method – no need for knowledge of distribution

parameters

  • Ability to discriminate population characteristics that are

tough to describe parametrically (skew, odd shapes)

  • Can provide feedback to automated gating algorithms when the

number of populations is unknown.

  • Example, if two subpopulations in sample 1 are matched to one

same subpopulation in sample 2, it indicates that either sample 1 is

  • ver-partitioned sample 1 or we didn't partitioned sample 2 enough.

Drawbacks:

  • Computationally expensive, need to downsample
slide-6
SLIDE 6

Implementation of the F-R algorithm

For two samples:

  • Get the auto-gating results from FLOCK or any other auto-gating software
  • For every pair of populations, one from sample A and the other from sample B,
  • If either populations has more than 100 events (predetermined, changeable)
  • Take a random sample of 100
  • Apply the F-R test to the sampled population(s) to obtain the p-value
  • Repeat 20 times (predetermined, changeable)
  • Calculate the averaged p-value
  • Repeat the procedure for all pairs and obtain the p-value matrix
  • Set up a predetermined cutoff to identify the matched pair (may need to adjust

cutoff for different shifts)

slide-7
SLIDE 7

Simulation of data to characterize performance

Experimental data Simulated data

slide-8
SLIDE 8

Movements of simulation of data to characterize performance

slide-9
SLIDE 9

Movements of Simulation of data to characterize performance

slide-10
SLIDE 10

Use of FLOCK + Friedman-Rafsky (F-R) in Challenge 1 and 4

FLOCK can be accessed via Immport website

slide-11
SLIDE 11

Use of FLOCK + Friedman-Rafsky (F-R) in Challenge 1 and 4 Processing Steps

  • Identify populations with FLOCK
  • Map FLOCK populations to T Cell target populations for a

representative T Cell sample (target = Stanford 1)

slide-12
SLIDE 12

Use of FLOCK + Friedman-Rafsky (F-R) in Challenge 1 and 4 Processing Steps

  • Apply F-R algorithm to perform cross sample associations

across the other samples.

slide-13
SLIDE 13

Cross sample comparisons – challenge 4 T Cell data

Target data set (Stanford 1) compared with other datasets using P Values from the F-R test

slide-14
SLIDE 14

Future Directions

Better accommodate differences in gains across instruments (shifts, dialations) Evaluate and incorporate lessons learned here at Flowcap III