A Robust Recursive Partitioning Algorithm for Mining Multiple - - PowerPoint PPT Presentation

a robust recursive partitioning
SMART_READER_LITE
LIVE PREVIEW

A Robust Recursive Partitioning Algorithm for Mining Multiple - - PowerPoint PPT Presentation

A Robust Recursive Partitioning Algorithm for Mining Multiple Populations Jose Alvir 1 Javier Cabrera 2 Frank Caridi 1 Ha Nguyen 1 Pfizer Inc 1 & Rutgers University 2 Rutgers Biostatistics Day, 4/25/2008 The Challenge of Personalized


slide-1
SLIDE 1

A Robust Recursive Partitioning Algorithm for Mining Multiple Populations

Jose Alvir1 Javier Cabrera2 Frank Caridi1 Ha Nguyen1 Pfizer Inc1 & Rutgers University2 Rutgers Biostatistics Day, 4/25/2008

slide-2
SLIDE 2
  • Drugs do not work for everybody
  • Certain drugs may work for certain

individuals compared to other drugs

  • Individuals may need more or less of a

drug than other individuals

The Challenge of Personalized Medicine

slide-3
SLIDE 3
  • Shift from individuals to groups of

individuals with similar characteristics

  • Search for subgroups where response is

maximal

  • Classification techniques like CART are

available

The Challenge of Personalized Medicine

slide-4
SLIDE 4

Pima Indians Diabetes Data Set

768 females at least 21 yrs old of Pima Indian heritage Variable Mean SD Number of times pregnant 3.8 3.4 Plasma glucose concentration 120.9 32 Diastolic blood pressure 69.1 19.4 Triceps skin fold thickness 20.5 16 2-Hour serum insulin 79.8 115.2 Body Mass Index 32 7.9 Diabetes pedigree function 0.5 0.3 Age 33.2 11.8 Diabetes 268/768

slide-5
SLIDE 5

Classic Example of CART: Pima Indians & Diabetes

  • 768 Pima Indian females, 21+ years old ; 268 tested positive to diabetes
  • 8 predictors: PRG, PLASMA, BP, THICK, INSULIN, BODY, PEDIGREE, AGE

| P LA S M A <127. 5 A G E <28. 5 B O D Y <30. 95 B O D Y <26. 35 P LA S M A <99. 5 P E D I G R E E <0. 561 B O D Y <29. 95 P LA S M A <145. 5 P LA S M A <157. 5 A G E <30. 5 B P <61

  • 0. 01325
  • 0. 17500
  • 0. 04878
  • 0. 18180
  • 0. 40480
  • 0. 73530
  • 0. 14630
  • 0. 51430
  • 1. 00000
  • 0. 32500
  • 0. 72310
  • 0. 86960
slide-6
SLIDE 6

ARF – Activity Region Finder

  • Identify High Activity Regions
  • Find regions where concentration of

“success” is highest, unlike other classification trees (e.g. CHAID, CART) that aim to predict response across the entire range

  • Splitting a node when there is substantial

evidence that the response is higher/lower in the child node (compared to the parent node)

  • Written in R
slide-7
SLIDE 7

Alvir J, Cabrera J, Caridi F, Nguyen H. Mining Clinical Trial

  • Data. In Knowledge Discovery and Data Mining:

Challenges and Realities with Real World Data, edited by Xingquan (Hill) Zhu and Ian Davidson, 2007

slide-8
SLIDE 8

DATASET n=768;p=35% PLASMA [155,199] n=122;p=80% PLASMA [128,152] n=153;p=49% BODY [29.9,45.7] n=92;p=88% AGE [29,56] n=199;p=35% BODY [30.3,67.1] n=99;p=64% PEDIGREE [0.344,1.394] n=55;p=96% PEDIGREE [0.439,1.057] n=38;p=82%

ARF applied to Pima Indian data

Subset %Success n 1 PLASMA in [155,199] & BODY in [29.9,45.7] & PEDIGREE in [0.344,1.394] 96.364 55 2 PLASMA in [128,152] & BODY in [30.3,67.1] & PEDIGREE in [0.439,1.057] 81.579 38 3 PLASMA in [0,127] & AGE in [29,56] 35.176 199

slide-9
SLIDE 9

Differences between CART & ARF trees

  • Best node for the CART tree has 9
  • bservations with 100% diabetes
  • ARF tree has a node of 55 observations

with 96% rate of diabetes

  • The node from CART has a high

probability of occurring by chance

  • ARF tree produces sketches that

summarize only important information and downplay less interesting information

slide-10
SLIDE 10

Ziprasidone Placebo Controlled Trials 4- & 6-wk U.S. trials

  • Protocol 104 – 4 weeks N=195
  • Protocol 106 – 4 weeks N=132
  • Protocol 114 – 6 weeks N=299
  • Protocol 115 – 6 weeks N=325

85 subjects on haloperidol excluded

Total N = 951

slide-11
SLIDE 11

Ziprasidone Data

N by dose (mg./day) & Protocol #

83 200 103 160 76 42 120 104 47 80 86 43 55 40 46 10 80 92 47 47 PBO 115 114 106 104

slide-12
SLIDE 12

Ziprasidone Data Mining Variables

Outcomes: Change in BPRS Total score Predictors: age, sex, race, protocol, dose, baseline clinical ratings (positive Sx, CGI-S, anergia, depressive Sx, AIMS), duration of illness in years, current smoking status

slide-13
SLIDE 13

Patient Characteristics Total = 951

Race 75 716 Smoker 10 97 Other 25 234 Black 65 620 White 74 700 Male % N

slide-14
SLIDE 14

Data Definitions

  • AIMS = mean of AIMS total/5 and TD severity
  • BPRS total & Sx scores (positive, depression,

anergia) – absolute minimum is zero (items scored with minimum = 0 and not 1)

  • Positive Sx score – sum of conceptual

disorganization, hallucinatory behavior, unusual thought content, suspiciousness

  • Depression – sum of anxiety, guilt feelings,

depressive mood

  • Anergia – sum of blunted affect, emotional withdrawal,

motor retardation

  • Residual BPRS change – Residual (observed minus

predicted) LOCF BPRS total regressed on baseline BPRS

slide-15
SLIDE 15

Patient Characteristics

14, 86 11.0 35.9 Baseline BPRS

  • 58, 55

13.4

  • 5.1

BPRS change

  • 45, 65

13.1 Residual change 18, 72 10.1 38.7 Age 0, 54 9.6 16.0 Duration of illness 0, 4 0.6 0.4 Baseline AIMS 0, 18 3.4 6.0 Baseline Anergia 3, 7 0.8 4.8 Baseline CGI-S 0, 17 3.3 5.5 Baseline Depression 4, 24 3.4 12.7 Baseline Positive Sx Range S.D. Mean

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
  • Can we identify subgroups for which the

drug is more effective than placebo or

  • ther drugs?
  • Are there subsets for which a low dose is

better than placebo?

  • Are there subsets for which a high dose is

better than a low dose or vice versa?

The Challenge of Personalized Medicine revisited

slide-19
SLIDE 19

Conventional tree methods can only answer these questions indirectly In conventional modelling:

  • The X space is defined by one sample
  • We estimate the conditional mean of a

response variable given a set of predictors.

slide-20
SLIDE 20

Comparative efficacy

Subsets where:

  • the drug is more effective than placebo or other drugs
  • low dose is better than placebo
  • high dose is better than a low dose or vice versa

In these situations:

  • The X space is defined by two or more samples.
  • We estimate the conditional difference of means or in general

a function of the conditional means.

  • We extend ARF to the differences between two or more

means

slide-21
SLIDE 21

47th Interscience Conference on Antimicrobial Agents and Chemotherapy Chicago,September 17-20, 2007

Symptom Resolution with Azithromycin Extended Release Versus Amoxicillin/Clavulanate in Patients with Acute Sinusitis in a General Practice Physician Environment

  • J. F. Piccirillo1, B. F. Marple2, C. S. Roberts3,
  • J. R. Frytak4, V. F. Schabert5, J. C. Wegner4,
  • H. Bhattacharyya3, S. P. Sanchez3

1 Washington University School of Medicine, St Louis, MO 2 University of Texas Southwestern Medical Center, Dallas, TX 3 Pfizer Inc, New York, NY 4 i3 Innovus, Eden Prairie, MN 5 Integral Health Decisions Inc, Santa Barbara, CA

slide-22
SLIDE 22

Sample Characteristics

slide-23
SLIDE 23

| BASEDEP< 2.5 URATILL>=16.5 BASEPOS< 15.5 BCGIS< 5.5 DURATILL< 7.5 DURATILL>=3.5 RACE=bde DURATILL>=13 ANERGIA>=8.5

  • 6.9380

n=37 5.4806 n=57

  • 8.5192

n=30 7.8737 n=27 0.8727 n=62

  • 1.0445

n=26 7.1425 n=94 13.8646 n=37 14.3053 n=31 12.5481 n=86

Ziprasidone: 120 mg/160 mg Vs Placebo MULTIRESPONSE CART

slide-24
SLIDE 24

5 10 15

  • 40
  • 20

20 x y

10 20 30 40 50

  • 30
  • 20
  • 10

10 x y 5 10 15 20

  • 40
  • 20

20 x y

BASEDEP BASEPOS DURATIL

Ziprasidone: 120 mg/160 mg Vs Placebo TOP THREE SPLITS

slide-25
SLIDE 25

Software

  • These two ARF applications are being

incorporated into PfarMineR, a suite of statistical methods for EDA and Data Mining

  • ARF is available at:

http://www.rci.rutgers.edu/~cabrera/dm/DM.html