A Robust Recursive Partitioning Algorithm for Mining Multiple - - PowerPoint PPT Presentation
A Robust Recursive Partitioning Algorithm for Mining Multiple - - PowerPoint PPT Presentation
A Robust Recursive Partitioning Algorithm for Mining Multiple Populations Jose Alvir 1 Javier Cabrera 2 Frank Caridi 1 Ha Nguyen 1 Pfizer Inc 1 & Rutgers University 2 Rutgers Biostatistics Day, 4/25/2008 The Challenge of Personalized
- Drugs do not work for everybody
- Certain drugs may work for certain
individuals compared to other drugs
- Individuals may need more or less of a
drug than other individuals
The Challenge of Personalized Medicine
- Shift from individuals to groups of
individuals with similar characteristics
- Search for subgroups where response is
maximal
- Classification techniques like CART are
available
The Challenge of Personalized Medicine
Pima Indians Diabetes Data Set
768 females at least 21 yrs old of Pima Indian heritage Variable Mean SD Number of times pregnant 3.8 3.4 Plasma glucose concentration 120.9 32 Diastolic blood pressure 69.1 19.4 Triceps skin fold thickness 20.5 16 2-Hour serum insulin 79.8 115.2 Body Mass Index 32 7.9 Diabetes pedigree function 0.5 0.3 Age 33.2 11.8 Diabetes 268/768
Classic Example of CART: Pima Indians & Diabetes
- 768 Pima Indian females, 21+ years old ; 268 tested positive to diabetes
- 8 predictors: PRG, PLASMA, BP, THICK, INSULIN, BODY, PEDIGREE, AGE
| P LA S M A <127. 5 A G E <28. 5 B O D Y <30. 95 B O D Y <26. 35 P LA S M A <99. 5 P E D I G R E E <0. 561 B O D Y <29. 95 P LA S M A <145. 5 P LA S M A <157. 5 A G E <30. 5 B P <61
- 0. 01325
- 0. 17500
- 0. 04878
- 0. 18180
- 0. 40480
- 0. 73530
- 0. 14630
- 0. 51430
- 1. 00000
- 0. 32500
- 0. 72310
- 0. 86960
ARF – Activity Region Finder
- Identify High Activity Regions
- Find regions where concentration of
“success” is highest, unlike other classification trees (e.g. CHAID, CART) that aim to predict response across the entire range
- Splitting a node when there is substantial
evidence that the response is higher/lower in the child node (compared to the parent node)
- Written in R
Alvir J, Cabrera J, Caridi F, Nguyen H. Mining Clinical Trial
- Data. In Knowledge Discovery and Data Mining:
Challenges and Realities with Real World Data, edited by Xingquan (Hill) Zhu and Ian Davidson, 2007
DATASET n=768;p=35% PLASMA [155,199] n=122;p=80% PLASMA [128,152] n=153;p=49% BODY [29.9,45.7] n=92;p=88% AGE [29,56] n=199;p=35% BODY [30.3,67.1] n=99;p=64% PEDIGREE [0.344,1.394] n=55;p=96% PEDIGREE [0.439,1.057] n=38;p=82%
ARF applied to Pima Indian data
Subset %Success n 1 PLASMA in [155,199] & BODY in [29.9,45.7] & PEDIGREE in [0.344,1.394] 96.364 55 2 PLASMA in [128,152] & BODY in [30.3,67.1] & PEDIGREE in [0.439,1.057] 81.579 38 3 PLASMA in [0,127] & AGE in [29,56] 35.176 199
Differences between CART & ARF trees
- Best node for the CART tree has 9
- bservations with 100% diabetes
- ARF tree has a node of 55 observations
with 96% rate of diabetes
- The node from CART has a high
probability of occurring by chance
- ARF tree produces sketches that
summarize only important information and downplay less interesting information
Ziprasidone Placebo Controlled Trials 4- & 6-wk U.S. trials
- Protocol 104 – 4 weeks N=195
- Protocol 106 – 4 weeks N=132
- Protocol 114 – 6 weeks N=299
- Protocol 115 – 6 weeks N=325
85 subjects on haloperidol excluded
Total N = 951
Ziprasidone Data
N by dose (mg./day) & Protocol #
83 200 103 160 76 42 120 104 47 80 86 43 55 40 46 10 80 92 47 47 PBO 115 114 106 104
Ziprasidone Data Mining Variables
Outcomes: Change in BPRS Total score Predictors: age, sex, race, protocol, dose, baseline clinical ratings (positive Sx, CGI-S, anergia, depressive Sx, AIMS), duration of illness in years, current smoking status
Patient Characteristics Total = 951
Race 75 716 Smoker 10 97 Other 25 234 Black 65 620 White 74 700 Male % N
Data Definitions
- AIMS = mean of AIMS total/5 and TD severity
- BPRS total & Sx scores (positive, depression,
anergia) – absolute minimum is zero (items scored with minimum = 0 and not 1)
- Positive Sx score – sum of conceptual
disorganization, hallucinatory behavior, unusual thought content, suspiciousness
- Depression – sum of anxiety, guilt feelings,
depressive mood
- Anergia – sum of blunted affect, emotional withdrawal,
motor retardation
- Residual BPRS change – Residual (observed minus
predicted) LOCF BPRS total regressed on baseline BPRS
Patient Characteristics
14, 86 11.0 35.9 Baseline BPRS
- 58, 55
13.4
- 5.1
BPRS change
- 45, 65
13.1 Residual change 18, 72 10.1 38.7 Age 0, 54 9.6 16.0 Duration of illness 0, 4 0.6 0.4 Baseline AIMS 0, 18 3.4 6.0 Baseline Anergia 3, 7 0.8 4.8 Baseline CGI-S 0, 17 3.3 5.5 Baseline Depression 4, 24 3.4 12.7 Baseline Positive Sx Range S.D. Mean
- Can we identify subgroups for which the
drug is more effective than placebo or
- ther drugs?
- Are there subsets for which a low dose is
better than placebo?
- Are there subsets for which a high dose is
better than a low dose or vice versa?
The Challenge of Personalized Medicine revisited
Conventional tree methods can only answer these questions indirectly In conventional modelling:
- The X space is defined by one sample
- We estimate the conditional mean of a
response variable given a set of predictors.
Comparative efficacy
Subsets where:
- the drug is more effective than placebo or other drugs
- low dose is better than placebo
- high dose is better than a low dose or vice versa
In these situations:
- The X space is defined by two or more samples.
- We estimate the conditional difference of means or in general
a function of the conditional means.
- We extend ARF to the differences between two or more
means
47th Interscience Conference on Antimicrobial Agents and Chemotherapy Chicago,September 17-20, 2007
Symptom Resolution with Azithromycin Extended Release Versus Amoxicillin/Clavulanate in Patients with Acute Sinusitis in a General Practice Physician Environment
- J. F. Piccirillo1, B. F. Marple2, C. S. Roberts3,
- J. R. Frytak4, V. F. Schabert5, J. C. Wegner4,
- H. Bhattacharyya3, S. P. Sanchez3
1 Washington University School of Medicine, St Louis, MO 2 University of Texas Southwestern Medical Center, Dallas, TX 3 Pfizer Inc, New York, NY 4 i3 Innovus, Eden Prairie, MN 5 Integral Health Decisions Inc, Santa Barbara, CA
Sample Characteristics
| BASEDEP< 2.5 URATILL>=16.5 BASEPOS< 15.5 BCGIS< 5.5 DURATILL< 7.5 DURATILL>=3.5 RACE=bde DURATILL>=13 ANERGIA>=8.5
- 6.9380
n=37 5.4806 n=57
- 8.5192
n=30 7.8737 n=27 0.8727 n=62
- 1.0445
n=26 7.1425 n=94 13.8646 n=37 14.3053 n=31 12.5481 n=86
Ziprasidone: 120 mg/160 mg Vs Placebo MULTIRESPONSE CART
5 10 15
- 40
- 20
20 x y
10 20 30 40 50
- 30
- 20
- 10
10 x y 5 10 15 20
- 40
- 20
20 x y
BASEDEP BASEPOS DURATIL
Ziprasidone: 120 mg/160 mg Vs Placebo TOP THREE SPLITS
Software
- These two ARF applications are being
incorporated into PfarMineR, a suite of statistical methods for EDA and Data Mining
- ARF is available at: