[PPT] - Statistics in high- -content biology content biology Statistics in PowerPoint Presentation

SLIDE 1

Statistics in high Statistics in high-

content biology

content biology

Rebecca Walls Rebecca Walls

Advanced Science & Technology Laboratory Advanced Science & Technology Laboratory

SLIDE 2

2 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Outline Outline

Introduction and aim of high-content biology
Predicting liver toxicity in vivo

in vivo

Distinguishing distinct modes of compound action

SLIDE 3

3 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Current issues facing the pharmaceutical Current issues facing the pharmaceutical industry industry

All pharmaceutical companies face high attrition of

compounds through the discovery and development process

Two key issues that face project progression are
Safety and toxicity

Safety and toxicity

Efficacy in disease process

Efficacy in disease process

Need to know more about the mechanism of action

and toxicity of our compounds at an earlier stage in the discovery process

More information enables front-loading of risk, early

go/no-go decisions and improvements in toxicological attrition

SLIDE 4

4 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

High High-

content biological assays

content biological assays

Attempt to use in vitro

in vitro cell models to mimic the complexity of an in vivo in vivo situation

Advanced imaging techniques used to generate

large, complex datasets describing the response of a population of cells to a compound

Aim is to build predictive models or ‘fingerprints’

from the multiparametric assay data for well- characterised compounds that elicit known responses

Fingerprints applied to new drugs to predict

biological mechanism of action and its toxicity

SLIDE 5

5 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Cell culture Cell culture

Cells Media layer

Cells are extracted from some source

tissue e.g. rat hepatocytes, tumour derived cell-lines

Cells are plated into

multi-well plates, typically hundred or thousands of cells per well

Each well is like test tube

where we can test a single prototype drug

Cells grown in

the well can be labelled and imaged

SLIDE 6

6 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

HCB cellular profiling HCB cellular profiling

Nucleus Nucleus DNA content Size Shape Cell division Fragmentation Micronuclei ER/ ER/Golgi Golgi Protein trafficking Secretion Mitochondria Mitochondria Viability Mass Activity Cellular distibution Pre-Apoptotic indicators Cytoskeleton Cytoskeleton Tubulin Actin Fibre content Length Mitotic arrest Apoptosis Apoptosis Membrane markers Blebbing Necrosis Cell Morphology Cell Morphology Count Area Form Roundness Length/Breadth Perimeter

General imaging indicators General imaging indicators

SLIDE 7

7 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Statistical challenges Statistical challenges

Information captured for each feature is a dynamic

response to the compound over an 8-point dose- range

Datasets possess three-dimensional cube-like

structure

Traditional multivariate

approaches are difficult to apply to this type of data directly

FEATURES FEATURES COMPOUNDS COMPOUNDS DOSES DOSES

SLIDE 8

8 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Case study 1: Predicting liver toxicity Case study 1: Predicting liver toxicity in in vivo vivo

Drug-induced liver toxicity is one of the most common causes of drug non-

approval

Early in vitro identification of compounds with hepatotoxic risk would allow their

de-selection early in the drug development process

In the animal In the lab

Cell Death (Necrosis) Fatty Liver (Steatosis) Phospholipidosis Cholestasis

SLIDE 9

9 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Predicting Predicting steatosis steatosis -

data

data

Primary rat hepatotcytes treated with 60

compound set at a range of doses, consisting of known steatotics and non-steatotics

Bespoke algorithms designed to quantify

differences in localisation and morphology of lipid droplets in the cells

Generates 32 different continuous measurements

per cell

Averaged over cell population to give well-level

measurements for each compound and dose combination

Use partial least squares modelling (stepwise)

with the steatotic annotation as a binary response

Dose HT0053

5000

5000 10000 15000 20000 0.5 1 5 10 50 100 500 1000 5000 10000 Dose HT1042

5000

5000 10000 15000 20000 0.5 1 5 10 50 100 500 1000 5000 10000 Dose HT1102

5000

5000 10000 15000 20000 0.5 1 5 10 50 100 500 1000 5000 10000

SLIDE 10

10 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Polynomial model Polynomial model

Fit cubic polynomial to dose-response data for each feature
t-statistics for each term in cubic form a new set of variables
Only a small number of variables required to generate

greatest predictivity

After cross-validation, polynomial model is approximately

10% better than range model

2
1

1 2

0.06 -0.04 -0.02 0.00 0.02 0.04

0.06 x y

Proportion of edge fat Proportion of edge fat – – non non-

steatotic

steatotic

2
1

1 2 0.00 0.05 0.10 x y

Proportion of edge fat Proportion of edge fat – – steatotic steatotic

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Specificity Sensitivity 50 variables 1 variable

SLIDE 11

11 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Advantages of model Advantages of model

Based on predictive scores, compounds can be ranked in order of steatotic effect
Bootstrapping, incorporating random x-resampling, used to generate 95% confidence

intervals for the predicted score

High confidence, high steatotic effect compounds can be de-selected

1 4 7 11 15 19 23 27 31 35 39 43 47 51 55 59

1.0
0.5

0.0 0.5 1.0 1.5 Compounds Steatotic effect

SLIDE 12

12 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Case study 2: Identifying distinct modes Case study 2: Identifying distinct modes

f compound action
f compound action
Morphology high content assay developed specifically to

examine microtubules and actin filaments as oncology targets –

Describes how drugs influence entire complex cellular phenotype

(i.e. multiple targets)

102 compounds screened through the morphology assay
Primary aims are
Identify which compounds are active in the assay i.e. which

Identify which compounds are active in the assay i.e. which are ‘hits’? are ‘hits’?

Differentiate compound hits that have distinct morphological

Differentiate compound hits that have distinct morphological effects effects

Cluster hits together that have similar effects

Cluster hits together that have similar effects

138 features for each compound, tested over 8 doses
310 control wells

SLIDE 13

13 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Principal components analysis Principal components analysis

PCA used in an attempt to reduce

dimension of dataset, yielding 6 principal components which explain close to 80%

f variation
Mahalanobis distance is powerful means
f determining how similar an unknown

sample is to a known one

Differs from Euclidean distance in that it

takes into account the covariance between variables

The Mahalanobis distance from a group
f values with mean μ=(μ1, μ2, …, μp)T

and covariance matrix Σ for multivariate vector x=(x1, x2, …, xp)T is defined as

) ( ) ( ) (

1

μ μ − Σ − =

− x

x x DM

SLIDE 14

14 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Using the Mahalanobis distance Using the Mahalanobis distance

20 40 60 80 100 120 0.00 0.05 0.10 0.15

Squared Mahalanobis distances

Density

Non Non-

hits

hits Hits Hits

Working on the PCA scores on the 6 principal components,

the covariance matrix of the control cloud was calculated

For each compound at every dose, the squared

Mahalanobis distance to the centre of mass was calculated and compared to a chi-squared distribution with 6 degrees of freedom at some pre-chosen significance level, α.

An adjustment was made to control the false discovery rate
A compound with a significant result at at least

at least one of the doses along its range was deemed to be an ‘active hit’.

SLIDE 15

15 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Distinguishing distinct phenotypes Distinguishing distinct phenotypes

Buffer Compound A Compound B Compound C Compound D Compound E Compound F F Compound G

Homogeneous nuclei

Homogeneous nuclei and cell shape and cell shape

Stabilised cell

Stabilised cell-

cell junctions

cell junctions – – results in ‘clumpy’ cells results in ‘clumpy’ cells

No single cells

No single cells

Aneuploidy

Aneuploidy – – big nuclei big nuclei

Increased cell size

Increased cell size

SLIDE 16

16 Rebecca Walls, Non-Clinical Statistics Conference 2008, Leuven

Acknowledgements Acknowledgements

Discovery Statistics

Discovery Statistics

Chris

Chris Harbron Harbron

Advanced Science and Technology Laboratory

Advanced Science and Technology Laboratory

Ed

Ed Ainscow Ainscow

Neil

Neil Carragher Carragher

Andy

Andy Hargreaves Hargreaves

Mike Sullivan

Mike Sullivan

Helen Garside

Helen Garside

James Pilling

James Pilling

Lisa Rice

Lisa Rice

Tom

Tom Houslay Houslay

Peter

Peter Caie Caie

Alex

Alex Ingleston Ingleston-

Orme

Orme