Reconstruction Spatiotemporal Gene Expression from Partial - - PowerPoint PPT Presentation

reconstruction spatiotemporal gene expression from
SMART_READER_LITE
LIVE PREVIEW

Reconstruction Spatiotemporal Gene Expression from Partial - - PowerPoint PPT Presentation

Reconstruction Spatiotemporal Gene Expression from Partial Observations Dustin Cartwright 1 April 7, 2010 1 Joint with David Orlando, Siobhan Brady, Bernd Sturmfels, and Philip Benfey. Research supported by the DARPA project Fundamental Laws of


slide-1
SLIDE 1

Reconstruction Spatiotemporal Gene Expression from Partial Observations

Dustin Cartwright 1 April 7, 2010

1Joint with David Orlando, Siobhan Brady, Bernd Sturmfels, and Philip

  • Benfey. Research supported by the DARPA project Fundamental Laws of

Biology

slide-2
SLIDE 2

Arabidopsis root

slide-3
SLIDE 3

Arabidopsis root

Gene expression microarrays are a tool to understand dynamics and regulatory processes.

slide-4
SLIDE 4

Arabidopsis root

Gene expression microarrays are a tool to understand dynamics and regulatory processes. Two ways of separating cells in the lab:

◮ Chemically, using

18 markers (colors in diagram A)

slide-5
SLIDE 5

Arabidopsis root

Gene expression microarrays are a tool to understand dynamics and regulatory processes. Two ways of separating cells in the lab:

◮ Chemically, using

18 markers (colors in diagram A)

◮ Physically, using

13 longitudinal sections (red lines in diagram B)

slide-6
SLIDE 6

Measurement along two axes

◮ Markers measure variation among cell types.

slide-7
SLIDE 7

Measurement along two axes

◮ Markers measure variation among cell types. ◮ Longitudinal sections measure variation along developmental

stage.

slide-8
SLIDE 8

Measurement along two axes

◮ Markers measure variation among cell types. ◮ Longitudinal sections measure variation along developmental

stage. Na¨ ıve approach would use variation among each set of experiments as proxies for variation along each of the two axes.

slide-9
SLIDE 9

Problem with na¨ ıve approach

Correspondence between markers and cell types is imperfect.

slide-10
SLIDE 10

Problem with na¨ ıve approach

Correspondence between markers and cell types is imperfect. For example, the sample labelled APL consists of mixture of two cell types: cell type section phloem phloem companion cells 12

1 16 1 16

. . . . . . . . . 7

1 16 1 16

6

1 16

. . . . . . . . . 3

1 16

2 1 columella

slide-11
SLIDE 11

Problem with na¨ ıve approach

Similarly, the longitudinal sections do not have the same mixture of

  • cells. For example:

◮ In each of sections 1-5, 30-50% of the cells are lateral root

cap cells.

slide-12
SLIDE 12

Problem with na¨ ıve approach

Similarly, the longitudinal sections do not have the same mixture of

  • cells. For example:

◮ In each of sections 1-5, 30-50% of the cells are lateral root

cap cells.

◮ In sections 6-12, there are no lateral root cap cells.

slide-13
SLIDE 13

Problem with na¨ ıve approach

Similarly, the longitudinal sections do not have the same mixture of

  • cells. For example:

◮ In each of sections 1-5, 30-50% of the cells are lateral root

cap cells.

◮ In sections 6-12, there are no lateral root cap cells.

Conclusion: Need to analyze each transcript across all 31 (= 13 + 18) experiments to model the expression pattern in the whole root.

slide-14
SLIDE 14

Model

◮ Expression level for each combination of a cell type and a

section.

slide-15
SLIDE 15

Model

◮ Expression level for each combination of a cell type and a

section.

◮ Each marker and longitudinal section measures a linear

combination of these expression levels.

◮ The coefficients of these linear combinations are determined

by:

◮ Numbers of cells present in each section ◮ Marker selection patterns

slide-16
SLIDE 16

Model

◮ Expression level for each combination of a cell type and a

section.

◮ Each marker and longitudinal section measures a linear

combination of these expression levels.

◮ The coefficients of these linear combinations are determined

by:

◮ Numbers of cells present in each section ◮ Marker selection patterns

Under-constrained system: 31 (= 13 + 18) measurements and 129 expression levels.

slide-17
SLIDE 17

Assumption

Since the system is under-constrained, we make the following assumption:

slide-18
SLIDE 18

Assumption

Since the system is under-constrained, we make the following assumption:

◮ The dependence on the expression level on the section is

independent of the dependence on the cell type.

slide-19
SLIDE 19

Assumption

Since the system is under-constrained, we make the following assumption:

◮ The dependence on the expression level on the section is

independent of the dependence on the cell type.

◮ More precisely, the expression level in section i and type j is

xiyj for some vectors x and y.

slide-20
SLIDE 20

Assumption

Since the system is under-constrained, we make the following assumption:

◮ The dependence on the expression level on the section is

independent of the dependence on the cell type.

◮ More precisely, the expression level in section i and type j is

xiyj for some vectors x and y.

Example

If the expression level is either 0 or 1 (off or on), then our assumption says that it is 1 for the combination of some subset of the sections and some subset of the cell types.

slide-21
SLIDE 21

Non-negative bilinear equations

Equating the expression levels from the above model with actual

  • bservations gives a system of bilinear equations:
slide-22
SLIDE 22

Non-negative bilinear equations

Equating the expression levels from the above model with actual

  • bservations gives a system of bilinear equations:

xtA(1)y = o1 . . . xtA(k)y = ok x1 + · · · + xn = 1 (normalization) where A(1), . . . , A(k) n × m non-negative matrices (cell mixture)

  • 1, . . . , ok

positive scalars (expression levels)

slide-23
SLIDE 23

Non-negative bilinear equations

Equating the expression levels from the above model with actual

  • bservations gives a system of bilinear equations:

xtA(1)y = o1 . . . xtA(k)y = ok x1 + · · · + xn = 1 (normalization) where A(1), . . . , A(k) n × m non-negative matrices (cell mixture)

  • 1, . . . , ok

positive scalars (expression levels) We want approximate solutions with x and y non-negative vectors

  • f dimensions n × 1 and m × 1 respectively.
slide-24
SLIDE 24

Kullback-Leibler divergence

Maximum likelihood estimation: Given a model (function f : Θ → Rk) and empirical counts for each of the k events, determine the parameters which maximize the probability of the counts given the model.

slide-25
SLIDE 25

Kullback-Leibler divergence

Maximum likelihood estimation: Given a model (function f : Θ → Rk) and empirical counts for each of the k events, determine the parameters which maximize the probability of the counts given the model. Equivalently, maximum likelihood parameters minimize the Kullback-Leibler divergence between the predicted distribution and the empirical distribution (= normalized counts): D(of (θ)) :=

k

  • ℓ=1
  • ℓ log

fℓ(θ)

slide-26
SLIDE 26

Kullback-Leibler divergence

Maximum likelihood estimation: Given a model (function f : Θ → Rk) and empirical counts for each of the k events, determine the parameters which maximize the probability of the counts given the model. Equivalently, maximum likelihood parameters minimize the Kullback-Leibler divergence between the predicted distribution and the empirical distribution (= normalized counts): D(of (θ)) :=

k

  • ℓ=1
  • ℓ log

fℓ(θ)

  • − oℓ + fℓ(θ)

With two additional terms, the generalized Kullback-Leibler divergence provides a measurement of the difference between any two positive vectors.

slide-27
SLIDE 27

Finding maximum likelihood parameters

Two statistical methods for finding maximum likelihood parameters:

◮ Expectation Maximization: reduce solving mixture model

(summation) to solving underlying equations.

◮ Iterative Proportional Fitting: solving log-linear (monomial)

equations.

slide-28
SLIDE 28

Expectation Maximization

Want to solve:

  • i,j

A(ℓ)

ij xiyj = oℓ for ℓ = 1, . . . , k

(1)

slide-29
SLIDE 29

Expectation Maximization

Want to solve:

  • i,j

A(ℓ)

ij xiyj = oℓ for ℓ = 1, . . . , k

(1)

◮ Start with guesses ˜

x, ˜ y

slide-30
SLIDE 30

Expectation Maximization

Want to solve:

  • i,j

A(ℓ)

ij xiyj = oℓ for ℓ = 1, . . . , k

(1)

◮ Start with guesses ˜

x, ˜ y

◮ Estimate contribution of (i, j) term of left side of equation 1

needed to obtain equality: eijℓ := A(ℓ)

ij ˜

xi ˜ yj

  • i′j′ A(ℓ)

i′j′˜

xi ˜ yj

slide-31
SLIDE 31

Expectation Maximization

Want to solve:

  • i,j

A(ℓ)

ij xiyj = oℓ for ℓ = 1, . . . , k

(1)

◮ Start with guesses ˜

x, ˜ y

◮ Estimate contribution of (i, j) term of left side of equation 1

needed to obtain equality: eijℓ := A(ℓ)

ij ˜

xi ˜ yj

  • i′j′ A(ℓ)

i′j′˜

xi ˜ yj

◮ Find approximate solution to system:

A(ℓ)

ij

  • xiyj ≈

eijℓ

slide-32
SLIDE 32

Expectation Maximization

Want to solve:

  • i,j

A(ℓ)

ij xiyj = oℓ for ℓ = 1, . . . , k

(1)

◮ Start with guesses ˜

x, ˜ y

◮ Estimate contribution of (i, j) term of left side of equation 1

needed to obtain equality: eijℓ := A(ℓ)

ij ˜

xi ˜ yj

  • i′j′ A(ℓ)

i′j′˜

xi ˜ yj

◮ Find approximate solution to system:

A(ℓ)

ij

  • xiyj ≈

eijℓ

◮ Repeat until convergence

slide-33
SLIDE 33

Iterative Proportional Fitting

Want to minimize Kullback-Leibler divergence of:

A(ℓ)

ij

  • xiyj ≈

eijℓ

slide-34
SLIDE 34

Iterative Proportional Fitting

Want to minimize Kullback-Leibler divergence of:

A(ℓ)

ij

  • xiyj ≈

eijℓ Simplify: Aijxiyj ≈ eij for 1 ≤ i ≤ n, 1 ≤ j ≤ m.

slide-35
SLIDE 35

Iterative Proportional Fitting

Want to minimize Kullback-Leibler divergence of:

A(ℓ)

ij

  • xiyj ≈

eijℓ Simplify: Aijxiyj ≈ eij for 1 ≤ i ≤ n, 1 ≤ j ≤ m. Algorithm:

◮ Adjust ˜

xi: ˜ xi ← ˜ xi

  • j eij
  • j Aij˜

xi ˜ yj

◮ Adjust ˜

yi: ˜ yj ← ˜ yj

  • i eij
  • i Aij˜

xi ˜ yj

◮ Iterate until convergence

slide-36
SLIDE 36

Back to Arabidopsis root

Using this algorithm, we estimated the expression profiles of 30, 000 transcripts in several hours.

slide-37
SLIDE 37

Validation

A: reconstructed expression levels. B and C: same transcript visualized using green fluorescent protein (GFP).

slide-38
SLIDE 38

Generalization: positive root finding

The EM/IPF-based algorithm can be generalized to find exact or approximate positive solutions to polynomial systems of equations:

  • α∈S

aℓαxα = oℓ for ℓ = 1, . . . , k, where

◮ S is a finite set of exponent vectors, ◮ coefficients aℓα are all non-negative, ◮ the oℓ are positive, and ◮ a technical condition on the exponents (sufficient to be

homogeneous or multi-homogeneous).