Engineering Genetic Circuits Chris J. Myers Lecture 2: Learning - - PowerPoint PPT Presentation

engineering genetic circuits
SMART_READER_LITE
LIVE PREVIEW

Engineering Genetic Circuits Chris J. Myers Lecture 2: Learning - - PowerPoint PPT Presentation

Engineering Genetic Circuits Chris J. Myers Lecture 2: Learning Models Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 1 / 98 Johann Von Neumann The sciences do not try to explain, they hardly even try to interpret,


slide-1
SLIDE 1

Engineering Genetic Circuits

Chris J. Myers

Lecture 2: Learning Models

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 1 / 98

slide-2
SLIDE 2

Johann Von Neumann

The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed

  • phenomena. The justification of such a

mathematical construct is solely and precisely that it is expected to work.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 2 / 98

slide-3
SLIDE 3

Scott Adams

There are many methods for predicting the future. For example, you can read horoscopes, tea leaves, tarot cards, or crystal balls. Collectively, these methods are known as “nutty methods.” Or you can put well-researched facts into sophisticated computer models, more commonly referred to as “a complete waste of time.”

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 3 / 98

slide-4
SLIDE 4

Henri Theil

Models are to be used, not believed.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 4 / 98

slide-5
SLIDE 5

Introduction

First step of engineering approach is to use results form experiments to construct a mathematical model for the system of interest. DNA microarrays can measure expression levels of thousands of mRNA targets simultaneously. Today 100s of samples may be run in a microarray experiment. Given this data and an abstract class of potential models for various network configurations, how can we decide the most likely network configuration that generated this data? Even largest experiments do not provide enough samples for high statistical significance. Current technology has high noise to signal ratio.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 5 / 98

slide-6
SLIDE 6

Overview

Experimental methods Experimental data Cluster analysis Learning Bayesian networks Learning causal networks Experimental design

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 6 / 98

slide-7
SLIDE 7

Fluorescent Proteins

One of the most direct ways to see what is happening within a genetic circuit is to add a reporter gene that codes for a fluorescent protein. One example is green fluorescent protein (GFP) which comes from a jellyfish and fluoresces green when exposed to blue light. Since fluorescent proteins are typically not harmful to a cell, it allows the experimenter to observe the workings of a genetic circuit in a living cell. The downside is that there is only a small number of colors available making it impossible to observe more than a few genes at a time.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 7 / 98

slide-8
SLIDE 8

Fluorescent Protein Art

(Courtesy of Roger Tsien)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 8 / 98

slide-9
SLIDE 9

Fluorescent Proteins as a Static Reporter

(Courtesy of Guet et al., Science 2002)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 9 / 98

slide-10
SLIDE 10

Fluorescent Proteins as a Dynamic Reporter

(Courtesy of Michael Elowitz)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 10 / 98

slide-11
SLIDE 11

DNA Microarrays

In 1995, DNA microarrays were developed to allow the expression level of thousands of genes to be monitored simultaneously. They have led to an explosion in the amount of biological data generated. DNA microarrays are chips made of glass, plastic, or silicon. These chips have an array of thousands (or 10s of thousands) of single-stranded complementary DNA (cDNA) probes of about 20 bases. When strands of mRNA extracted from cells during an experiment are hybridized to complementary probes, they flouresce.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 11 / 98

slide-12
SLIDE 12

A DNA Microarray Experiment

(Courtesy of National Human Genome Research Institute).

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 12 / 98

slide-13
SLIDE 13

Time Series Data

Used to measure gene expression levels which are correlated with the levels of mRNA present. During an experiment, several microarray measurements can be taken at different times to generate time series data. This provides data on changes in expression patterns over time in response to a stimulus. While microarrays allow substantially more genes to be observed than with fluorescent proteins, cells must be destroyed to extract their mRNA. Therefore, time series data is for population rather than single cell. Another limitation is that expression level does not precisely track protein levels since they are dependent on degradation rates and other factors.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 13 / 98

slide-14
SLIDE 14

Proteomics

There are about 25,000 genes in the human genome, but there may be more than 500,000 proteins in the human proteome. Many transcripts must result in many different proteins through alternative splicing and other post-transcriptional modifications. Further complicates ability of DNA microarray results to give accurate estimate of protein levels. The goal of proteomic experimental techniques is to determine the full set

  • f proteins produced in a cell in a given situation.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 14 / 98

slide-15
SLIDE 15

2D Gel Electrophoresis

(Courtesy of http://www.lecb.ncifcrf.gov/2DgelDataSets/)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 15 / 98

slide-16
SLIDE 16

Mass Spectrometry

(Courtesy of United States Geological Survey)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 16 / 98

slide-17
SLIDE 17

Mass Spectrometry Result

(Courtesy of http://en.wikipedia.org/wiki/Protein_mass_spectrometry)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 17 / 98

slide-18
SLIDE 18

Protein and DNA Interactions

Using data from methods just described, one can infer the connectivity of the genetic circuit that generated this data. Experimental techniques can also be employed to determine which proteins interact as well as which proteins serve as transcription factors for particular genes. Two such techniques:

Two-hybrid screening ChIP-on-chip, also known as genome-wide location analysis

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 18 / 98

slide-19
SLIDE 19

Two-Hybrid System

(Courtesy of http://en.wikipedia.org/wiki/Two-hybrid_screening)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 19 / 98

slide-20
SLIDE 20

Genome-Wide Location Analysis

(Courtesy of Agilent Technologies)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 20 / 98

slide-21
SLIDE 21

Experimental Data

Model learning begins with experimental data, E. Each data point in E, is a 3-tuple e,τ,ν:

e ∈ N is a natural number representing the experiment number,

τ ∈ R is the time at which the species values were measured, and ν ∈ (R∪{L}∪{H}∪{−})|S| is the state of each species s ∈ S. L and H represent that a species is mutated low or high, respectively. ‘−’ represents an unknown value.

ν(s) denotes the value of species s for that data point. |E| represents the total number of data points within E.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 21 / 98

slide-22
SLIDE 22

Experimental Data Points: Example

...

E e = 1 e = 20 CI CII CIII Cro N L L 20 10 16

...

35

... ...

L L 29 88

...

60 40 8 8

...

45 5 10 100 Time

τ ν

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 22 / 98

slide-23
SLIDE 23

Bin Assignments

Data is discretized into n bins where n is typically 3 or 4. Each bin is a range of values Φj(s) = [θj(s),θj+1(s)) where j = 0 to n−1 and θ(s) = θ0(s),...,θn(s) are levels with θ0(s) = 0 and θn(s) = ∞. A bin assignment, b ∈ {0,..,n − 1,∗}|S|, assigns each s ∈ S to a bin. The notation b(s) indicates the bin assignment for species s in b. A bin assignment of ‘∗’ for s indicates that there is no bin assignment to s. The bin for a ‘∗’ is defined by Φ∗(s) = [0,∞). A bin assignment that includes ∗’s is called a partial bin assignment.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 23 / 98

slide-24
SLIDE 24

Discretize and Encode the Data

Time series data values for CIII

50 100 150 200 250 25 50 75 100 Number of times seen Expression level

Experimental data. 45

Experiment 1 Experiment 20

5 10 40 8 10 20 60 8 16 .. .. .. .. .. .. Time CI CII CIII Cro N 100 12 29 35 88

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 25 / 98

slide-25
SLIDE 25

Discretize and Encode the Data

Time series data values for CIII

50 100 150 200 250 25 50 75 100 Number of times seen Expression level

Experimental data. 45

Experiment 1 Experiment 20

5 10 40 8 10 20 60 8 16 .. .. .. .. .. .. Time CI CII CIII Cro N 100 12 29 35 88

θ(CIII) = 0,33,67,∞

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 25 / 98

slide-26
SLIDE 26

Discretize and Encode the Data

Time series data values for CIII

50 100 150 200 250 25 50 75 100 Number of times seen Expression level

Bin Assignmentsp ..

Experiment 1 Experiment 20

5 1 10 1 .. .. .. .. .. Time CI CII CIII Cro N 100 2 1 1

θ(CIII) = 0,33,67,∞

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 25 / 98

slide-27
SLIDE 27

Discretize and Encode the Data

Time series data values for CIII

50 100 150 200 250 25 50 75 100 Number of times seen Expression level

Experimental data. 45

Experiment 1 Experiment 20

5 10 40 8 10 20 60 8 16 .. .. .. .. .. .. Time CI CII CIII Cro N 100 12 29 35 88

θ(CIII) = 0,7,31,∞

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 25 / 98

slide-28
SLIDE 28

Discretize and Encode the Data

Time series data values for CIII

50 100 150 200 250 25 50 75 100 Number of times seen Expression level

Bin Assignmentsp 2

Experiment 1 Experiment 20

5 10 .. .. .. .. Time CII CIII Cro 100 .. CI 1 2 2 1 1 2 1 .. N

θ(CIII) = 0,7,31,∞

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 25 / 98

slide-29
SLIDE 29

Clustering Algorithms

Often applied to microarray data taken over a variety of conditions or a series of time points. Assume that genes that are active at the same time are likely involved in the same regulatory process. Also assume that genes are grouped and within a group the genes produce the same expression profile. Due to noise and other uncertainties, groupings are not clear. Goal: determine the original groupings of the genes. Assume that there exists a method to determine the pairwise distance between the expression profiles of any two genes. Many algorithms have been proposed for clustering.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 26 / 98

slide-30
SLIDE 30

K-Means

Partitions N genes into K clusters. Begins with K initial clusters either determined by user or by random. For each cluster, computes its centroid (i.e., the average expression profile of the genes in a cluster). Reassigns each gene to cluster with centroid that is closest to the expression pattern of the gene. Centroids recalculated and process repeats until no change.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 27 / 98

slide-31
SLIDE 31

K-Means Example

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 28 / 98

slide-32
SLIDE 32

K-Means Example

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 29 / 98

slide-33
SLIDE 33

K-Means Example

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 30 / 98

slide-34
SLIDE 34

K-Means Example

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 31 / 98

slide-35
SLIDE 35

K-Means Clustering: Phage λ Data

πK1 = {cI} πK2 = {B,C,U,V,H,M,L,K,I,J} πK3 = {A,D,E,Z,G,T,S,R} πK4 = {cro,cII,cIII,N,Q,xis,int,O}

(Data courtesy of Osterhout et al., BMC Microbiology 2007)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 32 / 98

slide-36
SLIDE 36

Agglomerative Hierarchical Clustering

Begins with N clusters each containing a single gene. Combines two clusters with smallest distance apart where distance is between their average expression profiles. Continues for N − 1 steps at which point all the genes are merged into a hierarchical tree.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 33 / 98

slide-37
SLIDE 37

Hierarchical Clustering: Phage λ Data

πH1 = {cI} πH2 = {cro,cII,cIII,N,Q,xis,int,O} πH3 = {S,R} πH4 = {A,D,E,Z,G,T,B,C,

U,V,H,M,L,K,I,J}

(Data courtesy of Osterhout et al., BMC Microbiology 2007)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 34 / 98

slide-38
SLIDE 38

Clustering Summary

Using clustering results, one can potentially determine which genes produce proteins with similar functions. Clustering results do not shed light on how these genes and their protein products interact.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 35 / 98

slide-39
SLIDE 39

Bayesian Networks

Given expression data, E, learning techniques allow one to infer the network connectivity that best matches E. Bayesian networks are a promising tool to learn connectivity. A Bayesian network represents a joint probability distribution. Represented with directed acyclic graph, G, whose vertices correspond to random variables, X1, . . . , Xn, for gene expression level. Connections represent dependencies between random variables.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 36 / 98

slide-40
SLIDE 40

Dependence

P(X,Y) is joint distribution over two variables X and Y. X and Y are independent if P(X,Y) = P(X)P(Y) for all values of X and Y (equivalently, P(X|Y) = P(X)). When X and Y are dependent, value of Y gives information about X. Correlation is sufficient but not necessary condition for dependence. When X and Y are dependent, this is represented in the Bayesian network by an arc between them. If the arc is directed from X to Y, X is a parent of Y.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 37 / 98

slide-41
SLIDE 41

Markov Assumption

Associated with each Xi is a conditional distribution, θ, given its parents. Graph G encodes the Markov Assumption, each variable Xi is independent of its non-descendents given its parents in G. This is known as Conditional independence, and graph G implies a set of conditional independence assumptions, Ind(G). Using the Markov assumption, the joint PDF can be decomposed: P(X1,...,Xn)

=

n

i=1

P(Xi|Pa(Xi)) where Pa(Xi) denotes the parants of Xi.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 38 / 98

slide-42
SLIDE 42

Simple Bayesian Network

A P(A) 0.5 1 0.5 A P(B=1|A) P(B=0|A) 0.1 0.9 1 0.2 0.8 B P(C=1|B) P(C=0|B) 0.75 0.25 1 0.35 0.65 Ind(G) = {A ⊥

⊥ C|B}

P(A,B,C) = P(A)P(B|A)P(C|B) P(A = 1,B = 0,C = 0) = 0.5× 0.8× 0.25 = 0.1

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 39 / 98

slide-43
SLIDE 43

Another Bayesian Network

E P(E) 0.6 1 0.4 A E B=1 B=0 0.1 0.9 1 0.8 0.2 1 0.7 0.3 1 1 0.4 0.6 B C=1 C=0 0.75 0.25 1 0.35 0.65 A P(A) 0.35 1 0.65 A D=1 D=0 0.1 0.9 1 0.8 0.2 A is common cause of B and D. If A not measured, hidden common cause.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 40 / 98

slide-44
SLIDE 44

Another Bayesian Network

E P(E) 0.6 1 0.4 A E B=1 B=0 0.1 0.9 1 0.8 0.2 1 0.7 0.3 1 1 0.4 0.6 B C=1 C=0 0.75 0.25 1 0.35 0.65 A P(A) 0.35 1 0.65 A D=1 D=0 0.1 0.9 1 0.8 0.2 Ind(G) = {A ⊥

⊥ C|B,A ⊥ ⊥ E,B ⊥ ⊥ D|A,C ⊥ ⊥ D|A,

C ⊥

⊥ D|B,C ⊥ ⊥ E|B,D ⊥ ⊥ E}

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 40 / 98

slide-45
SLIDE 45

Another Bayesian Network

E P(E) 0.6 1 0.4 A E B=1 B=0 0.1 0.9 1 0.8 0.2 1 0.7 0.3 1 1 0.4 0.6 B C=1 C=0 0.75 0.25 1 0.35 0.65 A P(A) 0.35 1 0.65 A D=1 D=0 0.1 0.9 1 0.8 0.2 P(A,B,C,D,E) = P(A)P(B|A,E)P(C|B)P(D|A)P(E)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 40 / 98

slide-46
SLIDE 46

Equivalence Classes of Bayesian Networks

More than one graph can imply same set of independences. Graphs X → Y and X ← Y both have Ind(G) = /

0.

G and G′ are equivalent if Ind(G) = Ind(G′). Equivalent graphs have same underlying undirected graph but may disagree on direction of some edges. Equivalence class represented by a partially directed graph (PDAG) where edges can be: X → Y, X ← Y, or X—Y.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 41 / 98

slide-47
SLIDE 47

Equivalent Bayesian Networks

A B C

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 42 / 98

slide-48
SLIDE 48

Equivalent Bayesian Networks

A B C D E

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 43 / 98

slide-49
SLIDE 49

Learning Bayesian Networks

Given training set of experimental data, E, find a network G,θ that best matches E. Evaluate using Bayesian scoring metric: P(G|E)

=

P(E|G)P(G) P(E) Score(G : E)

=

logP(G|E) = logP(E|G)+ logP(G)+ C where C = −logP(E) is constant and P(E|G) = R P(E|G,θ)P(θ|G)dθ is the marginal likelihood. Choice of priors P(G) and P(θ|G) influence the score.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 44 / 98

slide-50
SLIDE 50

Learning Bayesian Networks (cont)

Given priors and data, learning amounts to finding structure G that maximizes the score. NP-hard so use heuristics like greedy random search. For example, beginning with some initial network, a greedy random search would select an edge to add, delete, or reverse. It would then compute this networks score, and if it is better than the previous network, then it would keep the change. This process is repeated until no improvement is found.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 45 / 98

slide-51
SLIDE 51

Efficient Learning Algorithms

Number of graphs is super-exponential in number of variables. Sparse candidate algorithm identifies small number of candidate parents for each gene based on local statistics. Pitfall is early choices can overly restrict the search space. Adapting the candidate sets during the search can help.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 46 / 98

slide-52
SLIDE 52

Applying Bayesian Networks to Expression Data

By learning Bayesian network, can answer questions like which genes depend on which other genes. Expression level of each gene modeled as a random variable. Need to define local probability model for each variable. Discretize gene expression into 3 categories: significantly lower, similar to, or significantly greater than control. Discretizing can lose information, but more levels can be used if more resolution in experimental data. Control expression level either determined experimentally or the average expression level can be used. Meaning of “significantly” defined by setting threshold to ratio between measured expression and control.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 47 / 98

slide-53
SLIDE 53

Phage λ Discretized Expression Data

cIII N cII cro cI Probability 0.05 1 0.18 1 0.06 1 1 0.10 1 0.0 1 1 0.04 1 1 0.02 1 1 1 0.02 1 0.01 1 1 0.05 1 1 0.05 1 1 1 0.02 1 1 0.03 1 1 1 0.0 1 1 1 0.03 1 1 1 1 0.0 cIII N cII cro cI Probability 1 0.0 1 1 0.02 1 1 0.01 1 1 1 0.0 1 1 0.01 1 1 1 0.01 1 1 1 0.01 1 1 1 1 0.0 1 1 0.01 1 1 1 0.01 1 1 1 0.01 1 1 1 0.0 1 1 1 0.20 1 1 1 1 0.02 1 1 1 1 0.02 1 1 1 1 1 0.0

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 48 / 98

slide-54
SLIDE 54

Phage λ Bayesian Network

N P(N) 0.54 1 0.46 N cIII cII=1 cII=0 0.18 0.82 1 0.48 0.52 1 0.34 0.66 1 1 0.87 0.13

N cII cIII cI cro

N cIII=1 cIII=0 0.12 0.88 1 0.58 0.42 cII cI=1 cI=0 0.66 0.34 1 0.24 0.76 cI cro=1 cro=0 0.43 0.57 1 0.33 0.67 P(cIII,N,cII,cro,N) = P(N)P(cIII|N)P(cII|N,cIII)P(cI|cII)P(cro|cI)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 49 / 98

slide-55
SLIDE 55

Comparison

cIII N cII cro cI Orig BN 0.05 0.08 1 0.18 0.17 1 0.06 0.08 1 1 0.10 0.17 1 0.00 0.04 1 1 0.04 0.01 1 1 0.02 0.04 1 1 1 0.02 0.01 1 0.01 0.03 1 1 0.05 0.06 1 1 0.05 0.03 1 1 1 0.02 0.06 1 1 0.03 0.03 1 1 1 0.00 0.01 1 1 1 0.03 0.03 1 1 1 1 0.00 0.01 cIII N cII cro cI Orig BN 1 0.00 0.01 1 1 0.02 0.01 1 1 0.01 0.01 1 1 1 0.00 0.01 1 1 0.01 0.01 1 1 1 0.01 0.01 1 1 1 0.01 0.01 1 1 1 1 0.00 0.01 1 1 0.01 0.01 1 1 1 0.01 0.01 1 1 1 0.01 0.01 1 1 1 0.00 0.01 1 1 1 0.20 0.10 1 1 1 1 0.02 0.04 1 1 1 1 0.02 0.10 1 1 1 1 1 0.00 0.04

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 50 / 98

slide-56
SLIDE 56

Phage λ Bayesian Network

N P(N) 0.54 1 0.46 N cIII cII=1 cII=0 0.18 0.82 1 0.48 0.52 1 0.34 0.66 1 1 0.87 0.13

N cII cIII cI cro

N cIII=1 cIII=0 0.12 0.88 1 0.58 0.42 cII cI=1 cI=0 0.66 0.34 1 0.24 0.76 cI cro=1 cro=0 0.43 0.57 1 0.33 0.67 P(cIII,N,cII,cro,N) = P(N)P(cIII|N)P(cII|N,cIII)P(cI|cII)P(cro|cI)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 51 / 98

slide-57
SLIDE 57

Phage λ Bayesian Network

cIII P(cIII) 0.58 1 0.42 N cIII cII=1 cII=0 0.18 0.82 1 0.48 0.52 1 0.34 0.66 1 1 0.87 0.13

cI cro cII cIII N

cIII N=1 N=0 0.29 0.71 1 0.80 0.20 cII cI=1 cI=0 0.66 0.34 1 0.24 0.76 cI cro=1 cro=0 0.43 0.57 1 0.33 0.67 P(cIII,N,cII,cro,N) = P(cIII)P(N|cIII)P(cII|N,cIII)P(cI|cII)P(cro|cI)

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 52 / 98

slide-58
SLIDE 58

Finding Features

Difficulty is data is for thousands of genes but often only a few dozen samples, but on positive side, networks typically sparse. A set of plausible networks needs to be considered. May characterize features common in a set of networks. Markov relations: Is Y in the Markov blanket of X? Order relations: Is X an ancestor of Y? (or cause?) Confidence is likelihood that a feature is actually true. confidence(f)

=

1 m

m

i=1

f(Gi) where m is the number of potential networks considered, Gi is a potential network, and f(Gi) is 1 if f is a feature of Gi and 0 otherwise. Can use bootstrap method to generating potential networks which considers multiple subsets of the experimental data.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 53 / 98

slide-59
SLIDE 59

Bayesian Networks Discussion

Clustering approaches can only find correlations. Bayesian analysis can potentially discover causal relationships and interactions between genes. Probablistic semantics good for noisy biological systems. Can focus on extracting features rather than find single model. Can assist with experimental design. Bayesian networks though limited to acyclic graphs. Most (if not all) genetic circuits include feedback control.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 54 / 98

slide-60
SLIDE 60

Dynamic Bayesian Networks

A Dynamic Bayesian Networks (DBN) unrolls the cyclic graph T times. Nodes in DBN are random variables X (t)

1 ,...,X (t) n

where t equals 1 to T. The joint PDF can be decomposed as follows: P(X (1)

1 ,...,X (T) n

) =

T

t=1 n

i=1

P(X (t)

i

|Pa(X (t)

i

))

DBNs require time series experimental data.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 55 / 98

slide-61
SLIDE 61

DBN for the Phage λ Decision Circuit

P(N(1),cI(1),cII(1),cIII(1),cro(1),N(2),cI(2),cII(2),cIII(2),cro(2)) = P(N(1))P(cI(1))P(cII(1))P(cIII(1))P(cro(1))P(N(2)|cro(1),cI(1)) P(cI(2)|cro(1),cI(1),cII(1))P(cII(2)|N(1),cI(1),cIII(1)) P(cIII(2)|N(1),cI(1),cro(1))P(cro(2)|cI(1),cro(1))

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 56 / 98

slide-62
SLIDE 62

Causal Networks

A Bayesian network represents correlative relationships, but ultimately we are interested in knowing causal relationships. In a causal network, parents are interpreted as immediate causes. Causal Markov assumption states that given values of variable’s immediate causes, it is independent of earlier causes. Causal networks model not only distribution of observations but also effects of interventions. In causal networks, X → Y and Y → X are not equivalent.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 57 / 98

slide-63
SLIDE 63

Learning Causal Networks

DBN approaches typically must perform an expensive global search, and they have difficulty learning networks with tight feedback. The method described here uses local analysis to efficiently learn networks with tight feedback. This method determines the likelihood that a gene’s expression increases in the next time point given the current gene expression levels. These likelihoods are then used to determine influences between the genes in the genetic circuit. Result is a directed graph representation of the genetic circuit.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 58 / 98

slide-64
SLIDE 64

Possible Genetic Circuit Models

8.47× 1011 1

CII CI CIII Cro N

Number of models = 3|S|2 where |S| is the number of species.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 59 / 98

slide-65
SLIDE 65

Influence Vectors

?

CI CIII Cro N CII

CI CII CIII Cro N i : r n n u a

CIII u = unknown n = no connection a = activation r = repression Act(i)

= {N}

Rep(i)

= {CI}

Par(i)

= {N,CI}

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 60 / 98

slide-66
SLIDE 66

Scoring Influence Vectors

Γ = {(e,τ,ν,e′,τ′,ν′) | e,τ,ν ∈ E ∧e′,τ′,ν′ ∈ E ∧(e = e′)∧(τ < τ′) ∧¬∃e,τ′′,ν′′ ∈ E.(τ < τ′′)∧(τ′′ < τ′)}

inc(s)

= {(e,τ,ν,e′,τ′,ν′) ∈ Γ | ν(s) < ν′(s)}

val(s)

= {(e,τ,ν,e′,τ′,ν′) ∈ Γ | ν(s) / ∈ {L,H,−}∧ν′(s) / ∈ {L,H,−}}

bin(b)

= {(e,τ,ν,e′,τ′,ν′) ∈ Γ | ∀s′ ∈ S.ν(s′) ∈ Φb(s′)(s′)}

P(inc(s)∩ val(s)∩ bin(b)) = |inc(s)∩ val(s)∩ bin(b)|

|Γ|

P(val(s)∩ bin(b)) = |val(s)∩ bin(b)|

|Γ|

P(inc(s) | val(s)∩ bin(b)) = P(inc(s)∩ val(s)∩ bin(b)) P(val(s)∩ bin(b))

= |inc(s)∩ val(s)∩ bin(b)| |val(s)∩ bin(b)|

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 61 / 98

slide-67
SLIDE 67

Probability of Increase: Example

<*,i,j,*,*>

1 2 0 1 2 25 50 75 N’s Prob.

  • f Incr.

CII CIII N’s Prob.

  • f Incr.

(courtesy of Barker (2007))

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 62 / 98

slide-68
SLIDE 68

Probability of Increase: Example

CI,CII,CIII,Cro,N

P(N ↑ | b(CII),b(CIII))

∗,0,0,∗,∗

40

∗,0,1,∗,∗

49

∗,0,2,∗,∗

70

∗,1,0,∗,∗

58

∗,1,1,∗,∗

42

∗,1,2,∗,∗

38

∗,2,0,∗,∗

66

∗,2,1,∗,∗

38

∗,2,2,∗,∗

26

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 63 / 98

slide-69
SLIDE 69

Probability Ratios

To determine trends, a ratio is formed between two probabilities using two partial bin assignments, b and b′.

P(inc(s) | val(s)∩ bin(b)) P(inc(s) | val(s)∩ bin(b′)) = |inc(s)∩ val(s)∩ bin(b′)|

|val(s)∩ bin(b′)| ∗ |val(s)∩ bin(b)| |inc(s)∩ val(s)∩ bin(b)|

The partial bin assignment, b′, is called the base. b′(s) =

           ∗

if i(s) = ‘n’ if (i(s) = ‘a’∧|Rep(i)| ≤ |Act(i)|)∨

(i(s) = ‘r’∧|Rep(i)| > |Act(i)|)

n − 1

  • therwise

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 64 / 98

slide-70
SLIDE 70

Probability Ratios: Example

CI,CII,CIII,Cro,N

P(N ↑ | b(CII),b(CIII))

Ratio

∗,0,0,∗,∗

40

∗,0,1,∗,∗

49 1.23

∗,0,2,∗,∗

70 1.75

∗,1,0,∗,∗

58 1.45

∗,1,1,∗,∗

42 1.05

∗,1,2,∗,∗

38 0.95

∗,2,0,∗,∗

66 1.65

∗,2,1,∗,∗

38 0.95

∗,2,2,∗,∗

26 0.65

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 65 / 98

slide-71
SLIDE 71

Scoring Influence Vectors

More activating influences (i.e., |Rep(i)| ≤ |Act(i)|).

1 Ta Tr Against For Neutral

More repressing influences (i.e., |Rep(i)| > |Act(i)|).

Tr 1 Ta For Against Neutral

The final score is determined using the following equation: score = vf − va vf + va + vn A score greater than zero indicates support for the vector while a negative score indicates there is not support for the vector.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 66 / 98

slide-72
SLIDE 72

Scoring Influence Vectors: Example

CI,CII,CIII,Cro,N

P(N ↑ | b(CII),b(CIII))

Ratio Vote

∗,0,0,∗,∗

40

∗,0,1,∗,∗

49 1.23 va

∗,0,2,∗,∗

70 1.75 va

∗,1,0,∗,∗

58 1.45 va

∗,1,1,∗,∗

42 1.05 vn

∗,1,2,∗,∗

38 0.95 vn

∗,2,0,∗,∗

66 1.65 va

∗,2,1,∗,∗

38 0.95 vn

∗,2,2,∗,∗

26 0.65 vf

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 67 / 98

slide-73
SLIDE 73

Scoring with a Control Set

When scoring an influence vector, i, for species s, the probability of increase can be influenced by the level of s. Similarly, when comparing two influence vectors, i and i′, it is useful to control for the species in i′, when evaluating the score for i and vice versa. In both cases, can partition bins using a control set, G. Now consider all assignments to species in Par(i)∪ G. Base bin assignment agrees with values in b for each member of G. b′(s) =

              

b(s) if s ∈ G

if i(s) = ‘n’ if (i(s) = ‘a’∧|Rep(i)| ≤ |Act(i)|)

∨(i(s) = ‘r’∧|Rep(i)| > |Act(i)|)

n − 1

  • therwise

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 68 / 98

slide-74
SLIDE 74

Scoring with a Control Set: Example 1 <*,i,j,*,0> <*,i,j,*,2> 3

1 2 0 1 2 25 50 75 100 N’s Prob.

  • f Incr.

CII CIII N’s Prob.

  • f Incr.

(courtesy of Barker (2007))

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 69 / 98

slide-75
SLIDE 75

Scoring with a Control Set: Example

CI,CII,CIII,Cro,N

P(N ↑ | b(CII),b(CIII),b(N))

Ratio Vote

∗,0,0,∗,0

40 base

∗,0,1,∗,0

58 1.45 va

∗,0,2,∗,0

83 2.08 va

∗,1,0,∗,0

67 1.66 va

∗,1,1,∗,0

55 1.37 va

∗,1,2,∗,0

59 1.47 va

∗,2,0,∗,0

100 2.5 va

∗,2,1,∗,0

44 1.09 vn

∗,2,2,∗,0

36 0.90 vn

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 70 / 98

slide-76
SLIDE 76

Scoring with a Control Set: Example

CI,CII,CIII,Cro,N

P(N ↑ | b(CII),b(CIII),b(N))

Ratio Vote

∗,0,0,∗,1

55 base

∗,0,1,∗,1

40 0.72 vf

∗,0,2,∗,1

50 0.90 vn

∗,1,0,∗,1

54 0.98 vn

∗,1,1,∗,1

37 0.67 vf

∗,1,2,∗,1

41 0.75 vn

∗,2,0,∗,1

0.00 vf

∗,2,1,∗,1

42 0.76 vn

∗,2,2,∗,1

27 0.49 vf

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 71 / 98

slide-77
SLIDE 77

Scoring with a Control Set: Example

CI,CII,CIII,Cro,N

P(N ↑ | b(CII),b(CIII),b(N))

Ratio Vote

∗,0,0,∗,2

27 base

∗,0,1,∗,2

22 0.81 vn

∗,0,2,∗,2

50 1.85 va

∗,1,0,∗,2

30 1.11 vn

∗,1,1,∗,2

28 1.04 vn

∗,1,2,∗,2

28 1.04 vn

∗,2,0,∗,2

100 3.70 va

∗,2,1,∗,2

30 1.11 vn

∗,2,2,∗,2

24 0.88 vn

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 72 / 98

slide-78
SLIDE 78

Learning Influences Overview

Select initial influence vector set. Combine influence vectors. Compete influence vectors.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 73 / 98

slide-79
SLIDE 79

Unknown Influence Vector

?

N

? ? ?

CII CI CIII Cro

CI CII CIII Cro N u u n u u

CIII

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 74 / 98

slide-80
SLIDE 80

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 0 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-81
SLIDE 81

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 0 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-82
SLIDE 82

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 0 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-83
SLIDE 83

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

1 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 0 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-84
SLIDE 84

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

2 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 0 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-85
SLIDE 85

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

2 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-86
SLIDE 86

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

2 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-87
SLIDE 87

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

2 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-88
SLIDE 88

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

3 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-89
SLIDE 89

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

4 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-90
SLIDE 90

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

4 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 2 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-91
SLIDE 91

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

4 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 2 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-92
SLIDE 92

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

4 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 2 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-93
SLIDE 93

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

5 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 2

5 10 15 20 1 2 CIII’s rising probability (%) CI

CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-94
SLIDE 94

Selecting Initial Influence Vectors: CI ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

6 CII

nrnnn

Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 2 5 10 15 20 1 2 CIII’s rising probability (%) CI CI L0 CI L1 CI L2 L1 / L0 L2 / L0 CIII L0 19.0% 1.7% 1.0% 0.09 0.05 CIII L1 17.1% 2.6% 1.2% 0.15 0.07 CIII L2 11.6% 2.7% 1.1% 0.23 0.09

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 75 / 98

slide-95
SLIDE 95

Selecting Initial Influence Vectors: CII ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

6 CII

nrnnn

2 3 Cro

nnnrn

N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 0 5 10 15 20 1 2 CIII’s rising probability (%) CII CII L0 CII L1 CII L2 L1 / L0 L2 / L0 CIII L0 3.1% 13.7%

  • 4.32
  • CIII L1

4.4% 7.4% 12.6% 1.65 2.83 CIII L2 19.4% 5.5% 5.8% 0.28 0.35

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 76 / 98

slide-96
SLIDE 96

Selecting Initial Influence Vectors: Cro ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

6 CII

nrnnn

2 3 Cro

nnnrn

6 N

nnnnr

CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 2 5 10 15 20 1 2 CIII’s rising probability (%) Cro Cro L0 Cro L1 Cro L2 L1 / L0 L2 / L0 CIII L0 11.55% 1.84% 1.47% 0.16 0.13 CIII L1 14.20% 4.74% 3.10% 0.33 0.22 CIII L2 9.70% 5.02% 4.17% 0.52 0.43

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 77 / 98

slide-97
SLIDE 97

Selecting Initial Influence Vectors: N ⊣ CIII

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

6 CII

nrnnn

2 3 Cro

nnnrn

6 N

nnnnr

5 1 CI

annnn

CII

nannn

Cro

nnnan

N

nnnna

CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) N N L0 N L1 N L2 L1 / L0 L2 / L0 CIII L0 5.4% 2.9% 3.7% 0.53 0.68 CIII L1 9.3% 7.2% 6.7% 0.78 0.71 CIII L2 8.6% 6.4% 6.1% 0.75 0.71

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 78 / 98

slide-98
SLIDE 98

Selecting Initial Influence Vectors: Others

CIII’s Influence Vector Scores Influence Vector vf va vn CI

rnnnn

6 CII

nrnnn

2 3 Cro

nnnrn

6 N

nnnnr

5 1 CI

annnn

6 CII

nannn

3 2 Cro

nnnan

6 N

nnnna

5 1 CIII at level 1 5 10 15 20 1 2 CIII’s rising probability (%) N N L0 N L1 N L2 L1 / L0 L2 / L0 CIII L0 5.4% 2.9% 3.7% 0.53 0.68 CIII L1 9.3% 7.2% 6.7% 0.78 0.71 CIII L2 8.6% 6.4% 6.1% 0.75 0.71

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 79 / 98

slide-99
SLIDE 99

Selecting Initial Influence Vectors: Scoring

vf−va vf+va+vn

Keep Discard 1.0 0.75

  • 1.0

CIII’s Influence Vector Scores Influence Vector vf va vn Score CI

rnnnn

6 1.0 CII

nrnnn

2 3

  • 0.2

Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI

annnn

6

  • 1.0

CII

nannn

3 2 0.2 Cro

nnnan

6

  • 1.0

N

nnnna

5 1

  • 0.833

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 80 / 98

slide-100
SLIDE 100

Selecting Initial Influence Vectors: Scoring

vf−va vf+va+vn

Keep Discard 1.0 0.75

  • 1.0

CIII’s Influence Vector Scores Influence Vector vf va vn Score CI

rnnnn

6 1.0 CII

nrnnn

2 3

  • 0.2

Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI

annnn

6

  • 1.0

CII

nannn

3 2 0.2 Cro

nnnan

6

  • 1.0

N

nnnna

5 1

  • 0.833

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 80 / 98

slide-101
SLIDE 101

Selecting Initial Influence Vectors: Scoring

vf−va vf+va+vn

Keep Discard 1.0 0.75

  • 1.0

CIII’s Influence Vector Scores Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 80 / 98

slide-102
SLIDE 102

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

1

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s Prob.

  • f Incr.

CI Cro CIII’s Prob.

  • f Incr.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-103
SLIDE 103

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

2

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-104
SLIDE 104

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

3

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-105
SLIDE 105

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

4

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-106
SLIDE 106

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

5

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-107
SLIDE 107

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

6

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-108
SLIDE 108

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

7

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-109
SLIDE 109

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

8

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-110
SLIDE 110

Combining Influence Vectors: {CI,Cro} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 81 / 98

slide-111
SLIDE 111

Combining Influence Vectors: {CI,N} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0 CI, N

rnnnr

21 3 0.875

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD N CI CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 82 / 98

slide-112
SLIDE 112

Combining Influence Vectors: {CI,N} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0 CI, N

rnnnr

21 3 0.875

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD N CI CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 82 / 98

slide-113
SLIDE 113

Combining Influence Vectors: {Cro,N} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0 Cro, N

nnnrr

23 1 0.958

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD N Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 83 / 98

slide-114
SLIDE 114

Combining Influence Vectors: {Cro,N} ⊣ CIII

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0 Cro, N

nnnrr

23 1 0.958

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD N Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 83 / 98

slide-115
SLIDE 115

Combining Influence Vectors: Remove Subsets

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score CI

rnnnn

6 1.0 Cro

nnnrn

6 1.0 N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD N Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 84 / 98

slide-116
SLIDE 116

Combining Influence Vectors: Remove Subsets

Proteins produced from separate genes can work together to influence genes.

1

Consider Influence Vectors in the set two at a time

2

Merge if their combined score outweighs their individual scores.

3

Remove subsets. Influence Vector vf va vn Score N

nnnnr

5 1 0.833 CI, Cro

rnnrn

24 1.0

1 3 CIII at 0 CIII at 2

1 2 0 1 2 10 20 30 CIII’s RPD N Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 84 / 98

slide-117
SLIDE 117

Competing Influence Vectors

Set levels of species that activate or repress to reduce correlation effects

  • f unrelated genes (like a mutational experiment).

Select a pair of influence vectors and obtain votes for both vectors in the combined state space. Discard the influence vector with the most neutral score. Influence Vector vf va vn Score N

nnnnr

12 27 15 0.27 CI, Cro

rnnrn

1 2 10 20 30 5 15 25 N 1 27 CIII’s Prob. of Incr.

0,∗,0,0,j 2,∗,2,2,j

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 86 / 98

slide-118
SLIDE 118

Competing Influence Vectors

Set levels of species that activate or repress to reduce correlation effects

  • f unrelated genes (like a mutational experiment).

Select a pair of influence vectors and obtain votes for both vectors in the combined state space. Discard the influence vector with the most neutral score. Influence Vector vf va vn Score N

nnnnr

12 27 15 0.27 CI, Cro

rnnrn

71 1 0.98

1 9 N, CIII at <0,0> N, CIII at <2,2>

1 2 0 1 2 10 20 30 CIII’s RPD CI Cro CIII’s RPD Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 88 / 98

slide-119
SLIDE 119

Competing Influence Vectors

Set levels of species that activate or repress to reduce correlation effects

  • f unrelated genes (like a mutational experiment).

Select a pair of influence vectors and obtain votes for both vectors in the combined state space. Discard the influence vector with the most neutral score. Influence Vector vf va vn Score N

nnnnr

12 27 15 0.27 CI, Cro

rnnrn

71 1 0.98

N CII CI CIII Cro

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 90 / 98

slide-120
SLIDE 120

Competing Influence Vectors

Set levels of species that activate or repress to reduce correlation effects

  • f unrelated genes (like a mutational experiment).

Select a pair of influence vectors and obtain votes for both vectors in the combined state space. Discard the influence vector with the most neutral score. Influence Vector vf va vn Score N

nnnnr

12 27 15 0.27 CI, Cro

rnnrn

71 1 0.98

N CII CI CIII Cro

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 90 / 98

slide-121
SLIDE 121

Competing Influence Vectors

Set levels of species that activate or repress to reduce correlation effects

  • f unrelated genes (like a mutational experiment).

Select a pair of influence vectors and obtain votes for both vectors in the combined state space. Discard the influence vector with the most neutral score. Influence Vector vf va vn Score CI, Cro

rnnrn

71 1 0.98

N CII CI CIII Cro

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 90 / 98

slide-122
SLIDE 122

Final Result

N CII CI CIII Cro

CI CII CIII Cro N n a n n n

CI r n n n n

CII r n n r n

CIII r n n n n

Cro r n n r n

N

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 91 / 98

slide-123
SLIDE 123

Experimental Design

Learning method considers and discards many potential models. Alternative models may actually be correct and mistakenly discarded. It may be useful to design an experiment that would provide data to either support the selected model or potentially an alternative model.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 92 / 98

slide-124
SLIDE 124

Alternative Models A B C A

P1

  • B

C A

P1

  • B

P1

  • C

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 93 / 98

slide-125
SLIDE 125

Evaluation of Causal Network Learning Method

Genetic Circuit

  • Insert into

Host

  • Plasmid
  • Set of

Experiments

  • Perform

Experiments

  • Construct

Plasmid

  • Experimental

Data

  • Biological

Knowledge

  • DNA

Sequence

  • Learn Model
  • SBML Model
  • TechMap
  • Library
  • Models
  • Abstraction/

Simulation

  • Logic

Equations

  • Construct

Experiments

  • Simulation

Data

  • ATACS
  • HDL
  • Modeling

Analysis Synthesis

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 94 / 98

slide-126
SLIDE 126

Evaluation of Causal Network Learning Method

Genetic Circuit

  • Insert into

Host

  • Plasmid
  • Set of

Simulations

  • Perform

Simulations

  • Construct

Plasmid

  • Simulation

Data

  • Biological

Knowledge

  • DNA

Sequence

  • Learn Model
  • SBML Model
  • TechMap
  • Library
  • Models
  • Abstraction/

Simulation

  • Logic

Equations

  • Construct

Simulations

  • Simulation

Data

  • ATACS
  • HDL
  • Modeling

Analysis Synthesis

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 95 / 98

slide-127
SLIDE 127

Genetic Circuits Analyzed

L Note: the 7 Genes N-T are unconnected A C B G J F D M E I H K

J A B C G D H E F I

48 Networks inspired from Guet et al. 10 Random 10 Gene Networks 10 Yu et al. 20 Gene Networks

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 96 / 98

slide-128
SLIDE 128

Recall Results

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 GeneNet Yu’s DBN tool

BioSim wins in 57 and ties in 7 of the 68 cases.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 97 / 98

slide-129
SLIDE 129

Recall Results

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 97 / 98

slide-130
SLIDE 130

Precision Results

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 GeneNet Yu’s DBN tool

BioSim wins in 17 and ties in 13 of the 47 cases. Note that Yu’s method found no arcs in 21 cases.

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 97 / 98

slide-131
SLIDE 131

Precision Result

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 97 / 98

slide-132
SLIDE 132

Runtime Results

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 97 / 98

slide-133
SLIDE 133

Sources

Experimental methods:

Berg et al. (2002), Watson et al. (2003), and Alberts et al. (2002).

http://en.wikipedia.org. Cluster analysis:

Idea of cluster analysis - Tryon (1939) K-means clustering - MacQueen (1967) Hierarchical clustering - Johnson (1967) Cluster analysis of gene expression data - Tavozoie et al. (1999), Eisen et

  • al. (1998), D’haeseleer et al. (2000), etc.

Learning Bayesian networks:

Tutorial - Heckerman (1996) Application to biological systems - Friedman et al. (2000), Hartemink et al. (2001), Pe’er (2005), Sachs et al. (2005), etc. DBNs - Ong et al. (2002), Husmeier et al. (2003), Nachman et al. (2004), Yu et al. (2004), Bernard and Hartemink (2005), Beal et al. (2005), etc.

Learning Causal networks:

Barker et al., (2006, 2007, and 2010).

Chris J. Myers (Lecture 2: Learning Models) Engineering Genetic Circuits 98 / 98