Stochastic modelling of genome-wide robotic screens for genetic - - PowerPoint PPT Presentation

▶

Oct 08, 2022 412 likes •769 views

Experiments Data analysis Summary and conclusions Stochastic modelling of genome-wide robotic screens for genetic interaction in budding yeast Darren Wilkinson http://tinyurl.com/darrenjw School of Mathematics & Statistics, Newcastle

SLIDE 1

Experiments Data analysis Summary and conclusions

Stochastic modelling of genome-wide robotic screens for genetic interaction in budding yeast

Darren Wilkinson

http://tinyurl.com/darrenjw School of Mathematics & Statistics, Newcastle University, UK Solving big data challenges ICMS, Edinburgh 5th–8th May, 2015

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 2

Experiments Data analysis Summary and conclusions

Overview

Background: Budding yeast as a model for genetics High-throughput robotic genetic experiments Image analysis and data processing Stochastic modelling of growth curves Hierarchical modelling of genetic interaction Summary and conclusions Joint work with Jonathan Heydari, Conor Lawless and David Lydall (and others in the “Lydall lab”)

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 3

Experiments Data analysis Summary and conclusions Lydall lab HTP yeast SGA robotic screens

Yeast Lab

David Lydall’s (budding) yeast lab uses a range of high throughput (HTP) technologies for genome-wide screening for interactions relevant to DNA damage response and repair pathways, with a particular emphasis on telomere maintenance Much of this work centres around the use of robotic protocols in conjunction with genome-wide knockout libraries and synthetic genetic array (SGA) technology to screen for genetic interactions with known telomere maintenance genes Quantitative fitness analysis (QFA) is the term we use for our system of robotic image capture, data handling, image analysis and data modelling

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 4

Experiments Data analysis Summary and conclusions Lydall lab HTP yeast SGA robotic screens

Basic structure of an experiment

1 Introduce a mutation (such as cdc13-1) into an SGA query

strain, and then use SGA technology (and a robot) to cross this strain with the single deletion library in order to obtain a new library of double mutants

2 Inoculate the strains into liquid media, grow up to saturation

then spot back on to solid agar 4 times

3 Incubate the 4 different copies at different temperatures

(treatments), and image the plates multiple times to see how quickly the different strains are growing

4 Repeat steps 2 and 3 four times (to get some idea of

experimental variation)

5 Repeat steps 2 to 4 with a “control” library that does not

include the query mutation

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 5

Experiments Data analysis Summary and conclusions Lydall lab HTP yeast SGA robotic screens

Some numbers relating to an experiment

Initial SGA work (introducing mutations into the query and the library) takes around 1 month of calendar time, and several days of robot time The inoculation, spotting and imaging of the 8 repeats takes 1 month of calendar time, and around 2 weeks of robot time The experiment uses around £5,000 of consumables (plastics and media) The library is distributed across 72 96-well plates or 18 solid agar plates (in 384 format, or 1536 in quadruplicate) If each plate is imaged 30 times, there will be around 35k high-resolution photographs of plates in 384 format, corresponding to around 13 million colony growth measurements (400k time series) This is big data!

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 6

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Data analysis pipeline

Image processing (from images to colony size measurements) Fitness modelling (from colony size growth curves to strain fitness measures) Modelling genetic interaction (from strain fitness measures to identification of genetically interacting strains, ranked by effect size) Possible to carry out three stages separately, but benefits to joint modelling through borrowed strength and proper propagation of

uncertainty. Not practical to integrate image processing step into

the joint model, but possible to jointly model second two stages.

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 7

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Growth curve

12 24 36 his3Δ htz1Δ 6 18 30 42

Time since inoculation (h) Normalised cell density (AU)

A B

10 20 30 40 0.00 0.05 0.10 0.15

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 8

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Growth curve modelling

We want something between a simple smoothing of the data and a detailed model of yeast cell growth and division Logistic growth models are ideal — simple semi-mechanistic models with interpretable parameters related to strain fitness Basic deterministic model: dx dt = rx(1 − x/K), subject to initial condition x = P at t = 0 r is the growth rate and K is the carrying capacity Analytic solution: x(t) = KPert K + P(ert − 1)

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 9

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Statistical model

Model observational measurements {Yt1, Yt2, . . .} with Yti = xti + εti Can fit to observed data yti using non-linear least squares or MCMC Can fit all (400k) time courses simultaneously in a large hierarchical model which effectively borrows strength, especially across repeats, but also across genes Generally works well (fine for most of the downstream scientific applications), but fit is often far from perfect...

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 10

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Fitting the logistic curve

1 2 3 4 5 6 7 0.00 0.05 0.10 0.15

YAL003W

Time (days) Colony size

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 11

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Improved modelling of colony growth curves

Could use a generalised logistic model (Richards’ curve) which breaks the symmetry in the shape of “take off” and “landing” dx dt = rx(1 − (x/K)ν) This helps, but doesn’t address the real problem of strongly auto-correlated residuals Better to introduce noise into the dynamics to get a logistic growth diffusion process

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 12

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Stochastic logistic growth diffusion

Well-known stochastic generalisation of the logistic growth equation, expressed as an Itˆ

stochastic differential equation

(SDE): dXt = rXt(1 − Xt/K)dt + ξ−1/2Xt dWt The drift is exactly as for the deterministic model The diffusion term injects some noise into the dynamics The multiplicative noise ensures that this defines a non-negative stochastic process

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 13

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Sample trajectories from the logistic diffusion

Stochastic logistic growth

Time x 1 2 3 4 5 6 7 0.00 0.05 0.10 0.15

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 14

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Statistical model

Model observational measurements {Yt1, Yt2, . . .} with Yti = Xti + εti where Xti refers to our realisation of the diffusion process Need somewhat sophisticated algorithms to fit these sorts of SDE models to discrete time data Standard algorithms would require knowledge of the transition kernel of the diffusion process, but this is not available for the logistic diffusion Lots of work on Bayesian inference for intractable diffusions (Golightly & W, ’05, ’06, ’08, ’10, ’11), but this won’t scale to simultaneous fitting of tens of thousands of realisations

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 15

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Approximating the stochastic logistic diffusion

Computational constraints mean that we can only really consider working with diffusions having tractable transition kernels (as then we can apply standard MCMC methods for discrete time problems) Would therefore like a tractable approximation to the stochastic logistic diffusion Rom´ an–Rom´ an & Torres–Ruiz (2012) propose just such an approximation: dXt = br ert + bXt dt + ξ−1/2Xt dWt, where b = (K/P) − 1, and use it to fit measured growth curves to data

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 16

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

The logistic diffusion and the RRTR approximation

Stochastic logistic growth

Time x 2 4 6 8 10 0.00 0.05 0.10 0.15

RRTR logistic growth model

Time x 2 4 6 8 10 0.00 0.10 0.20 0.30

The RRTR model behaves asymptotically like Geometric Brownian motion (GBM), where as the true process is mean reverting

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 17

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Linear noise approximation (LNA)

The RRTR model has the desirable feature of log-normal increments, but has problems with long-term behaviour Alternatively, if we apply a log transformation to the logistic diffusion and then carry out a linear noise approximation, the result will also be a process with log-normal increments, but will have mean-reverting behaviour which is clearly desirable here Putting Ut = log Xt, Itˆ

’s formula gives

dUt =

r − 1

2ξ − r K eUt

dt + ξ−1/2dWt

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 18

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Linear noise approximation (LNA)

Decompose Ut into a deterministic component and a stochastic residual process Ut = vt + Zt where vt solves the deterministic part dvt dt = r − 1 2ξ − r K evt Subtracting out the deterministic solution from Ut leaves a residual process of the form dZt = r K evt(1 − eZt)dt + ξ−1/2dWt

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 19

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Linear noise approximation (LNA)

Applying the linear approximation 1 − eZt ≃ −Zt to linearise the drift gives dZt = − r K evtZtdt + ξ−1/2dWt Substituting in for vt then gives dZt = − abPeat bP(eat − 1) + aZtdt + ξ−1/2dWt, where a = r − 1/(2ξ) and b = r/K This is a (zero) mean-reverting time-varying Ornstein–Uhlenbeck (OU) process, and can be solved exactly, giving a normal transition kernel

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 20

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

The logistic diffusion and the LNA

Stochastic logistic growth

Time x 2 4 6 8 10 0.00 0.05 0.10 0.15

LNA logistic diffusion

Time

upath

2 4 6 8 10 0.00 0.05 0.10 0.15

The (log)LNA is a very good approximation to the true process, with tractable log-normal increments

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 21

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Further simplifications and approximations

The LNA is a good model with a tractable transition kernel We can implement standard discrete time MCMC methods to estimate model parameters together with the unobserved latent trajectories Embedding in a hierarchical model is straightforward These methods work fine for hundreds of growth curves, but are still problematic for tens of thousands of growth curves

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 22

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Integrating out the latent process

If we are prepared to assume linear Gaussian error on the log scale, we can use Kalman filtering techniques to integrate out the latent process (but this isn’t very plausible) Alternatively, we could apply a LNA directly to the logistic diffusion (without first transforming), and assume linear Gaussian error on that scale (Heydari et al, 2013) This latter approach turns out to be better, despite the fact that the LNA approximation to the true process isn’t quite as good More important to have a plausible error structure than a super-accurate approximation to the stochastic process

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 23

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Growth curve model

rp ˆ ylmn Klm ylmn σν νp P στ,r τ r,p τ K

σr,o ro

τ K

Kp Ko

rlm σK,o στ,K τ K,p νl Time Point Repeat

rf∆

Population m n l

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 24

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Colony fitness

The results of model fitting are estimates (or posterior distributions) of r and K for each yeast colony, and also the corresponding gene level parameters Both r and K are indicative of colony fitness — keep separate where possible Often useful to have a scalar measure of fitness — many possibilities, including rK, r log K, or MDR×MDP, where MDR is the maximal doubling rate and MDP is the maximal doubling potential Statistical summaries can be fed in as data to the next level of analysis (or, ultimately, modelled jointly as a giant hierarchical model)

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 25

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Multiplicative model

Consider two genes with alleles a/A and b/B with a and b representing “wild type” (note that A and B could potentially represent knock-outs of a and b) Four genotypes: aa, Ab, aB, AB. Use [·] to denote some quantitative phenotypic measure (eg. “fitness”) for each genotype Multiplicative model of genetic independence: [AB] × [ab] = [Ab] × [aB] no epistasis [AB] × [ab] > [Ab] × [aB] synergistic epistasis [AB] × [ab] < [Ab] × [aB] antagonistic epistasis Perhaps simpler if re-written in terms of relative fitness: [AB] [ab] = [Ab] [ab] × [aB] [ab]

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 26

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Genetic independence and HTP data

Suppose that we have scaled our data so that it is consistent with a multiplicative model — what do we expect to see? The independence model [AB] × [ab] = [Ab] × [aB] translates to [query : abc∆] × [wt] = [query] × [abc∆] In other words [query : abc∆] = [query] [wt] × [abc∆] That is, the double-mutant differs from the single-deletion by a constant multiplicative factor that is independent of the particular single-deletion

ie. a scatter-plot of double against single will show them all

lying along a straight line

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 27

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Statistical modelling

Assume that Fclm is the fitness measurement for repeat m of gene deletion l in condition c (c = 1 for the single deletion and c = 2 for the corresponding double-mutant) Model: Fclm ∼ N( ˆ Fcl, 1/νcl) log ˆ Fcl = αc + Zl + δlγcl δl ∼ Bern(p) δl is a variable selection indicator of genetic interaction Then usual Bayesian hierarchical stuff...

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 28

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Genetic interaction model

Fclm αc νp δl Zl l n c ˆ Fclm σγ

γcl Zp σZ νcl Repeat σν Population

rf∆

Condition

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 29

Experiments Data analysis Summary and conclusions Data analysis pipeline Growth curve modelling Modelling genetic interaction

Genetic interaction results

50 100 150 10 20 30 40 50 60

VPS8 BUD14 KIN3 MUM2 UBC4 CHK1 RIF1 :::MRC1 MRC1 MAK31 PTC6 RPN4 RPL13A YDL109C YDL118W YDL119C RPP1B RPL35B RPL35A ARX1 MTC5 SAN1 RAD9 SWM1 YDR269C IPK1 SAC7 RPL27B RPL37B MAK10 NOP16 TMA20 YER119C− A SCS2 FTR1 RAD24 BMH1 BUD27 RPO41 UBP6 RPL2A PIB2 RPL24A DBP3 RPL9A ARO2 TOS3 ARO8 YGL217C UPF3 RPL11B PPT1 PHB2 RPL8A SSF1 NMD2 EST3 YIL055C FYV10 DPH1 MPH1 PET130 TMA22 GEF1 MRT4 ELM1 HAP4 ZRT3 LST4 DPH2 DOA1 MEH1 UTH1 BAS1 RIC1 ERG3 YLR111W SRN2 RPL37A EST1 YLR261C TAL1 SUR4 REH1 VIP1 RIF2 OGG1 VPS9 GTR1 NAM7 ECM5 YMR206W YKU70 YNL011C RPL16B ESBP6 RAD50 MCK1 KRE1 YNR005C STI1 EXO1 SHE4 STD1 CKA2 HIS3 RAD17 HAT1 PNG1 DDC1 YPR044C TKL1 CLB2 VPS4 EDE1 YBL104C PAT1 CYK3 EBS1 RPP2B BST1 LRP1 YIL057C MNI1 FKH1 VPS51 YPT6 RPL6B YML010C− B RPL42A MNT4 RTC1 RIM1 DIE2 COX23 CST6 J J J 3 RPL43A LTE1 SLA1 ARA1 UFD2 RVS167 POT1 CCW12 SUB1 YMR057C FET3 NPT1 CLB5 HMT1 YDL012C PHO2 RPL24B YPS6 IRC21 YPL080C ELP3 MNI2 RPL8B CKB2 NUP2 HSP26 GUP1 AIR1 SNX4 YJ R154W CDC73 MFT1 QCR9 THP2 RPL16A HPR5 DID2 STM1 PGM2 MRE11 CCC2 UFD4 RPL36A MET18 LDB19 PHO13 OCA6 PPH3 VID28 RAD23 RPL34A SKI8 BNR1 YMR193C− A PET122 VPS35 POL32 YLR402W RPL29 DBF2 HGH1 ELP2 BUD28 PPQ1 GPH1 AVT5 MDM34 CTP1 BEM2 RMD11 TEL1 CRD1 CRP1 UPS1 LIA1 COX12 ICT1 GUF1 YMR153C− A MSS18 SYS1 DPH5 COX7 RPL4A RRD1 UBX4 OAC1 YPL062W NAT1 DPB4 SIM1 HCM1 RHR2 YER093C− A GPD2 VPS21 :::RTT103 KAP122 SER2 KEX1 XBP1 BCK1 VPS24 PFK26 VAM10 AYR1 YIL161W TNA1 CBP4 HSP104 VPS60 YGL218W COQ10 YDR271C CLG1 DYN3 ALD6 CAP2 SIW14 NAP1 YNL226W SRB2 YLR218C YLR290C MON1 YPR050C EOS1 OST3 YDL176W PUS7 IMG2 GZF3 FKH2 YKE4 TRP1 VRP1 DCC1 IRA2 TIP41 BUB2 RPL26A SLT2 YLR338W DPB3 FEN1 FCY2 RPS4A RPL43B IKI3 NUP53 PUF6 FUS3 SKI2 NPR1 RPL21B UBX7 YGR259C MCX1 YKR035C ZWF1 PUF4 ARO1 ARP1 NUP133 YJ L185C YOR251C VID24 RNR3 YPT7 PTC2 AEP2 PER1 SAS4 YBR266C CBF1 DRS2 YLR143W OST4 GIM3 ALG3 PTC1 ELP4 CKB1 YML013C− A DYN1 YBL083C STP1 PSD1 TRM44 ARC18 YDR262W ALG12 RPL40B SAS5 BMH2 MAK3 KTI12 YGL042C J NM1 CAT5 GDH1 YTA7 TEF4 CTF4 YKR074W VPS1 ELG1 NRG2 KNS1 VPS5 GPA2 FPS1 PEX32 VID30 VPS17 YBR277C MCM16 YBL059W PHO80 BEM4 EMI5 SBP1 OTU2 LSM6 RPL23A VPS38 UBP3 GID8 DBR1 RPL17B SWF1 MNE1 YMR074C VHS2 SPT2 SWI4 FIT2 CYT2 J J J 1 CBT1 PAC1 SKI7 MBP1 RPL33B PAN2 YNL120C NPL4 EAF3 YOR309C SFH5 CSN12 IMP2' SHE1 FMP35 YDL050C YBR144C PBP2 YLR184W YGL149W RPL9B INP52 YGL057C NTO1 QCR2 RTF1 DOT1 OMS1 VAM7 CYT1 EMI1 NBP2 VAC14 RAV1 YLR407W RPP1A CAC2 MRPL1 LAC1 HPT1 XRS2 YJ L206C KEX2 MDM38 MMM1 ERG24 ACE2 ALD3 CTK1 VAM3 RPE1 OYE2 ERJ 5 RVS161 PKR1 ERG6 OCA4 YBR028C GOS1 MRN1 HAP5 MGR2 SAS2 AAH1 CYC2 CHL1 TOM70 RSM25RBS1 TMA19 SWC5 UBA4 HSE1 ELP6 MTC3 HAP2 UBP2 HSP12 YAP1 SET2 APQ12 APT1 PMS1 BFA1 LEU3 YUR1 VAM6 YJ L120W GRS1 HAP3 ASC1 TOP1 DIP5 YNR029C UME1 ALG9RMD5 SIR3 BTS1 YDR203W TPS3 LDB18 ERD1 ALG6 URM1 PHO87 DSK2 GET1 CKA1 MLH1 YPR098C DEP1 YPR039W GEM1 YDR266C REI1 RNH202 AKL1 MET16 YDR049W YJ L211C PEX8 PHB1 MDM10 YLR217W LRG1 YBR025C EFT2 YBR246W SNF1 YDR149C BRE5 PEX15 PAN3 PAH1 FYV1 YHL005C MIR1 VTS1 RPL22A YPL102C IRC3 PUF3 YDR348C TOM7 THR4 PEX2 YAP1801 PEX13 PMP3 MNN11 QCR6 SYT1 YLR091W YPR084W YGL024W BUL1 ALG5 PPZ1 KAR9 HSP82 TCM62 CPS1 CCZ1 AZF1 SYF2 YML090W OCA5 YDR537C CYM1 ATP10 URE2 KRE11 YBL071C NBA1 YPL035C PEP8 RGS2 PTP3 YOR052C BNA2 YBR232C YKL121W NCS2 YBR226C DEG1 CSM3 TOM6 MMS22 SPE2 YMR310C HIR3 RTT109 YBR099C MIS1 ASR1 RTC4 IMP2 RAD27 YPR004C

Fitness (F) of double mutants at 27°C (doublings / day)

double mutants at 27°C (doublings / day)

Fitness (F) of

rfΔ ura3Δ
rfΔ cdc13-1

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 30

Experiments Data analysis Summary and conclusions Ongoing work Summary References

Ongoing work

Integrating stochastic dynamics into the full joint model Extension to more sophisticated experimental designs: eg. “all-by-all” experiments

Currently running an experiment with 150 gene knockouts (in the cdc13-1 background) on a single plate, using each in turn as the query mutation, to map out the full set of pairwise interactions for the subset Requires an extension and some modification of the current hierarchical models Will allow an in depth study of the prevelance of genetic interaction, and allow consideration of alternative notions of genetic interaction and epistasis, which could distinguish between “direct” and “indirect” interactions

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 31

Experiments Data analysis Summary and conclusions Ongoing work Summary References

Big data issues

Understanding conflict between model and data in big data contexts — does more data demand more complex models? Model simplifications and improvements to MCMC (block updates and proposals, reparameterisations, GVS, etc.) Basic parallelisation strategies (parallel chains, parallellelised single chain) Investigation of data parallel strategies (consensus Monte Carlo, etc.)

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 32

Experiments Data analysis Summary and conclusions Ongoing work Summary References

Summary

Modern bioscience is generating large, complex data sets which require sophisticated modelling in order to answer questions of scientific interest Big data forces trade-offs between statistical accuracy and computational tractability Stochastic dynamic models are much more flexible than deterministic models, but come at a computational cost — the LNA can sometimes represent an excellent compromise Notions of genetic interaction translate directly to statistical models of interaction Big hierarchical variable selection models are useful in genomics, but can be computationally challenging

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 33

Experiments Data analysis Summary and conclusions Ongoing work Summary References

Funding acknowledgements

Major funders of this work: BBSRC MRC Wellcome Trust Cancer Research UK

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast

SLIDE 34

Experiments Data analysis Summary and conclusions Ongoing work Summary References

References

Addinall, S. G., Holstein, E., Lawless, C., Yu, M., Chapman, K., Taschuk, M., Young, A., Ciesiolka, A., Lister, A., Wipat, A., Wilkinson, D. J., Lydall, D. A. (2011) Quantitative fitness analysis shows that NMD proteins and many other protein complexes suppress or enhance distinct telomere cap defects. PLoS Genetics, 7:e1001362. Heydari, J. J., Lawless, C., Lydall, D. A., Wilkinson, D. J. (2014) Bayesian hierarchical modelling for inferring genetic interactions in yeast, in submission. Heydari, J. J., Lawless, C., Lydall, D. A., Wilkinson, D. J. (2014) Fast Bayesian parameter estimation for stochastic logistic growth models, BioSystems, 122:55-72. Lawless, C., Wilkinson, D. J., Addinall, S. G., Lydall, D. A. (2010) Colonyzer: automated quantification of characteristics of microorganism colonies growing on solid agar, BMC Bioinformatics, 11:287. Wilkinson, D. J. (2009) Stochastic modelling for quantitative description of heterogeneous biological systems, Nature Reviews Genetics. 10(2):122-133. Wilkinson, D. J. (2011) Stochastic Modelling for Systems Biology, second

edition. Chapman & Hall/CRC Press.

Darren Wilkinson — ICMS, 6/5/2015 Genetic interaction in budding yeast