Metabolic pathway identification via unsupervised methods Max - PowerPoint PPT Presentation

Metabolic pathway identification via unsupervised methods Max Conway

Outline ● What a metabolic model is, and why you would want one ● How to make one ○ Basic data format ○ Steady state assumption ○ Biomass maximization assumption ● Controlling metabolism with gene expression ● Building up a multiplex network ● Collapsing it back down again with our take on Similarity Network Fusion ● Pathway labelling: ○ Linear approaches ○ Decision Trees ○ Restricted Boltzmann machine

Basic data format ● Input table or SBML file ● Can be transformed to stoichiometric matrix Name Reaction Min Max C 6 H 12 O 6 O 2 CO 2 H 2 O Respiration C 6 H 12 O 6 + 6 O 2 → 6 CO 2 + 6 H 2 O 0 100 Respiration -1 -6 6 6 Ex: Glucose → C 6 H 12 O 6 -100 1 Ex: Glucose 1 0 0 0 Ex: Oxygen → O 2 -100 10 Ex: Oxygen 0 1 0 0 Ex: CO2 → CO 2 -100 0 Ex: CO2 0 0 1 0 Ex: Water → H 2 O -100 10 Ex: Water 0 0 0 1

Water CO2 Steady State assumption ● The reaction table and Photosynthesis stoichiometric matrix tell us what reactions exist, and rough speed limits, but we need stronger assumptions to better understand how reactions Oxygen Glucose relate. ● Therefore, we assume that the network is in steady state. Respiration ADP ATP

Biomass maximization We need more constraints: Once we’ve got the fittest phenotype, we can find out what other properties it has: ● Steady state constrains the model to possible phenotypes ● How would it respond to changes of ● But which of these phenotypes is the one condition? chosen by nature? ● What metabolites would it produce? ● The fittest one! ● What can we do to make it produce more ● We use linear programming on the of the metabolites we’d like? constraints and stoichiometric matrix to find the model with highest biomass output.

Adding Gene Expression ● Map gene expressions to flux bounds ● Use Colombos gene expression compendium ● Create a set of 2 369 flux distributions with associated gene expressions

Building up a multiplex network 2369 individuals, each with: Pivot the network: ● 4280 Gene expressions ● Before: ● 1260 internal fluxes ○ Nodes are reactions and metabolites ○ Edges are fluxes ● ~10 external fluxes ○ Layers are individuals ● After: How do we interpret all this information? ○ Nodes are individuals ○ Edges are correlations ○ Layers are datasets (fluxes or genes)

Similarity Network Fusion Basic similarity network fusion: We used a weighted mean, rather than an unweighted mean. ● First transform to similarity network (vs distance) This makes sense because our layers are not ● Iteratively move each edge similarity equivalent to each other. closer to the mean of the parallel edges in other layers ● Wait for convergence

Results Heat map of spectral clustering of fused network ● Orange top bar: 5-deoxyribose exchange ● Green side bar: biomass X and Y axes are individuals, blue colour intensity is similarity. But what does it mean?

What does it mean/what next? Network clusterings are often hard to interpret Implicit model in network algorithms is often less obvious than in tabular algorithms Want to look at identifying structure within networks, such as subsystems

Labelling pathways ● Multiple valid labellings ● Subsystem annotations exist, but don’t tell us much ● A good model should be able to predict fluxes from other fluxes ● The structure of the model gives us the pathways ● We need an interpretable model

Linear approaches Correlation with important fluxes Principal Component Analysis ● Choose some important exchange fluxes ● Natural conclusion of correlation based (e.g. biomass, O2 excretion) approach ● See which reactions correlate with them ● Look at every pair ● Choosing more exchange fluxes gives us ● Loadings give us the amount of influence more information of each reaction But: ● Can’t deal with nonlinearity ● Can only tell us average coefficient over all conditions

Decision tree Regression tree, using R’s Cubist package. Pros: ● Build a decision tree ● Fast to build and run ● Break it down into a set of rules ● Piecewise-linear model makes sense ● Group the observations by the rules given the structure of the dataset ● Interpolate using a regression model ● Highly accurate: cross-validated based on the remaining variables correlation > 0.99 Cons: ● Only predicts one flux at a time ● No obvious way to have one model predict all fluxes

Restricted Boltzmann Machine A neural network that predicts its own inputs Pros: ● Simple change from classification network ● Adjustable model complexity (depth and width) ● Nonlinear Cons: ● Slow to train Simplified model Fluxes

Summary ● Flux balance analysis metabolic models are detailed, steady state network models ● We estimate how continuous gene expression values affect them ● Looking at many gene expression vectors gives us a large multiplex network ● Similarity Network Fusion can help simplify this, but we still need more interpretability ● Linear dimension reduction can only take us so far ● Decision trees model the data well, but are not well suited to unsupervised use ● RBMs are more appropriate for nonlinear unsupervised learning

Max Conway, Claudio Angione, Pietro Lio’ Thanks! conway.max1@gmail.com github.com/maxconway Questions?

Metabolic pathway identification via unsupervised methods Max - PowerPoint PPT Presentation

Metabolic pathway identification via unsupervised methods Max Conway Outline What a metabolic model is, and why you would want one How to make one Basic data format Steady state assumption Biomass maximization

OR ey What are the pathways? Pathway 1 Pathway 2 Pathway 3 Pathway 4

Metabolic Pathways Networks of Care Professor Anne Green Lead Scientist Metabolic Biochemistry

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Machine Learning Methods for Metabolic Pathway Prediction Joseph M. Dale, Liviu Popescu, and

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Metabolic flux estimation So far in this course we have examined techniques that help us

Metabolic flux estimation So far in this course we have examined techniques that help us

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Integrating flux balance analysis of fungal genome-scale metabolic networks into metabolic

Whats in the PAH Nitric Oxide Pathway Pathway Pathway Endothelial cells Endothelial cells

Of MODS and Models: Predicting and Validating Phenotypes from Pathway Tools Metabolic Models

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Chapter 8 Metabolism Slide 2 / 64 Metabolic Pathways Metabolism is the totality of an

PROSPECTS FOR ALGAE ARI PATRINOS SYNTHETIC GENOMICS INC. Role of Metabolic Engineering in

Metabolic Investigations of Molecular Mechanisms Associated with Parkinsons Disease. Robert

BeWell: A Smartphone Application to Monitor, Model and Promote Wellbeing Nicholas D. Lane,

and Potential Implications of Sedentary Physiology James A. Stone BPHE, BA, MSc, MD, PhD, FRCPC,

The Role of NonInvasive Testing in CAD Teresa Daniele,M.D, FACC Chief of Cardiology,

Hebbian Learning, Hebbian Learning Principal Component Analysis, and Independent Component

A TOPOLOGISTS VIEW OF SYMMETRIC AND QUADRATIC FORMS Andrew Ranicki (Edinburgh)

Liu et al.: Controllability of complex References networks Sandbox slides. Peter Sheridan Dodds

Alexander Lee: C: elegans metabolic network Graph of C. elegans metabolic network. Note that

Sambuz

Useful Links

Newsletter

Mail Us