Uncovering Proteins Functions Through Multi-Layer Tissue Networks - - PowerPoint PPT Presentation

uncovering proteins functions through multi layer tissue
SMART_READER_LITE
LIVE PREVIEW

Uncovering Proteins Functions Through Multi-Layer Tissue Networks - - PowerPoint PPT Presentation

Uncovering Proteins Functions Through Multi-Layer Tissue Networks Marinka Zitnik marinka@cs.stanford.edu Joint work with Jure Leskovec Why tissues? A unified view of cellular functions across human tissues is essential for understanding


slide-1
SLIDE 1

Uncovering Proteins Functions Through Multi-Layer Tissue Networks

Marinka Zitnik

marinka@cs.stanford.edu Joint work with Jure Leskovec

slide-2
SLIDE 2

Why tissues?

A unified view of cellular functions across human tissues is essential for understanding biology, interpreting genetic variation, and developing therapeutic strategies

[Greene et al. 2015, Yeger & Sharan 2015, GTEx and others]

Marinka Zitnik, Stanford, ISMB/ECCB 2017 2

slide-3
SLIDE 3

What Does My Protein Do?

Goal: Given a set of proteins and possible functions, predict each protein’s association with each function

Proteins × (Functions, Tissues) → [0,1]

𝑋𝑂𝑈1 × (Midbrain development, Substantia nigra) → 0.9 RPT6 × (Angiogenesis, Blood) → 0.05

Midbrain development WNT1

PPI network in substantia nigra tissue

Angiogenesis RPT6

PPI network in blood tissue

Marinka Zitnik, Stanford, ISMB/ECCB 2017 3

slide-4
SLIDE 4

Existing Research

§ Guilty by association: protein’s function is determined based on who it interacts with [Zuberi et al. 2013, Radivojac et al. 2013,

Kramer et al. 2014, Yu et al. 2015] and many others]

§ No tissue-specificity

§ Protein functions are assumed constant across organs and tissues:

§ Functions in heart are the same as in skin

Lack of methods for predicting protein functions in different biological contexts

Marinka Zitnik, Stanford, ISMB/ECCB 2017 4

slide-5
SLIDE 5

Challenges

§ Tissues have inherently multiscale, hierarchical

  • rganization

§ Tissues are related to each other:

§ Proteins in biologically similar tissues have similar functions [Greene et al. 2015, ENCODE 2016] § Proteins are missing in some tissues

§ Interaction networks are tissue-specific § Many tissues have no annotations

Marinka Zitnik, Stanford, ISMB/ECCB 2017 5

slide-6
SLIDE 6

Machine Learning in Networks

WNT1 INA DLPG5 GPR4 ETS1 NDNF RHOA HPSE WNT1 INA DLG5 GPR4 ETS1 NDNF RHOA

Angiogenesis Midbrain development

HPSE

Machine learning

Multi-label node classification: midbrain development, angiogenesis, etc.

Marinka Zitnik, Stanford, ISMB/ECCB 2017 6

slide-7
SLIDE 7

Machine Learning Lifecycle

Raw Networks Node and edge profiles Learning Algorithm Prediction Model

Downstream task: Protein function prediction Feature engineering

Automatically learn the features

§ Machine learning lifecycle: This feature, that feature § Every single time!

Marinka Zitnik, Stanford, ISMB/ECCB 2017 7

slide-8
SLIDE 8

Feature Learning in Multi-Layer Graphs

OhmNet: Unsupervised feature learning for multi-layer networks

Vectors, node embeddings

𝑔

L, 𝑔 M, 𝑔 N

𝑔

O, 𝑔 P, 𝑔 Q

𝑣 → ℝT

u u u

Layer Layer Layer Scale “3” Scale “2” Scale “1”

Marinka Zitnik, Stanford, ISMB/ECCB 2017 8

slide-9
SLIDE 9

Features in Multi-Layer Tissue Network

§ Given: Layers 𝐻L L, hierarchy ℳ

§ Layers 𝐻L LWQ..X are in leaves of ℳ

§ Goal: Learn functions: 𝑔

L: 𝑊 L → ℝT

§ Multi-scale model:

§ Learn node embeddings at each possible scale § Layers 𝑗, 𝑘, 𝑙, 𝑚

§ Scales “3”, “2”, “1”

Marinka Zitnik, Stanford, ISMB/ECCB 2017 9

slide-10
SLIDE 10

OhmNet Learning Approach

OhmNet has two components:

  • 1. Single-layer objectives

Nodes with similar network neighborhoods in each layer are embedded close together

  • 2. Hierarchical dependency objectives

Nodes in nearby network layers in the hierarchy share similar features

Marinka Zitnik, Stanford, ISMB/ECCB 2017 10

slide-11
SLIDE 11

Single-Layer Objectives

§ Intuition: For each layer, embed nodes to 𝑒 dimensions by preserving their similarity § Two nodes are similar if their neighborhoods are similar § For node 𝑣 in layer 𝑗 we define nearby nodes as nodes in 𝐻L visited by random walks starting at 𝑣

u u

Marinka Zitnik, Stanford, ISMB/ECCB 2017 11

slide-12
SLIDE 12

Dependencies Between Network Layers

§ Intuition: Proteins in biologically similar tissues share similar features § Use tissue hierarchy to recursively regularize features at 𝑗 to be similar to features in 𝑗’s parent

“2” is a parent of 𝐻L and 𝐻

`

OhmNet generates multi-scale node embeddings

Marinka Zitnik, Stanford, ISMB/ECCB 2017 12

slide-13
SLIDE 13

FemaleReproductiveSystem FemaleReproductiveSystem Choroid Choroid Eye Eye NervousSystem NervousSystem Placenta Placenta Integument Integument Retina Retina Hindbrain Hindbrain PancreaticIslet PancreaticIslet Basophil Basophil SpinalCord SpinalCord Spermatid Spermatid EndocrineGland EndocrineGland ReproductiveSystem ReproductiveSystem ParietalLobe ParietalLobe Hepatocyte Hepatocyte CorpusCallosum CorpusCallosum Pons Pons TemporalLobe TemporalLobe Pancreas Pancreas Oviduct Oviduct BloodPlasma BloodPlasma Lens Lens Glia Glia

Data: 107 Tissue Layers

§ Layers are PPI nets: § Nodes: proteins § Edges: tissue-specific PPIs § Node labels: § “Cortex development” in renal cortex tissue § “Artery morphogenesis” in artery tissue

One layer

Marinka Zitnik, Stanford, ISMB/ECCB 2017 13

slide-14
SLIDE 14

Experimental Setup

§ Protein function prediction is a multi-label node classification task § Every node (protein) is assigned one or more labels (functions) § Setup:

§ Learn OhmNet embeddings for multi-layer tissue network § Train a classifier for each function based on a fraction of proteins and all their functions § Predict functions for new proteins

Marinka Zitnik, Stanford, ISMB/ECCB 2017 14

slide-15
SLIDE 15

0.756

Tissue-Specific Protein Functions

OhmNet Protein function prediction methods Mono-layer network embeddings Tensor decompositions

>10% improvement over function prediction methods >18% improvement over non- hierarchical versions of the dataset >15% improvement over matrix-based methods

Marinka Zitnik, Stanford, ISMB/ECCB 2017 15

slide-16
SLIDE 16

Case Study: 9 Brain Tissues

Frontal lobe Medulla

  • blongata

Pons Substantia nigra Midbrain Parietal lobe Occipital lobe Temporal lobe

Brainstem Brain

Cerebellum

9 brain tissue PPI networks in two-level hierarchy

Marinka Zitnik, Stanford, ISMB/ECCB 2017 16

slide-17
SLIDE 17

Multi-Scale Node Embeddings

Brainstem Brain

Marinka Zitnik, Stanford, ISMB/ECCB 2017 17

slide-18
SLIDE 18

Annotating Proteins in a New Tissue

§ Transfer protein functions to an unannotated tissue § Task: Predict functions in target tissue without access to any annotation/label in that tissue

Target tissue Tissue-specific (OhmNet) Tissue non-specific Improvement Placenta 0.758 0.684 11% Spleen 0.779 0.712 10% Liver 0.741 0.553 34% Forebrain 0.755 0.632 20% Blood plasma 0.703 0.540 40% Smooth muscle 0.729 0.583 25% Average 0.746 0.617 21% Reported are AUROC values (see paper for other metrics)

Marinka Zitnik, Stanford, ISMB/ECCB 2017 18

slide-19
SLIDE 19

Conclusions

§ Unsupervised feature learning for multi-layer networks § Learned embeddings can be used for any downstream prediction task: node classification, node clustering, link prediction § OhmNet predicts protein functions across biological contexts

A shift from flat networks to large multiscale systems in biology

Marinka Zitnik, Stanford, ISMB/ECCB 2017 19

slide-20
SLIDE 20

snap.stanford.edu/ohmnet

Poster A-294

Travel Award

Marinka Zitnik, Stanford, ISMB/ECCB 2017 20