Mapping the sub-cellular proteome Laurent Gatto lg390@cam.ac.uk - - PowerPoint PPT Presentation

mapping the sub cellular proteome
SMART_READER_LITE
LIVE PREVIEW

Mapping the sub-cellular proteome Laurent Gatto lg390@cam.ac.uk - - PowerPoint PPT Presentation

Mapping the sub-cellular proteome Laurent Gatto lg390@cam.ac.uk @lgatt0 http://www.damtp.cam.ac.uk/user/lg390/ Slides @ https://zenodo.org/record/1063508 22 Nov 2017, Cambridge Computational Biology Institute These slides are available under


slide-1
SLIDE 1

Mapping the sub-cellular proteome

Laurent Gatto lg390@cam.ac.uk – @lgatt0 http://www.damtp.cam.ac.uk/user/lg390/ Slides @ https://zenodo.org/record/1063508 22 Nov 2017, Cambridge Computational Biology Institute

slide-2
SLIDE 2

These slides are available under a creative common CC-BY license. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially .

slide-3
SLIDE 3

Plan

Spatial proteomics The LOPIT pipeline Improving on LOPIT

Experimental advances: hyperLOPIT Computational advances: Transfer learning

Biological applications

Dual-localisation Trans-localisation

Open development: R/Bioconductor software

slide-4
SLIDE 4

Regulations

slide-5
SLIDE 5

Cell organisation

Spatial proteomics is the systematic study of protein localisations.

Image from Wikipedia http://en.wikipedia.org/wiki/Cell_(biology).

slide-6
SLIDE 6

Spatial proteomics - Why?

Localisation is function

◮ The cellular sub-division allows cells to establish a range of

distinct micro-environments, each favouring different biochemical reactions and interactions and, therefore, allowing each compartment to fulfil a particular functional role.

◮ Localisation and sequestration of proteins within sub-cellular

niches is a fundamental mechanism for the post-translational regulation of protein function.

Re-localisation in

◮ Differentiation: Tfe3 in mouse ESC (Betschinger et al., 2013). ◮ Activation of biological processes.

Examples later.

slide-7
SLIDE 7

Spatial proteomics - Why?

Mis-localisation

Disruption of the targeting/trafficking process alters proper sub-cellular localisation, which in turn perturb the cellular functions of the proteins.

◮ Abnormal protein localisation leading to the loss of functional

effects in diseases (Laurila and Vihinen, 2009).

◮ Disruption of the nuclear/cytoplasmic transport (nuclear

pores) have been detected in many types of carcinoma cells (Kau et al., 2004).

slide-8
SLIDE 8

Spatial proteomics - How, experimentally

Single cell direct

  • bservation

Population level Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometry Cataloguing Relative abundance

1 fraction 2 fractions (enriched and crude) n discrete fractions n continuous fractions (gradient approaches)

Subtractive proteomics (enrichment) Invariant rich fraction (clustering)

(χ )

2 PCP LOPIT (PCA, PLS-DA) Pure fraction catalogue GFP Epitope Prot.-spec. antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010)

slide-9
SLIDE 9

Fusion proteins and immunofluorescence

Figure : Targeted protein localisation.

slide-10
SLIDE 10

Fusion proteins and immunofluorescence

Figure : Example of discrepancies between IF and FPs as well as between FP tagging at the N and C termini (Stadler et al., 2013).

slide-11
SLIDE 11

Spatial proteomics - How, experimentally

Single cell direct

  • bservation

Population level Subcellular fractionation (number of fractions)

Tagging Quantitative mass spectrometry Cataloguing Relative abundance

1 fraction 2 fractions (enriched and crude) n discrete fractions n continuous fractions (gradient approaches)

Subtractive proteomics (enrichment) Invariant rich fraction (clustering)

(χ )

2 PCP LOPIT (PCA, PLS-DA) Pure fraction catalogue GFP Epitope Prot.-spec. antibody

Figure : Organelle proteomics approaches (Gatto et al., 2010). Gradient approaches: Dunkley et al. (2006), Foster et al. (2006).

⇒ Explorative/discovery approches, steady-state global localisation maps.

slide-12
SLIDE 12

Fractionation/centrifugation

Quantitation/identification by mass spectrometry

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

slide-13
SLIDE 13

Quantitation data and organelle markers

Fraction1 Fraction2 . . . Fractionm markers p1 q1,1 q1,2 . . . q1,m unknown p2 q2,1 q2,2 . . . q2,m loc1 p3 q3,1 q3,2 . . . q3,m unknown p4 q4,1 q4,2 . . . q4,m loci . . . . . . . . . . . . . . . . . . pj qj,1 qj,2 . . . qj, m unknown

slide-14
SLIDE 14

Visualisation and classification

0.2 0.3 0.4 0.5

Correlation profile − ER

Fractions

1 2 4 5 7 8 11 12 0.1 0.2 0.3 0.4

Correlation profile − Golgi

Fractions

1 2 4 5 7 8 11 12 0.0 0.1 0.2 0.3 0.4 0.5 0.6

Correlation profile − mit/plastid

Fractions

1 2 4 5 7 8 11 12 0.15 0.20 0.25 0.30 0.35

Correlation profile − PM

Fractions

1 2 4 5 7 8 11 12 0.1 0.2 0.3 0.4 0.5 0.6

Correlation profile − Vacuole

Fractions

1 2 4 5 7 8 11 12

  • −10

−5 5 −5 5

Principal component analysis

PC1 PC2

  • ER

Golgi mit/plastid PM vacuole marker PLS−DA unknown

Figure : From Gatto et al. (2010), Arabidopsis thaliana data from Dunkley et al. (2006)

slide-15
SLIDE 15

Data analysis

Fraction1 Fraction2 . . . Fractionm prot1 q1,1 q1,2 . . . q1, m prot2 q2,1 q2,2 . . . q2, m prot3 q3,1 q3,2 . . . q3, m prot4 q4,1 q4,2 . . . q4, m . . . . . . . . . . . . . . . proti qi,1 qi,2 . . . qi, m . . . . . . . . . . . . . . . protn qn,1 qn,2 . . . qn, m markers . . . unknown . . .

  • rganelle1

unknown

  • rganelle2

. . . . . . . . .

  • rganellek

. . . . . . . . . . . . unknown Fraction1 Fraction2 . . . Fractionm prot1 . . . . . . . . . . . . proti . . . . . . . . . . . . protn . . . . . . . . . . . .

−6 −4 −2 2 4 6 −4 −2 2 4 Principal Component Analysis Plot PC1 (64.36%) PC2 (22.34%)

  • ● ●
  • Supervised machine learning

Using labelled marker proteins to match unlabelled proteins (of unknown localisation) with similar profiles and classify them as residents to the markers organelle class.

slide-16
SLIDE 16

Supervised ML

−6 −4 −2 2 4 −4 −2 2 4

Organelle markers

PC1 (48.41%) PC2 (23.85%)

  • 40S Ribosome

60S Ribosome Actin cytoskeleton Cytosol Endoplasmic reticulum/Golgi apparatus Endosome Extracellular matrix Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Non−chromatin Peroxisome Plasma membrane Proteasome unknown

−6 −4 −2 2 4 −4 −2 2 4

Classifcation (SVM)

PC1 (48.41%) PC2 (23.85%)

  • Figure : Support vector machines classifier (after classification cutoff) on

the embryonic stem cell data from Christoforou et al. (2016).

slide-17
SLIDE 17

Importance of annotation

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 PC1 (58.53%) PC2 (29.96%)

  • ER/Golgi

mitochondrion PM unknown

Incomplete annotation, and therefore lack of training data, for many/most organelles. Drosophila data from Tan et al. (2009).

slide-18
SLIDE 18

Semi-supervised learning: novelty detection

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 PC1 (58.53%) PC2 (29.96%)

  • ER/Golgi

mitochondrion PM unknown

−3 −2 −1 1 2 3 −3 −2 −1 1 2 3 PC1 (58.53%) PC2 (29.96%)

Cytoskeleton ER Golgi Lysosome mitochondrion Nucleus Peroxisome PM Proteasome Ribosome 40S Ribosome 60S

Figure : Left: Original Drosophila data from Tan et al. (2009). Right: After semi-supervised learning and classification, Breckels et al. (2013).

slide-19
SLIDE 19

Improving on LOPIT

Improving is obtaining better sub-cellular resolution to increase the number of protein that can be confidently assigned to a sub-cellular niche.

−2 2 4 −2 −1 1 2 3 4 PC1 (40.28%) PC2 (25.7%)

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome unknown

−6 −4 −2 2 4 −4 −2 2 4 PC1 (48.41%) PC2 (23.85%)

  • 40S Ribosome

60S Ribosome Actin cytoskeleton Cytosol Endoplasmic reticulum/Golgi apparatus Endosome Extracellular matrix Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Non−chromatin Peroxisome Plasma membrane Proteasome unknown

Figure : E14TG2a embryonic stem cells: old (left) vs. new, better resolved (right) experiments (Christoforou et al. (2016)).

slide-20
SLIDE 20

Improving on LOPIT

Improving is obtaining better sub-cellular resolution to increase the number of protein that can be confidently assigned to a sub-cellular niche ⇒ biological discoveries. LOPIT Dunkley et al. (2006) Gatto et al. (2014a) Computational: transfer learning Breckels et al. (2016a) Experimental: hyperLOPIT Christoforou et al. (2016) Mulvey et al. (2017) Breckels et al. (2016b) Biological discoveries

slide-21
SLIDE 21

Experimental advances: hyperLOPIT

Figure : From Mulvey et al. (2017) Using hyperLOPIT to perform high-resolution mapping of the spatial proteome.

slide-22
SLIDE 22

−2 2 4 −2 −1 1 2 3 4 PC1 (40.28%) PC2 (25.7%)

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome unknown

−4 −2 2 4 −4 −2 2 4 PC1 (50.56%) PC2 (24.34%)

  • 40S Ribosome

60S Ribosome Actin cytoskeleton Cytosol Endoplasmic reticulum/Golgi apparatus Endosome Extracellular matrix Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Non−chromatin Peroxisome Plasma membrane Proteasome unknown

Figure : E14TG2a LOPIT on 8 fractions (using iTRAQ 8-plex) and 1109 proteins vs. hyperLOPIT on 10 fractions (using TMT 10-plex) and SPS-MS3 for 5032 proteins.

slide-23
SLIDE 23

Computational advances: Transfer learning

What about using addition data, such as annotations from the Gene Ontogy (GO), sequence features (pseudo aminoacid composition), signal peptide, trans-membrane domains (length, number, ...), images (IF, FP), prediction software, . . .

◮ From a user perspective: ”free/cheap” vs. expensive and

time-consuming experiments.

◮ Abundant (all proteins, 100s of features) vs. (experimentally)

limited/targeted (1000s of proteins, 6 – 20 of features)

◮ For localisation in system at hand: low vs. high quality ◮ Static vs. dynamic

slide-24
SLIDE 24

Transfer learning

What about annotation data from repositories such as the Gene Ontology (GO), sequence features, signal peptide, transmembrane domains, images, prediction software, . . .

Transfer learning

Support/complement the primary target domain (experimental data) with auxiliary data (annotation, imaging, PPI, ...) features without compromising the integrity of our primary data (Breckels et al., 2016a).

slide-25
SLIDE 25

Fractionation/centrifugation

Quantitation/identification by mass spectrometry Database query Extract GO CC terms Convert terms to binary

PRIMARY EXPERIMENTAL DATA AUXILIARY DRY DATA

O00767 P51648 Q2TAA5 Q9UKV5 . . . . . . GO:0016021 GO:0005789 GO:0005783 ... ... ... 1 1 1 ... ... ... 1 1 0 ... ... ... 1 1 0 ... ... ... 0 0 0 ... ... ... . . . . . . . . . . . . . . . . . . x1 . . . . . . . . xn GO1 ... ... ... ... GOA O00767 P51648 Q2TAA5 Q9UKV5 . . . . . . 0.1361 0.150 0.1062 0.147 0.277 0.1429 0.0380 0.00338 0.1914 0.205 0.0566 0.165 0.237 0.0996 0.0180 0.02727 0.1297 0.201 0.0546 0.146 0.292 0.1463 0.0206 0.00902 0.0939 0.207 0.0419 0.204 0.344 0.1098 0.0000 0.00000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 . . . . . . . . xn X113 X114 X115 X116 X117 X118 X119 X121

Visualisation Visualisation

e.g. Mitochondrion

Cell lysis

e.g. Mitochondrion

slide-26
SLIDE 26

Transfer learnig, based on Wu and Dietterich (2004):

Class-weighted kNN

V (ci)j = θ∗nP

ij + (1 − θ∗)nA ij

−2 2 4 −2 −1 1 2 3 4 PC1 (40.28%) PC2 (25.7%)

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome unknown

Linear programming SVM

f (x, v; αP, αA, b) =

m

  • l=1

yl

  • αP

l K P(xl, x) + αA l K A(vl, v)

  • + b
slide-27
SLIDE 27

D ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡E ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ A ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡B ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡C ¡

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome 0.4 0.6 0.8 1.0 0.6 0.7 0.8 0.9 1.0 0.00 0.25 0.50 0.75 1.00 0.7 0.8 0.9 1.0 0.00 0.25 0.50 0.75 1.00 0.75 0.80 0.85 0.90 0.95 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary Combined Primary Auxiliary F1 score −6 −4 −2 −6 −4 −2 2 PC1 (3.43%) PC2 (2.08%)

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome unknown −2 2 4 −2 −1 1 2 3 4 PC1 (40.28%) PC2 (25.7%)

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome unknown

  • 0.5

0.6 0.7 0.8 0.9 Combined Primary Auxiliary F1 score Proteasome Plasma membrane Nucleus − Nucleolus Nucleus − Chromatin Mitochondrion Lysosome Endoplasmic reticulum Cytosol 60S Ribosome 40S Ribosome 1/3 2/3 1 Classifier weight Class

Data from mouse stem cells (E14TG2a).

slide-28
SLIDE 28

−2 2 4 −2 −1 1 2 3 4 PC1 (40.28%) PC2 (25.7%)

  • 40S Ribosome

60S Ribosome Cytosol Endoplasmic reticulum Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Nucleolus Plasma membrane Proteasome unknown

−6 −4 −2 2 4 −4 −2 2 4 PC1 (48.41%) PC2 (23.85%)

  • 40S Ribosome

60S Ribosome Actin cytoskeleton Cytosol Endoplasmic reticulum/Golgi apparatus Endosome Extracellular matrix Lysosome Mitochondrion Nucleus − Chromatin Nucleus − Non−chromatin Peroxisome Plasma membrane Proteasome unknown

0.25 0.50 0.75 1.00 knn knn−TL svm svm−TL Scores

  • utcome

correct incorrect

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR (1 − specificity) TPR (sensitivity) k−NN k−NN TL (Breckels) k−NN TL (Wu) SVM SVM TL (Breckels)

Figure : From Breckels et al. (2016a) Learning from heterogeneous data

sources: an application in spatial proteomics.

slide-29
SLIDE 29

Biological discoveries

◮ Multi-localisation ◮ Trans-localisation

Dependent on good sub-cellular resolution.

slide-30
SLIDE 30

Dual-localisation Proteins may be present simultaneously in several organelles (e.g. trafficking). Simulation on A. thaliana data from Dunkley et al. (2006) (Gatto et al., 2014b) (left). Example from embryonic stem cells (Christoforou et al., 2016) (right).

−6 −4 −2 2 4 6 −4 −2 2 4 PC1 (64.36%) PC2 (22.34%)

  • ● ●
  • ER lumen

ER membrane Golgi Mitochondrion Plastid PM Ribosome TGN vacuole unknown

  • ● ● ●
  • ● ● ● ● ●
  • ● ● ● ● ●
slide-31
SLIDE 31

Dual-localisation Proteins may be present simultaneously in several organelles (e.g. trafficking). Simulation on A. thaliana data from Dunkley et al. (2006) (Gatto et al., 2014b) (left). Example from embryonic stem cells (Christoforou et al., 2016) (right).

−6 −4 −2 2 4 6 −4 −2 2 4 PC1 (64.36%) PC2 (22.34%)

  • ● ●
  • ER lumen

ER membrane Golgi Mitochondrion Plastid PM Ribosome TGN vacuole unknown

  • ● ● ●
  • ● ● ● ● ●
  • ● ● ● ● ●

From Betschinger et al. (2013)

−6 −4 −2 2 4 −4 −2 2 4

Mouse ESC (E14TG2a) in serum LIF

PC1 (50.05%) PC2 (24.61%)

  • Actin cytoskeleton

Cytosol Endosome ER/GA Extracellular matrix Lysosome Mitochondria Nucleus − Chromatin Nucleus − Nucleolus Peroxisome Plasma Membrane Proteasome Ribosome 40S Ribosome 60S unknown

  • Tfe3
slide-32
SLIDE 32

Spatial dynamics

Trans-localisation event during monocyte to macrophage differenciation

Investigate the effect of LPS-mediated inflammatory response in human monocytic cells (THP-1)

Data

◮ Triplicate temporal profiling (0, 2, 4, 6, 12, 24 hours). ◮ Triplicate spatial profiling (0 vs 12 hours) - early trafficking,

before actual morphological differentiation at 24h. Work lead by Dr Claire Mulvey, Cambridge Centre for Proteomics.

slide-33
SLIDE 33

−10 −5 5 10 −5 5 10

Unstimulated

PC1 (36.64%) PC2 (20.7%)

  • −15

−10 −5 5 10 −5 5 10

LPS 12hrs

PC1 (37.4%) PC2 (19.23%)

  • ● ●
  • Cytosol

Endoplasmic Reticulum Golgi Apparatus Lysosome Mitochondria Nucleus Peroxisome plasma mem unknown

Figure : Spatial maps: unstimulated and LPS-treated.

slide-34
SLIDE 34

−10 −5 5 10 −5 5 10

Unstimulated

PC1 (36.64%) PC2 (20.7%)

  • PKCA
  • PKCB

−15 −10 −5 5 10 −5 5 10

LPS 12hrs

PC1 (37.4%) PC2 (19.23%)

  • ● ●
  • PKCA
  • PKCB
  • Cytosol

Endoplasmic Reticulum Golgi Apparatus Lysosome Mitochondria Nucleus Peroxisome plasma mem unknown

Figure : Relocation of Protein Kinase C alpha and beta from the cytosol to the plasma membrane, driving maturation into a differentiated macrophage phenotype.

slide-35
SLIDE 35

−10 −5 5 10 −5 5 10

Unstimulated

PC1 (36.64%) PC2 (20.7%)

  • STAT2
  • STAT3
  • STAT6

−15 −10 −5 5 10 −5 5 10

LPS 12hrs

PC1 (37.4%) PC2 (19.23%)

  • ● ●
  • STAT2
  • STAT3
  • STAT6
  • Cytosol

Endoplasmic Reticulum Golgi Apparatus Lysosome Mitochondria Nucleus Peroxisome plasma mem unknown

Figure : Relocation of Signal transducer and activator of transcription 6 (STAT6) from the cytosol to the Nucleus, activating anti-bacterial and anti-viral-like response. Validated by microscopy and see also Chen et al. (2011).

slide-36
SLIDE 36

Beyond organelles: application to PPI/Protein complexes

−10 −5 5 10 −5 5 10

markers

PC1 (47.02%) PC2 (22.25%)

  • ● ●
  • ● ●●
  • 14−3

19S 20S 40S 60S CCT eIF3 Ku70/Ku80 PA28 Rab unknown

Figure : Data on proteasome complexes from Fabre et al. Mol Syst Biol (2015), DOI: 10.15252/msb.20145497

slide-37
SLIDE 37

Plan

Spatial proteomics The LOPIT pipeline Improving on LOPIT

Experimental advances: hyperLOPIT Computational advances: Transfer learning

Biological applications

Dual-localisation Trans-localisation

Open development: R/Bioconductor software

slide-38
SLIDE 38

But none of this would matter if it wasn’t reproducible! Try it out yourselves:

slide-39
SLIDE 39

R/Bioconductor:

◮ Software for spatial proteomics. ◮ Ecosystem for high throughput biology data analysis and

comprehension.

slide-40
SLIDE 40

Software for mass spectrometry and (spatial) proteomics

Bioconductor Open source, enable reproducible research, enables understanding of the data (not a black box) and drive scientific innovation.

◮ mzR – low level access to raw and identification mass spectrometry

data (Chambers and et al., 2012)

◮ MSnbase – infrastructure to handle quantitative data and meta-data

(Gatto and Lilley, 2012) (500 unique IP download/month in 2016).

◮ pRoloc and pRolocGUI – dedicated visualisation and ML

infrastructure for spatial proteomics (Gatto et al., 2014a) (200 unique IP download/month in 2016). Try it out at https://lgatto.shinyapps.io/christoforou2015/

◮ pRolocdata – structured and annotated spatial proteomics data

(Gatto et al., 2014a).

◮ And more generally RforProteomics (Gatto and Christoforou,

2014) (160 unique IP download/month in 2016).

slide-41
SLIDE 41

MSnbase mzR pRoloc pRolocGUI pRolocdata

slide-42
SLIDE 42

http://www.bioconductor.org

slide-43
SLIDE 43

Bioconductor Open source, and coordinated open development, enabling reproducible research, enables understanding of the data (not a black box) and drive scientific innovation.

◮ Bioconductor core team (lead by Dr. Martin Morgan) ◮ Common infrastructure ◮ Common documentation standards ◮ Common testing infrastructure ◮ Open package technical peer review

Quick getting started guide: https://lgatto.github.io/2017_11_ 09_Rcourse_Jena/navigating-the-bioconductor-project.html

slide-44
SLIDE 44

Figure : Dependency graph containing 41 MS and proteomics-tagged packages (out of 100+) and their dependencies. Showing all packages and deps would produce a big hairball.

slide-45
SLIDE 45

MSnbase example

Figure : Contributions to the MSnbase package since its creation, the last

  • ne leading to common proteomics/metabolomics infrastructure.

More details: https://lgatto.github.io/msnbase-contribs/

slide-46
SLIDE 46

References I

J Betschinger, J Nichols, S Dietmann, P D Corrin, P J Paddison, and A Smith. Exit from pluripotency is gated by intracellular redistribution of the bhlh transcription factor tfe3. Cell, 153(2):335–47, Apr 2013. doi: 10.1016/j.cell.2013.03.012. L M Breckels, S B Holden, D Wojnar, C M Mulvey, A Christoforou, A Groen, M W Trotter, O Kohlbacher, K S Lilley, and L Gatto. Learning from heterogeneous data sources: An application in spatial proteomics. PLoS Comput Biol, 12(5):e1004920, May 2016a. doi: 10.1371/journal.pcbi.1004920. LM Breckels, L Gatto, A Christoforou, AJ Groen, KS Lilley, and MW Trotter. The effect of organelle discovery upon sub-cellular protein localisation. J Proteomics, 88:129–40, Aug 2013. LM Breckels, CM Mulvey, KS Lilley, and L Gatto. A bioconductor workflow for processing and analysing spatial proteomics data [version 1; referees: awaiting peer review]. F1000Research, 5(2926), 2016b. doi: 10.12688/f1000research.10411.1. MC Chambers and et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol, 30(10): 918–20, Oct 2012. H Chen, H Sun, F You, W Sun, X Zhou, L Chen, J Yang, Y Wang, H Tang, Y Guan, W Xia, J Gu, H Ishikawa, D Gutman, G Barber, Z Qin, and Z Jiang. Activation of stat6 by sting is critical for antiviral innate immunity. Cell, 147(2):436–46, Oct 2011. doi: 10.1016/j.cell.2011.09.022. A Christoforou, C M Mulvey, L M Breckels, A Geladaki, T Hurrell, P C Hayward, T Naake, L Gatto, R Viner, A Martinez Arias, and K S Lilley. A draft map of the mouse pluripotent stem cell spatial proteome. Nat Commun, 7:8992, Jan 2016. doi: 10.1038/ncomms9992. TPJ Dunkley, S Hester, IP Shadforth, J Runions, T Weimar, SL Hanton, JL Griffin, C Bessant, F Brandizzi, C Hawes, RB Watson, P Dupree, and KS Lilley. Mapping the Arabidopsis organelle proteome. PNAS, 103(17): 6518–6523, Apr 2006. LJ Foster, CL de Hoog, Y Zhang, Y Zhang, X Xie, VK Mootha, and M Mann. A mammalian organelle map by protein correlation profiling. Cell, 125(1):187–199, Apr 2006. L Gatto and A Christoforou. Using R and Bioconductor for proteomics data analysis. Biochim Biophys Acta, 1844 (1 Pt A):42–51, Jan 2014. L Gatto and KS Lilley. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics, 28(2):288–9, Jan 2012.

slide-47
SLIDE 47

References II

L Gatto, JA Vizcaino, H Hermjakob, W Huber, and KS Lilley. Organelle proteomics experimental designs and

  • analysis. Proteomics, 2010.

L Gatto, L M Breckels, S Wieczorek, T Burger, and K S Lilley. Mass-spectrometry based spatial proteomics data analysis using pRoloc and pRolocdata. Bioinformatics, Jan 2014a. L Gatto, LM Breckels, T Burger, DJ Nightingale, AJ Groen, C Campbell, N Nikolovski, CM Mulvey, A Christoforou, M Ferro, and KS Lilley. A foundation for reliable spatial proteomics data analysis. MCP, 13(8): 1937–52, Aug 2014b. TR Kau, JC Way, and PA Silver. Nuclear transport and cancer: from mechanism to intervention. Nat Rev Cancer, 4(2):106–17, Feb 2004. K Laurila and M Vihinen. Prediction of disease-related mutations affecting protein localization. BMC Genomics, 10:122, 2009. C M Mulvey, L M Breckels, A Geladaki, N K Britov?ek, DJH Nightingale, A Christoforou, M Elzek, M J Deery, L Gatto, and K S Lilley. Using hyperlopit to perform high-resolution mapping of the spatial proteome. Nat Protoc, 12(6):1110–1135, Jun 2017. doi: 10.1038/nprot.2017.026. DJL Tan, H Dvinge, A Christoforou, P Bertone, A Arias Martinez, and KS Lilley. Mapping organelle proteins and protein complexes in Drosophila melanogaster. J Proteome Res, 8(6):2667–2678, Jun 2009. P Wu and TG Dietterich. Improving svm accuracy by training on auxiliary data sources. In Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, New York, NY, USA, 2004. ACM.

slide-48
SLIDE 48

Acknowledgements

◮ Lisa Breckels, Computational Proteomics Unit, Cambridge

(ML, algo)

◮ Kathryn Lilley, Cambridge Centre of Proteomics

(Proteomics)

◮ Funding: BBSRC, Wellcome Trust ◮ Slides: https://zenodo.org/record/1063508 ◮ License:

Thank you for your attention