Biomedical Data I Kelly Ruggles, PhD Methods in Quantitative - - PowerPoint PPT Presentation

biomedical data i
SMART_READER_LITE
LIVE PREVIEW

Biomedical Data I Kelly Ruggles, PhD Methods in Quantitative - - PowerPoint PPT Presentation

Biomedical Data I Kelly Ruggles, PhD Methods in Quantitative Biology Biomedical Data Types Next Generation Sequencing Mass Spectrometry Clinical Imaging Biomedical Data Types Molecular Data Next Generation Sequencing Mass Spectrometry


slide-1
SLIDE 1

Biomedical Data I

Kelly Ruggles, PhD Methods in Quantitative Biology

slide-2
SLIDE 2

Biomedical Data Types

Mass Spectrometry Next Generation Sequencing Imaging Clinical

slide-3
SLIDE 3

Biomedical Data Types

Mass Spectrometry Next Generation Sequencing Imaging Clinical

Molecular Data

slide-4
SLIDE 4

Diversity of Omics in Biomedicine

Mutation calls Copy Number Gene Expression DNA methylation/Epigenetics MicroRNA Metabolomics Phenotype Data Proteomics Phosphoproteomics

  • Genome
  • Long term information

storage

  • Transcriptome
  • Retrieval of information
  • Proteome
  • Short term information

storage

  • Interactome
  • Signaling networks
  • Metabolome, Lipidome
  • State
slide-5
SLIDE 5

MultiOmics

Ellis et al., Cancer Discovery 2013

slide-6
SLIDE 6

Next Generation Sequencing

  • Based on the dideoxy method but incorporates additional innovations:

Each nucleotide is attached to a removable fluorescent molecule and a chain terminating chemical adduct (instead of a 3’ OH group) Complementary nucleotide covalently incorporates and a picture is taken Enzymatically wash away label and 3’-OH blocking group

Construct DNA library using PCR amplification of all DNA fragments in the

  • genome. Each amplified DNA fragment is

attached to a solid support. Each cluster contains about 1000 identical copies of a small piece of the genome.

slide-7
SLIDE 7

Next Generation Sequencing (NGS)

Sanger NGS Illumina and NGS Since 2005 the data output of NGS has more than doubled each year and the $1000 genome makes genomics-integrated personalized medicine a real possibility

slide-8
SLIDE 8

Paired-end Sequencing

  • Sequencing both ends of the DNA fragments in the

sequence library

  • Aligning forward and reverse reads as read pairs
  • Produces double the reads for the same time/effort
  • Alignment of read pairs is more accurate
slide-9
SLIDE 9

Multiplexing

  • Allows libraries to be pooled and sequenced

simultaneously during a single sequencing run

  • Unique index sequences are added to each DNA

fragment during library preparation so they can be identified during data analysis

slide-10
SLIDE 10

NGS Instrumentation

Illumina HiSeq 4000 Illumina HiSeq 2500 Oxford Nanopore MinION Illumina MiSeq

  • Whole genomes, exomes, targeted

sequencing

  • RNA-Seq
  • Chromatin Immunopreciptation

sequencing (ChIP-Seq)

  • High output but not as flexible as

HiSeq 2500

  • Whole genomes, exomes, targeted

sequencing

  • RNA-Seq
  • ChIP-Seq
  • Assay for transposase-accessible

chromatin with high-throughput seq (ATAC-Seq)

  • Metagenomics
  • 16S Ribosomal sequencing
  • Higher cost per Mb compared to

HiSeq but fastest run times and longest illumina read lengths

  • Long reads sequencing
  • Bacterial and viral genomes
  • VERY small (USB devise), low cost

and extremely long reads but very high error rates

slide-11
SLIDE 11

Oxford Nanopore MinION

  • Nanopore sequences

the fragment of any length

  • Streams data in real

time

https://nanoporetech.com/applications/dna-nanopore-sequencing

slide-12
SLIDE 12

NGS Methods

  • Whole Genome Sequencing
  • Exome Sequencing (2% of the genome)
  • De novo sequencing (no reference sequence available)
  • Short insert paired end (high coverage fills in gaps)
  • Long-insert mate pair sequencing
  • RNA-Seq
  • Methylation Sequencing
  • ChIP Sequencing
  • Ribosome profiling
slide-13
SLIDE 13

Publicly Available ‘Omics Datasets

  • Generated comprehensive genomic maps of 33

tumor types

  • Includes copy number, RNA-seq, methylation,

miRNA, mutation calls, etc.

ISGR: The International Genome Sample Resource

  • Started as the “1000 genomes project”
  • Cataloging human genetic variants across 2,000

+ samples globally

  • Goal is to build comprehensive parts list of

functional elements in the human genome

  • Includes ChIP-seq, RNA-seq, Hi-C, 5C, DNase-seq,

ATAC-seq, methylation, etc.

LINCS: Library of Integrated Cellular Signatures

  • Catalogs how cells respond to

16,000+ genetic and environmental stressors

  • Gene expression, proteomics,

phosphorylation..

The Cancer Genome Atlas Encyclopedia of DNA Elements

slide-14
SLIDE 14

Understanding Gene Regulation and Epigenetics

ChIP-Seq

  • Chromatin is immmunoprecipitated and the

recovered DNA is sequenced

  • Identifies binding sites of DNA-associated

proteins DNAse-Seq/FAIRE-Seq

  • Identifies DNaseI hypersensitive sites

(open chromatin = active genes) Hi-C/5C

  • DNA crosslinked and sequenced
  • Spatial organization of chromatin

(promoter/enhancer regions) Bisulfite Sequencing (WGBS, RRBS)

  • Reads methylation status at the genome

level

slide-15
SLIDE 15

Assessing Copy Number and Mutation Status by Genome Sequencing

Tumor Sample

Single Nucleotide Polymorphisms (SNPs)

  • Single base-pair sites that vary in a population
  • Have been found to act as “drivers” of tumor

progression

Copy Number Variation (CNV)

  • Changes in the genome due to duplication or

deletion of large regions of DNA

Genomic DNA Isolation Load on Flow Cell Sequence Alignment Next Generation Sequencing

Library Preparation

SNP T C

slide-16
SLIDE 16

SNPs and Disease

  • Mendelian (monogenic) vs. mutigenic
  • For multigenic studies, DNA is collected from a large number

people with the disease and compared to those who do not have the disease.

  • Associated SNPs indicate that a nearby allele likely is responsible

for the increased risk (based on DNA linkage)

Genome Wide Association Study!

slide-17
SLIDE 17

Common Disease Common Variant Hypothesis

  • Predicts that common disease-causing alleles (variants) will be

found in all human populations which manifest a given disease.

slide-18
SLIDE 18

Genome-Wide Association Studies (GWAS)

  • Measures and analyzes DNA

sequence variations across the genome to identify genetic risk factors for common diseases

  • Has also been used to identify

genetic associations with drug metabolism

  • SNP arrays are done on each

person (1+ million)

  • Typically a case-control design
  • Allele count for each SNP is

evaluated and chi-squared test used to identify variants associated with the trait

slide-19
SLIDE 19

SNP arrays

  • Array of 25 bp oligonucleotide

sequences are laid across the chip surface

  • Sample’s DNA is amplified and

marker is attached

  • DNA hybridized to the array
  • Array is scanned to quantify the

relative amount bound

slide-20
SLIDE 20

GWAS Example: APOE epsilon 4 and Alzheimer’s

  • APOE is an apolipoprotein
  • Essential for the normal catabolism
  • f TG-rich lipoproteins
  • Mediates cholesterol metabolism
  • Transports cholesterol to neurons
  • Original study measured

502,627 SNPs in over 1000 AD cases/controls

  • Found APOE locus (SNP

rs4420638 14 kb distal to APOE) as having association with late

  • nset AD

Coon K et al. (2007) J Clin Psychiatry 68(4):613-8

slide-21
SLIDE 21

VCF File Format

VCF File Format # Meta-information lines Columns:

  • 1. Chromosome
  • 2. Position
  • 3. ID (ex: dbSNP)
  • 4. Reference base
  • 5. Alternative allele
  • 6. Quality score
  • 7. Filter (PASS=passed filters)
  • 8. Info (ex: SOMATIC, VALIDATED..)
slide-22
SLIDE 22

SNP Databases

  • db

dbSNP: : full collection of all SNPs identified

  • COSMIC

IC: : Database of somatic mutations in human cancer

  • op
  • penS

nSNP: : allows you to upload your SNP data (and make it publiclly available!). Can be downloaded by researchers.

  • IS

ISGR: Started as the 1000 genomes project, now contains data from over 3K individuals from around the world

  • GO e

exome: : SNP database from lung, heart and blood disorder patients

slide-23
SLIDE 23

Methods involved in SNP calling

  • General Steps;
  • Align to genome

reference

  • Alignment

recalibration

  • Raw variant calling
  • Quality assignment
  • Variant filtering
  • Variant Calling

Pipelines:

  • VarScan
  • Pindel
  • Somatic Sniper
  • Radia
  • Muse
  • MuTect
  • Indelocator

Ellrott et al., 2018

slide-24
SLIDE 24

Assessing Copy Number and Mutation Status by Genome Sequencing

Gene Expression

  • Normalized expression of genes in all samples
  • Can be used for differential expression analysis

Alternative Splicing

  • Splicing of exons, creating new protein isoforms
  • Alternative splicing changes are frequently found in

cancer

  • Loss of functional domains may also be a disease

driver

Tumor Sample RNA Isolation Load on Flow Cell Sequence Alignment Next Generation Sequencing

Library Preparation

slide-25
SLIDE 25

Microarrays: Studying the expression of groups of genes

  • Allows one to measure

the expression of thousands of genes at a time

  • Has been used to

compare patterns of gene expression in different tissues, different times, different conditions.

Isolate mRNA. Make cDNA by reverse transcription, using fluorescently labeled nucleotides. Apply the cDNA mixture to a microarray, a different gene in each spot. The cDNA hybridizes with any complementary DNA on the microarray. Rinse off excess cDNA; scan microarray for fluorescence. Each fluorescent spot represents a gene expressed in the tissue sample. Tissue sample mRNA molecules Labeled cDNA molecules (single strands) DNA fragments representing specific genes DNA microarray with 2,400 human genes DNA microarray

slide-26
SLIDE 26

RNA-Seq

slide-27
SLIDE 27

RNA-Seq

  • Uses NGS to reveal the presence and quantity of RNA in a sample

at a specific point in time

  • Alternative splicing, post-transcriptional modifications, gene

fusions, SNPs/mutations

  • Gene annotation
  • Coding and non-coding RNA
slide-28
SLIDE 28

Library Construction

slide-29
SLIDE 29

RNA-Seq Methods

RNAs are converted into cDNA fragment library Sequence adapters (blue) are added to cDNA fragments Short sequence reads from each cDNA are obtained Reads are aligned to reference sequence and classified as exonic reads, junction reads or poly(A) end-reads Used to generate a base-resolution expression profile for each gene Wang et al, 2009

slide-30
SLIDE 30

30

Paired-end short reads Alignment to genome De Novo Assembly

Exon 1 Exon 2 Transcript X Reference genome

RNA-Seq Data Analysis

slide-31
SLIDE 31

Coverage and Depth

NGS Category Application Recommended coverage (x) or reads (millions) Whole Genome Sequencing SNV detection 10-33x CNV 1-8x Whole Exome Sequencing SNV detection 100x RNA-Seq Differential Expression 10—25 Million Alternative splicing 50-100 Million De novo assembly >100 Million

https://genohub.com/recommended-sequencing-coverage-by-application/

  • Coverage: % of thereference genome sequenced
  • Depth: redundancy of coverage. On average a

base has been sequenced a certain number of times (10X, 20X, …)

  • Reads are the number of uniquely mapped

reads

slide-32
SLIDE 32

Comparing microarray and RNA-seq

Wang et al, 2009

slide-33
SLIDE 33

Applications

  • Disease transciptomics
  • Fusion transcripts
  • Mutations
  • TCGA has characterized thousands of primary tumors
  • ENCODE has characterized dozens of cell lines
  • Genome annotation
  • RNA editing
  • Identifies novel and low frequency RNAs associated with disease

processes

  • microRNA expression and disease
  • Have found miRNAs that are overexpressed in cancer
slide-34
SLIDE 34

BED File Format

BED File Format Columns:

  • 1. Chromosome
  • 2. Chromosome Start
  • 3. Chromosome End
  • 4. Name
  • 5. Score
  • 6. Strand (+or-)

7-9. Display info

  • 10. # blocks (exons)
  • 11. Size of blocks
  • 12. Start of blocks
slide-35
SLIDE 35

BED File

Ch Chr St Start En End Na Name Sc Score st str Display i info # b blocks Bl Block si size Bl Block st start chr5 11106617 111091469 NP_004763 1000

  • 11106617

111091469 3 126, 78, 3 0, 4509, 24849

Chr 5 11106617

slide-36
SLIDE 36

BED File

Ch Chr St Start En End Na Name Sc Score st str Display i info # # bl blocks cks Bl Block si size Bl Block st start chr5 11106617 111091469 NP_004763 1000

  • 11106617

111091469 3

126, ,

78, 3

0, ,

4509, 24849

Chr 5 11106617 + 0 11106617 + 126 126 Block 1

slide-37
SLIDE 37

BED File

Ch Chr St Start En End Na Name Sc Score st str Display i info # # bl blocks cks Bl Block si size Bl Block st start chr5 11106617 111091469 NP_004763 1000

  • 11106617

111091469 3 126,

78 78, 3

0,

4509 4509

, 24849

Chr 5 Block 1 11106617 + 4509 4509 11106617 + 4509 + + 7 78 Block 2

slide-38
SLIDE 38

BED File

Ch Chr St Start En End Na Name Sc Score st str Display i info # # bl blocks cks Bl Block si size Block s start chr5 11106617 111091469 NP_004763 1000

  • 11106617

111091469 3 126, 78, 3 0, 4509,

24 24849

Chr 5 Block 1 Block 2 11106617 + 24 24849 11106617 + 24849 + + 3 3 Block 3

slide-39
SLIDE 39

BED Junction Files

Chr 5 Block 1 Block 2 Block 3 Junction 1 Block 1 Block 2 Block 2 Block 3 Block 1 Block 3 Junction 2 Junction 3

slide-40
SLIDE 40

Fusion Genes

Fusion Location

.…AGAACTGGAAGAATTGG*AATGGTAGATAACGCAGATCATCT..…

Find consensus sequence

slide-41
SLIDE 41

Gene Fusions

  • Hybrid genes that combine two or more original genes
  • Frequently act as drivers of cancer progression

Gene X Exon 1 Gene X Exon 2 Gene Y Exon 1 Gene Y Exon 2 Gene X Exon 1 Gene Y Exon 2 Chromosomal translocation Interstitial deletion Chromosomal inversion

Chr 1 Chr 2

Stephens, et al. Complex landscape of somatic rearrangement in human breast cancer genomes. Nature 2009

slide-42
SLIDE 42

42

http://genome.ucsc.edu/

RNA-Seq: Expression RNA-Seq: coverage Global PNNL Global WashU Phospho PNNL Somatic Variants Germline Variants RefSeq Genes

  • Alt. Splicing

Junctions Global Pep PNNL Phospho PNNL Global Pep WashU

1 2 3 4 5 6 7 8 9 10 11 12 13

UCSC Genome Browser

slide-43
SLIDE 43

Protein Identification and Quantitation by Mass Spectrometry

Tumor Sample Peptides Fractionation Digestion Lysis

m/z intensity

Identity Quantity

Tandem Mass Spectrometry

Discovery Proteomics:

  • Used to measure global protein

expression (whole cell proteome)

  • Can enrich for phosphopeptides

to measure phosphorylation status Targeted Proteomics:

  • Hypothesis driven analysis
  • Select proteins and

representative peptides of these proteins to measure prior to run

slide-44
SLIDE 44

Traditional Affinity-based proteomics

Using antibodies to quantify proteins

Western Blot RPPA Immunofluorescence Immunohistochemistry ELISA

slide-45
SLIDE 45

Proteomics: Reverse Phase Protein Array (RPPA)

  • High-throughput immuno-

based assay

  • Measures levels of protein

expression and phosphorylation

  • Protein lysates are arrayed

as microspots on glass slides and probed with ~200 antibodies

slide-46
SLIDE 46

Proteomic Spectral Identification

High throughput shotgun MS/MS

Requires no knowledge of peptides present, uses mass difference to determine next AA in peptide chain.

slide-47
SLIDE 47

Protein Identification and Quantitation by Mass Spectrometry

Tumor Sample Peptides Fractionation Digestion Lysis

m/z intensity

Identity Quantity

Tandem Mass Spectrometry Protein Sequence DB Pick Protein in silico digestion Pick Peptide Compare, Score, Test Significance All fragment masses

m/z

Repeat for all proteins/peptides to find best match

slide-48
SLIDE 48

Protein Inference in Discovery Proteomics

Computational i issues w with p protein i inference:

  • Generating a reliable list of proteins from identified peptides is not straightforward
  • ‘One hit wonders’ = some proteins only have a single peptide identified
  • Difficult to infer proteins based on a single peptide due to possible false-positives
  • Therefore multiple proteins can be supported by one peptides, and determining which it belongs

to is difficult

Huang T et al (2012) Protein inference: a review

Examples: (1) Proteins 1 and 2 have same set of identified peptides, if no other supporting information then we cannot determine which protein is in the sample (2) Protein 3 is a one-hit wonder and cannot be reliably mapped (3) Protein 4 has two peptides identified which do not map to another protein, so we can assume that this protein is present

slide-49
SLIDE 49

Fragmentation

Top down Bottom up Fragmentation

Intact proteins are ionized and introduced to a mass analyzer Proteins are enzymatically digested and then introduced to a mass analyzer

slide-50
SLIDE 50

Peptide Fragmentation

b y

Charge retained on the C-terminus Charge retained on the N-terminus

slide-51
SLIDE 51

Li Liquid Ch Chroma matogr graphy y (LC) C)-MS MS/MS MS

Mass Analyzer 1 Frag- mentation Detector

in intensit ity ma mass/cha harge

Ion Source Mass Analyzer 2 LC LC

in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge

Tim Time

in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge in intensit ity ma mass/cha harge

  • Passes pressurized liquid solvent containing

the sample through a column filled with a solid adsorbent material.

  • Adsorbent material causes different flow

rates for different peptides à separation

slide-52
SLIDE 52

Ma Mass Spe pect ctrometry Data

Di Dimensions: Time Peptide m/z Peptide Intensity Petide fragment m/z Peptide fragment intensity ...

slide-53
SLIDE 53

Ex Example da ample data a – MA MALDI-TO TOF

1 1 8

m /z 2280 2400 700 m /z 13 00 1460 45

m /z 1444.0 1458.0 35 m /z 2378 .0 239 4.0 700

Pe Peptide intensity vs m/z

slide-54
SLIDE 54

Fr Fragment i intensi sity v vs m s m/z

Ex Example data – ES ESI-LC LC-MS/MS MS/MS

Tim Time m/ m/z

m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

MS MS/MS MS

Pe Peptide intensity vs m/z vs time

slide-55
SLIDE 55

Ide dentification n – Tande ndem MS MS

slide-56
SLIDE 56

Protein Sequence Database Search

Tu Tumor Sa Sample Pe Peptides Fract ctionation Di Digestion Ly Lysis

m/ m/z in intens nsit ity

Iden Identity Qua Quantity

Tandem Mass Spect ctrometry Pr Protein Sequence ce DB DB Pick ck Protein in in si silico di digestion Pick ck Peptide Co Comp mpare, Score, Test Significance ce Al All fragm gmen ent masses es

m/ m/z

Re Repeat for all proteins/peptides to find best match ch

slide-57
SLIDE 57

Ly Lysis Fr Fractionation Di Digestion LC LC-MS MS/MS MS

MS MS/MS MS

Spe Spectrum um Li Library Pi Pick Spe Spectrum um Co Compare, , Score, , Test Significance Re Repeat for all all spe pectra Id Iden enti tified ed P Prot

  • tei

eins

Spectrum Library Search

slide-58
SLIDE 58

Using MS/MS for quantitative protein assays

  • Most clinical and biology studies in the literature rely on

measurement of proteins which already have antibodies available

  • Goal of targeted proteomics is to create reliable, high quality assays

to measure proteins that do not require antibodies and instead rely

  • n MS/MS
  • Focuses on a subset of protein of interest
  • Disease related changes in proteins
  • Signaling processes
  • Highly multiplexed alternative method to western blots/antibodies
  • Can focus on unique and informative peptides for protein of

interest

  • Hypothesis driven questions!

Nature Method of the Year 2012

slide-59
SLIDE 59

Mass Spectrometry based proteomic quantitation

Fractionation Digestion LC-MS Lysis

MS

Shotgun proteomics Targeted MS

  • 1. Records M/Z
  • 2. Selects peptides based on

abundance and fragments

MS/MS

  • 3. Protein database search for

peptide identification Data Dependent Acquisition (DDA) Uses predefined set of peptides

  • 1. Select precursor ion

MS

  • 2. Precursor fragmentation

MS/MS

  • 3. Use Precursor-Fragment

pairs for identification

slide-60
SLIDE 60

Mass Analyzers

Domon B & Aebersold R. Nature Biotechnology. 28(7), 710-721 (2010) Quadrupole Often used in targeted MS/MS Linear array of 4 symmetrical rods Filters sample ions based on m/z Examples:

  • Triple Quadrupole (QqQ)
  • Quadrupole Time of Flight (QqTOF)

Ion Trap Often used in Shotgun proteomics Ring electrode and two end cap electrodes Examples:

  • LTQ XL Linear Ion Trap
  • Velos

TOF (Time of Flight) Ions m/z determined via a time measurement Examples:

  • Maldi-TOF
  • QqTOF

OT/ICR (Orbitrap) Barrel-like electrode and co-axial inner electrode Traps ions in an orbital motion around spindle Examples:

  • Q Exactive Quadrupole-Orbitrap
  • Orbitrap Fusion
slide-61
SLIDE 61

Proteomics File Formats

Deutsch EW. Mol Cell Proteomics. 11(12), 1612-21 (2012)

slide-62
SLIDE 62

MS Output Files

  • Proprietary vendor formats
  • Complex open formats
  • Simple text formats

All instruments collect profile- mode data but vendor raw files written out after each run can contain one or more of these types, depending on user input.

Deutsch EW. Mol Cell Proteomics. 11(12), 1612-21 (2012)

slide-63
SLIDE 63

mzML format

  • Contains all

information for a single MS run including metadata about the spectra and the spectra themselves

  • Either centroided or

profile mode

  • Encoded in an XML

format

Deutsch EW. Proteomics. 8(14), 2776-7 (2008) http://www.psidev.info/mzml_1_0_0%20

slide-64
SLIDE 64

MGF (Mascot Generic Format) format

  • Before the appearance of XML

formats, it was common to convert the output files into simple text files with only information on the spectra

  • The most common text format is the

MGF file

  • Encodes multiple MS/MS spectra

into one file via m/z intensity pairs separated by headers

  • Unfortunately, important metadata

is lost in this conversion, hindering the development of advanced proteomic tools

retention_time peptide_mass peak_area

slide-65
SLIDE 65

Sequence File Formats

  • FASTA
  • Series of entries of single header followed by one or more lines of a sequence
  • Variable header lines
  • PEFF (PSI Extended FASTA Format)
  • Conventional FASTA
  • Hash-mark initiated header lines with strict header syntax
slide-66
SLIDE 66

Nesvizhskii AI, Methods Mol Biol. 367, 87-119 (2007)

Protein Identification Software

slide-67
SLIDE 67

Metabolites

  • Metabolites represent both the downstream output of the

genome and the upstream input from the environment

  • Allows us to explore gene-environment interactions
  • Considered to best reflect the activities of the cell at a

functional level

  • Endogenous: Gene-derived
  • Exogenous: Environmentally-derived
slide-68
SLIDE 68

Metabolomics

  • “Systematic study of unique chemical fingerprints that specific cellular

processes leave behind”

  • Metabalome: complete set of small-molecule metabolites found in a

biological sample

  • Metabolome responds to nutrients, stress and disease long before the

transcriptome or proteome

  • Examples:
  • Metabolic intermediates
  • Hormones
  • Signaling molecules
  • Secondary metabolites
  • Antibiotics
  • Drugs
slide-69
SLIDE 69

Gas Chromotography-Mass Spectrometry (GC-MS)

  • GC separates out different

molecules based on their affinity for the stationary phase

  • f the column
  • Elute at different times
  • MS breaks each molecule into

fragments and detects these using their mass to charge ratio

slide-70
SLIDE 70

Nuclear magnetic resonance (NMR)

  • Does not require separation of

analytes

  • Applies an external magnetic field

which transfer energy to nuclei of each molecule

  • Measure energy emitted when

returns to base level

  • The resulting signal is an NMR

spectrum which can be used to identify/measure the metabolite

slide-71
SLIDE 71

Human Metabolome Project

  • Funded by Genome Canada
  • Using high-throughput technology (GC-MS, NMR)
  • Through this, Human Metabolome Database (HMDB) was created
  • Contains 41,993 metabolites
  • Includes both water and lipid soluble entries
  • Version 3 has information on metabolites in 600+ human diseases

http://www.hmdb.ca/

slide-72
SLIDE 72

Metabolights

  • Database for metabolomics

experiments and data

  • Cross-species, cross-technique
  • Allows for user-submission
  • 23,343 metabolites

2056 Species

slide-73
SLIDE 73

Metabolomics and Drug Discovery

  • Metabolomics offers are more cost-effective and productive route for

drug discovery and development

  • For example, if a metabolite or set of metabolites is identified as being

causal, then the drug target/pathway is known

  • Drugs to inhibit TMAO production for atherosclerosis
  • Metabolomics has also been used to detect early drug toxicity in trials
  • COMET (the Consortium for Metabonomic Toxicity) brought together 5 pharma

companies to create a system for these predictions

  • These methods are still not widely used in pharma, likely because NMR

is not sensitive enough but with better technology these methods will likely increase

slide-74
SLIDE 74

Metabolomic Integration

Johnson, Ivanisevic and Suizdak, 2016

slide-75
SLIDE 75

Membrane Components

Choline glycerophospholipid (PC) Ethanolamine glycerophospholipid (PE) Phosphitidylinositol (PI) Phosphatidylglycerol(PG) Phosphatidic Acid (PA) Phosphatidylserine (PS) Cardiolipin (CL) Sphingomyelin (SM) Galactoslyceramide (GalCer) Glucosylceramide (GluCer) Cholesterol Glycolipids Sulfatide Gangliosides

Glycerophospholipids Sphingolipids

Energy Storage

Free Fatty Acid (FFA) Triacylglycerol (TAG) Diacylglycerol (DAG) Monoacylglycerol (MAG) Acyl-CoA Acylcarnitine

Signaling

DAG MAG Acyl-CoA Acylcarnitine FFA Eicosanoids Steroids Ceramide Sphingosine Sphingoid-1-phosphate (S1P)

Sphingolipids

Lipid Classes and Functions

slide-76
SLIDE 76

Lipidomics

  • Developed in 2003 to study metabolism of the lipidome
  • Allows us to quantify changes in individual lipid classes,

subclasses that reflect changes in metabolism

Han, X. (2016) Lipidomics for studying metabolism

  • Nat. Rev. Endocrinol. doi:10.1038/nrendo.2016.98
slide-77
SLIDE 77

Lipidomics Techniques

  • Electrospray Ionization (ESI)-MS/Shotgun Lipidomics
  • Direct infusion of lipid solution
  • Difficult to quantitate due to low dynamic range
  • LC-MS
  • HPLC separated
  • Powerful for targeted analysis of low abundance lipids
  • Ion-mobility-MS
  • Post-ionization separation technique
  • Still in development
  • MALDI-MS
  • Primarily used for imaging
  • High sensitivity
slide-78
SLIDE 78

Lipidomics Applications

  • Identification of novel lipid classes and molecular species
  • Development of quantitative methods for large-scale lipid analysis
  • Pathway analysis in disease
  • Biomarker identification
  • Tissue mapping of altered lipid distribution in organs
  • Development of bioinformatics approaches for high-throughput

processing

slide-79
SLIDE 79

Lipidomics Databases

  • Lipid Library http://lipidlibrary.co.uk
  • Lipid Bank http://lipidbank.jp
  • Cyberlipids http://www.cyberlipid.org
  • LIPIDAT www.lipidat.chemystry.ohio-state.edu/home.stm
  • Lipidomics Expertise Platform www.lipidomics-expertise
  • Provides summaries of structure, functions and extraction methods