abundance profiles
play

Abundance profiles The suffix ome refers to a totality of some sort - PDF document

30 Mar 15 Omics sciences Abundance profiles The suffix ome refers to a totality of some sort Gene (genetics) Genome Genomics Transcript (RNA) Transcriptome Transcriptomics Protein Proteome


  1. 30 ‐ Mar ‐ 15 Omics sciences Abundance profiles • The suffix ‐ ome refers to a totality of some sort • Gene (genetics) • Genome • Genomics • Transcript (RNA) • Transcriptome • Transcriptomics • Protein • Proteome • Proteomics DNA RNA Protein • Metabolite • Metabolome • Metabolomics Bas E. Dutilh • Lipid • Lipidome • Lipidomics Systems Biology: Bioinformatic Data Analysis • Microbe • Microbiome • Microbiomics (?!) Utrecht University, March 30 th 2015 DNA sequencing Massively parallel sequencing • First generation – Chain termination sequencing • Sanger • Second generation – Massively parallel sequencing • Illumina (MiSeq) • Ion Torrent • Many thousands, up to billions of short DNA sequences • Third generation – ~50 ‐ 500 base pairs – Single molecule sequencing – Bad quality nucleotides need to be removed (trimming)! • Oxford Nanopore (MinION) • These reads are randomly sampled from the total • Pacific Biosciences (PacBio) DNA/RNA content of a sample – This gives a detailed overview of the sequences in the sample 1

  2. 30 ‐ Mar ‐ 15 Who/what is present in the sample? Micro ‐ arrays • Last week we discussed how to annotate these • Micro ‐ arrays are another way of identifying sequences in a sequencing reads by aligning to a reference database sample – Fast, heuristic similarity search programs are used for this – Micro ‐ arrays can only identify previously known sequences – That is why DNA sequencing is now the standard • The result is an overview of the genes/functions/microbes and their relative abundances in the sequencing reads A micro ‐ array is a glass slide that contains pieces of DNA with a known sequence q High g Many reads Many reads Low Few reads Functions Species Green/red labeled sample or taxa sequences hydridize to the sequences on the micro ‐ array These results are just lists of numbers DNA: human microbiome • Which bacterial phyla are present in different • DNA: metagenome: list of all microbial human body sites? genes and organisms and relative • Which metabolic functions do they encode? abundances in an environment • Can we use this to understand the differences – Micro ‐ organisms play important between people? roles in ecology and thus in health • RNA: transcriptome lists all genes and their relative expression values in a cell/tissue expression values in a cell/tissue – Gene expression is important for phenotype • Protein: proteome lists all proteins in a sample and their relative abundances – Proteins perform most of the functions in a cell • A list of numbers is also known as a a x multidimensional vector, for example: a y a z Different healthy people DNA: microbiome studies RNA: discover cancer biomarker genes • Can we improve the prognosis for cancer patients by analyzing their gene expression profile? Jack Gilbert Jack Gilbert Up ‐ regulated genes Argonne National Lab Argonne National Lab Down ‐ regulated genes ts Patient Genes 2

  3. 30 ‐ Mar ‐ 15 RNA: heat shock response • How do Saccharomyces cerevisiae (yeast) genes respond to increased growth temperature? Up ‐ regulated genes Down ‐ regulated genes s Gene 5 15 30 60 Time (minutes) High ‐ throughput sequencing Scaling • Same organism, different tissues or body sites • When analyzing transcriptome data, we find that the RNA expression of gene A consists of: – For example: brain versus liver, mouth versus gut – 4,000 reads in a healthy tissue sample • Same tissue, same organism – 10,000 reads in a tumor sample – For example: treatment versus control, tumor versus healthy • Can we conclude that gene A is over ‐ expressed in cancer? • Same tissue, different organisms – The total volume of the transcriptomic datasets are: – For example: wildtype versus knock ‐ out/transgenic/mutant, • 80 000 sequencing reads from the healthy tissue 80,000 sequencing reads from the healthy tissue comparing monozygotic twin pairs comparing monozygotic twin pairs • 500,000 sequencing reads from the tumor • Time course experiments 4,000 10,000 – For example: effect of a treatment, development of a tissue, Healthy: = 0.05 Tumor: = 0.02 80,000 500,000 response of microbiota to environmental change – No, the gene is actually expressed much lower in the tumor Comparing read counts Gene expression in time • To compare samples, we need to: • Normalized and scaled gene expression values – Scale the numbers so that they add up to 1 – For example, expression of genes in aging Arabidopsis leaves • This accounts for differences in the sample size (total number of reads) 0. 25 • Divide each number by the total number of reads – Normalize numbers so that they are (close to) normally 0. 20 distributed pression Gene 1 • This is important in many statistical tests 0 0. 15 15 • Biological data are often logarithmically distributed – if so, you could take Bi l i l d t ft l ith i ll di t ib t d if ld t k Gene 2 G 2 Abundance/Exp the logarithm of the number of reads to normalize 0. 10 Gene 3 5 0.0 0 Value 1 2 3 4 5 6 7 8 9 10 Time/environments/samples… …or leaves… Log (value) 3

  4. 30 ‐ Mar ‐ 15 Microbial abundance in time Research setup 1. Design experimental conditions and sampling strategy • Normalized and scaled microbial abundance values – For example, presence of pathogens on rotting Arabidopsis leaves 2. Extract DNA/RNA/protein 0. 25 3. Sequence nucleotides or proteins 20 0. pression 4. Quality control of sequencing reads or peptides Microbe 1 0 0. 15 15 Microbe 2 Mi b 2 Abundance/Exp 5. Annotate (e.g. align reads to database) and count 0. 10 Microbe 3 6. Normalize and scale the counts 0.0 5 7. Compare samples, clustering (next lecture) 0 1 2 3 4 5 6 7 8 9 10 8. Interpret results and perform verification experiments Time/environments/samples… …or leaves… Quantifying similarity between vectors Distance matrices • Based on these measurements, which genes/microbes/etc are more • Distance matrix • Similarity matrix inverse similar to each other? • Abundance/expression levels 0 x y 1 1 ‐ x 1 ‐ y 0. 25 are most similar between and • Abundance/expression patterns inverse x 0 z 1 ‐ x 1 1 ‐ z 0. 20 are most similar between and pression y z 0 1 ‐ y 1 ‐ z 1 0. 0 15 15 • We can use a distance measure W di t Abundance/Exp to quantify the (dis ‐ )similarity 10 between the lists 0. – Many different distance distance = 1 ‐ similarity measures exist 0.0 5 0 1 2 3 4 5 6 7 8 9 10 Time/Environments/Samples Manhattan distance (levels) Euclidean distance (levels) d AB = (X A – X B ) 2 + (Y A – Y B ) 2 2 2 d AB = |X A – X B | + |Y A – Y B | = + d AB d AB 0 0 0.265 0.799 0.103 0.253 0 0 0.265 0.534 0.103 0.178 0 0 0.799 0.534 0.253 0.178 Example: Example: • • d 2 = (0.20 – 0.15) 2 + d = |0.20 – 0.15| + ( A (Y A – Y B ) 2 (Y A – Y B ) 2 ( A B ) B ) (0.17 – 0.15) 2 + |0.17 – 0.15| + (X A – X B ) 2 (X A – X B ) 2 (0.16 – 0.16) 2 + |0.16 – 0.16| + (0.20 – 0.15) 2 + |0.20 – 0.15| + 1 0.20 0.15 0.12 1 0.20 0.15 0.12 (0.20 – 0.16) 2 + |0.20 – 0.16| + 2 0.17 0.15 0.09 2 0.17 0.15 0.09 3 (0.17 – 0.16) 2 + 3 0.16 0.16 0.08 0.16 0.16 0.08 |0.17 – 0.16| + 4 (0.16 – 0.15) 2 + 4 0.20 0.15 0.11 0.20 0.15 0.11 |0.16 – 0.15| + 5 5 0.20 0.16 0.12 (0.20 – 0.15) 2 + 0.20 0.16 0.12 |0.20 – 0.15| + 6 6 0.17 0.16 0.10 0.17 0.16 0.10 (0.18 – 0.16) 2 + |0.18 – 0.16| + 7 7 0.16 0.15 0.08 0.16 0.15 0.08 (0.16 – 0.15) 2 = 0.0105  d = 0.103 |0.16 – 0.15| = 0.265 8 0.20 0.15 0.12 8 0.20 0.15 0.12 d = 0.799 d = 0.253 9 0.18 0.16 0.11 9 0.18 0.16 0.11 d = 0.534 d = 0.178 10 0.16 0.15 0.08 10 0.16 0.15 0.08 4

  5. 30 ‐ Mar ‐ 15 Comparing patterns instead of distances Compare patterns instead of distances • Correlation can be used to quantify the similarity between • Correlation can be used to quantify 1 0 ‐ 0.35 1.35 ‐ 0.16 1.16 patterns the similarity between patterns 1 ‐ 0.35 1.35 0 0.97 0.03 0.25 0. 25 0.25 1 0.20 0.15 0.12 0 1 ‐ 0.16 1.16 0.03 0.97 2 0.17 0.15 0.09 r = ‐ 0.35 3 0.16 0.16 0.08 0.2 r = ‐ 0.35 20 0. 0.2 4 ression pression 0.20 0.15 0.11 Low correlation Little correlation ession 5 0.20 0.16 0.12 6 0.15 0.17 0.16 0.10 0 0. 15 15 0 15 0.15 Abundance/Exp Abundance/Exp Abundance/Expr 7 0.16 0.15 0.08 8 0.20 0.15 0.12 0.1 9 0.18 0.16 0.11 0. 10 0.1 10 0.16 0.15 0.08 0.05 0.0 5 0.05 r = 0.97 r = 0.97 1 0 ‐ 0.35 1.35 ‐ 0.16 1.16 High correlation Positive correlation 0 0 0 0 1 ‐ 0.35 1.35 0.97 0.03 0 0.05 0.1 0.15 0.2 0.25 1 2 3 4 5 6 7 8 9 10 0 0.05 0.1 0.15 0.2 0.25 Abundance/Expression of Time/Environments/Samples Abundance/Expression of 0 1 ‐ 0.16 1.16 0.03 0.97 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend