Computational Methods in Systems Biology The hottest scientific - PDF document

What is Biology?  “A branch of knowledge that deals with living organisms and vital processes” Computational Methods in Systems Biology  The hottest scientific frontier of our times • Many great processes have been figured out • Much is still unknown Nir Friedman Maya Schuldiner  Tremendous impact on Medicine • Both diagnosis, prognosis, and treatment . 2 Biological Systems are Complex Bakers Yeast Saccharomyces Cereviciae •The System is NOT just a sum of its parts •Used to make bread and beer •The simplest cell that still resembles human cells 3 4 The Age of Genomes What is Systems Biology? 404 Complete Microbial Genomes (Thousands in progress) “Systems biology is the study of the interactions between the 31 Complete Eukaryotic Genomes (315 in progress!) components of a biological system, and how these 3 Complete Plant Genomes (6 in progress) interactions give rise to the function and behavior of that system” • The last decades lead to revolution on how we can examine and understand biological systems Characterized by Bacteria Eukaryote Animal Human 1.6Mb 13Mb 100Mb 3Gb Individual Genomes? • High-throughput assays 1600 genes ~6000 genes ~20,000 genes ~30,000 genes? • Integration of multiple forms of experiments & knowledge • Mathematical modeling 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 5 6 1

Ask Not What Systems Biology Can do For you…. . 8 Why Biology for NIPS Crowd? Flow of Information in Biology  Quantity DNA RNA Protein Phenotype • Data-intense discipline: Too vast for manual interpretation  Systematic • Collection of data on all genes/proteins/…  Multi-faceted • Measurements of complementary aspects of cellular function, development and disease states • Challenge of integration and fusion of multiple data Recipe Working The resulting The Review (in safe) copy dish Has the potential to be medically applicative! 9 10 The “Post-Genomic Era” Outline Systematic is Not Just More Assays DNA RNA Protein Phenotype DNA RNA Protein Phenotype  Genomic  Quantity  Quantity  Genetic  Stores genetically inherited information sequences interventions  Structure  Location  Sequence of four nucleotide types (A, C, G, T)  Variations  Environmental  Degradation  Modifications  Two complementary strands creating base pairs (bp) within a interventions rate  Interactions population  10 5 bp in bacteria, 3x10 9 in humans 6 X10 13 in wheat  …  …  …  … 11 12 2

Understanding Genome Sequences ~3,289,000,000 characters: aattgtgctctgcaaattatgatagtgatctgtatttactacgtgcatat attttgggccagtgaatttttttctaagctaatatagttatttggacttt tgacatgactttgtgtttaattaaaacaaaaaaagaaattgcagaagtgt tgtaagcttgtaaaaaaattcaaacaatgcagacaaatgtgtctcgcagt cttccactcagtatcatttttgtttgtaccttatcagaaatgtttctatg tacaagtctttaaaatcatttcgaacttgctttgtccactgagtatatta tggacatcttttcatggcaggacatatagatgtgttaatggcattaaaaa taaaacaaaaaactgattcggccgggtacggtggctcacgcctgtaatcc cagcactttgggagatcgaggagggaggatcacctgaggtcaggagttac agacatggagaaaccccgtctctactaaaaatacaaaattagcctggcgt ggtggcgcatgcctgtaatcccagctactcgggaggctgaggcaggagaa tcgcttgaacccgggagcggaggttgcggtgagccgagatcgcaccgttg cactccagcctgggcgacagagcgaaactgtctcaaacaaacaaacaaaa aaacctgatacatggtatgggaagtacattgtttaaacaatgcatggaga tttaggttgtttccagtttttactggcacagatacggcaatgaatataat tttatgtatacattcatacaaatatatcggtggaaaattcctagaagtgg aatggctgggtcagtgggcattcatattgagaaattggaaggatgttgtc Goal: aaactctgcaaatcagagtattttagtcttaacctctcttcttcacaccc ttttccttggaagaaagctaaatttagacttttaaacacaaaactccatt Identify components encoded in the DNA sequence ttgagacccctgaaaatctgggttcaaagtgtttgaaaattaaagcagag gctttaatttgtacttatttaggtataatttgtactttaaagttgttcca . . . 13 14 Open Reading Frame Finding Open Reading Frames ATGCTCAGCGTGACCTCA . . . CAGCGTTAA ATGCTCAGCGTGACCTCA . . . CAGCGTTAA M L S V T S . . . Q R STP M L S V T S . . . Q R STP  Protein-encoding DNA sequence consists of a Try all possible starting points sequence of 3 letter codons  3 possible offsets  Starts with the START codon (ATG)  2 possible strands  Ends with a STOP codon (TAA, TAG, or TGA) Simple algorithm finds all ORFs in a genome  Many of these are spurious (are not real genes)  How do we focus on the real ones? 15 16 Using Additional Genomes Phylogentic Tree of Yeasts S. cerevisiae Basic premise ~10M years S. paradoxus “What is important is conserved” S. mikatae S. bayanus C. glabrata S. castellii Evolution = Variation + Selection K. lactis • Variation is random A. gossypii K. waltii • Selection reflects function D. hansenii C. albicans Y. lipolytica Idea: N. crassa  Instead of studying a single genome, compare M. graminearum related genomes M. grisea A. nidulans  A real open reading frame will be conserved S. pombe Kellis et al, Nature 2003 17 18 3

Conserved Evolution of Open Reading Frame Examples Variable Frame shift Spurious ORF S. cerevisiae ATGCTCAGCGTGACCTCA . . . S. paradoxus ATGCTCAGCGTGACATCA . . . S. mikatae ATGCTCAGGGTGACA--A . . . ATG not S. bayanus ATGCTCAGG---ACA--A . . . conserved Frame shift Confirmed ORF Conserved changes interpretation positions of downstream seq Variable positions A deletion Greedy algorithm to find conserved ORFs surprisingly Sequencing effective (> 99% accuracy) on verified yeast data error [Kellis et al, Nature 2003] 19 20 Defining Conservation Probabilistic Model of Evolution Conserved Variable Naïve approach A A A C  Consensus between all A C A C species A A A C A G A A Problem: A T C C  Rough grained A C C A  Ignores distances between species A G C A Aardvark Bison Chimp Dog Elephant A G C A  Ignores the tree topology Random variables – sequence at current day taxa or C A T C at ancestors Goal : Potentials/Conditional distribution – represent the % conserv 100 33 55 55  More sensitive and robust probability of evolutionary changes along each methods branch 21 22 Parameterization of Phylogenies Conserved vs. unconserved Assumptions: Two hypotheses:  Positions (columns) are independent of each other  Each branch is a reversible continuous time discrete state Markov process P ( a c | t t ' ) P ( a b | t ) P ( b c | t ' ) � � + = � � b 2 3 4 1 2 3 4 1 P ( a ) P ( a b | t ) P ( b ) P ( b a | t ) � = � Conserved Unconserved Short branches Long branches governed by a rate matrix Q (fewer mutations) (more mutations) Q a , b = d dt P ( a � b | t ) P ( position | unconserve d ) t = 0 Use log P P ( a � b | t ) = e t Q [ ] a , b ( position | conserved ) [Boffelli et al, Science 2003] 23 24 4

Genes Are Better Conserved Challenges Other types of genomic elements  Small polypeptides (peptohormones, neuropeptides) log Fast/Slow  RNA coding genes • rRNA, tRNA, snoRNA… • miRNA  Regulatory regions % conserved [Boffelli et al, Science 2003] 25 27 Transcription Factor Binding Sites Regulatory Elements  Relatively short words (6-20bp)  Recognition is not perfect • Binding sites allow variations  Often conserved *Essential Cell Biology; p.268 28 29 Challenges Outline Other types of genomic elements  Small polypeptides (peptohormones, neuropeptides)  RNA coding genes • rRNA, tRNA, snoRNA… DNA RNA Protein Phenotype • miRNA  Regulatory regions  Copied from DNA template  Conveys information (mRNA)  Can also perform function (tRNA, rRNA, …) Recognition of elements without comparisons  Single stranded, four nucleotide types (A,C, G, U)  Clearly sequence contains enough information to  For each expressed gene there can be as few as 1 molecule and up to 10,000 molecules per cell. “parse” it within the living cell 30 31 5

Gene Expression High Throughput Gene Expression Transcription Translation Extract  Same DNA content RNA expression levels of 10,000s  Very different phenotype Microarray of genes in  Difference is in regulation of expression of genes one experiment 33 34 Dynamic Measurements Expression: Supervised Approaches Conditions Labeled samples  Time courses  Different perturbations (genetic & environmental)  Biopsies from different Feature selection patient populations + Genes Classification  … Classifier confidence  Potential diagnosis/prognosis tool P-value =< 0.027  Characterizes the disease state ⇒ insights about underlying processes Segman et al, Mol. Psych. 2005 35 Gasch et al. Mol. Cell 2001 36 Expression: Unsupervised Papers  Compendia 26 datasets from Whitehead and Stanford Various tumors Viral infection Stimulated B lymphoma PBMC Breast cancer Stimulated immune Fibroblast EWS/FLI Prostate PCA Cluster cancer Fibroblast infection Neuro tumors Fibroblast serum NCI60 Gliomas HeLa cell cycle Leukemia Lung cancer Liver cancer Eisen et al. PNAS 1998; Alter et al , PNAS 2000 Segal et al Nat. Gen. 2004 37 39 6

Computational Methods in Systems Biology The hottest scientific - PDF document

What is Biology? A branch of knowledge that deals with living organisms and vital processes Computational Methods in Systems Biology The hottest scientific frontier of our times Many great processes have been figured out

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Using BlenX for Systems Biology Corrado Priami CoSBi Outline of the talk 1. Systems biology 2.

Computational Methods for Systems Biology and Synthetic Biology Franois Fages, Constraint

Curation of computational biology models Curation of computational biology models Anand

Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

1. Introduction to Molecular & Systems Biology EECS 600: Systems Biology &

Systems Biology Overview Dr. Shaila C. Rssle 1 Topics to be discussed What is

Computational models of of biological biological Computational models systems systems

Deciphering the Face Deciphering the Face Aleix M. Martinez Computational Biology Computational

Computational Challenges in Computational Challenges in Genomics and Molecular Biology Genomics

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

Biology Majors Information Session Biology Advising Center NHB 2.606 Biology Advising Center

An Introduction to Analysis of Multiple Gene Expression Datasets Pratyaksha Wirapati Statistical

Food Matters Patrice Sutton, MPH Research Scientist Program on Reproductive Health and the

Working with protein structures / PDB files Structure of Triosephosphate Isomerase PDB ID: 1HTI

Disclosures I have nothing to disclose. Pleomorphic sarcomas: MFH, where did you go? Andrew

When medicine discovered sex Sarah Hiltner 18-12-2018 To participate join at www.menti.com

Probabilisti tic Model Checking & P & PRIS RISM Dave Parker University of

High Resolution I m aging From Single Molecules to Cells & Tissues Higher Order Structures /

THE MOLECULAR BASIS OF THE ACTIVATION OF FIBROBLAST GROWTH FACTORS BY GLYCOSAMINOGLYCANS

Sambuz

Useful Links

Newsletter

Mail Us

Computational Methods in Systems Biology The hottest scientific - PDF document

What is Biology? A branch of knowledge that deals with living organisms and vital processes Computational Methods in Systems Biology The hottest scientific frontier of our times Many great processes have been figured out

Deep Computing in Biology Challenges and Progress Ajay K. Royyuru Computational Biology Center

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Using BlenX for Systems Biology Corrado Priami CoSBi Outline of the talk 1. Systems biology 2.

Computational Methods for Systems Biology and Synthetic Biology Franois Fages, Constraint

Curation of computational biology models Curation of computational biology models Anand

Computational and Mathematical Biology Computational and Mathematical Biology in the Genomics

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

1. Introduction to Molecular &amp; Systems Biology EECS 600: Systems Biology &amp;

Systems Biology Overview Dr. Shaila C. Rssle 1 Topics to be discussed What is

Computational models of of biological biological Computational models systems systems

Deciphering the Face Deciphering the Face Aleix M. Martinez Computational Biology Computational

Computational Challenges in Computational Challenges in Genomics and Molecular Biology Genomics

Synthetic Biology Considerations in Synthetic Biology Considerations in Synthetic Biology

Biology Majors Information Session Biology Advising Center NHB 2.606 Biology Advising Center

An Introduction to Analysis of Multiple Gene Expression Datasets Pratyaksha Wirapati Statistical

Food Matters Patrice Sutton, MPH Research Scientist Program on Reproductive Health and the

Working with protein structures / PDB files Structure of Triosephosphate Isomerase PDB ID: 1HTI

Disclosures I have nothing to disclose. Pleomorphic sarcomas: MFH, where did you go? Andrew

When medicine discovered sex Sarah Hiltner 18-12-2018 To participate join at www.menti.com

Probabilisti tic Model Checking &amp; P &amp; PRIS RISM Dave Parker University of

High Resolution I m aging From Single Molecules to Cells &amp; Tissues Higher Order Structures /

THE MOLECULAR BASIS OF THE ACTIVATION OF FIBROBLAST GROWTH FACTORS BY GLYCOSAMINOGLYCANS

Sambuz

Useful Links

Newsletter

Mail Us

1. Introduction to Molecular & Systems Biology EECS 600: Systems Biology &

Probabilisti tic Model Checking & P & PRIS RISM Dave Parker University of

High Resolution I m aging From Single Molecules to Cells & Tissues Higher Order Structures /