Introduction to Bioinformatics http://theory.bio.uu.nl/BDA/2015 - - PDF document

introduction to bioinformatics
SMART_READER_LITE
LIVE PREVIEW

Introduction to Bioinformatics http://theory.bio.uu.nl/BDA/2015 - - PDF document

21 Mar 15 Info and documentation Introduction to Bioinformatics http://theory.bio.uu.nl/BDA/2015 http://www.google.com but only for guidance and hints: never take the internet for granted Campbell Biology, 9 th or 10 th


slide-1
SLIDE 1

21‐Mar‐15 1

Introduction to Bioinformatics

Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 19th 2015

Info and documentation

  • http://theory.bio.uu.nl/BDA/2015
  • http://www.google.com

– … but only for guidance and hints: never take the internet for granted

  • Campbell Biology, 9th or 10th edition, Pearson

p gy, ,

  • Reader

– Printed in black and white – Download full color PDF at: http://theory.bio.uu.nl/BDA/2015/BioInf2015.pdf – Errata: http://theory.bio.uu.nl/BDA/2015/errata.html

Evaluation

  • Final mark course

– 2/3 mark of Mathematics/Theoretical Biology – 1/3 mark of Bioinformatic Data Analysis

  • Bioinformatics: mark of written exam only

– NOTE: this is different from info in studiegids! – Date: April 9th 2015 at 17:00‐20:00 in Educatorium Gamma

  • Bonus point

– NOTE: this is different from info in studiegids! – Make all practicals and have them signed by the assistant

  • In case of emergencies you can be late by one class maximum

– Hand in your mini‐article on time (deadline: April 7th 2015) through http://theory.bio.uu.nl/sb/rooster.html – The bonus point will only be added to the mark of the written exam if this mark is >4 before addition – The maximum mark is a 10

How would you figure out the function of a protein?

X‐ray structure Activity assay Knock‐out mouse BLAST search

How about for all proteins in a genome? Genome sizes

Chaos chaos (1.4 Tb, Friz 1968)

Tb: Tera base pairs (1012) Gb: Giga base pairs (109) Mb: Mega base pairs (106) Kb: Kilo base pairs (103)

slide-2
SLIDE 2

21‐Mar‐15 2

Gene density and non‐coding DNA

  • Mammals (including humans) have the lowest gene

density

– Number of genes in a given length of DNA

  • Introns within genes
  • Noncoding DNA between genes

Components of the human genome

  • 20,000 – 25,000 protein‐coding genes (1.5%)
  • Introns (25.9%)
  • Transposable elements (44.7%)

– DNA transposons – Long terminal repeat (LTR) retrotransposons – Short interspersed nuclear elements (SINEs) – Long interspersed nuclear elements (LINEs) – Endogenous retroviruses – Miniature inverted repeat transposable elements (MITEs)

Largest genomes

Largest sequenced genome: Loblolly pine (Pinus taeda) 20 000 000 000 b (20 Gb) 20,000,000,000 bp (20 Gb) Kinugasasō (Paris japonica) 149,000,000,000 bp (149 Gb)

Smallest genomes

  • Eukaryota

– Free: Ostreococcus tauri (12.6 Mb) – Endosymb: Encephalitozoon intestinalis (2.3 Mb)

  • Bacteria and Archaea

– Free: Mycoplasma genitalium (580 kb) – Endosymb: Cand. Carsonella ruddii (160 kb)

  • Viruses

– Circoviridae (1.8 kb – only two proteins!)

Genetic diversity

  • Phylogenetic Tree of Life

Eukaryotes

Prokaryotes

Bacteria Archaea

Human genome

  • 3,000,000,000 bp (3 Gb)
  • Human Genome Project (HGP)

– 1990‐2003 – Draft genome sequence complete in 2000

  • Reference genome

– Source: blood (female) and sperm (male) l k f d b l – Samples taken from many donors, but only a few were used to protect donor identities – Sequence is not from one individual

  • >70% from one male donor
  • Cost HGP: $ 3,000,000,000

– Target: $ 1,000 genome

slide-3
SLIDE 3

21‐Mar‐15 3

Genome sequencing

Cloned genomes Segments known order Fragment and sequence Assemble sequences Consensus genome

Whole Genome Shotgun (WGS) approach Personal genome sequences

Craig Venter James Watson ~2.000.000 differences Craig Venter James Watson Reference Genome ~5.000.000 differences ~5.000.000 differences

Your personal genome sequence So we have a $200 personal genome…

  • …now the million dollar question is:

What can I learn from my 3,000,000,000 A’s, C’s, G’s, and T’s?

Personalized medicine

  • From reactive to proactive medicine

Sergey Brin

Co‐founder

Sergey Brin

Co‐founder LRRK2 polymorphism on chromosome 12 ‐ 28% risk of Parkinson’s at age 59 ‐ 51% at age 69 ‐ 74% at age 79 Co‐invester Co‐invester

From reactive to proactive medicine

– Identify high risk alleles – Adapt lifestyle (e.g. risk of high blood pressure) – Preventive screening or treatment (e.g. risk of cancer)

  • Pharmacogenomics:

– Impact of genetic variation on response to medication

slide-4
SLIDE 4

21‐Mar‐15 4

Biology is Big Data science

# sequenced genomes Moore's Law: computer power doubles every ~2 years.

Omics sciences

  • The suffix ‐ome refers to a totality of some sort
  • Gene (genetics)
  • Transcript (RNA)
  • Protein
  • Genome
  • Transcriptome
  • Proteome
  • Genomics
  • Transcriptomics
  • Proteomics

RNA Protein

  • Metabolite
  • Lipid
  • Microbe
  • Metabolome
  • Lipidome
  • Microbiome
  • Metabolomics
  • Lipidomics
  • Microbiomics (?!)

DNA

Genomics

  • Identify differences in gene content between genomes
  • Discover new species: “Biological Dark Matter”
  • Analyze genome evolution
  • Predict gene functions

Chordata ↔ Echinodermata

Sample Filter

Metagenomics

Filter Microbes

  • r viruses

Image: Lisa Brown for

Human microbiome and virome

  • In your body: ~1013 human cells ~1014 bacteria ~1015 viruses

Bioinformatics

  • Bioinformatics: study of informatic processes in biotic

systems

Paulien Hogeweg and Ben Hesper (Utrecht University, 1970)

  • Bioinformatic Data Analysis: using computational methods

to analyze biological data

slide-5
SLIDE 5

21‐Mar‐15 5

Bioinformatics in Utrecht today