[PPT] - Structural Bioinformatics Davide Ba Staff Scientist Genome PowerPoint Presentation

SLIDE 1

Structural Bioinformatics

Davide Baù Staff Scientist

Genome Biology Group (CNAG) Structural Genomics Group (CRG) dbau@pcb.ub.cat

SLIDE 2

Proteins

SLIDE 3

Amino Acids

SLIDE 4

The peptide bond

Properties A peptide bond is a covalent bond formed between two molecules when the carboxyl group of one molecule reacts with the amino group of the other molecule, causing the release of a molecule of water (H2O). Polypeptides and proteins are chains of amino acids held together by peptide bonds.

SLIDE 5

Adapted from http://oregonstate.edu

Only 2 bonds can freely rotate: Cα–N and Cα-C(O)

The peptide bond

The peptide bond is planar Fixed Fixed

SLIDE 6

Protein structures Φ and Ψ angles fall within allowed regions (displayed in green and red). Secondary structure elements are defined by specific pairs of Φ and Ψ angles:

Ramachandran plots

Image credits: http://www.imb-jena.de/ ~rake

SLIDE 7

Take home message

Proteins Chains of amino acids held together by the peptide bond Configuration Defined by limited pairs of Φ and Ψ angles Role Fundamental constituents of the cell

SLIDE 8

Summary

Protein structural levels

Image credits: http:// iitb.vlab.co.in/

Primary Secondary Tertiary Quaternary

SLIDE 9

Protein structure relevance

The biochemical function (activity) of a protein is defined by its interactions with other molecules. The biological function is in large part a consequence of these interactions. The 3D structure is more informative than sequence because interactions are determined by residues that are close in space but are frequently distant in sequence.

SLIDE 10

Protein prediction vs protein determination

Experimental data inferred data X-Ray NMR Comparative Modeling Threading Ab-initio

SLIDE 11

Utility of protein structure models, despite errors

D. Baker & A. Sali. Science 294, 93, 2001.

SLIDE 12

NMR spectroscopy

Nuclear magnetic resonance TOCSY NOESY

7.5 8.0 8.5 ppm 8.0 8.5 ppm 20/21 2/3 3/4 4/5 25/26 24/25 12/13 21/22 9/10 8/9 22/23 16/17 31/32 27/28+ 28/29 30/31 13/14

ppm

8.0 8.5 8.0 8.5 7.5

ppm

1.5 2.0 2.5 3.0 3.5 4.0 4.5 ppm 8.0 8.5 ppm

αR20 βR20 αV2 βV2 γV2 αV21 βV21 γV21 γV21 αN10 βN10 αH9 βH9 β−βAla18 α−βAla18 β−βAla19 α−βAla19 δR25 δR20 αL11 βL11 γL11 αG12 αG12 αQ29 γQ29 βQ29 αY34 βY34 βH14 βH14 βD30 αD30 αQ6 γQ6 βQ6 βH32 βH32 βN16 γE22 βE22 αE4 γE4 βE4 βR25 γR25 γR25 βN33+ βN16 αNle8 βNle8 γNle8 γNle8 βL7 βL28 βV31 βI5 γI5 γL24 βL24

ppm

8.5 8.0 2.5 3.5 1.5

ppm

SLIDE 13

Superimposition of the ensemble of lowest energy structures

f a peptide.

NMR spectroscopy

Nuclear magnetic resonance

SLIDE 14

X-RAY crystallography

SLIDE 15

X-RAY crystallography

SLIDE 16

Take home message

Protein types Fibrous Membrane Globular Biochemical function Activity depends on the 3D structure Evolution conserve Structure is more conserved than sequence

SLIDE 17

Nucleic acids

DNA and RNA

SLIDE 18

Nucleic acids

DNA and RNA DNA and RNA are polymers made up of repeating units called nucleotides. Each nucleotide is composed of a nitrogen-containing nucleobase, a monosaccharide sugar and a phosphate group. The nucleotides are joined to one another in a chain by sugar- nucleobase covalent bonds. DNA (Deoxyribonucleic acid) encodes the genetic information. RNA (Ribonucleic acid) is implicated in various biological roles including coding, decoding, regulation, and expression of genes.

SLIDE 19

The nucleotides

DNA Sugar Phosphate group Nitrogenous base

Guanine (G), Adenine (A), Thymine (T), or Cytosine (C)

SLIDE 20

The nucleotides

DNA Sugar Phosphate group Nitrogenous base

Guanine (G), Adenine (A), Thymine (T), or Cytosine (C)

OH

RNA

Uracil (U)

SLIDE 21

Nitrogens bases

Cytosine (C) Guanine (G) Thymine (T) DNA Adenine (A)

SLIDE 22

Nitrogens bases

Cytosine (C) Guanine (G) Thymine (T) DNA Adenine (A) Uracil (U) RNA

SLIDE 23

The phosphodiester bond

P P S B

SLIDE 24

The phosphodiester bond

P P S B

SLIDE 25

Helix stability

Hydrogen bonds and base-stacking interactions The two types of base pairs form different numbers of hydrogen bonds (2 for AT, 3 for GC). The DNA double helix is maintained largely by the intra-strand base stacking interactions (GC > AT). The stability of the dsDNA form depends also on sequence and length. DNA with high GC-content is more stable than DNA with low GC- content.

SLIDE 26

Base pairing

DNA

SLIDE 27

Base pairing

RNA

SLIDE 28

Nucleic acids helical structures

A-DNA B-DNA Z-DNA

SLIDE 29

Nucleic acids helical structures

A B Z

Helix sense

R R L

bp per turn

11 10 12

Vertical rise per bp (Å)

2.56 3.4 3.7

Rotation per bp (degrees)

+33 +36

30

Helical diameter (Å)

23 19 18

SLIDE 30

Nucleic acids helical structures

A-DNA B-DNA Z-DNA

SLIDE 31

Major and minor groove

Major groove Minor groove

SLIDE 32

The helical structure and DNA

Rosalind Franklin

SLIDE 33

Take home message

DNA and RNA Polymers of nucleotide units Nucleotides Nucleobase (G,C,A,T - U) + sugar +phosphate DNA Store the genetic information RNA Implicated in various biological processes

SLIDE 34

Genomes

Limited data types

SLIDE 35

hormone

Activity Organization Processes

The role of chromatin structure

SLIDE 36

Chromatin definition

Chromatin is composed of DNA complexed with histone proteins and other bio-molecules. Chromatin formation enables the genome to be hierarchically packaged or condensed so that it can fit inside the nuclear space. The compaction allows to modulate gene transcription, DNA repair, recombination, and replication. Chromatin structure is considered highly dynamic.

SLIDE 37

Chromatin structures

SLIDE 38

The nuclear organization of DNA

Chromosome Chromatin fibre Nucleosome

Adapted from Richard E. Ballermann, 2012

SLIDE 39

The resolution gap

What do we “really” know?

μ 10 10 10 Resolution s Time 10 10 10 10 10 10 10 10 μm Volume 10 10 10 10 10 DNA length nt 10 10 10 10

Knowledge

IDM INM

SLIDE 40

The nucleosome

Gene Histone Histone tail Methyl group Acetyl group DNA Histone proteins

SLIDE 41

The nucleosome & chromatin marks

Gene Histone Histone tail Methyl group Acetyl group DNA Histone proteins

Modification H3K4 H3K9 H3K14 H3K27 H3K79 H4K20 H2BK5 mono- methylation activation activation activation activation activation activation di-methylation activation repression repressio n activation tri- methylation activation repression repressio n activation, repression repression acetylation activation activation

SLIDE 42

Euchromatin and heterochromatin

Euchromatin: chromatin that is located away from the nuclear lamina, is generally less densely packed, and contains actively transcribed genes Heterochromatin: chromatin that is near the nuclear lamina, tightly condensed, and transcriptionally silent

Electron microscopy

SLIDE 43

Complex genome organization

Takizawa, T., Meaburn, K. J. & Misteli, Cell 135, 9–13 (2008)

 

Chromosome size Gene density Expression

SLIDE 44

Lamina-genome interactions

to neural/glial The poising’’ ), in promoters here and architec-

ver-

large step

nuclear membrane nuclear lamina internal chromatin

(mostly active)

lamina-associated domains

(repressed)

Genes mRNA

AC

“Unlocking” gene Stemcell genes Cell-cycle gene Neuronal gene

Most genes in Lamina Associated Domains are transcriptionally silent, suggesting that lamina-genome interactions are widely involved in the control of gene expression

Adapted from Molecular Cell 38, 603-613, 2010

SLIDE 45

Complex genome organization

Cavalli, G. & Misteli, Nat Struct Mol Biol 20, 290–299 (2013)

DNA Chromatin domains Superdomains Chromosome territories Lamina Transcription hub Centromere cluster Nuclear pore Inactive Active Non- coding Nucleus Marina Corral

SLIDE 46

Chromatin loops

Loops bring distal genomic regions in close proximity to one another. This in turn can have profound effects on gene transcription. Enhancers can be thousands of kilobases away from their target genes in any direction (or even on a separate chromosome). Gene Gene enhancers Gene activity

SLIDE 47

Main approaches

SLIDE 48

5C technology

http://my5C.umassmed.edu

Job Dekker

Dostie et al. Genome Res (2006) vol. 16 (10) pp. 1299-309

SLIDE 49

Biomolecular structure determination 2D-NOESY data Chromosome structure determination 3C-based data

Structure determination using Hi-C data

SLIDE 50

Interpreting chromatin interaction data

Nuclear envelope

r lamina

Subnuclear body

r transcription

factory Protein- complex- mediated interaction Direct interaction Bystander interaction Baseline (polymer) interaction Interaction with same subnuclear structures

Adapted from Dekker et all, (2013) Nat Rev Genetics

SLIDE 51

Hi-C data and genomic tracks data

RefSeq genes
Interaction

enrichment Interaction depletion Mouse chromosome 18 20 Mb

DNase I sensitivity

Adapted from Dekker et all, (2013) Nat Rev Genetics

SLIDE 52

Adapted from Dekker et all, (2013) Nat Rev Genetics

A compartments 20 Mb 2 Mb B compartments Interaction preference TADs Compartments

Human chromosome 14 A-B compartments TADs

Genome Organization

Dekker, J., Marti-Renom, M. A. & Mirny, L. A.Nat Rev Genet (2013)

SLIDE 53

1 Mb

b

100 Mb 101 Mb 102 Mb

B F G H I E D C A T A D s 2 98% of max Median count in 30-kb window

chrX 99 Mb 100 Mb 101 Mb 102 Mb 99 Mb 103 Mb 103 Mb

F G H I E D

TADs

Topologically associating domains (TADs) can be made of up to hundreds of kb in size Loci located within TADs tend to interact more frequently with each other than with loci located outside their domain The human and mouse genomes are each composed of over 2,000 TADs, covering over 90% of the genome

Topologically Associating Domains (TADs)

SLIDE 54

Take home message

Chromatin = DNA + (histone) proteins + other biomolecules The genome is well organized and hierarchically packaged Histone modifications affect chromatin structure and activity 3C-like data measure the frequency of interaction between distant loci

SLIDE 55

[1] G. W. Beadle and E. L. Tatum. Genetic control of biochemical reactions in neurospora. Proc Natl Acad Sci U S A, 27(11):499–506, 1941. [2] I. H. G. S. Consortium. Finishing the euchromatic sequence of the human genome. Nature, 431(7011):931– 45, 2004. [3] F. H. Crick, L. Barnett, S. Brenner, and R. J. Watts-Tobin. General nature of the genetic code for

proteins. Nature,192:1227–32, 1961.

[4] M. Grunberg-Manago, P. J. Oritz, and S. Ochoa. Enzymatic synthesis of nucleic acidlike

polynucleotides. Science, 122(3176):907–10, 1955.

[5] H. G. Khorana. Polynucleotide synthesis and the genetic code. Fed Proc, 24(6):1473–87, 1965. [6] P. Leder and M. W. Nirenberg. Rna codewords and protein synthesis, 3. on the nucleotide sequence of a cysteine and a leucine rna codeword. Proc Natl Acad Sci U S A, 52:1521–9, 1964. [7] J. H. Matthaei, O. W. Jones, R. G. Martin, and M. W. Nirenberg. Characteristics and composition of rna coding units. Proc Natl Acad Sci U S A, 48:666–77, 1962. [8] F. Sanger and A. R. Coulson. A rapid method for determining sequences in dna by primed synthesis with dna

polymerase. J Mol Biol, 94(3):441–8, 1975.

[9] F. Sanger and H. Tuppy. The amino-acid sequence in the phenylalanyl chain of insulin. i. the identification of lower peptides from partial hydrolysates. Biochem J, 49(4):463–81, 1951. [10] J. D. Watson and F. H. Crick. Molecular structure of nucleic acids; a structure for deoxyribose nucleic

acid. Nature, 171(4356):737–8, 1953