SLIDE 1 Structural Bioinformatics
Davide Baù Staff Scientist
Genome Biology Group (CNAG) Structural Genomics Group (CRG) dbau@pcb.ub.cat
SLIDE 2
Proteins
SLIDE 3
Amino Acids
SLIDE 4
The peptide bond
Properties A peptide bond is a covalent bond formed between two molecules when the carboxyl group of one molecule reacts with the amino group of the other molecule, causing the release of a molecule of water (H2O). Polypeptides and proteins are chains of amino acids held together by peptide bonds.
SLIDE 5 Adapted from http://oregonstate.edu
Only 2 bonds can freely rotate: Cα–N and Cα-C(O)
The peptide bond
The peptide bond is planar Fixed Fixed
SLIDE 6 Protein structures Φ and Ψ angles fall within allowed regions (displayed in green and red). Secondary structure elements are defined by specific pairs of Φ and Ψ angles:
Ramachandran plots
Image credits: http://www.imb-jena.de/ ~rake
SLIDE 7
Take home message
Proteins Chains of amino acids held together by the peptide bond Configuration Defined by limited pairs of Φ and Ψ angles Role Fundamental constituents of the cell
SLIDE 8 Summary
Protein structural levels
Image credits: http:// iitb.vlab.co.in/
Primary Secondary Tertiary Quaternary
SLIDE 9
Protein structure relevance
The biochemical function (activity) of a protein is defined by its interactions with other molecules. The biological function is in large part a consequence of these interactions. The 3D structure is more informative than sequence because interactions are determined by residues that are close in space but are frequently distant in sequence.
SLIDE 10
Protein prediction vs protein determination
Experimental data inferred data X-Ray NMR Comparative Modeling Threading Ab-initio
SLIDE 11 Utility of protein structure models, despite errors
- D. Baker & A. Sali. Science 294, 93, 2001.
SLIDE 12 NMR spectroscopy
Nuclear magnetic resonance TOCSY NOESY
7.5 8.0 8.5 ppm 8.0 8.5 ppm 20/21 2/3 3/4 4/5 25/26 24/25 12/13 21/22 9/10 8/9 22/23 16/17 31/32 27/28+ 28/29 30/31 13/14
ppm
8.0 8.5 8.0 8.5 7.5
ppm
1.5 2.0 2.5 3.0 3.5 4.0 4.5 ppm 8.0 8.5 ppm
αR20 βR20 αV2 βV2 γV2 αV21 βV21 γV21 γV21 αN10 βN10 αH9 βH9 β−βAla18 α−βAla18 β−βAla19 α−βAla19 δR25 δR20 αL11 βL11 γL11 αG12 αG12 αQ29 γQ29 βQ29 αY34 βY34 βH14 βH14 βD30 αD30 αQ6 γQ6 βQ6 βH32 βH32 βN16 γE22 βE22 αE4 γE4 βE4 βR25 γR25 γR25 βN33+ βN16 αNle8 βNle8 γNle8 γNle8 βL7 βL28 βV31 βI5 γI5 γL24 βL24
ppm
8.5 8.0 2.5 3.5 1.5
ppm
SLIDE 13 Superimposition of the ensemble of lowest energy structures
NMR spectroscopy
Nuclear magnetic resonance
SLIDE 14
X-RAY crystallography
SLIDE 15
X-RAY crystallography
SLIDE 16
Take home message
Protein types Fibrous Membrane Globular Biochemical function Activity depends on the 3D structure Evolution conserve Structure is more conserved than sequence
SLIDE 17
Nucleic acids
DNA and RNA
SLIDE 18
Nucleic acids
DNA and RNA DNA and RNA are polymers made up of repeating units called nucleotides. Each nucleotide is composed of a nitrogen-containing nucleobase, a monosaccharide sugar and a phosphate group. The nucleotides are joined to one another in a chain by sugar- nucleobase covalent bonds. DNA (Deoxyribonucleic acid) encodes the genetic information. RNA (Ribonucleic acid) is implicated in various biological roles including coding, decoding, regulation, and expression of genes.
SLIDE 19 The nucleotides
DNA Sugar Phosphate group Nitrogenous base
Guanine (G), Adenine (A), Thymine (T), or Cytosine (C)
SLIDE 20 The nucleotides
DNA Sugar Phosphate group Nitrogenous base
Guanine (G), Adenine (A), Thymine (T), or Cytosine (C)
OH
RNA
Uracil (U)
SLIDE 21
Nitrogens bases
Cytosine (C) Guanine (G) Thymine (T) DNA Adenine (A)
SLIDE 22
Nitrogens bases
Cytosine (C) Guanine (G) Thymine (T) DNA Adenine (A) Uracil (U) RNA
SLIDE 23
The phosphodiester bond
P P S B
SLIDE 24
The phosphodiester bond
P P S B
SLIDE 25
Helix stability
Hydrogen bonds and base-stacking interactions The two types of base pairs form different numbers of hydrogen bonds (2 for AT, 3 for GC). The DNA double helix is maintained largely by the intra-strand base stacking interactions (GC > AT). The stability of the dsDNA form depends also on sequence and length. DNA with high GC-content is more stable than DNA with low GC- content.
SLIDE 26
Base pairing
DNA
SLIDE 27
Base pairing
RNA
SLIDE 28
Nucleic acids helical structures
A-DNA B-DNA Z-DNA
SLIDE 29 Nucleic acids helical structures
A B Z
Helix sense
R R L
bp per turn
11 10 12
Vertical rise per bp (Å)
2.56 3.4 3.7
Rotation per bp (degrees)
+33 +36
Helical diameter (Å)
23 19 18
SLIDE 30
Nucleic acids helical structures
A-DNA B-DNA Z-DNA
SLIDE 31
Major and minor groove
Major groove Minor groove
SLIDE 32
The helical structure and DNA
Rosalind Franklin
SLIDE 33
Take home message
DNA and RNA Polymers of nucleotide units Nucleotides Nucleobase (G,C,A,T - U) + sugar +phosphate DNA Store the genetic information RNA Implicated in various biological processes
SLIDE 34
Genomes
Limited data types
SLIDE 35 hormone
Activity Organization Processes
The role of chromatin structure
SLIDE 36
Chromatin definition
Chromatin is composed of DNA complexed with histone proteins and other bio-molecules. Chromatin formation enables the genome to be hierarchically packaged or condensed so that it can fit inside the nuclear space. The compaction allows to modulate gene transcription, DNA repair, recombination, and replication. Chromatin structure is considered highly dynamic.
SLIDE 37
Chromatin structures
SLIDE 38 The nuclear organization of DNA
Chromosome Chromatin fibre Nucleosome
Adapted from Richard E. Ballermann, 2012
SLIDE 39 The resolution gap
What do we “really” know?
μ 10 10 10 Resolution s Time 10 10 10 10 10 10 10 10 μm Volume 10 10 10 10 10 DNA length nt 10 10 10 10
Knowledge
IDM INM
SLIDE 40 The nucleosome
Gene Histone Histone tail Methyl group Acetyl group DNA Histone proteins
SLIDE 41 The nucleosome & chromatin marks
Gene Histone Histone tail Methyl group Acetyl group DNA Histone proteins
Modification H3K4 H3K9 H3K14 H3K27 H3K79 H4K20 H2BK5 mono- methylation activation activation activation activation activation activation di-methylation activation repression repressio n activation tri- methylation activation repression repressio n activation, repression repression acetylation activation activation
SLIDE 42 Euchromatin and heterochromatin
Euchromatin: chromatin that is located away from the nuclear lamina, is generally less densely packed, and contains actively transcribed genes Heterochromatin: chromatin that is near the nuclear lamina, tightly condensed, and transcriptionally silent
Electron microscopy
SLIDE 43
Complex genome organization
Takizawa, T., Meaburn, K. J. & Misteli, Cell 135, 9–13 (2008)
Chromosome size Gene density Expression
SLIDE 44 Lamina-genome interactions
to neural/glial The poising’’ ), in promoters here and architec-
large step
nuclear membrane nuclear lamina internal chromatin
(mostly active)
lamina-associated domains
(repressed)
Genes mRNA
AC
“Unlocking” gene Stemcell genes Cell-cycle gene Neuronal gene
Most genes in Lamina Associated Domains are transcriptionally silent, suggesting that lamina-genome interactions are widely involved in the control of gene expression
Adapted from Molecular Cell 38, 603-613, 2010
SLIDE 45 Complex genome organization
Cavalli, G. & Misteli, Nat Struct Mol Biol 20, 290–299 (2013)
DNA Chromatin domains Superdomains Chromosome territories Lamina Transcription hub Centromere cluster Nuclear pore Inactive Active Non- coding Nucleus Marina Corral
SLIDE 46
Chromatin loops
Loops bring distal genomic regions in close proximity to one another. This in turn can have profound effects on gene transcription. Enhancers can be thousands of kilobases away from their target genes in any direction (or even on a separate chromosome). Gene Gene enhancers Gene activity
SLIDE 47
Main approaches
SLIDE 48 5C technology
http://my5C.umassmed.edu
Job Dekker
Dostie et al. Genome Res (2006) vol. 16 (10) pp. 1299-309
SLIDE 49
Biomolecular structure determination 2D-NOESY data Chromosome structure determination 3C-based data
Structure determination using Hi-C data
SLIDE 50 Interpreting chromatin interaction data
Nuclear envelope
Subnuclear body
factory Protein- complex- mediated interaction Direct interaction Bystander interaction Baseline (polymer) interaction Interaction with same subnuclear structures
Adapted from Dekker et all, (2013) Nat Rev Genetics
SLIDE 51 Hi-C data and genomic tracks data
enrichment Interaction depletion Mouse chromosome 18 20 Mb
Adapted from Dekker et all, (2013) Nat Rev Genetics
SLIDE 52 Adapted from Dekker et all, (2013) Nat Rev Genetics
A compartments 20 Mb 2 Mb B compartments Interaction preference TADs Compartments
Human chromosome 14 A-B compartments TADs
Genome Organization
Dekker, J., Marti-Renom, M. A. & Mirny, L. A.Nat Rev Genet (2013)
SLIDE 53 1 Mb
b
100 Mb 101 Mb 102 Mb
B F G H I E D C A T A D s 2 98% of max Median count in 30-kb window
chrX 99 Mb 100 Mb 101 Mb 102 Mb 99 Mb 103 Mb 103 Mb
F G H I E D
TADs
Topologically associating domains (TADs) can be made of up to hundreds of kb in size Loci located within TADs tend to interact more frequently with each other than with loci located outside their domain The human and mouse genomes are each composed of over 2,000 TADs, covering over 90% of the genome
Topologically Associating Domains (TADs)
SLIDE 54
Take home message
Chromatin = DNA + (histone) proteins + other biomolecules The genome is well organized and hierarchically packaged Histone modifications affect chromatin structure and activity 3C-like data measure the frequency of interaction between distant loci
SLIDE 55 [1] G. W. Beadle and E. L. Tatum. Genetic control of biochemical reactions in neurospora. Proc Natl Acad Sci U S A, 27(11):499–506, 1941. [2] I. H. G. S. Consortium. Finishing the euchromatic sequence of the human genome. Nature, 431(7011):931– 45, 2004. [3] F. H. Crick, L. Barnett, S. Brenner, and R. J. Watts-Tobin. General nature of the genetic code for
- proteins. Nature,192:1227–32, 1961.
[4] M. Grunberg-Manago, P. J. Oritz, and S. Ochoa. Enzymatic synthesis of nucleic acidlike
- polynucleotides. Science, 122(3176):907–10, 1955.
[5] H. G. Khorana. Polynucleotide synthesis and the genetic code. Fed Proc, 24(6):1473–87, 1965. [6] P. Leder and M. W. Nirenberg. Rna codewords and protein synthesis, 3. on the nucleotide sequence of a cysteine and a leucine rna codeword. Proc Natl Acad Sci U S A, 52:1521–9, 1964. [7] J. H. Matthaei, O. W. Jones, R. G. Martin, and M. W. Nirenberg. Characteristics and composition of rna coding units. Proc Natl Acad Sci U S A, 48:666–77, 1962. [8] F. Sanger and A. R. Coulson. A rapid method for determining sequences in dna by primed synthesis with dna
- polymerase. J Mol Biol, 94(3):441–8, 1975.
[9] F. Sanger and H. Tuppy. The amino-acid sequence in the phenylalanyl chain of insulin. i. the identification of lower peptides from partial hydrolysates. Biochem J, 49(4):463–81, 1951. [10] J. D. Watson and F. H. Crick. Molecular structure of nucleic acids; a structure for deoxyribose nucleic
- acid. Nature, 171(4356):737–8, 1953