CSE182-L7 CSE182-L7 Protein structure Basics Protein structure - - PowerPoint PPT Presentation
CSE182-L7 CSE182-L7 Protein structure Basics Protein structure - - PowerPoint PPT Presentation
CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS Protein sequencing via MS Quiz Quiz What research won the Nobel prize in What research won the Nobel prize in Chemistry in 2004?
Quiz Quiz
ß ß What research won the Nobel prize in
What research won the Nobel prize in Chemistry in 2004? Chemistry in 2004?
ß ß In 2002?
In 2002?
A structural view of proteins A structural view of proteins
CS view of a protein CS view of a protein
- >sp|P00974|BPT1_BOVIN Pancreatic
>sp|P00974|BPT1_BOVIN Pancreatic trypsin trypsin inhibitor precursor (Basic inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) protease inhibitor) (BPI) (BPTI) ( (Aprotinin Aprotinin) - ) - Bos taurus Bos taurus (Bovine). (Bovine).
- MKMSRLCLSVALLVLLGTLAASTPGCDT
MKMSRLCLSVALLVLLGTLAASTPGCDT SNQAKAQRPDFCLEPPYTGPCKARIIRYF SNQAKAQRPDFCLEPPYTGPCKARIIRYF YNAKAGLCQTFVYGGCRAKRNNFKSAED YNAKAGLCQTFVYGGCRAKRNNFKSAED CMRTCGGAIGPWENL CMRTCGGAIGPWENL
Protein structure basics Protein structure basics
Side chains determine amino-acid type Side chains determine amino-acid type ß ß The residues may have different properties.
The residues may have different properties.
ß ß Aspartic acid (D), and
Aspartic acid (D), and Glutamic Glutamic Acid (E) are Acid (E) are acidic residues acidic residues
Bond angles form structural Bond angles form structural constraints constraints
Various constraints determine 3d Various constraints determine 3d structure structure ß ß Constraints
Constraints
ß ß Structural constraints due to physiochemical Structural constraints due to physiochemical properties properties ß ß Constraints due to bond angles Constraints due to bond angles ß ß H-bond formation H-bond formation
ß ß Surprisingly, a few conformations are seen
Surprisingly, a few conformations are seen
- ver and over again.
- ver and over again.
Alpha-helix Alpha-helix
ß ß 3.6 residues per
3.6 residues per turn turn
ß ß H-bonds between
H-bonds between 1st and 4th 1st and 4th residue stabilize residue stabilize the structure. the structure.
ß ß First discovered
First discovered by by Linus Pauling Linus Pauling
Beta-sheet Beta-sheet
ß ß
Each strand by itself has 2 residues per turn, and is not stable. Each strand by itself has 2 residues per turn, and is not stable.
ß ß
Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel.
ß ß
Beta sheets have long range interactions that stabilize the structure, while alpha-helices Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local interactions. have local interactions.
Domains Domains
ß ß The basic structures (helix, strand, loop)
The basic structures (helix, strand, loop) combine to form complex 3D structures. combine to form complex 3D structures.
ß ß Certain combinations are popular. Many
Certain combinations are popular. Many sequences, but only a few folds sequences, but only a few folds
3D structure 3D structure
- Predicting tertiary structure is an important problem in
Bioinformatics.
- Premise: Clues to structure can be found in the sequence.
- While de novo tertiary structure prediction is hard, there
are many intermediate, and tractable goals.
- The PDB database is a compendium of structures
PDB
Protein Domains Protein Domains
ß ß
An important realization (in the last decade) is that proteins have a An important realization (in the last decade) is that proteins have a modular architecture of domains/folds. modular architecture of domains/folds.
ß ß
Example: The zinc finger domain is a DNA-binding domain. Example: The zinc finger domain is a DNA-binding domain.
ß ß
What is a domain? What is a domain? ß ß Part of a sequence that can fold independently, and is present in Part of a sequence that can fold independently, and is present in
- ther sequences as well
- ther sequences as well
Proteins containing Proteins containing zf zf domains domains
How can we find a motif corresponding to a zf domain
Domain review Domain review
ß ß What is a domain?
What is a domain? ß ß How are domains expressed
How are domains expressed
ß ß Motifs (Regular expression & others) Motifs (Regular expression & others) ß ß Multiple alignments Multiple alignments ß ß Profiles Profiles ß ß Profile Profile HMMs HMMs
Prosite
http://us.expasy.org/prosite/
Protein Domain databases Protein Domain databases
ß ß Motifs
Motifs
ß ß PROSITE: Regular PROSITE: Regular Expressions & Expressions & Profiles Profiles ß ß BLOCKS:Multiple BLOCKS:Multiple Alignments Alignments ß ß Pfam Pfam: HMMS : HMMS PFAM
http://www.sanger.ac.uk/Software/Pfam/
How are Proteins Sequenced? How are Proteins Sequenced? Mass Spec 101: Mass Spec 101:
Nobel Citation 2002 Nobel Citation 2002
Nobel Citation, 2002 Nobel Citation, 2002
Mass Spectrometry Mass Spectrometry
Sample Preparation Sample Preparation
Enzymatic Digestion (Trypsin) + Fractionation
Single Stage MS Single Stage MS
Mass Spectrometry LC-MS: 1 MS spectrum / second
Tandem MS Tandem MS
Secondary Fragmentation
Ionized parent peptide
The peptide backbone The peptide backbone
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1 N-terminus C-terminus
The peptide backbone breaks to form fragments with characteristic masses.
Ionization Ionization
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1 N-terminus C-terminus
The peptide backbone breaks to form fragments with characteristic masses. Ionized parent peptide
H+
Fragment ion generation Fragment ion generation
H...-HN-CH-CO NH-CH-CO-NH-CH-CO-…OH Ri-1 Ri Ri+1
AA residuei-1 AA residuei AA residuei+1 N-terminus C-terminus
The peptide backbone breaks to form fragments with characteristic masses. Ionized peptide fragment
H+
Tandem MS for Peptide ID Tandem MS for Peptide ID
147 K 1166 L 260 1020 E 389 907 D 504 778 E 633 663 E 762 534 L 875 405 F 1022 292 G 1080 145 S 1166 88 y ions b ions 100 250 500 750 1000 [M+2H]2+ m/z % Intensity
Peak Assignment Peak Assignment
147 K 1166 L 260 1020 E 389 907 D 504 778 E 633 663 E 762 534 L 875 405 F 1022 292 G 1080 145 S 1166 88 y ions b ions 100 250 500 750 1000 y2 y3 y4 y5 y6 y7 b3 b4 b5 b8 b9 [M+2H]2+ b6 b7 y9 y8 m/z % Intensity Peak assignment implies Sequence (Residue tag) Reconstruction!
Database Searching for peptide ID Database Searching for peptide ID
ß ß For every peptide from a database
For every peptide from a database
ß ß Generate a hypothetical spectrum Generate a hypothetical spectrum ß ß Compute a correlation between observed Compute a correlation between observed and experimental spectra and experimental spectra ß ß Choose the best Choose the best
ß ß Database searching is very powerful and
Database searching is very powerful and is the is the de facto de facto standard for MS. standard for MS.
ß ß Sequest Sequest, Mascot, and many others , Mascot, and many others
Spectra: the real story Spectra: the real story
ß ß Noise Peaks
Noise Peaks
ß ß Ions, not prefixes & suffixes
Ions, not prefixes & suffixes
ß ß Mass to charge ratio, and not mass
Mass to charge ratio, and not mass
ß ß Multiply charged ions Multiply charged ions
ß ß Isotope patterns, not single peaks
Isotope patterns, not single peaks
Peptide fragmentation possibilities (ion types)
- HN-CH-CO-NH-CH-CO-NH-
Ri CH-R’
ai bi ci xn-i yn-i zn-i yn-i-1 bi+1
R”
di+1 vn-i wn-i
i+1 i+1
low energy fragments high energy fragments
Ion types, and offsets Ion types, and offsets
ß ß P = prefix residue mass
P = prefix residue mass
ß ß S = Suffix residue mass
S = Suffix residue mass
ß ß b-ions = P+1
b-ions = P+1
ß ß y-ions = S+19
y-ions = S+19
ß ß a-ions = P-27
a-ions = P-27
Mass-Charge ratio Mass-Charge ratio
ß ß The X-axis is (M+Z)/Z
The X-axis is (M+Z)/Z
ß ß Z=1 implies that peak is at M+1 Z=1 implies that peak is at M+1 ß ß Z=2 implies that peak is at (M+2)/2 Z=2 implies that peak is at (M+2)/2 ß ß M=1000, Z=2, peak position is at 501
M=1000, Z=2, peak position is at 501
ß ß Suppose you see a peak at 501. Is the mass Suppose you see a peak at 501. Is the mass 500, or is it 1000? 500, or is it 1000?
Isotopic peaks Isotopic peaks
ß ß Ex: Consider peptide SAM
Ex: Consider peptide SAM
ß ß Mass =
Mass = 308.12802 308.12802
ß ß You should see:
You should see:
ß ß Instead, you see
Instead, you see
308.13 308.13 310.13
Isotopes Isotopes
ß ß C-12 is the most common. Suppose C-13
C-12 is the most common. Suppose C-13
- ccurs with probability 1%
- ccurs with probability 1%
ß ß EX:
EX: SAM SAM
ß ß Composition: C11 H22 N3 O5 S1 Composition: C11 H22 N3 O5 S1