Protein Sequence Analysis Protein Sequence Analysis Protein - - PowerPoint PPT Presentation

protein sequence analysis protein sequence analysis
SMART_READER_LITE
LIVE PREVIEW

Protein Sequence Analysis Protein Sequence Analysis Protein - - PowerPoint PPT Presentation

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence motifs Premise: the sequence of a protein Premise: the sequence of a protein sequence gives clues about its structure sequence gives clues


slide-1
SLIDE 1

Protein Sequence Analysis Protein Sequence Analysis

slide-2
SLIDE 2

Protein sequence motifs Protein sequence motifs

ß ß Premise: the sequence of a protein

Premise: the sequence of a protein sequence gives clues about its structure sequence gives clues about its structure and function. and function.

ß ß In the 80s, scientists looked directly for

In the 80s, scientists looked directly for clusters of residues that were indicative of clusters of residues that were indicative of function. function.

slide-3
SLIDE 3

Prosite Prosite

ß ß

In some cases the sequence of an unknown protein is too distantly related to any In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by overall sequence alignment. protein of known structure to detect its resemblance by overall sequence alignment. However, relationships can be revealed by the occurrence in its sequence of a However, relationships can be revealed by the occurrence in its sequence of a particular cluster of residue types, which is variously known as a pattern, motif, particular cluster of residue types, which is variously known as a pattern, motif, signature or fingerprint. These motifs arise because specific region(s) of a protein signature or fingerprint. These motifs arise because specific region(s) of a protein which may be important, for example, for their binding properties or for their which may be important, for example, for their binding properties or for their enzymatic activity are conserved in both structure and sequence. These structural enzymatic activity are conserved in both structure and sequence. These structural requirements impose very tight constraints on the evolution of this small but important requirements impose very tight constraints on the evolution of this small but important portion(s) of a protein sequence. The use of protein sequence patterns or profiles to portion(s) of a protein sequence. The use of protein sequence patterns or profiles to determine the function of proteins is becoming very rapidly one of the essential tools determine the function of proteins is becoming very rapidly one of the essential tools

  • f sequence analysis. Many authors ( 3,4) have recognized this reality. Based on
  • f sequence analysis. Many authors ( 3,4) have recognized this reality. Based on

these observations, we decided in 1988, to actively pursue the development of a these observations, we decided in 1988, to actively pursue the development of a database of regular expression-like patterns, which would be used to search against database of regular expression-like patterns, which would be used to search against sequences of unknown function. sequences of unknown function.

Kay Hofmann , Kay Hofmann ,Philipp Philipp Bucher, Laurent Bucher, Laurent Falquet Falquet and Amos and Amos Bairoch Bairoch

The PROSITE database, its status in 1999 The PROSITE database, its status in 1999

slide-4
SLIDE 4

Basic idea Basic idea

ß ß It is a heuristic approach. Start with the

It is a heuristic approach. Start with the following: following:

ß ß A collection of sequences with the same function. A collection of sequences with the same function. ß ß Region/residues known to be significant for maintaining Region/residues known to be significant for maintaining structure and function. structure and function.

ß ß Develop a pattern of conserved residues around

Develop a pattern of conserved residues around the residues of interest the residues of interest

ß ß Iterate for appropriate sensitivity and specificity

Iterate for appropriate sensitivity and specificity

slide-5
SLIDE 5

From alignment to regular expressions From alignment to regular expressions

* ALRDFATHDDF SMTAEATHDSI ECDQAATHEAS ATH-[DE]

  • Search Swissprot with the resulting pattern
  • Refine pattern to eliminate false positives
  • Iterate
slide-6
SLIDE 6

Zinc Finger domain Zinc Finger domain

slide-7
SLIDE 7

Proteins containing Proteins containing zf zf domains domains

How can we find a motif corresponding to a zf domain

slide-8
SLIDE 8

The sequence analysis perspective The sequence analysis perspective

ß Zinc Finger motif

ß ß C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H ß 2 conserved C, and 2 conserved H

ß How can we search a database using these motifs?

ß The motif is described using a regular expression. What is a regular expression? ß How can we search for a match to a regular expression? Not allowed to use Perl :-)

ß The ‘regular expression’ motif is weak. How can we

make it stronger

slide-9
SLIDE 9
slide-10
SLIDE 10

Profiles Profiles

slide-11
SLIDE 11

Scoring Profiles Scoring Profiles

S(i, j) = fik

k

Â

M k, j

[ ]