http://vision.unipv.it/ PROTEIN STRUCTURE ANALYSIS THROUGH HOUGH - - PowerPoint PPT Presentation

http vision unipv it protein structure analysis through
SMART_READER_LITE
LIVE PREVIEW

http://vision.unipv.it/ PROTEIN STRUCTURE ANALYSIS THROUGH HOUGH - - PowerPoint PPT Presentation

http://vision.unipv.it/ PROTEIN STRUCTURE ANALYSIS THROUGH HOUGH TRANSFORM AND RANGE TREE VIRGINIO CANTONI, Dipartimento di Informatica e Sistemistica, Universit di PAVIA, virginio.cantoni@unipv.it ELIO MATTIA, Center for Systems Chemistry,


slide-1
SLIDE 1

PROTEIN STRUCTURE ANALYSIS THROUGH HOUGH TRANSFORM AND RANGE TREE

VIRGINIO CANTONI, Dipartimento di Informatica e Sistemistica, Università di PAVIA, virginio.cantoni@unipv.it ELIO MATTIA, Center for Systems Chemistry, Rijksuniversiteit Groningen, Groningen, The Netherlands, E.Mattia@rug.nl

http://vision.unipv.it/

slide-2
SLIDE 2

Overview

  • Searching in a database of protein

structures

– Pairwise comparison – All-to-All comparison – Search for a structural “motif”

2

slide-3
SLIDE 3

General Hough transform approach to protein structure comparison 3C Vision

cues, contexts and channels Elsevier (April 2011)

  • V. Cantoni, S. Levialdi, B. Zavidovique

Università di Pavia, Roma, Université de Paris XI

slide-4
SLIDE 4

Image plane

4

Paul Hough , 1959 : straight lines

q = yi-mxi q y x y = mx+q m

  •  < m, q < +

r = x cos(q) + y sin(q) 0 < r < L2; -  q   Parameter space

slide-5
SLIDE 5

5

Example

slide-6
SLIDE 6

6

Richard Duda and Peter Hart 1972: Circles

y x yc xc

dyi/dxi

y

(xc,yc)

  • x
  • yc= -1/mi xc + (yi- mixi)

f((x,y),xc,yc,r) = (y-yc)2+(x-xc)2-r2=0

Image plane Parameter spaces Parameter space

slide-7
SLIDE 7

7

Exemple de vote: cercle

slide-8
SLIDE 8

Dana H. Ballard 1981: Generalized HT

Mapping rule X0 = x + r cos(a); Y0 = y + r sin(a)

8

slide-9
SLIDE 9

9

Exemple de vote : clé 1

slide-10
SLIDE 10

Basics on Proteins

  • A protein is an ordered sequence of amino

acids

  • Building blocks: 20 amino acid residues.
  • Three-dimensional shapes (“fold”) vary

enormously.

10

slide-11
SLIDE 11

Levels of protein structure representation

  • Primary structure
  • Secondary structure
  • Tertiary structure
  • Quaternary structure

11

slide-12
SLIDE 12

Primary structure: the sequence of amino acids

12

slide-13
SLIDE 13

Secondary structures

Three basic components:

  • helix
  • sheet
  • Loops (linear connections between the

components)

13

slide-14
SLIDE 14

The helix

  • One of the most

closely packed arrangement of residues.

  • ~40% of residues

in globular proteins

14

slide-15
SLIDE 15

The sheet

loosely packed arrangement of residues. Antiparallel Parallel Twisted

15

slide-16
SLIDE 16

Secondary Structures Representation

  • Secondary structures are represented as

linear vectors (segments): the axis for the alpha helix and the best fit segment for a strand

  • An alignment algorithm is used to match an

helix segments with known axes to determine helix axis. Direct segment fits are made to fit sheet strands.

16

slide-17
SLIDE 17

Secondary Structure Determination

  • Programs: DSSP and STRIDE.
  • On the average 4.8% of the target

residues were differently assigned, this number reaching 12% for certain targets.

17

slide-18
SLIDE 18

Distribution of segment lenght

18

slide-19
SLIDE 19

Protein Structure Comparison

What are the most similar folds ? Given a motif or domain or protein

PDB

19

slide-20
SLIDE 20

Secondary structure representation

  • Each segment is

associated to a secondary structure and is displayed as a cylinder

  • The protein is

represented by and

  • rdered sequence of

cylinder with two labels: helices or strands

20

slide-21
SLIDE 21

GHT applied to proteins

  • For every protein, the distance (r) of

every secondary structure from a reference point (RP, eg the geometric center of the protein) and the angle (theta) between the direction of the secondary structure in the 3D space and the segment linking the center of that secondary structure with the RP are first

  • calculated. (GH reference table RT)

21

slide-22
SLIDE 22

In the way of GHT (simplified 2D representation)

Query protein (scaled 0.5) Mapping Rule Votes Space helices and strands

22

slide-23
SLIDE 23

In the way of GHT

Query protein Mapping Rule Votes Space helices and strands

23

slide-24
SLIDE 24

Proteins: the 3D solution

24

slide-25
SLIDE 25

GH parameters spaces

Credits: Elio Mattia

25

slide-26
SLIDE 26

GHT applied to proteins

  • In the 3D space of a given “object

protein”, every secondary structure of a “model protein” votes a circumference of points starting from every secondary structure of the object protein.

  • If the proteins are similar in shape, the

circumferences will all intersect in a given point.

26

slide-27
SLIDE 27

Main characteristics

  • the mapping rule, for each compatible

correspondent, in 3D is a circle on a plane perpendicular to the axis of the secondary structure

  • Other information can be exploited to

increase the S/N ratio:

– the length of the secondary structure – the residues properties contained in the SS – any other (biochemical, morphological, etc.) peculiarities.

27

slide-28
SLIDE 28

The implementation

  • The voting space is smoothed by accumulation
  • f nearby votes (within a given radius) for

each point

  • After smoothing, the highest peaks in the

voting space are detected (avoiding to pick high votes that however are not the top of a peak but lie close to one such peak)

  • Only the relevant votes are stored in memory:

there isn’t a matrix with all the possible cells.

28

slide-29
SLIDE 29

Smoothing Algorithm

  • Smoothing is performed by accumulating votes within a

given radius, for every point in the vote space.

  • The classic version, i.e., checking every vote for the

vicinity condition, has been proven to be too time- consuming for applications, with a time complexity of O(n2), where n is the number of votes in the vote space.

  • The smoothing problem can be seen as an “orthogonal

search” problem, i.e., finding points within a given cube in space.

  • A particular structure has been implemented for solving

this problem with a O(n log3(n)) complexity: Range Trees.

29

slide-30
SLIDE 30

, i c h i I Y - range tree S X - range tree S

Ortogonal range tree

30

slide-31
SLIDE 31

Ortogonal range tree

31

slide-32
SLIDE 32

The implementation

  • The comparison of ONE (1) object protein

with MANY (N) model proteins is accomplished by sorting the votes of the top peaks in the spaces of each of the (N) model proteins.

  • The sorting is carried out in TWO ways:

either the smoothed votes themselves are sorted, or the differences between the two highest peaks in each of the (N) voting spaces are sorted.

32

slide-33
SLIDE 33

First results

33

slide-34
SLIDE 34

Testing on Motif Retrieval

  • The developed algorithm makes a new approach for

protein structural comparison available.

  • The main application of this new approach is to

classify protein structures and to retrieve structural motifs which are common of a given protein function.

  • Indeed, tests were performed on motif retrieval.
  • As an example, a motif (present in the Ubiquitin

Conjugating Enzyme) was found in other proteins which are known to contain it.

  • Further testing will be done with the parallel

implementation of the software.

34

slide-35
SLIDE 35

Much experimentation allowed

  • Computationally, the results might vary

substantially if any of the following parameters are varied:

– The mesh of the voting space (in Ångström) – The mesh of the voting circumference (how many votes in each circumference) – The radius of smoothing – The radius of tolerance for avoiding “false peaks” when detecting peaks – The normalization factor (linear, square root, etc.)

35