Protein functions prediction Swiss Institute of Bioinformatics - - PDF document

protein functions prediction
SMART_READER_LITE
LIVE PREVIEW

Protein functions prediction Swiss Institute of Bioinformatics - - PDF document

Protein functions prediction Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2004.08 Introduction Signal peptides Secondary structure Transmembrane regions Antigenic peptides and topology


slide-1
SLIDE 1

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Protein functions prediction

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Introduction

Signal peptides Transmembrane regions

and topology

PTM (post-translational

modifications)

Low complexity and

biased regions

Repeats Coils Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package

slide-2
SLIDE 2

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Different techniques

Algorithms

Sliding window, Nearest Neighbor Patterns, regular expression Weight matrices HMM, profiles Neural Networks Rules

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Sliding window

THIS ISATESTSEQVENCETHATDISPLAYSTHESL ID INGWINDQ W

Score1 Score2 Scoren

Width or Size=11, Step=5 Results are usually displayed as a graph, see example ->

slide-3
SLIDE 3

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Patterns / regular expression

Pattern: <A-x-[ST](2)-x(0,1)-{V} Regexp: ^A.[ST]{2}.?[^V] Text: The sequence must start with an

alanine, followed by any amino acid, followed by a serine or a threonine, two times, followed by any amino acid or nothing, followed by any amino acid except a valine.

Simply the syntax differ…

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Weight matrices (PSSM)

slide-4
SLIDE 4

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

HMM / profiles

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Neural Networks

General principle: Example:

slide-5
SLIDE 5

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Signals found in proteins

N-ter

exportation - secretion mitochondria chloroplast

internal

NLS (nuclear localization

signal)

C-ter

GPI-anchor (Glycosyl

Phosphatidyl Inositol)

  • ther membrane

anchors (see PTM)

  • ther unknown ?

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Signals detection tools

SignalP MitoProt ChloroP Predotar PSort TargetP Sigcleave (EMBOSS) Phobius Big-PI DGPI

slide-6
SLIDE 6

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Transmembrane regions

Detection (signal peptide, hydropathy, helices) Organisation (topology)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Transmembrane detection tools

TMHMM TMPred TopPred2 DAS HMMTop Tmap (EMBOSS) Mixture of tools

Phobius ConPred II

slide-7
SLIDE 7

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Post translational modifications

Phosphorylation

S - T - Y

N-glycosylation

N

O-glycosylation

S - T - (HO)K

Acetylation, methylation

D - E - K

Sulfation

Y

Farnesylation, myristylation,

palmitoylation, geranylgeranylation, GPI- anchor

C - Nter - Cter

Ubiquitination and family

K - Nter

Inteins (protein splicing) Pre-translational

Selenoprotein

C Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

PTM detection

Pattern prediction

(PROSITE)

Short or weak signal Frequent hit producer Best method is experimental

MS/MS detection

Most method use « rules »

joining pattern detection and knowledge to predict sites.

  • NetOGlyc - Prediction of type O-

glycosylation sites in mammalian proteins

  • DictyOGlyc - Prediction of GlcNAc

O-glycosylation sites in Dictyostelium

  • YinOYang - O-beta-GlcNAc

attachment sites in eukaryotic protein sequences

  • NetPhos - Prediction of Ser, Thr

and Tyr phosphorylation sites in eukaryotic proteins

  • NMT - Prediction of N-terminal N-

myristoylation

  • Sulfinator - Prediction of tyrosine

sulfation sites

slide-8
SLIDE 8

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Low complexity regions

repeats compositional bias PEST

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Low complexity / Repeats

DUST (DNA) / SEG

de novo detection

RepeatMasker (DNA)

search collection

REP

search collection

REPRO, Radar

de novo detection

PEST, PESTFind

de novo detection

EMBOSS (DNA)

einverted equicktandem etandem palindrome

EMBOSS (protein)

  • ddcomp
slide-9
SLIDE 9

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Coils

Helix of helix

coiled-coil

Leu-zipper

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Coils detection

COILS

Weight matrices

Paircoil, Multicoil

Pairwise correlation

Marcoil

HMM

Pepcoil (EMBOSS)

Weight matrices

slide-10
SLIDE 10

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Secondary structure

Structure to predict

Alpha-helices Beta-sheets Turns Random coil Garnier (EMBOSS) PHD DSC PREDATOR NNSSP Jpred Jnet Many others

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Antigenic peptide

Peptides binding to MHC class I

8, 9, 10 mers

class II

15 mers (3+9+3)

Depend highly on MHC type Use of experimental

knowledge

Databases of known

peptides

  • SYFPEITHI
  • HLA_Bind (BIMAS)
  • MAPPP combined expert
  • Antigenic (EMBOSS)
  • Many more
  • Prediction of proteasome

cleavage sites

  • NetChop
  • PaProc
slide-11
SLIDE 11

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Domain / Motif

All the protein domain

descriptors

PROSITE PFAM SMART PRODOM BLOCKS PRINTS TIGRfam …

Federation: InterPro Many techniques

Patterns, Regexp PSSM (PSI-BLAST) Profiles HMM

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Other Tools

You can find some of them on our servers

www.ch.embnet.org

Or on ExPASy server

www.expasy.org/tools

Or ask Google!!

www.google.com

slide-12
SLIDE 12

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

European Molecular Biology Open Software Suite

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

How to use EMBOSS/Jemboss at SIB

slide-13
SLIDE 13

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Free Open Source (for most Unix plateforms) GCG successor (compatible with GCG file format) More than 150 programs (ver. 2.9.0) Easy to install locally

but no interface, requires local databases Unix command-line only

Interfaces

Jemboss, www2gcg, w2h, wemboss… (with account) Pise, EMBOSS-GUI, SRSWWW (no account) Staden, Kaptain, CoLiMate, Jemboss (local)

Access: www.emboss.org or emboss.sourceforge.net

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

  • Format USA
  • '

as is ' : : Sequence [ s ta r t :end : r eve rse ]

  • Forma

t : : ' @' L is tF i l e [ s ta r t :e nd : r eve r se ]

  • Forma

t : :' l i s t ' :L is t F i le[ s ta r t :end : r eve rse ]

  • Forma

t : :Database :En t r y [ s t a r t :end : r eve rse ]

  • Forma

t : :Database - SearchF ie ld : Word [ s ta r t :end : r eve rse ]

  • Forma

t : : F i le: En t r y [ s ta r t :e nd : r eve r se ]

  • Forma

t : : F i le: SearchF ie ld : Word [ s t a r t :end : r eve rse ]

  • Forma

t : :Program Prog ram-pa rame te r s ' | ' [ s t a r t :end : r eve rse ]

  • Example: fas

ta: :Sw isspr

  • t

:UBP5_HU M AN[200 : 300 ]

  • Databases
  • Any can be added, use showdb to display the available databases

Some details

slide-14
SLIDE 14

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

  • showdb

Disp lays i n f

  • r

mat i

  • n
  • n

t he cu r ren t l y avai l ab l e da tabases # Name Type ID Q ry A l l Com men t # ==== = === == === === ======= i p r_ fe t ch P OK OK OK I n te rP ro cur r en t by f e t ch i p i _ f e t ch P OK OK OK IP I cu r ren t by f e t ch r e f seq_ fe t ch P OK OK OK re f seq cur r en t by f e t ch r epbase_ fe t c h P OK OK OK r epbase cu r ren t by f e t ch sw iss_ fe t ch P OK OK OK Sw issPr

  • t

cu r ren t by f e tch sw issp ro t P OK OK OK SWISSPROT sequences t r emb l P OK OK OK TREMBL sequences t r emb l_ fe t ch P OK OK OK t r emb l cur r en t by f e t ch t r emb lnew P OK OK OK TREMBL New sequences ug_ fe t ch P OK OK OK Un igene by f e t ch emb l N OK OK OK EMBL re l e ase emhum N OK OK OK EMBL re l ease , Human sec t i

  • n

by emboss i ndex emrod N OK OK OK EMBL re l e ase , Roden t s ec t i

  • n

by emboss i ndex emvr t N OK OK OK EMBL re l e ase , Ve r t eb rat e ( nonhuman, non roden t )

  • seqret (seqretall, seqretset, seqretsplit)
  • entret (for complete untouched entry, e.g., for unigene, interpro, swissprot…)
  • Possible to define your own « .embossrc » file

databases

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

  • Some tools for DNA
  • redata

Search REBASE for enzyme name, references, suppliers etc

  • remap

Display a sequence with restriction cut sites, translation etc

  • restover

Finds restriction enzymes that produce a specific overhang

  • restrict

Finds restriction enzyme cleavage sites

  • showseq

Display a sequence with features, translation etc

  • silent

Silent mutation restriction enzyme scan

  • cirdna

Draws circular maps of DNA constructs

  • lindna

Draws linear maps of DNA constructs

  • revseq

Reverse and complement a sequence

slide-15
SLIDE 15

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Example: remap

ECLAC E . c

  • l

i l ac tose ope r

  • n

w i t h l ac I ,l a cZ ,l acY and l a cA genes . H in6 I Taq I | Hha I | Bsc4 I | Bsu6 I | | Hin6 I |BssK I | | | Hha I Ac i I | |Bs iS I \ \ \\ \ \\\ GACACCATCGAATGGC GCAAAACCTTTCGCG G TATGGCATGATAGCGCCCG GAAGAGAGT 10 20 30 40 50 60

  • :
  • |
  • :
  • |
  • :
  • |
  • :
  • |
  • :
  • |
  • :
  • |

CTGTGGTAGCTTACCGC GTTTTGGAAAGC GCCATACCGTACTATCGC G G GCCTTCTCTCA / / / / / / / / / / |Taq I | H in6 I Ac i I | | | | BssK I Bsc4 I Hha I | | |Bs iS I | | Bsu6 I | H in6 I Hha I # Enzymes that cu t F requency I sosch i zomers Ac i I 1 Bsc4 I 1 Bs iS I 1 BssK I 1 Bsu6 I 1 Hha I 2 H in6 I 2 HinP1 I ,HspA I Taq I 1 # Enzymes thatdo no t cu t Ac l I BamHI BceA I Bse1 I Bsh I Cla I EcoRI EcoR I I Hin4 I H ind I I H ind I I I HpyCH4IV Kpn I No t I

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Example: cirdna

  • File: ../../data/data.cirp

Sta r t 1001 End 4270 group l abe l Block 1011 1 362 3 ex1 end labe l l abe l T ick 1610 8 EcoR1 end labe l l abe l Block 1647 1 815 1 end labe l l abe l T ick 2459 8 BamH1 end labe l l abe l Block 4139 4 258 3 ex2 end labe l endgroup group l abe l Range 2541 2812 [ ] 5 Alu end labe l l abe l Range 3322 3497 > < 5 M ER13 end labe l endgroup

slide-16
SLIDE 16

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Example: plotorf

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

EMBOSS format input/output

UFO Universal Feature Object

gff, swissprot, embl, pir, nbrf (with or without sequence)

Alignments

Multiple and pairwise, many flavors (FASTA, MSF, SRS…)

Reports

Feature (UFO), SRS, motif, seqtable, excel, diffseq, listfile (USA),

etc…

Sequences (compatible with USA)

Many!!! E.g., fasta, clustal, gcg, paup, gff, embl, swissprot, acedb,

abi, etc…

slide-17
SLIDE 17

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Web interfaces

PISE (Pasteur Institute Software Environment)

http://www-alt.pasteur.fr/~letondal/Pise/

wEMBOSS (Belgium&Argentina) (not yet at

SIB)

http://www.wemboss.org

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Pise a tool to generate Web interfaces for Molecular Biology programs

http://emboss.ch.embnet.org/Pise

slide-18
SLIDE 18

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

http://www.wemboss.org

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

slide-19
SLIDE 19

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Launch Jemboss

http://emboss.ch.embnet.org/Jemboss

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Launch Jemboss

First time only… Each time…

slide-20
SLIDE 20

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Jemboss windows

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Jemboss windows other systems

slide-21
SLIDE 21

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Summary

Anonymous web access through Pise Registered access through Jemboss Registered access through command-line

(requires UNIX skills)

Please report problems!

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique

LF-2004.08

Exercises

DEA Exercises web based sequence analysis

  • The goal of this exercise is to use web based tools for protein sequence analysis
  • a) Take this TrEMBL sequence (Q9X252) and try a BLAST against swissprot with the complete protein or

with the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you.

  • b) Pass this sequence through PFSCAN and search all databases. Compare with this command on

ludwig-sun1/2: hits -b "prf pat pfam" tr:Q9X252

  • c) use the different profile, motifs, pattern databases to get more information about the domain(s) you

found.

  • d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (Q9WZH0)?
  • List of useful links:
  • basic BLAST or advanced BLAST or PSI-BLAST
  • TMPred prediction tool for transmembrane regions (or TMHMM)
  • COILS prediction tool for coiled-coil regions
  • SignalP prediction tool for signal-peptide cleavage site
  • Profile, domain, motifs databases and search sites:
  • PFSCAN
  • InterPro (Pfam, PRINTS, PROSITE, SMART)
  • HITS