Blast summary Blast summary Basic ideas: Basic ideas: Alignment - - PowerPoint PPT Presentation

blast summary blast summary
SMART_READER_LITE
LIVE PREVIEW

Blast summary Blast summary Basic ideas: Basic ideas: Alignment - - PowerPoint PPT Presentation

Blast summary Blast summary Basic ideas: Basic ideas: Alignment (global/local/affine gaps) Alignment (global/local/affine gaps) scoring matrices, (DNA/AA(PAM, Blosum62)), scoring matrices, (DNA/AA(PAM, Blosum62)), position


slide-1
SLIDE 1

Blast summary Blast summary

ß ß Basic ideas:

Basic ideas:

ß ß Alignment (global/local/affine gaps) Alignment (global/local/affine gaps) ß ß scoring matrices, (DNA/AA(PAM, Blosum62)), scoring matrices, (DNA/AA(PAM, Blosum62)), position specific (Later in the course) position specific (Later in the course) ß ß p-value p-value ß ß Seed selection, algorithms for keyword search Seed selection, algorithms for keyword search

ß ß Flavors:

Flavors: blastn blastn, , blastx blastx, , tblastn tblastn… …

ß ß Other variants:

Other variants: psi psi-blast.. (later in the course)

  • blast.. (later in the course)
slide-2
SLIDE 2

query: genomic sequence Subject: aa seq Predicted cDNA

Why does it not match the subject perfectly?

exons 3’ UTR

Assignment 2 schematic Assignment 2 schematic

slide-3
SLIDE 3
slide-4
SLIDE 4

Blast summary Blast summary

ß ß Basic ideas:

Basic ideas:

ß ß Alignment (global/local/affine gaps) Alignment (global/local/affine gaps) ß ß scoring matrices, (DNA/AA(PAM, Blosum62)), scoring matrices, (DNA/AA(PAM, Blosum62)), position specific (Later in the course) position specific (Later in the course) ß ß p-value p-value ß ß Seed selection, algorithms for keyword search Seed selection, algorithms for keyword search

ß ß Flavors:

Flavors: blastn blastn, , blastx blastx, , tblastn tblastn… …

ß ß Other variants:

Other variants: psi psi-blast.. (later in the course)

  • blast.. (later in the course)
slide-5
SLIDE 5

Proteins Proteins

slide-6
SLIDE 6

CS view of a protein CS view of a protein

  • >sp|P00974|BPT1_BOVIN Pancreatic

>sp|P00974|BPT1_BOVIN Pancreatic trypsin trypsin inhibitor precursor (Basic protease inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) ( inhibitor) (BPI) (BPTI) (Aprotinin Aprotinin) - ) - Bos Bos taurus taurus (Bovine). (Bovine).

  • MKMSRLCLSVALLVLLGTLAASTPGCDT

MKMSRLCLSVALLVLLGTLAASTPGCDT SNQAKAQRPDFCLEPPYTGPCKARIIRY SNQAKAQRPDFCLEPPYTGPCKARIIRY FYNAKAGLCQTFVYGGCRAKRNNFKSA FYNAKAGLCQTFVYGGCRAKRNNFKSA EDCMRTCGGAIGPWENL EDCMRTCGGAIGPWENL

slide-7
SLIDE 7

Protein structure basics Protein structure basics

slide-8
SLIDE 8

Bond angles form structural Bond angles form structural constraints constraints

slide-9
SLIDE 9

Alpha-helix Alpha-helix

ß ß 3.6 residues per

3.6 residues per turn turn

ß ß H-bonds between

H-bonds between 1st and 4th residue 1st and 4th residue stabilize the stabilize the structure. structure.

ß ß First discovered by

First discovered by Linus Pauling Linus Pauling

slide-10
SLIDE 10

Beta-sheet Beta-sheet

ß ß

Each strand by itself has 2 residues per turn, and is not stable. Each strand by itself has 2 residues per turn, and is not stable.

ß ß

Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel. Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel.

ß ß

Beta sheets have long range interactions that stabilize the structure, while alpha- Beta sheets have long range interactions that stabilize the structure, while alpha- helices have local interactions. helices have local interactions.

slide-11
SLIDE 11

Domains Domains

ß ß The basic structures (helix, strand, loop)

The basic structures (helix, strand, loop) combine to form complex 3D structures. combine to form complex 3D structures.

ß ß Certain combinations are popular. Many

Certain combinations are popular. Many sequences, but only a few folds sequences, but only a few folds

slide-12
SLIDE 12

3D structure 3D structure

  • Predicting tertiary structure is an

important problem in Bioinformatics.

  • Premise: Clues to structure can be

found in the sequence.

  • While de novo tertiary structure

prediction is hard, there are many intermediate, and tractable goals.

slide-13
SLIDE 13

Protein Domains Protein Domains

ß ß An important realization (in the last

An important realization (in the last decade) is that proteins have a modular decade) is that proteins have a modular architecture of domains/folds. architecture of domains/folds.

ß ß Example: The zinc finger domain is a

Example: The zinc finger domain is a DNA-binding domain. DNA-binding domain.

slide-14
SLIDE 14

Zinc Finger domain Zinc Finger domain

slide-15
SLIDE 15

Proteins containing Proteins containing zf zf domains domains

How can we find a motif corresponding to a zf domain

slide-16
SLIDE 16

The sequence analysis perspective The sequence analysis perspective

ß Zinc Finger motif

ß #-X-C-X(1-5)-C-X3-#-X5-#-X2-H-X(3-6)-[H/C] ß 2 conserved C, and 2 conserved H

ß How can we search a database using these motifs? ß The ‘regular expression’ motif is weak. How can we

make it stronger