Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies - - PowerPoint PPT Presentation

practical bioinformatics
SMART_READER_LITE
LIVE PREVIEW

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies - - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies Practical Bioinformatics Review Documentation A program should communicate its intent to both the computer and human programmers. Comments Docstrings Mark Voorhies


slide-1
SLIDE 1

Practical Bioinformatics

Mark Voorhies 5/19/2015

Mark Voorhies Practical Bioinformatics

slide-2
SLIDE 2

Review – Documentation

A program should communicate its intent to both the computer and human programmers. Comments Docstrings

Mark Voorhies Practical Bioinformatics

slide-3
SLIDE 3

Review – Documentation

A program should communicate its intent to both the computer and human programmers. Comments Docstrings Code and inputs defining a complete protocol

Mark Voorhies Practical Bioinformatics

slide-4
SLIDE 4

Review – Documentation

A program should communicate its intent to both the computer and human programmers. Comments Docstrings Code and inputs defining a complete protocol Positive and negative controls

Mark Voorhies Practical Bioinformatics

slide-5
SLIDE 5

Review – “Top Down” design

Experiment in the shell Factor working code into functions and modules Refine from problem-specific to generally applicable functions

Mark Voorhies Practical Bioinformatics

slide-6
SLIDE 6

Review – “Top Down” design

Experiment in the shell Factor working code into functions and modules Refine from problem-specific to generally applicable functions “As simple as possible, but no simpler”

Mark Voorhies Practical Bioinformatics

slide-7
SLIDE 7

Dictionaries

d i c t i o n a r y = {”A” : ”T” , ”T” : ”A” , ”G” : ”C” , ”C” : ”G”} d i c t i o n a r y [ ”G” ] d i c t i o n a r y [ ”N” ] = ”N” d i c t i o n a r y . has key ( ”C” )

Mark Voorhies Practical Bioinformatics

slide-8
SLIDE 8

Dictionaries

geneticCode = {”TTT” : ”F” , ”TTC” : ”F” , ”TTA” : ”L” , ”TTG” : ”L” , ”CTT” : ”L” , ”CTC” : ”L” , ”CTA” : ”L” , ”CTG” : ”L” , ”ATT” : ” I ” , ”ATC” : ” I ” , ”ATA” : ” I ” , ”ATG” : ”M” , ”GTT” : ”V” , ”GTC” : ”V” , ”GTA” : ”V” , ”GTG” : ”V” , ”TCT” : ”S” , ”TCC” : ”S” , ”TCA” : ”S” , ”TCG” : ”S” , ”CCT” : ”P” , ”CCC” : ”P” , ”CCA” : ”P” , ”CCG” : ”P” , ”ACT” : ”T” , ”ACC” : ”T” , ”ACA” : ”T” , ”ACG” : ”T” , ”GCT” : ”A” , ”GCC” : ”A” , ”GCA” : ”A” , ”GCG” : ”A” , ”TAT” : ”Y” , ”TAC” : ”Y” , ”TAA” : ”∗” , ”TAG” : ”∗” , ”CAT” : ”H” , ”CAC” : ”H” , ”CAA” : ”Q” , ”CAG” : ”Q” , ”AAT” : ”N” , ”AAC” : ”N” , ”AAA” : ”K” , ”AAG” : ”K” , ”GAT” : ”D” , ”GAC” : ”D” , ”GAA” : ”E” , ”GAG” : ”E” , ”TGT” : ”C” , ”TGC” : ”C” , ”TGA” : ”∗” , ”TGG” : ”W” , ”CGT” : ”R” , ”CGC” : ”R” , ”CGA” : ”R” , ”CGG” : ”R” , ”AGT” : ”S” , ”AGC” : ”S” , ”AGA” : ”R” , ”AGG” : ”R” , ”GGT” : ”G” , ”GGC” : ”G” , ”GGA” : ”G” , ”GGG” : ”G”} Mark Voorhies Practical Bioinformatics

slide-9
SLIDE 9

Whiteboard Image

Mark Voorhies Practical Bioinformatics

slide-10
SLIDE 10

Exercise: Transforming sequences

1 Write a function to return the antisense strand of a DNA

sequence in 3’→5’ orientation.

2 Write a function to return the complement of a DNA

sequence in 5’→3’ orientation.

3 Write a function to translate a DNA sequence Mark Voorhies Practical Bioinformatics

slide-11
SLIDE 11

Why compare sequences?

Mark Voorhies Practical Bioinformatics

slide-12
SLIDE 12

Why compare sequences?

To find genes with a common ancestor To infer conserved molecular mechanism and biological function To find short functional motifs To find repetitive elements within a sequence To predict cross-hybridizing sequences (e.g., in microarray design) To find genomic origin of imperfectly sequenced fragments (e.g., in deep sequencing experiments) To predict nucleotide secondary structure

Mark Voorhies Practical Bioinformatics

slide-13
SLIDE 13

Whiteboard Image

Mark Voorhies Practical Bioinformatics

slide-14
SLIDE 14

Nomenclature

Homologs heritable elements with a common evolutionary

  • rigin.

Mark Voorhies Practical Bioinformatics

slide-15
SLIDE 15

Nomenclature

Homologs heritable elements with a common evolutionary

  • rigin.

Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome.

Mark Voorhies Practical Bioinformatics

slide-16
SLIDE 16

Nomenclature

Homologs heritable elements with a common evolutionary

  • rigin.

Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome. Xenologs homologs arising from horizontal transfer. Onologs homologs arising from whole genome duplication.

Mark Voorhies Practical Bioinformatics

slide-17
SLIDE 17

Dotplots

1

Unbiased view of all ungapped alignments of two sequences

Mark Voorhies Practical Bioinformatics

slide-18
SLIDE 18

Dotplots

1

Unbiased view of all ungapped alignments of two sequences

2

Noise can be filtered by applying a smoothing window to the diagonals.

Mark Voorhies Practical Bioinformatics

slide-19
SLIDE 19

Types of alignments

Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch) Local Alignment An optimal pair of subsequences is taken from the two sequences and globally aligned (e.g., Smith-Waterman)

Mark Voorhies Practical Bioinformatics

slide-20
SLIDE 20

Exercise: Scoring an ungapped alignment

s ={”A” :{ ”A” : 1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : −1.0} , ”T” :{ ”A” : −1.0 , ”T” : 1.0 , ”G” : −1.0 , ”C” : −1.0} , ”G” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : 1.0 , ”C” : −1.0} , ”C” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : 1.0}}

Mark Voorhies Practical Bioinformatics

slide-21
SLIDE 21

Exercise: Scoring an ungapped alignment

s ={”A” :{ ”A” : 1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : −1.0} , ”T” :{ ”A” : −1.0 , ”T” : 1.0 , ”G” : −1.0 , ”C” : −1.0} , ”G” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : 1.0 , ”C” : −1.0} , ”C” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : 1.0}} S(x, y) =

N

  • i

s(xi, yi)

Mark Voorhies Practical Bioinformatics

slide-22
SLIDE 22

Exercise: Scoring an ungapped alignment

s ={”A” :{ ”A” : 1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : −1.0} , ”T” :{ ”A” : −1.0 , ”T” : 1.0 , ”G” : −1.0 , ”C” : −1.0} , ”G” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : 1.0 , ”C” : −1.0} , ”C” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : 1.0}} S(x, y) =

N

  • i

s(xi, yi)

1 Given two equal length sequences and a scoring matrix, return

the alignment score for a full length, ungapped alignment.

Mark Voorhies Practical Bioinformatics

slide-23
SLIDE 23

Exercise: Scoring an ungapped alignment

s ={”A” :{ ”A” : 1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : −1.0} , ”T” :{ ”A” : −1.0 , ”T” : 1.0 , ”G” : −1.0 , ”C” : −1.0} , ”G” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : 1.0 , ”C” : −1.0} , ”C” :{ ”A” : −1.0 , ”T” : −1.0 , ”G” : −1.0 , ”C” : 1.0}} S(x, y) =

N

  • i

s(xi, yi)

1 Given two equal length sequences and a scoring matrix, return

the alignment score for a full length, ungapped alignment.

2 Given two sequences and a scoring matrix, find the offset that

yields the best scoring ungapped alignment.

Mark Voorhies Practical Bioinformatics

slide-24
SLIDE 24

Whiteboard Image

Mark Voorhies Practical Bioinformatics

slide-25
SLIDE 25

Exercise: Scoring a gapped alignment

1 Given two equal length gapped sequences (where “-”

represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap.

Mark Voorhies Practical Bioinformatics

slide-26
SLIDE 26

Exercise: Scoring a gapped alignment

1 Given two equal length gapped sequences (where “-”

represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap.

2 Write a new scoring function with separate penalties for

  • pening a zero length gap (e.g., G = -11) and extending an
  • pen gap by one base (e.g., E = -1).

Sgapped(x, y) = S(x, y) +

gaps

  • i

(G + E ∗ len(i))

Mark Voorhies Practical Bioinformatics