Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies - - PowerPoint PPT Presentation

practical bioinformatics
SMART_READER_LITE
LIVE PREVIEW

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies - - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics JavaTreeView link-out for ENSEMBL Mouse http://www.ensembl.org/Mus musculus/Gene/Summary?g=HEADER Mark Voorhies Practical Bioinformatics Science! Mark


slide-1
SLIDE 1

Practical Bioinformatics

Mark Voorhies 4/16/2018

Mark Voorhies Practical Bioinformatics

slide-2
SLIDE 2

JavaTreeView link-out for ENSEMBL Mouse

http://www.ensembl.org/Mus musculus/Gene/Summary?g=HEADER

Mark Voorhies Practical Bioinformatics

slide-3
SLIDE 3

Science!

Mark Voorhies Practical Bioinformatics

slide-4
SLIDE 4

Example Pipeline: Overview

Mark Voorhies Practical Bioinformatics

slide-5
SLIDE 5

Example Pipeline: Overview

Generate Samples Transfer & Archival Pre-process Genome Coverage Transcriptome Profile Differential Expression Annotation/ Analysis Paper

~2.5-4 years (publish)

Mark Voorhies Practical Bioinformatics

slide-6
SLIDE 6

Example Pipeline: Overview

Generate Samples Transfer & Archival Pre-process Genome Coverage Transcriptome Profile Differential Expression Annotation/ Analysis Paper

~2.5-4 years (publish) ~1 day

Mark Voorhies Practical Bioinformatics

slide-7
SLIDE 7

Example Pipeline: Overview

Generate Samples Transfer & Archival Pre-process Genome Coverage Transcriptome Profile Differential Expression Annotation/ Analysis Paper

~2.5-4 years (publish) ~1 day Follow-up Experiments

Mark Voorhies Practical Bioinformatics

slide-8
SLIDE 8

Example Pipeline: Overview

Generate Samples Transfer & Archival Pre-process Genome Coverage Transcriptome Profile Differential Expression Annotation/ Analysis Paper

~2.5-4 years (publish) ~1 day Follow-up Experiments

Mark Voorhies Practical Bioinformatics

slide-9
SLIDE 9

Example Pipeline: Overview

Mark Voorhies Practical Bioinformatics

slide-10
SLIDE 10

Example Pipeline: Details

Mark Voorhies Practical Bioinformatics

slide-11
SLIDE 11

GSE88801 Pipelines

Mark Voorhies Practical Bioinformatics

slide-12
SLIDE 12

EM: Expectation Maximization

L

  • m

i =m i–1

(i −1)c ic −1 ) ∝ λL · ρ · ωp|−,L ·φ − |

−,L

P (

L

m

i P (−) P p (−) P(−)

∝ α ∝ λ Online EM algorithm Update parameters Constrain estimated counts Output

Relative abundances Estimated counts

Augmented alignment file

Effective counts

Get next read pair Update masses Input Capture target sequences Fragment and sequence Align to target references Calculate assignment probabilities

Error probabilities

A A C G T C G T +

Bias Targets

p,

Roberts and Pachter, Nature Methods 10:71

Mark Voorhies Practical Bioinformatics

slide-13
SLIDE 13

Abundance estimation with kallisto

export transcriptome=”GRCm38 all mRNA” while read i ; do export jobname=”${ i }. ${ transcriptome }. f r ” k a l l i s t o quant −i ”${ transcriptome }. idx ” \ −t 4 −−s i n g l e −−fr −stranded −l 250 −s 50 −o ”${jobname}” ”${ i } 1 . f a s t q . gz” \ > ”${jobname }. log ” \ 2> ”${jobname }. e r r ” done < sample names . t x t

Mark Voorhies Practical Bioinformatics

slide-14
SLIDE 14

Linear Least Squares

Mark Voorhies Practical Bioinformatics

slide-15
SLIDE 15

Linear Least Squares

bi = yi σi

Mark Voorhies Practical Bioinformatics

slide-16
SLIDE 16

Linear Least Squares

Aij = fj(xi) σi

Mark Voorhies Practical Bioinformatics

slide-17
SLIDE 17

Linear Least Squares

χ2 = |A · a − b|2

Mark Voorhies Practical Bioinformatics

slide-18
SLIDE 18

Linear Least Squares

a =

M

  • i

Ui · b si

  • Vi

Mark Voorhies Practical Bioinformatics

slide-19
SLIDE 19

Multiple Hypothesis Testing

http://xkcd.com/882/ Mark Voorhies Practical Bioinformatics

slide-20
SLIDE 20

Final Homework

Implement Needleman-Wunsch global alignment with zero gap

  • pening penalties. Try attacking the problem in this order:

1 Initialize and fill in a dynamic programming matrix by hand

(e.g., try reproducing the example from my slides on paper).

2 Write a function to create the dynamic programming matrix

and initialize the first row and column.

3 Write a function to fill in the rest of the matrix 4 Rewrite the initialize and fill steps to store pointers to the

best sub-solution for each cell.

5 Write a backtrace function to read the optimal alignment

from the filled in matrix. If that isn’t enough to keep you occupied, try implementing Smith-Waterman local alignment and/or non-zero gap opening penalties.

Mark Voorhies Practical Bioinformatics