Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies - - PowerPoint PPT Presentation

practical bioinformatics
SMART_READER_LITE
LIVE PREVIEW

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies - - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/6/2017 Mark Voorhies Practical Bioinformatics Loading and re-loading your functions # Use import the f i r s t time you load a module # (And keep using import u n t i l i t loads # s u c c


slide-1
SLIDE 1

Practical Bioinformatics

Mark Voorhies 4/6/2017

Mark Voorhies Practical Bioinformatics

slide-2
SLIDE 2

Loading and re-loading your functions

# Use import the f i r s t time you load a module # (And keep using import u n t i l i t loads # s u c c e s s f u l l y ) import my module my module . my function (42) # Once a module has been loaded , use r e l o a d to # f o r c e python to read your new code from i m p o r t l i b import r e l o a d r e l o a d ( my module )

Mark Voorhies Practical Bioinformatics

slide-3
SLIDE 3

Pearson distances

Pearson similarity s(x, y) = N

i (xi − xoffset)(yi − yoffset)

N

i (xi − xoffset)2

N

i (yi − yoffset)2

Mark Voorhies Practical Bioinformatics

slide-4
SLIDE 4

Pearson distances

Pearson similarity s(x, y) = N

i (xi − xoffset)(yi − yoffset)

N

i (xi − xoffset)2

N

i (yi − yoffset)2

Pearson distance d(x, y) = 1 − s(x, y)

Mark Voorhies Practical Bioinformatics

slide-5
SLIDE 5

Pearson distances

Pearson similarity s(x, y) = N

i (xi − xoffset)(yi − yoffset)

N

i (xi − xoffset)2

N

i (yi − yoffset)2

Pearson distance d(x, y) = 1 − s(x, y) Euclidean distance N

i (xi − yi)2

N

Mark Voorhies Practical Bioinformatics

slide-6
SLIDE 6

Comparing all measurements for two genes

  • −5

5 −5 5

Comparing two expression profiles (r = 0.97)

TLC1 log2 relative expression YFG1 log2 relative expression

Mark Voorhies Practical Bioinformatics

slide-7
SLIDE 7

Comparing all genes for two measurements

  • −10

−5 5 10 −10 −5 5 Array 1, log2 relative expression Array 2, log2 relative expression

  • Mark Voorhies

Practical Bioinformatics

slide-8
SLIDE 8

Comparing all genes for two measurements

  • −10

−5 5 10 −10 −5 5

Euclidean Distance

Array 1, log2 relative expression Array 2, log2 relative expression

  • Mark Voorhies

Practical Bioinformatics

slide-9
SLIDE 9

Comparing all genes for two measurements

  • −10

−5 5 10 −10 −5 5

Uncentered Pearson

Array 1, log2 relative expression Array 2, log2 relative expression

  • Mark Voorhies

Practical Bioinformatics

slide-10
SLIDE 10

Measure all pairwise distances under distance metric

Mark Voorhies Practical Bioinformatics

slide-11
SLIDE 11

Hierarchical Clustering

Mark Voorhies Practical Bioinformatics

slide-12
SLIDE 12

Hierarchical Clustering

Mark Voorhies Practical Bioinformatics

slide-13
SLIDE 13

Hierarchical Clustering

Mark Voorhies Practical Bioinformatics

slide-14
SLIDE 14

Hierarchical Clustering

Mark Voorhies Practical Bioinformatics

slide-15
SLIDE 15

Hierarchical Clustering

Mark Voorhies Practical Bioinformatics

slide-16
SLIDE 16

It’s hard work at times, but you have to be realistic. If you have a large database with many variables and your goal is to get a good understanding of the interrelationships, then, unless you get lucky, this complex structure is bound to require some hard work to understand. Bill Cleveland and Rick Becker http://stat.bell-labs.com/project/trellis/interview.html

Mark Voorhies Practical Bioinformatics

slide-17
SLIDE 17

Using JavaTreeView

Mark Voorhies Practical Bioinformatics

slide-18
SLIDE 18

Adjust pixel settings for global view

Mark Voorhies Practical Bioinformatics

slide-19
SLIDE 19

Adjust pixel settings for global view

Mark Voorhies Practical Bioinformatics

slide-20
SLIDE 20

Select annotation columns

Mark Voorhies Practical Bioinformatics

slide-21
SLIDE 21

Select annotation columns

Mark Voorhies Practical Bioinformatics

slide-22
SLIDE 22

Select URL for gene annotations

Mark Voorhies Practical Bioinformatics

slide-23
SLIDE 23

Select URL for gene annotations

Mark Voorhies Practical Bioinformatics

slide-24
SLIDE 24

Activate and detach annotation window

Mark Voorhies Practical Bioinformatics

slide-25
SLIDE 25

Activate and detach annotation window

Mark Voorhies Practical Bioinformatics

slide-26
SLIDE 26

Activate and detach annotation window

Mark Voorhies Practical Bioinformatics

slide-27
SLIDE 27

Clustering exercises – Negative controls

Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays).

Mark Voorhies Practical Bioinformatics

slide-28
SLIDE 28

Clustering exercises – Negative controls

Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays).

def s h u f f l e G e n e s ( s e l f , seed = None ) : ””” S h u f f l e e x p r e s s i o n matrix by row . ””” import random i f ( seed != None ) : random . seed ( seed ) i n d i c e s = range ( l e n ( s e l f . genes )) random . s h u f f l e ( i n d i c e s ) genes = [ s e l f . geneName [ i ] f o r i i n i n d i c e s ] s e l f . geneName = genes a n n o t a t i o n s = [ s e l f . geneAnn [ i ] f o r i i n i n d i c e s ] s e l f . geneAnn = genes num = [ s e l f . num [ i ] f o r i i n i n d i c e s ] s e l f . num = num Mark Voorhies Practical Bioinformatics

slide-29
SLIDE 29

Clustering exercises – Negative controls

Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays).

Mark Voorhies Practical Bioinformatics

slide-30
SLIDE 30

Clustering exercises – Negative controls

Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays).

def shuffleRows ( s e l f , seed = None ) : ”””Permute r a t i o v a l u e s w i t h i n rows . ””” import random i f ( seed != None ) : random . seed ( seed ) f o r i i n s e l f . num : random . s h u f f l e ( i ) Mark Voorhies Practical Bioinformatics

slide-31
SLIDE 31

Clustering exercises – Negative controls

Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays).

def shuffleRows ( s e l f , seed = None ) : ”””Permute r a t i o v a l u e s w i t h i n rows . ””” import random i f ( seed != None ) : random . seed ( seed ) f o r i i n s e l f . num : random . s h u f f l e ( i ) def s h u f f l e C o l s ( s e l f , seed = None ) : ”””Permute r a t i o v a l u e s w i t h i n columns . ””” import random i f ( seed != None ) : random . seed ( seed ) # Transpose the e x p r e s s i o n matrix c o l s = [ ] f o r c o l i n xrange ( l e n ( s e l f . num [ 0 ] ) ) : c o l s . append ( [ row [ c o l ] f o r row i n s e l f . num ] ) # S h u f f l e f o r i i n c o l s : random . s h u f f l e ( i ) # Transpose back to

  • r i g i n a l
  • r i e n t a t i o n

s e l f . num = [ ] f o r row i n xrange ( l e n ( c o l s ) ) : s e l f . num . append ( [ c o l [ row ] f o r c o l i n row ] ) Mark Voorhies Practical Bioinformatics

slide-32
SLIDE 32

Homework

1 Explore different clustering methods and/or distance methods 2 Try additional shufflings of the data: how do they affect your

ability to cluster the data? C.f. figure 3 the Eisen paper

Permute the columns Independently permute the columns of each row

Mark Voorhies Practical Bioinformatics