Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Gotchas Indentation matters Mark Voorhies Practical Bioinformatics

Clustering exercises – Visualizing the distance matrix Mark Voorhies Practical Bioinformatics

Loading and re-loading your functions # Use import the f i r s t time you load a module # (And keep using import u n t i l i t loads # s u c c e s s f u l l y ) import my module my module . my function (42) # Once a module has been loaded , use r e l o a d to # f o r c e python to read your new code reload ( my module ) Mark Voorhies Practical Bioinformatics

Setting Canopy’s working/import directory OS X Open a terminal cd path/to/working/directory env PYTHONPATH=”$PYTHONPATH:$PWD” canopy Windows (or OS X) Start canopy %cd path/to/working/directory import sys, os sys.path.append(os.getcwd()) Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity N s ( x , y ) = 1 � x i − x offset � � y i − y offset � � N φ x φ y i � N � ( G i − G offset ) 2 � � φ G = � N i Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity N � x i − x offset � � y i − y offset � � s ( x , y ) = φ x φ y i � N � � � ( G i − G offset ) 2 φ G = � i Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity     N x i − x offset y i − y offset � s ( x , y ) =     �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 i Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Pearson distance d uncentered ( x , y ) = 1 − s ( x , y ) Mark Voorhies Practical Bioinformatics

Pearson distances Pearson similarity � N i ( x i − x offset )( y i − y offset ) s ( x , y ) = �� N �� N i ( x i − x offset ) 2 i ( y i − y offset ) 2 Pearson distance d uncentered ( x , y ) = 1 − s ( x , y ) Euclidean distance � N i ( x i − y i ) 2 N Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). s h u f f l e G e n e s ( s e l f , seed = None ) : def ””” S h u f f l e e x p r e s s i o n matrix by row . ””” random import i f ( seed != None ) : random . seed ( seed ) i n d i c e s = range ( len ( s e l f . genes ) ) random . s h u f f l e ( i n d i c e s ) genes = [ s e l f . geneName [ i ] f o r i i n i n d i c e s ] s e l f . geneName = genes a n n o t a t i o n s = [ s e l f . geneAnn [ i ] f o r i i n i n d i c e s ] s e l f . geneAnn = genes num = [ s e l f . num [ i ] f o r i i n i n d i c e s ] s e l f . num = num Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). def shuffleRows ( s e l f , seed = None ) : ””” Permute r a t i o v a l u e s w i t h i n rows . ””” import random i f ( seed != None ) : random . seed ( seed ) i s e l f . num : f o r i n random . s h u f f l e ( i ) Mark Voorhies Practical Bioinformatics

Clustering exercises – Negative controls Write functions to reproduce the shuffling controls in figure 3 of the Eisen paper (removing correlations among genes and/or arrays). def shuffleRows ( s e l f , seed = None ) : ””” Permute r a t i o v a l u e s w i t h i n rows . ””” import random i f ( seed != None ) : random . seed ( seed ) i s e l f . num : f o r i n random . s h u f f l e ( i ) s h u f f l e C o l s ( s e l f , seed = None ) : def ””” Permute r a t i o v a l u e s w i t h i n columns . ””” random import i f ( seed != None ) : random . seed ( seed ) # Transpose the e x p r e s s i o n matrix c o l s = [ ] f o r c o l i n xrange ( len ( s e l f . num [ 0 ] ) ) : c o l s . append ( [ row [ c o l ] f o r row i n s e l f . num ] ) # S h u f f l e f o r i i n c o l s : random . s h u f f l e ( i ) # Transpose back to o r i g i n a l o r i e n t a t i o n s e l f . num = [ ] f o r row i n xrange ( len ( c o l s ) ) : s e l f . num . append ( [ c o l [ row ] f o r c o l i n row ] ) Mark Voorhies Practical Bioinformatics

Comparing all measurements for two genes Comparing two expression profiles (r = 0.97) ● ● 5 ● ● ● YFG1 log2 relative expression ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● −5 0 5 TLC1 log2 relative expression Mark Voorhies Practical Bioinformatics

Comparing all genes for two measurements ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Practical Bioinformatics

Comparing all genes for two measurements Euclidean Distance ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Practical Bioinformatics

Comparing all genes for two measurements Uncentered Pearson ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● Array 2, log2 relative expression ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 ● ● −10 −5 0 5 10 Array 1, log2 relative expression Mark Voorhies Practical Bioinformatics

Measure all pairwise distances under distance metric Mark Voorhies Practical Bioinformatics

Hierarchical Clustering Mark Voorhies Practical Bioinformatics

Scripting Cluster Running Cluster3 from the command line /Applications/Cluster.app/Contents/MacOS/Cluster /Program Files/Stanford University/Cluster3/Cluster.com Command-line programs are like functions “man program” is like “help(function)” Use the subprocess module to run command-line programs from within Python. Mark Voorhies Practical Bioinformatics

Programs as functions USAGE: cluster [options] -f filename File loading -u jobname Allows you to specify a different name for the output files (default is derived from the input file name) -g [0..8] Specifies the distance measure for gene clustering 0: No gene clustering 1: Uncentered correlation 2: Pearson correlation 3: Uncentered correlation, absolute value 4: Pearson correlation, absolute value 5: Spearman’s rank correlation 6: Kendall’s tau 7: Euclidean distance 8: City-block distance (default: 0) -m [msca] Specifies which hierarchical clustering method to use m: Pairwise complete-linkage s: Pairwise single-linkage c: Pairwise centroid-linkage a: Pairwise average-linkage (default: m) Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics Gotchas Indentation matters Mark Voorhies Practical Bioinformatics Clustering exercises Visualizing the distance matrix Mark Voorhies Practical

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies Practical Bioinformatics Review

ELF linking: what it means and why it matters Stephen Kell stephen.kell@cl.cam.ac.uk joint work

Mobile Networks 2015.08.08 10:30 2015.08.08 10:48 2015.08.08 11:01 2015.08.08 11:29

Linking 15-213: Introduc0on to Computer Systems 11 th Lecture,

ARM Cortex-M4 Programming Model Stacks and Subroutines Textbook: Chapter 8.1 - Subroutine

Stratification and intergenerational Mobility in Africa - Examining Linkages with Pre-colonial

October October October October 27 27 27-28, 28, 28, 28, 2014 2014 2014 2014 HHS,

diameter, radius, discrete radius D : M M R distance function, S M , | S | <

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics Gotchas Indentation matters Mark Voorhies Practical Bioinformatics Clustering exercises Visualizing the distance matrix Mark Voorhies Practical

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/19/2015 Mark Voorhies Practical Bioinformatics Review

ELF linking: what it means and why it matters Stephen Kell stephen.kell@cl.cam.ac.uk joint work

Mobile Networks 2015.08.08 10:30 2015.08.08 10:48 2015.08.08 11:01 2015.08.08 11:29

Linking 15-213: Introduc0on to Computer Systems 11 th Lecture,

ARM Cortex-M4 Programming Model Stacks and Subroutines Textbook: Chapter 8.1 - Subroutine

Stratification and intergenerational Mobility in Africa - Examining Linkages with Pre-colonial

October October October October 27 27 27-28, 28, 28, 28, 2014 2014 2014 2014 HHS,

diameter, radius, discrete radius D : M M R distance function, S M , | S | &lt;

Clustering methods R.W. Oldford Interactive data visualization An important advantage of data

diameter, radius, discrete radius D : M M R distance function, S M , | S | <