Practical Bioinformatics
Mark Voorhies 5/22/2015
Mark Voorhies Practical Bioinformatics
Practical Bioinformatics Mark Voorhies 5/22/2015 Mark Voorhies - - PowerPoint PPT Presentation
Practical Bioinformatics Mark Voorhies 5/22/2015 Mark Voorhies Practical Bioinformatics PAM (Dayhoff) and BLOSUM matrices PAM1 matrix originally calculated from manual alignments of highly conserved sequences (myoglobin, cytochrome C, etc.)
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
import networkx as nx t r y : import P y c l u s t e r except Imp ortErr or : import Bio . C l u s t e r as P y c l u s t e r c l a s s S c o r e C l u s t e r : def i n i t ( s e l f , S , alpha aa = ”ACDEFGHIKLMNPQRSTVWY” ) : ””” I n i t i a l i z e from numpy a r r a y
s c a l e d log
s c o r e s . ””” ( x , y ) = S . shape a s s e r t ( x == y == len ( alpha aa ) ) # I n t e r p r e t the l a r g e s t s c o r e as a d i s t a n c e
zero D = max(S . reshape ( x∗∗2))−S # Maximum −l i n k a g e c l u s t e r i n g , with a user−s u p p l i e d d i s t a n c e matrix t r e e = P y c l u s t e r . t r e e c l u s t e r ( d i s t a n c e m a t r i x = D, method = ”m” ) # Use NetworkX to read
the amino−a c i d s i n c l u s t e r e d
G = nx . DiGraph ( ) f o r (n , i ) i n enumerate ( t r e e ) : f o r j i n ( i . l e f t , i . r i g h t ) :
s e l f . o r d e r i n g = [ i f o r i i n nx . d f s p r e o r d e r (G, −len ( t r e e )) i f ( i >= 0 ) ] s e l f . names = ”” . j o i n ( alpha aa [ i ] f o r i i n s e l f . o r d e r i n g ) s e l f . C = s e l f . permute (S) def permute ( s e l f , S ) : ””” Given square matrix S i n a l p h a b e t i c a l
r e t u r n rows and columns
S permuted to match the c l u s t e r e d
return a r r a y ( [ [ S [ i ] [ j ] f o r j i n s e l f . o r d e r i n g ] f o r i i n s e l f . o r d e r i n g ] ) Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
f o r i i n 50 100 150 200 250 300 350 400 450; do head −n $ i −q G217B iron . f a s t a Pb01 iron . f a s t a > temp . f a s t a ; time c l u s t a l w −i n f i l e =temp . f a s t a −type= DNA −a l i g n ; done
Sequences ( 1 : 2 ) Aligned . Score : Guide t r e e f i l e c r e a t e d : [ temp . dnd ] There are 1 groups S t a r t
M u l t i p l e Alignment A l i g n i n g . . . Group 1: Delayed Alignment Score 7238 CLUSTAL −Alignment f i l e c r e a t e d [ temp . a l n ] r e a l 0m3.400 s u s e r 0m3.388 s s y s 0m0.012 s Mark Voorhies Practical Bioinformatics
#!/ usr / bin / env python # Time−stamp : <ParseTimes . py 2011−03−29 21:10:59 Mark Voorhies> ””” Parse w a l l times from a log f i l e
s t d i n and w r i t e them as a CSV formatted column f o r Excel / OpenOffice / etc
stdout . I f command l i n e arguments are given , t r e a t them as a second
column . ””” from csv import w r i t e r import re t i m e r e = re . compile ( ”ˆ r e a l .∗(?P <minutes >[\d]+)m(?P <seconds >[\d ]+\.[\ d]+) s ” , re .M) i f ( name == ” m a i n ” ) : import s y s args = s y s . argv [ 1 : ]
i = 0 f o r t i n t i m e r e . f i n d i t e r ( s y s . s t d i n . read ( ) ) : t r y : y = args [ i ] i += 1 except I n d e x E r r o r : y = ””
( f l o a t ( t . group ( ” minutes ”))∗60+ f l o a t ( t . group ( ” seconds ” ) ) , y )) del
Mark Voorhies Practical Bioinformatics
data < − read . csv ( ” t i m i n g s . csv ” , header = FALSE , c o l . names = c ( ” t ” , ”n” )) x < − log ( data $n∗80) y < − log ( data $ t / 60) f < − lm ( y ˜ x ) x0 < − 0:40000 a < − exp ( f $ c o e f f [ 1 ] ) b < − f $ c o e f f [ 2 ] pdf ( ” ClustalwTimings . pdf ” ) p l o t ( data $n∗80 , data$ t / 60 , x la b = ” l e n g t h /bp” , y la b = ” time / minutes ” , main = ”CLUSTALW t i m i n g s
I n t e l Core2 T7300@2 .00GHz , 32 b i t ” ) p o i n t s ( x0 , a∗x0ˆb , c o l = ” blue ” , type = ” l ” ) legend ( ” t o p l e f t ” , c ( ”y = ( 1 . 8 e−9)x ˆ ( 2 . 0 8 ) ” ) , c o l = ” blue ” , l t y = 1) dev . o f f ( ) Mark Voorhies Practical Bioinformatics
10000 15000 20000 25000 30000 35000 1 2 3 4 5
CLUSTALW timings on Intel Core2 T7300@2.00GHz, 32bit
length/bp time/minutes y = (1.8e−9)x^(2.08)
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
time b l 2 s e q −p b l a s t n −i G217B iron . f a s t a −j Pb01 iron . f a s t a −e 1e−6 > temp . b l a s t n r e a l 0m0.342 s u s e r 0m0.080 s s y s 0m0.032 s Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Modified from the INFERNAL User Guide – Nawrocki, Kolbe, and Eddy Mark Voorhies Practical Bioinformatics
Mark Voorhies Practical Bioinformatics