Practical Bioinformatics Mark Voorhies 4/3/2018 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/3/2018 Mark Voorhies Practical Bioinformatics

Mean def mean( x ) : s = 0.0 i in x : for s += i s / l e n ( x ) return def mean( x ) : return sum( x )/ f l o a t ( l e n ( x )) Mark Voorhies Practical Bioinformatics

Standard Deviation �� N x ) 2 i ( x i − ¯ σ x = N − 1 Mark Voorhies Practical Bioinformatics

Standard Deviation �� N x ) 2 i ( x i − ¯ σ x = N − 1 stdev ( x ) : def m = mean( x ) s = 0.0 for i in x : s += ( i − m) ∗∗ 2 return ( s /( l e n ( x ) − 1)) ∗∗ .5 Mark Voorhies Practical Bioinformatics

Pearson’s Correlation Coefficient � i ( x i − ¯ x )( y i − ¯ y ) r ( x , y ) = �� x ) 2 �� y ) 2 i ( x i − ¯ i ( y i − ¯ Mark Voorhies Practical Bioinformatics

Pearson’s Correlation Coefficient def pearson ( x , y ) : � i ( x i − ¯ x )( y i − ¯ y ) mx = mean( x ) r ( x , y ) = �� x ) 2 y ) 2 i ( x i − ¯ i ( y i − ¯ my = mean( y ) sxy = 0.0 ssx = 0.0 ssy = 0.0 i , j z i p ( x , y ) : for in dx = i − mx dy = j − my sxy += dx ∗ dy ssx += dx ∗∗ 2 ssy += dy ∗∗ 2 sxy /(( ssx ∗ ssy ) ∗∗ .5) return Mark Voorhies Practical Bioinformatics

[T]he relational graphic – in its barest form, the scatterplot and its variants – is the greatest of all graphical designs. It links at least two variables, encouraging and even imploring the viewer to assess the possible causal relationship between the plotted variables. –Edward Tufte Mark Voorhies Practical Bioinformatics

Collections of objects # A l i s t i s a mutable sequence of o b j e c t s m y l i s t = [1 , 3.1415926535 , ”GATACA” , 4 , 5 ] # Indexing m y l i s t [0] == 1 m y l i s t [ − 1] == 5 # Assigning by index m y l i s t [ 0 ] = ”ATG” # S l i c i n g m y l i s t [1:3] == [3.1415926535 , ”GATACA” ] m y l i s t [:2] == [1 , 3.1415926535] m y l i s t [ 3 : ] = = [ 4 , 5 ] # Assigning a second name to a l i s t a l s o m y l i s t = m y l i s t # Assigning to a copy of a l i s t m y o t h e r l i s t = m y l i s t [ : ] Mark Voorhies Practical Bioinformatics

Subject, verb that noun! return value = object.function(parameter, ...) “Object, do function to parameter ” file = open(“myfile.txt”) file.read() file.readlines() for line in file: string.split() and string.join() file.write() Mark Voorhies Practical Bioinformatics

Binary files are like genomic DNA hexdump -C computers.png fp = open(“computers.png”) fp.read(50) fp.close() Mark Voorhies Practical Bioinformatics

Text files are like ORFs hexdump -C 3 4 2010.txt Mark Voorhies Practical Bioinformatics

OS X sometimes uses CR newlines hexdump -C macfile.txt tr ’ \ r’ ’ \ n’ < macfile.txt > unixfile.txt Mark Voorhies Practical Bioinformatics

Windows uses CRLF newlines hexdump -C dosfile.txt Mark Voorhies Practical Bioinformatics

supp2data.csv CSV File Mark Voorhies Practical Bioinformatics

open(“supp2data.csv”) File object CSV File Mark Voorhies Practical Bioinformatics

open(“supp2data.csv”).next() single line File object CSV File Mark Voorhies Practical Bioinformatics

open(“supp2data.csv”).read() single line whole file File object CSV File Mark Voorhies Practical Bioinformatics

csv.reader(open(“supp2data.csv”)).next() list reader File object CSV File Mark Voorhies Practical Bioinformatics

csv.reader(urlopen(“http://example.com/csv”)).next() list reader urllib object Web service CSV File Mark Voorhies Practical Bioinformatics

The CDT file format Minimal CLUSTER input Cluster3 CDT output Tab delimited ( \ t) UNIX newlines ( \ n) Missing values → empty cells Mark Voorhies Practical Bioinformatics

Homework 1 Download and install JavaTreeView 2 Try reading the first few bytes of different files on your computer. Can you distinguish binary files from text files? 3 Create a simple data table in your favorite spreadsheet program and save it in a text format ( e.g. , save as CSV or tab-delimited text from Excel 1 ). Practice reading the data from Python. 4 Write a function to disect supp2data.cdt into three lists of strings (gene names, gene annotations, and experimental conditions) and one matrix (list of lists) of log ratio values (as floats, using None or 0. to represent missing values). 5 If you are familiar with Python classes, write a CDT class based on the parse in the previous exercise. Provide methods for looking up annotations and log ratios by gene name. 1 Note for Mac users: Excel will offer you Macintosh and DOS/Windows text formats. Choose DOS/Windows ; otherwise, Python will think that the entire file is a single line. Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/3/2018 Mark Voorhies - PowerPoint PPT Presentation

Practical Bioinformatics Mark Voorhies 4/3/2018 Mark Voorhies Practical Bioinformatics Mean def mean( x ) : s = 0.0 i in x : for s += i s / l e n ( x ) return def mean( x ) : return sum( x )/ f l o a t ( l e n ( x )) Mark Voorhies

Practical Bioinformatics Mark Voorhies 5/15/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/9/2018 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/12/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 6/3/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/ 24/ 2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/23/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Practical Bioinformatics Mark Voorhies 5/11/2015 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/29/2019 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 4/20/2011 Mark Voorhies Practical Bioinformatics Review

Practical Bioinformatics Mark Voorhies 5/21/2013 Mark Voorhies Practical Bioinformatics

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Practical Bioinformatics Mark Voorhies 5/14/2019 Mark Voorhies Practical Bioinformatics Course

Practical Bioinformatics Mark Voorhies 5/2/2017 Mark Voorhies Practical Bioinformatics

functions Genome 559: Introduction to Statistical and Computational Genomics Prof. James H.

Layer for a 3D Multi-core Processor with Awareness of Layout Constraints 1 1 2 Luca Ramini ,

Lecture 5. Time to failure - Failure intensity Measures of Risk-Testing for Poisson cdf 1 Igor

Ethernet Session 16 INST 346 Technologies, Infrastructure and Architecture Link Layer and LANs

UNIX Data Tools Bualo Chapter 7 1 / 37 Overview In Chapter 3 we learned the basic operations

Topics for today Introduction to R Graphics: Getting started with R g U i R t t fi

Maureen P. Walsh Open Repositories 2013 Charlottetown, PEI

Importing data into R Workshop 3 2 Objectives By doing this workshop and carrying out the