Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies - - PowerPoint PPT Presentation

systematic annotation
SMART_READER_LITE
LIVE PREVIEW

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies - - PowerPoint PPT Presentation

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM PNAS 95:14863 Mark Voorhies Systematic Annotation The Gene Ontology Three directed acyclic graphs (aspects): Biological Process Molecular


slide-1
SLIDE 1

Systematic Annotation

Mark Voorhies 4/5/2012

Mark Voorhies Systematic Annotation

slide-2
SLIDE 2

Review

RTFM PNAS 95:14863

Mark Voorhies Systematic Annotation

slide-3
SLIDE 3

The Gene Ontology

Three directed acyclic graphs (aspects): Biological Process Molecular Function Subcellular Component

Mark Voorhies Systematic Annotation

slide-4
SLIDE 4

The Gene Ontology

Mark Voorhies Systematic Annotation

slide-5
SLIDE 5

The Gene Ontology

Mark Voorhies Systematic Annotation

slide-6
SLIDE 6

The AmiGO browser

Mark Voorhies Systematic Annotation

slide-7
SLIDE 7

The Gene Ontology

How might we annotate genes with GO terms? How do we calculate the significance of the GO terms associated with a particular group of genes?

Mark Voorhies Systematic Annotation

slide-8
SLIDE 8

Associating GO terms

How might we annotate genes with GO terms?

Mark Voorhies Systematic Annotation

slide-9
SLIDE 9

Associating GO terms

How might we annotate genes with GO terms? By sequence homology (e.g., BLAST) By domain homology (e.g., InterProScan) Mapping from an annotated relative (e.g., INPARANOID) Human curation of the literature (e.g., SGD)

Mark Voorhies Systematic Annotation

slide-10
SLIDE 10

Associating GO terms: Evidence codes

Experimental

EXP: Inferred from Experiment IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern

Computational Analysis

ISS: Inferred from Sequence or Structural Similarity ISO: Inferred from Sequence Orthology ISA: Inferred from Sequence Alignment ISM: Inferred from Sequence Model IGC: Inferred from Genomic Context RCA: inferred from Reviewed Computational Analysis

Author Statement

TAS: Traceable Author Statement NAS: Non-traceable Author Statement Curator Statement Evidence Codes IC: Inferred by Curator ND: No biological Data available

Automatically-assigned

IEA: Inferred from Electronic Annotation

Obsolete

NR: Not Recorded Mark Voorhies Systematic Annotation

slide-11
SLIDE 11

The Gene Ontology

How might we annotate genes with GO terms? How do we calculate the significance of the GO terms associated with a particular group of genes?

Mark Voorhies Systematic Annotation

slide-12
SLIDE 12

Sampling with replacement: Mutagenesis

How many transformants do we have to screen in order to “cover” a genome?

Mark Voorhies Systematic Annotation

slide-13
SLIDE 13

Sampling with replacement: Mutagenesis

How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: pm Number of genes in organsim: Ng

Mark Voorhies Systematic Annotation

slide-14
SLIDE 14

Sampling with replacement: Mutagenesis

How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: pm Number of genes in organsim: Ng Probability that a specific gene is disrupted in a specific transformant: pd = pm 1 Ng

  • = pm

Ng (1)

Mark Voorhies Systematic Annotation

slide-15
SLIDE 15

Sampling with replacement: Mutagenesis

How many transformants do we have to screen in order to “cover” a genome? Probability that a transformant has (1) disrupted gene: pm Number of genes in organsim: Ng Probability that a specific gene is disrupted in a specific transformant: pd = pm 1 Ng

  • = pm

Ng (1) Probability of not disrupting that gene: pu = 1 − pm Ng (2)

Mark Voorhies Systematic Annotation

slide-16
SLIDE 16

Sampling with replacement: Mutagenesis

Probability of not disrupting that gene: pu = 1 − pm Ng (3)

Mark Voorhies Systematic Annotation

slide-17
SLIDE 17

Sampling with replacement: Mutagenesis

Probability of not disrupting that gene: pu = 1 − pm Ng (3) The probability of not disrupting that gene n independent times is: pu,n =

  • 1 − pm

Ng n (4)

Mark Voorhies Systematic Annotation

slide-18
SLIDE 18

Sampling with replacement: Mutagenesis

Probability of not disrupting that gene: pu = 1 − pm Ng (3) The probability of not disrupting that gene n independent times is: pu,n =

  • 1 − pm

Ng n (4) And the probability of disrupting that gene n independent times is: pd,n = 1 − pu,n = 1 −

  • 1 − pm

Ng n (5)

Mark Voorhies Systematic Annotation

slide-19
SLIDE 19

Sampling with replacement: Mutagenesis

Probability of not disrupting that gene: pu = 1 − pm Ng (3) The probability of not disrupting that gene n independent times is: pu,n =

  • 1 − pm

Ng n (4) And the probability of disrupting that gene n independent times is: pd,n = 1 − pu,n = 1 −

  • 1 − pm

Ng n (5) This is also the expected genome coverage.

Mark Voorhies Systematic Annotation

slide-20
SLIDE 20

Sampling with replacement: Mutagenesis

50000 100000 150000 200000 0.0 0.2 0.4 0.6 0.8 1.0 n p_i or coverage

  • Mark Voorhies

Systematic Annotation

slide-21
SLIDE 21

Sampling with replacement: General Cases

Calculating the probability of zero events was easy. p0,n =

  • 1 − pm

Ng n (6)

Mark Voorhies Systematic Annotation

slide-22
SLIDE 22

Sampling with replacement: General Cases

Calculating the probability of zero events was easy. p0,n =

  • 1 − pm

Ng n (6) What about exactly k events?

Mark Voorhies Systematic Annotation

slide-23
SLIDE 23

Sampling with replacement: General Cases

Calculating the probability of zero events was easy. p0,n =

  • 1 − pm

Ng n (6) What about exactly k events? Binomial distribution: pk,n = n k

  • pk

m(1 − pm)n−k

(7)

Mark Voorhies Systematic Annotation

slide-24
SLIDE 24

Sampling with replacement: General Cases

Calculating the probability of zero events was easy. p0,n =

  • 1 − pm

Ng n (6) What about exactly k events? Binomial distribution: pk,n = n k

  • pk

m(1 − pm)n−k

(7) What if there is more than one type of event?

Mark Voorhies Systematic Annotation

slide-25
SLIDE 25

Sampling with replacement: General Cases

Calculating the probability of zero events was easy. p0,n =

  • 1 − pm

Ng n (6) What about exactly k events? Binomial distribution: pk,n = n k

  • pk

m(1 − pm)n−k

(7) What if there is more than one type of event? Multinomial distribution: pk1,k2,...,n = n! ki!

  • pki

i

(8)

Mark Voorhies Systematic Annotation

slide-26
SLIDE 26

Sampling without replacement: GO Annotation

The binomial distribution assumes that event probabilities are constant: pk,n = n k

  • pk

m(1 − pm)n−k

(9)

Mark Voorhies Systematic Annotation

slide-27
SLIDE 27

Sampling without replacement: GO Annotation

The binomial distribution assumes that event probabilities are constant: pk,n = n k

  • pk

m(1 − pm)n−k

(9) What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library?

Mark Voorhies Systematic Annotation

slide-28
SLIDE 28

Sampling without replacement: GO Annotation

The binomial distribution assumes that event probabilities are constant: pk,n = n k

  • pk

m(1 − pm)n−k

(9) What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Hypergeometric distribution: pk,m,n = m

k

N−m

n−k

  • N

n

  • (10)

Mark Voorhies Systematic Annotation

slide-29
SLIDE 29

Sampling without replacement: GO Annotation

The binomial distribution assumes that event probabilities are constant: pk,n = n k

  • pk

m(1 − pm)n−k

(9) What if there are m virulence factors in our genome, and every time we discover one it is magically removed from our library? Hypergeometric distribution: pk,m,n = m

k

N−m

n−k

  • N

n

  • (10)

More than one disjoint type of label: pk1,k2,...,m1,m2,...,n = mi

ki

  • N

n

  • (11)

Mark Voorhies Systematic Annotation

slide-30
SLIDE 30

Extracting gene lists from JavaTreeView

Mark Voorhies Systematic Annotation

slide-31
SLIDE 31

The SGD GO Slim Mapper

Mark Voorhies Systematic Annotation

slide-32
SLIDE 32

Multiple Hypothesis Testing

http://xkcd.com/882/ Mark Voorhies Systematic Annotation

slide-33
SLIDE 33

Alternatives to Hierarchical Clustering

GORDER and pre-clustering by SOM

Mark Voorhies Systematic Annotation

slide-34
SLIDE 34

Alternatives to Hierarchical Clustering

GORDER and pre-clustering by SOM Pre-calling number of clusters: k-means and k-medians

Mark Voorhies Systematic Annotation

slide-35
SLIDE 35

Alternatives to Hierarchical Clustering

GORDER and pre-clustering by SOM Pre-calling number of clusters: k-means and k-medians Principal Component Analysis (PCA)

Mark Voorhies Systematic Annotation

slide-36
SLIDE 36

Homework

Download PyMol

Mark Voorhies Systematic Annotation