P. falciparum: Examination of Correlation Between Spatial Location - - PowerPoint PPT Presentation

p falciparum
SMART_READER_LITE
LIVE PREVIEW

P. falciparum: Examination of Correlation Between Spatial Location - - PowerPoint PPT Presentation

P. falciparum: Examination of Correlation Between Spatial Location and Temporal Expression of Genes CAMDA Conference 11 November 2004 JB Christian, C Shaw, J Noyola-Martinez, MC Gustin, DW Scott and R Guerra Motivations: Evidence for


slide-1
SLIDE 1
  • P. falciparum:

Examination of Correlation Between Spatial Location and Temporal Expression of Genes

CAMDA Conference 11 November 2004 JB Christian, C Shaw, J Noyola-Martinez, MC Gustin, DW Scott and R Guerra

slide-2
SLIDE 2

Motivations:

  • Evidence for correlation in literature

– Printing artifact – Biological

  • Improving on Bozdech threshold
  • Develop a visualization and statistical

testing methodology

slide-3
SLIDE 3

ORF1 ORF2

promoter mRNA

Operon control (bacteria) ORF1 ORF2

mRNAs

Upstream Activating Sequences (yeast)

UAS1 UAS2

ORF1 ORF2 Locus Control Region (mammalian globin cluster)

LCR1 mRNAs

Biological Motivations

slide-4
SLIDE 4

Hypothesis and Statistic

  • Statistical: Correlation between

chromosomal location and gene expression?

  • Biological: Gene order random?
  • H0: no correlation between location on

chromosome and expression

  • Consider correlations in partitions
slide-5
SLIDE 5

Approach

Covariogram: General Tool Partition Chromosome, Develop Statistic Permutation Testing Framework Check for Confounding Factors Biological Significance

slide-6
SLIDE 6

Issues

  • Confounding (printing) or other artifacts
  • Account for inter-gene distances (as
  • pposed to adjacent pairwise correlation)
  • Significance of correlation
  • peron
slide-7
SLIDE 7

Methods: Data

  • Need gene information (plasmodb.org

has annotated fastA files):

TCAAGCAATTGTTAGATGAGAACAATAGGAAGAATTTAAATTTTAATGAT CTGGTTATACACCCTTGGTGGTCTTATAAGAATTAA >Pfa3D7|pfal_chr1|PFA0135w|Annotation|Sanger(protein coding) hypothetical protein Location=join(124752..124823,124961..125719) ATGATATTTCATAAATGCTTTAAAATTTGTTCGCTCTCTTGTACTGTTTT ATGGGTTACCGCCATATCATCGATCATTCAACCAGACAAACAACAAGAAA

  • Normalized gpr files (2-D loess,

centered and scaled)

slide-8
SLIDE 8

Methods: Data

FastA sequence: 5400 predicted genes QC Microarray: 3800 genes 5100 probes Intersection: 3500 genes with common gene name

PFA0135w 124752:125719 bp PFA0135w probe a16122_1 t1,t2,…, t48 PFA0135w 124752:125719 bp probe a16122_1 t1,t2,…, t48

slide-9
SLIDE 9

Methods: Covariograms

)] ) , ( | , ( [ ) , ; , (

b a b a

d y x dist d y x Ave d d y x < ≤ = ρ γ

  • Covariogram 1: distance is chromosomal

location:

  • Covariogram 2: distance is printed microarray

location:

) ( , ) ( ,

) , (

loc chr midpt j loc chr midpt i j i

g g g g d − =

( ) ( )

( )

2 , , 2 , ,

) , (

y j y i x j x i j i

g g g g g g d − + − =

slide-10
SLIDE 10

Chr 10: Covariogram 2 Chr 10: Covariogram 1 Chr 6: Covariogram 1 Chr 6: Covariogram 2

slide-11
SLIDE 11

Methods: Partitioning

  • Partition
  • Avg of all

pairwise Pearson correlations

  • =

=

3 1 2

3 1

i i

r r

3 genes,

  • 2

3

pairwise correlations

60 kb 120 kb 0 kb

  • =

=

21 1 1

21 1

i i

r r

7 genes,

  • 2

7

pairwise correlations

slide-12
SLIDE 12

Methods: Partitioning

  • Chr 6, 40 kb partition
  • Significant?
slide-13
SLIDE 13

Methods: Permutation Test

  • in a 40kb

interval on chr 6

  • Permutation test
  • Null distribution
  • Estimated

p-values

2

g

3

g

4

g

  • bs

gene

1

g

1

e

2

e

3

e

4

e

Perm(1)

1

e

2

e

3

e

4

e

Perm(2)

1

e

2

e

3

e

4

e

Perm(n) …

1

e

2

e

3

e

4

e

.50 = r

slide-14
SLIDE 14

22 . 2 57 . = − = = val p n r

genes

  • bs

Methods: Permutation Test

  • Distribution of

in 40 kb interval

r

slide-15
SLIDE 15

001 . 6 72 . ≤ − = = val p n r

genes

  • bs

Methods: Permutation Test

  • Distribution of

in 40 kb interval

r

slide-16
SLIDE 16

002 . 9 49 . = − = = val p n r

genes

  • bs

Methods: Permutation Test

  • Distribution of

in 40 kb interval

r

slide-17
SLIDE 17

475 . 12 018 . = − = = val p n r

genes

  • bs

Methods: Permutation Test

  • Distribution of

in 40 kb interval

r

slide-18
SLIDE 18

100kb 10kb 80kb 20kb 60kb 40kb

Significant Intervals (Chr 7)

slide-19
SLIDE 19

Significant Intervals (Chr 7)

100kb 10kb 80kb 20kb 60kb 40kb

slide-20
SLIDE 20

Significant Intervals (Chr 7)

100kb 10kb 80kb 20kb 60kb 40kb

slide-21
SLIDE 21

100kb 80kb 10kb 20kb 40kb 60kb

slide-22
SLIDE 22

MAL6P1.273: hypothetical protein MAL6P1.272: ribonuclease MAL6P1.271: cdc2-like protein kinase MAL6P1.268: hypothetical protein MAL6P1.267: hypothetical protein MAL6P1.266: hypothetical protein MAL6P1.265: pyridoxine kinase MAL6P1.263: hypothetical protein MAL6P1.260: hypothetical protein MAL6P1.259: hypothetical protein MAL6P1.258: malate:quinone oxidoreductase MAL6P1.257: hypothetical protein

slide-23
SLIDE 23

7500 10 967.5 957.5 3 0.87 0.002 5000 10 965 955 3 0.87 0.004 10000 20 970 950 4 0.76 0.002 45000 60 1005 945 9 0.51 5000 20 965 945 4 0.76 0.002 60000 80 1020 940 11 0.39 0.003 20000 40 980 940 5 0.64 0.002 15000 20 955 935 2 0.96 30000 60 990 930 8 0.57 0.001 10000 40 970 930 5 0.76 0.001 30000 40 750 710 9 0.39 0.004 30000 60 750 690 10 0.44 0.003 75000 100 775 675 14 0.27 0.003 2500 10 562.5 552.5 3 0.86 0.002 10000 20 570 550 3 0.86 0.004 10 560 550 3 0.86 0.003 Start Loc Size kb End kb Start kb ngenes Avg Cor p-val

Intervals (Chr 6)

slide-24
SLIDE 24

Results: Summary Table

1/132 2/220 4/1304 Chr 14 3/56 1/88 6/528 Chr 5 4 2/48 5/80 10/476 Chr 4 0/40 0/68 3/400 Chr 3 10kb in 60kb 100kb 60kb 10kb

slide-25
SLIDE 25

Conclusions

  • Statistical: Significance for both small

regions of strong correlation and large regions of weak correlation

  • Biological: Evidence for regulation at

multiple levels