Assessing Effect of Cross- Hybridization on Oligonucleotide - - PowerPoint PPT Presentation

assessing effect of cross hybridization on
SMART_READER_LITE
LIVE PREVIEW

Assessing Effect of Cross- Hybridization on Oligonucleotide - - PowerPoint PPT Presentation

Assessing Effect of Cross- Hybridization on Oligonucleotide Microarrays S. Kachalo, J.Liang Dept. of Bioengineering University of Illinois at Chicago Abstract A prediction method to assess non-specific binding based on sequence similarity


slide-1
SLIDE 1

Assessing Effect of Cross- Hybridization on Oligonucleotide Microarrays

  • S. Kachalo, J.Liang
  • Dept. of Bioengineering

University of Illinois at Chicago

slide-2
SLIDE 2

Abstract

A prediction method to assess non-specific binding based on sequence similarity between probe and target would aid in the understanding and interpreting of global expression profile analysis. In this work we consider a linear hybridization model and estimate the binding coefficients using the quadratic programming technique. We demonstrate that the estimated binding coefficients are correlated with the similarity of nucleotide sequences between probes and

  • targets. We show that cross-hybridization can be detected for the probes

that have 7 or more nucleotide similarity with target. We introduce binding patterns technique for predicting the binding coefficients. Our results suggest that further development based

  • n nucleotide sequence can be fruitful.
slide-3
SLIDE 3

Data set

Transcript 37777_at 684_at 1597_at 38734_at 39058_at 36311_at 36889_at 1024_at 36202_at 36085_at 40322_at 407_at 1091_at 1708_at 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Expts A 0.25 0.5 1 2 4 8 16 32 64 128 512 1024 B 0.25 0.5 1 2 4 8 16 32 64 128 256 0.25 1024 C 0.5 1 2 4 8 16 32 64 128 256 512 0.5 0.25 D 1 2 4 8 16 32 64 128 256 512 1024 1 0.25 0.5 E 2 4 8 16 32 64 128 256 512 1024 2 0.5 1 F 4 8 16 32 64 128 256 512 1024 0.25 4 1 2 G 8 16 32 64 128 256 512 1024 0.25 0.5 8 2 4 H 16 32 64 128 256 512 1024 0.25 0.5 1 16 4 8 I 32 64 128 256 512 1024 0.25 0.5 1 2 32 8 16 J 64 128 256 512 1024 0.25 0.5 1 2 4 64 16 32 K 128 256 512 1024 0.25 0.5 1 2 4 8 128 32 64 L 256 512 1024 0.25 0.5 1 2 4 8 16 256 64 128 M, N, O, P 512 1024 0.25 0.5 1 2 4 8 16 32 512 128 256 Q, R, S, T 1024 0.25 0.5 1 2 4 8 16 32 64 1024 256 512

Human portion of Affymetrix Latin Square data set: 59 chips * 409,600 probes; 14 targets with known concentration and unknown complex target in 3 groups of experiments

slide-4
SLIDE 4

Common assumptions

  • Main contribution for PM
  • r MM probe intensity is

made by its specific target.

  • Non-specific targets

binding is about equal for PM and MM probes. Unfortunately, that is not always true…

slide-5
SLIDE 5

DNA binding model

) (+

R

) (−

R X N R

unoccupied

~

) (+ ) ( ) ( − + = R

R

bound

N R ~

) (−

association rate: dissociation rate: equilibrium: X N N

unoccupied bound ~

X - target concentration

slide-6
SLIDE 6

Linear and nonlinear dependency

200 400 600 800 1000 5000 10000 15000 concentration 1597_at [416:507]

unoccupied bound

N N >>

unoccupied bound

N N << X Nbound ~ const Nunoccupied ≈ const Nbound ≈ X Nunoccupied / 1 ~ linear dependency: saturation: Dependency is linear if target concentration is low

slide-7
SLIDE 7

Distribution of dependencies

200 400 600 800 1000 20 40 60 80 100 120 140 concentration 36889_at [327:417] 200 400 600 800 1000 1000 2000 3000 4000 concentration 684_at [527:198] 200 400 600 800 1000 5000 10000 15000 20000 concentration 684_at [517:489]

26 36 28 134 MM: 6 11 11 196 PM:

Most probes demonstrate typical concentration-intensity curve, linear for low concentrations and nonlinear for higher concentrations

slide-8
SLIDE 8

Linear model

+ =

j ik jk ij ik

X B Y ε ≥

ik

Y ≥

jk

X

ij

B

  • concentration of j-th target in k-th experiment;
  • signal intensity of i-th probe in k-th experiment;
  • binding coefficient for i-th probe and j-th target;

ik

ε

  • random noise.

Knowledge of binding coefficients can reduce calculation of target concentrations to a simple linear algebra problem!

slide-9
SLIDE 9

Calculating binding coefficients

0 e+00 1 e+05 2 e+05 3 e+05 4 e+05 0.0 0.2 0.4 0.6 0.8 1.0 probe error ratio

+ =

j k jk j k

X B Y ε For each probe

2 k

ε ≥

j

B minimize subject to

10 / 1 ˆ /

2 2

∑ ∑

k k

ε ε

k k

Y Y ε ˆ + = random model for comparison:

  • it’s a quadratic

programming problem

slide-10
SLIDE 10

Results

1 2 3 4 5 5 10 15 20 25 binding coefficient

Binding coefficients

  • btained correlate with

sequence similarity measures such as:

  • Longest common

substring size

  • Smith-Waterman local

alignment score (correlation is over 60%)

slide-11
SLIDE 11

Binding patterns contributions

5 10 15 20 25 30 4 5 6 7 8 9 10 11 12 13 14 15

+ =

a a a

rror C n B ε ≥

a

C

  • contribution of each type of match into binding coefficient;

a

n

  • number of matches of each type.
slide-12
SLIDE 12

Calculating contributions

5 10 15 0.0 0.5 1.0 1.5 2.0 2.5 match length contribution

Quadratic programming problem: minimize total error for all probe-target pairs under conditions: a) b) ≥

L

C

L L

C C ≥

+1 L

C

  • contribution of match of length L into binding coefficient;
slide-13
SLIDE 13

Estimated binding coefficients

1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 binding coefficient binding coefficient estimation

Binding coefficients, estimated via binding patterns contributions are 71% correlated with experimental binding coefficients

slide-14
SLIDE 14

Suggestions for Further Experiments

  • Lower target concentrations;
  • Lower dynamic range of target concentrations;
  • Smaller correlation between target concentration -

rather random concentrations than ordered Latin Square;

  • No complex target.
slide-15
SLIDE 15

Summary

  • Sequence information should be utilized in microarray data

analyses and microarray design;

  • Targets with similarities of 7 and more nucleotides to the

probe sequence have detectable contribution to its intensity;

  • Probe intensity can be assumed linear function of target

concentrations for a reasonable range of concentrations;

  • If binding coefficients are known, linear binding model can

give more accuracy than traditional algorithm.

slide-16
SLIDE 16

Thank you!