Deconvoluting BAC-gene Deconvoluting BAC-gene Relationships Using - - PowerPoint PPT Presentation

deconvoluting bac gene deconvoluting bac gene
SMART_READER_LITE
LIVE PREVIEW

Deconvoluting BAC-gene Deconvoluting BAC-gene Relationships Using - - PowerPoint PPT Presentation

Deconvoluting BAC-gene Deconvoluting BAC-gene Relationships Using Relationships Using a Physical Map a Physical Map Y. Wu 1 1 , L. Liu , L. Liu 1 1 , T. Close , T. Close 2 2 , S. Lonardi , S. Lonardi 1 1 Y. Wu 1 1 Department of Computer


slide-1
SLIDE 1

Deconvoluting BAC-gene Deconvoluting BAC-gene Relationships Using Relationships Using a Physical Map a Physical Map

  • Y. Wu
  • Y. Wu1

1, L. Liu

, L. Liu1

1, T. Close

, T. Close2

2, S. Lonardi

, S. Lonardi1

1

1 1Department of Computer Science & Engineering

Department of Computer Science & Engineering

2 2Department of Botany & Plant Sciences

Department of Botany & Plant Sciences

slide-2
SLIDE 2

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Selective sequencing Selective sequencing

  • Many organisms are unlikely to be

Many organisms are unlikely to be sequenced in the near future due to the sequenced in the near future due to the large size and highly repetitive content of large size and highly repetitive content of their genomes their genomes

  • Selective sequencing:

Selective sequencing: obtain the sequence

  • btain the sequence
  • f a small set of BAC clones that contain a
  • f a small set of BAC clones that contain a

specific set of genes of interest specific set of genes of interest

  • How do we identify these BAC clones?

How do we identify these BAC clones? BAC-gene deconvolution problem BAC-gene deconvolution problem

slide-3
SLIDE 3

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

An illustration of the problem An illustration of the problem

slide-4
SLIDE 4

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

An illustration of the problem An illustration of the problem

slide-5
SLIDE 5

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

An illustration of the problem An illustration of the problem

slide-6
SLIDE 6

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with probes Hybridization with probes

  • The presence of a gene in a BAC can be

The presence of a gene in a BAC can be determined by an hybridization experiment determined by an hybridization experiment (e.g., using a (e.g., using a unique unique probe designed from it) probe designed from it)

  • Given that typically BAC clones and probes

Given that typically BAC clones and probes could be in the order of tens of thousands, could be in the order of tens of thousands, carrying out an experiment for each pair carrying out an experiment for each pair (BAC,probe) is usually unfeasible (BAC,probe) is usually unfeasible

  • Group testing

Group testing (or (or pooling pooling) has to be used ) has to be used

slide-7
SLIDE 7

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization with pools of probes Hybridization with pools of probes

  • Probes can be arranged into pools for

Probes can be arranged into pools for group testing. group testing. However, i However, in order to achieve n order to achieve exact deconvolution this strategy could be exact deconvolution this strategy could be still unfeasible due to the large number of still unfeasible due to the large number of pools pools

  • Question

Question: Can we use a small number of : Can we use a small number of pools (e.g., 1- or 2-decodable pool design) pools (e.g., 1- or 2-decodable pool design) and still achieve accurate deconvolution? and still achieve accurate deconvolution?

slide-8
SLIDE 8

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Dealing with the limitations of pooling Dealing with the limitations of pooling

  • Answer:

Answer: Yes, if one compensates for the Yes, if one compensates for the lack of information obtained by a weak lack of information obtained by a weak pooling design with the knowledge of the pooling design with the knowledge of the

  • verlapping structure of the BACs
  • verlapping structure of the BACs
  • In this way, the number of pools required

In this way, the number of pools required is reduced is reduced ⇒ ⇒ less expensive/time- less expensive/time- consuming consuming

slide-9
SLIDE 9

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Hybridization data Hybridization data

h(b,p)=1 h(b,p)=1 (pool (pool p p hybridizes to BAC hybridizes to BAC b b) )

– – b b must must contain at least one of the contain at least one of the probes/genes represented by probes/genes represented by p p – – positive information positive information

h(b,p)=0 h(b,p)=0 (pool (pool p p does not hybridize to BAC does not hybridize to BAC b b) )

– – b b cannot cannot contain any of the probes/genes contain any of the probes/genes represented by represented by p p – – negative information negative information

slide-10
SLIDE 10

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Deconvolution problem Deconvolution problem

  • Given

Given h(b,p) h(b,p) for all pairs for all pairs (b,p) (b,p) the the deconvolution problem deconvolution problem is to establish a is to establish a

  • ne-to-many assignment between the
  • ne-to-many assignment between the

probes probes p p and the clones and the clones b b in such a way in such a way that it satisfies the value of that it satisfies the value of h h 1.

  • 1. Basic deconvolution: uses only on

Basic deconvolution: uses only on information obtained from group testing information obtained from group testing 2.

  • 2. Improved deconvolution: also uses the

Improved deconvolution: also uses the physical map physical map

slide-11
SLIDE 11

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Input to the basic deconvolution Input to the basic deconvolution

1 1 b b5

5

1 1 1 1 b b4

4

1 1 1 1 b b3

3

1 1 1 1 b b2

2

1 1 b b1

1

p p4

4

p p3

3

p p2

2

p p1

1

h h

Hybridization table

pi is a pool bj is a BAC uk is a probe/gene

slide-12
SLIDE 12

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Input to the basic deconvolution Input to the basic deconvolution

1 1 b b5

5

1 1 1 1 b b4

4

1 1 1 1 b b3

3

1 1 1 1 b b2

2

1 1 b b1

1

p p4

4

p p3

3

p p2

2

p p1

1

h h 1 1 1 1 1 1 p p4

4

1 1 1 1 1 1 p p3

3

1 1 1 1 1 1 p p2

2

1 1 1 1 1 1 p p1

1

u u9

9

u u8

8

u u7

7

u u6

6

u u5

5

u u4

4

u u3

3

u u2

2

u u1

1

Pool content table Hybridization table

pi is a pool bj is a BAC uk is a probe/gene

slide-13
SLIDE 13

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Positive information Positive information

1 1 1 1 1 1 b b4

4,p

,p4

4

1 1 1 1 1 1 b b5

5,p

,p4

4

1 1 1 1 1 1 b b4

4,p

,p3

3

1 1 1 1 1 1 b b3

3,p

,p3

3

1 1 1 1 1 1 b b3

3,p

,p2

2

1 1 1 1 1 1 b b2

2,p

,p2

2

1 1 1 1 1 1 b b2

2,p

,p1

1

1 1 1 1 1 1 b b1

1,p

,p1

1

u u9

9

u u8

8

u u7

7

u u6

6

u u5

5

u u4

4

u u3

3

u u2

2

u u1

1

pi is a pool bj is a BAC uk is a probe/gene

slide-14
SLIDE 14

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Negative information Negative information

b b5

5

b b4

4

b b3

3

b b2

2

b b1

1

u u9

9

u u8

8

u u7

7

u u6

6

u u5

5

u u4

4

u u3

3

u u2

2

u u1

1

pi is a pool bj is a BAC uk is a probe/gene

slide-15
SLIDE 15

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Combining positive & negative Combining positive & negative

1 1 1 1 1 1 b b4

4,p

,p4

4

1 1 1 1 1 1 b b5

5,p

,p4

4

1 1 1 1 1 1 b b4

4,p

,p3

3

1 1 1 1 1 1 b b3

3,p

,p3

3

1 1 1 1 1 1 b b3

3,p

,p2

2

1 1 1 1 1 1 b b2

2,p

,p2

2

1 1 1 1 1 1 b b2

2,p

,p1

1

1 1 1 1 1 1 b b1

1,p

,p1

1

u u9

9

u u8

8

u u7

7

u u6

6

u u5

5

u u4

4

u u3

3

u u2

2

u u1

1

pi is a pool bj is a BAC uk is a probe/gene

slide-16
SLIDE 16

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Combining positive & negative Combining positive & negative

1 1 1 1 1 1 b b4

4,p

,p4

4

1 1 1 1 b b5

5,p

,p4

4

1 1 1 1 b b4

4,p

,p3

3

1 1 1 1 b b3

3,p

,p3

3

1 1 1 1 b b3

3,p

,p2

2

1 1 1 1 b b2

2,p

,p2

2

1 1 1 1 1 1 b b2

2,p

,p1

1

1 1 1 1 b b1

1,p

,p1

1

u u9

9

u u8

8

u u7

7

u u6

6

u u5

5

u u4

4

u u3

3

u u2

2

u u1

1

  • Each row represents

Each row represents a a constraint constraint to be to be satisfied satisfied

  • If a row contains only

If a row contains only

  • ne
  • ne “

“1 1” ”, then the , then the relationship between relationship between the BAC and probe the BAC and probe is resolved exactly is resolved exactly pi is a pool bj is a BAC uk is a probe/gene

slide-17
SLIDE 17

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Physical map-assisted deconvolution Physical map-assisted deconvolution

  • Basic deconvolution is not sufficient

Basic deconvolution is not sufficient

  • BACs are assembled into

BACs are assembled into contigs contigs by FPC (a by FPC (a contig contig is a set of BAC clones) is a set of BAC clones)

  • We assume the probes are unique

We assume the probes are unique ⇒ ⇒ each probe each probe can belong to exactly one can belong to exactly one contig contig

Contig 1 Contig 2

slide-18
SLIDE 18

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Optimization problem Optimization problem

  • We formulate the following optimization

We formulate the following optimization problem problem

  • The problem is NP-complete (proof in the

The problem is NP-complete (proof in the paper, reduction from 3SAT) paper, reduction from 3SAT)

slide-19
SLIDE 19

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Integer Linear Programming Integer Linear Programming

  • The optimization problem can be solved

The optimization problem can be solved via integer linear programming (ILP) via integer linear programming (ILP)

slide-20
SLIDE 20

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

LP and randomized rounding LP and randomized rounding

  • The ILP is relaxed to the corresponding

The ILP is relaxed to the corresponding LP, then the LP is solved exactly (via the LP, then the LP is solved exactly (via the GLPK package) GLPK package)

  • Optimal solution to the LP is mapped to a

Optimal solution to the LP is mapped to a valid solution to the ILP via randomized valid solution to the ILP via randomized rounding rounding

  • We prove that our method achieves

We prove that our method achieves approximation ratio approximation ratio (1-e (1-e-1

  • 1)

)

slide-21
SLIDE 21

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results on rice genome Experimental results on rice genome

  • Whole genome sequence for rice is available

Whole genome sequence for rice is available

  • BAC library and fingerprinting data are available

BAC library and fingerprinting data are available from AGI from AGI

  • BAC-end sequences are also available from

BAC-end sequences are also available from Genbank Genbank

  • Physical map was built using FPC

Physical map was built using FPC

  • Coordinates of the BAC on the genome were

Coordinates of the BAC on the genome were determined by determined by BLASTing BLASTing BAC-end sequences BAC-end sequences against the genome against the genome

slide-22
SLIDE 22

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results on rice genome Experimental results on rice genome

  • Rice unigenes are available from NCBI

Rice unigenes are available from NCBI

  • Unique probes for the unigenes were

Unique probes for the unigenes were designed by the designed by the Oligospawn Oligospawn software software

  • Experiments focused on chromosome I

Experiments focused on chromosome I

  • Probe pools were designed following the

Probe pools were designed following the shifted transversal design (STD) shifted transversal design (STD)

  • Dataset: 2,002 probes and 2,629 BACs

Dataset: 2,002 probes and 2,629 BACs

slide-23
SLIDE 23

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results Experimental results

1-decodable pooling design 1-decodable pooling design

slide-24
SLIDE 24

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results Experimental results

2-decodable pooling design 2-decodable pooling design

slide-25
SLIDE 25

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Experimental results Experimental results

slide-26
SLIDE 26

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Findings Findings

  • We proposed a new method to solve the

We proposed a new method to solve the BAC-gene deconvolution problem based BAC-gene deconvolution problem based

  • n integer linear programming
  • n integer linear programming
  • Experimental results show that our method

Experimental results show that our method is accurate and effective is accurate and effective

slide-27
SLIDE 27

Stefano Lonardi, Department of Computer Science and Engineering, CSB 2007

Thank you Thank you

  • Funding

Funding

  • Serdar

Serdar Bozdag Bozdag (UC Riverside) for providing the (UC Riverside) for providing the rice data (fingerprinting and hybridization) rice data (fingerprinting and hybridization)