Comparative Genomics: Computational Challenges Bernard M.E. Moret - - PowerPoint PPT Presentation

comparative genomics computational challenges
SMART_READER_LITE
LIVE PREVIEW

Comparative Genomics: Computational Challenges Bernard M.E. Moret - - PowerPoint PPT Presentation

Comparative Genomics: Computational Challenges Bernard M.E. Moret Laboratory for Computational Biology and Bioinformatics EPFL Nantes, 6/8/09 p. Overview Comparative approaches The genome and its evolution High-throughput data


slide-1
SLIDE 1

Comparative Genomics: Computational Challenges

Bernard M.E. Moret

Laboratory for Computational Biology and Bioinformatics

EPFL

Nantes, 6/8/09 – p.

slide-2
SLIDE 2

Overview

Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p.

slide-3
SLIDE 3 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p.

slide-4
SLIDE 4

Comparative Approaches

Nothing makes sense in biology except in the light of evolution

  • Th. Dobzhansky, The American Biology Teacher, 1973.
an evolutionary perspective requires models of evolution a model of evolution requires data that reveals evolution data that reveals evolution must come from several organisms or tissues hence working in the light of evolution requires comparative approaches

From the point of view of experimentalists and medical researchers

some organisms, incl. humans, are difficult to study in a lab setting some experiments cannot be performed on some organisms, incl. humans,

for practical or ethical reasons

hence learning about these organisms is best done by studying others

Nantes, 6/8/09 – p.

slide-5
SLIDE 5

Characteristics of Comparative Approaches

Comparative approaches

rely on identification of conserved patterns develop evolutionary models for observed changes use conserved patterns for datamining

and evolutionary models for analysis of mined data

Translated to comparative genomics:

data: whole-genome sequences conserved patterns: subsequences, distribution statistics, or combinations models: duplication/loss/rearrangement at the genome level, mutation/indel for nucleotides datamining: important components, anchors, clusters, syntenic blocks

Nantes, 6/8/09 – p.

slide-6
SLIDE 6

Comparative Genomics Vocabulary

syntenic block: conserved pattern (subject to microrearrangements) used to denote conserved block of genes (10Kbps to 1Mbps)

(originally used in genetics to denote colocation on the same chromosome)

genomic alignment: sequence-level alignment of complete genomes, with block-level rearrangements, duplications, and losses positive (Darwinian) selection: selects for favorable traits; accelerates observed change in affected regions negative (filtering) selection: selects against changes of most kinds; slows down observed change in affected regions ancestral reconstruction: inference of the putative contents and arrangement of the genome of a common ancestor genomic signature: originally, distribution of dinucleotide frequencies, now characterization of common patterns in a group of genomes

Nantes, 6/8/09 – p.

slide-7
SLIDE 7 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p.

slide-8
SLIDE 8

Evolution of the Genome

What evolutionary events affect the genome?

nucleotide-level: “classical” sequence evolution (mutations and indels) genomic rearrangements: inversions, transpositions, translocations, and chromosomal fusion and fission duplication: gene retrotransposition, tandem duplication, segmental duplication, whole-genome duplication loss: point mutation, segmental deletion, neofunctionalization recombination: meiotic recombination, hybridization, lateral gene transfer

Nantes, 6/8/09 – p.

slide-9
SLIDE 9

Evolutionary Models

And how well understood are they?

nucleotide-level: well established models with good statistics genomic rearrangements: enormous work in the last 10 years, but still parameter-poor duplication/loss: established work in lineage sorting (divergent gene evolution due to paralogs), much attention to whole-genome duplication, just starting on segmental duplications recombination: established work in population genetics, much work on identifying lateral gene transfer, detailed work on recombination just starting

Nantes, 6/8/09 – p.

slide-10
SLIDE 10 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p. 1

slide-11
SLIDE 11

High-throughput data

Slowly pervading laboratory-based biology:

sequencing: the original high-throughput data source, now easily the most economical high-tech lab instrument gene expression: microarrays and their ilk, now inexpensive and in very widespread use transcription profiling: ChIP-chip, ChIP-seq, and future products, soon to replace microarrays mass spectrometry: for protein analysis and sequencing; now also for mixed samples (metaproteomics) SNP assays: for precise genotyping of humans

  • ther domains: cell signalling, metabolomics (e.g., fluxes),

3D imaging, time series, etc.

Nantes, 6/8/09 – p. 1

slide-12
SLIDE 12

High-throughput sequencing

developed for the human genome project, now also used for: de novo sequencing: still the most challenging resequencing: verify base calls, test assemblies, etc. deep sequencing: dense sampling or high coverage metagenomics: random sampling of microbial communities current technologies (454, Illumina) generate around 4Gbps per half-day run next-gen technologies may yield over 20Gbps per hour run, at better than 50x coverage, with very short (20bps) fragments

Nantes, 6/8/09 – p. 1

slide-13
SLIDE 13

Can computation keep up?

Computer power still follows Moore’s law, doubling every 12–15 months. However: That power is getting harder to use (parallelism is hard to exploit). Data accumulates faster than Moore’s law (sequence data alone doubles every year). This comparison presupposes a linear relationship, but most genomic analysis algorithms are much slower. High-performance computing does not help much: the fastest machines can provide only

10 3–10 4 speedup.

Nantes, 6/8/09 – p. 1

slide-14
SLIDE 14

Genome-scale computing

Running time is only one facet of the problem. Comparing several genomes of a few Gbps each requires a lot of memory—at least 128GB per node. Available and not too expensive: a 16-core compute node with 128GB memory and 500GB disk can be had for $20K. Still rare: most compute clusters have “thin” nodes (2-8GB of memory), unsuited to whole-genome analysis. 2010 architectures will pack 64–128 cores per node, with 0.5–1TB memory—great for comparing a few genomes, but by then we will have 100s of vertebrate genomes. . .

Nantes, 6/8/09 – p. 1

slide-15
SLIDE 15 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p. 1

slide-16
SLIDE 16

Basic annotation per genome

We want to identify:

all coding genes (or exons); all noncoding genes; gene families; SINEs, LINEs, and other repeat elements; regions under positive, neutral, or negative selection.

Beyond this basic level, we want to identify

gene clusters, operons, alternative splicing scenarios, etc.; gene function.

Nantes, 6/8/09 – p. 1

slide-17
SLIDE 17

Comparative annotation

Using pairwise comparative approaches, we can ask for

all pairwise homologies;

  • rthology and paralogy relationships within gene families

(within limits due to lack of phylogenetic information); syntenic blocks; mapping of the syntenic blocks between the two genomes (simple translocations and transpositions, with inversions); translation of functional annotations from each genome into the other.

Nantes, 6/8/09 – p. 1

slide-18
SLIDE 18

A simple bacterial example: Rickettsiae

  • R. conorii (Med. spotted fever) vs. R. prowazeckii (typhus), about 15%

from Ogata et al., Science 293(5537):2093–2098 (2001)

600000 605000 610000 615000 620000 0616 0617 trxB1 0619 0620 gabD 0622 lgtD uvrD 0625 0626 0627 tdcB lon 0630 0631 0632 tRNA−Asp 0633 yhbH 0635 folD 545000 550000 555000 560000 443 444 trxB1 lgtD uvrD 448 tdcB lon sca3 625000 630000 635000 640000 645000 650000 trxB2 0638 0639 0640 0641 0642 0643 0644 0645 0646 nrdA 0648 0649 0650 nrdB 0652 0653 0654 0655 miaA exoC abcT2 0659 0660 kpsF 0662 pnp rpsO truB tlc4 sca4 615000 620000 625000 630000 635000 640000 rpsA clpP tRNA−Ala tRNA−Asp yhbH 516 folD trxB2 nrdA nrdB 511 miaA exoC abcT2 507 506 kpsF pnp rpsO 502 truB tlc4 499 sca4 650000 655000 sca4 0668 0669 0670 0671 0672 0673 def2 0675 0676 0677 addA 610000 615000 499 sca4 497 496 495 494 addA ppdK 660000 665000 670000 675000 680000 685000 690000 695000 700000 0679 tRNA−Ser glmU 0681 rpmJ 0683 0684 0685 0686 0687 0688 0689 0690 0691 bioC pdhD 0694 0695 rnd 0697 0698 0699 0700 0701 panF 0703 0704 mccF2 hemC 0707 0708 0709 tRNA−Ser trpS plsC 0712 0713 0714 0715 0716 0717 ampG1 0719 0720 0721 tlc3 0723 ispB tRNA−Arg tRNA−Gln 0725 potE 0727 hesB2 nifU 565000 570000 575000 580000 585000 590000 595000 600000 452 tRNA−Ser glmU 455 rpmJ 457 458 459 pdhD 461 rnd 463 464 phoR hemC tRNA−Ser trpS plsC 470 471 472 473 474 ampG1 rfaJ tlc3 478 ispB tRNA−Arg tRNA−Gln 482 potE hesB2 nifU 700000 705000 0727 hesB2 nifU spl1 spl1 0732 0733 0734 0735 0736 0737 0738 0739 0740 0741 600000 605000 nifU spl1 spl1 488 489 490 ppdK 710000 715000 720000 725000 730000 735000 0742 0743 0744 0745 tRNA−Ala clpP rpsA cmk 0749 tRNA−Phe 0750 0751 0752 0753 0754 0755 0756 himD sppA rho 0760 recJ prfA 0763 pdhC infC 0766 0767 0768 0769 0770 0771 0772 birA 0774 0775 0776 0777 sodB folC 645000 650000 655000 660000 tRNA−Asp tRNA−Ala clpP rpsA cmk tRNA−Phe 524 sppA rho 527 recJ prfA pdhC infC 532 birA 534 sodB folC 740000 745000 750000 bioY 0781 0782 rnpB ppdK 0784 0785 0786 0787 0788 0789 0790 0791 0792 0793 0794 0795 nuoN1 0797 605000 610000 615000 spl1 488 489 490 ppdK addA 494 495 496 497 sca4 750000 755000 760000 765000 770000 775000 780000 785000 790000 795000 800000 nuoN1 0797 hemB 0799 priA ubiX 0802 dnaB 0804 0805 0806 rluB 0808 0809 radA 0811 0812 0813 0814 0815 infB nusA 0818 0819 0820 0821 tlyA tyrS 0824 0825 0826 tRNA−Arg 0827 0828 0829 0830 0831 0832 0833 0834 0835 0836 0837 0838 0839 0840 0841 0842 0843 0844 0845 0846 0847 ubiH ntrX 0850 665000 670000 675000 680000 685000 690000 695000 700000 705000 nuoN1 538 hemB priA ubiX dnaB 543 544 545 radA 547 548 549 550 551 infB nusA 554 tlyA tyrS tRNA−Arg 558 559 fadB 561 ntrX 563

Nantes, 6/8/09 – p. 1

slide-19
SLIDE 19

Phylogenetic annotation

Once we introduce phylogenetic information, we can study mechanisms and reconstruct events. We can ask for the history of:

duplications and losses of genes; rearrangements; introns gains and losses; recombinations of various types; lateral gene transfer;

and any other events of interest.

Nantes, 6/8/09 – p. 1

slide-20
SLIDE 20

Example: pathogenic genes in fungi

From Soanes et al., The Plant Cell 19:3318–3326 (2007)

Nantes, 6/8/09 – p. 2

slide-21
SLIDE 21 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p. 2

slide-22
SLIDE 22

Genomic alignment

Combines sequence-level alignment with block-level rearrangements, duplications, and losses.

Large syntenic blocks reduce computational work in eukaryotic genomes and may also serve as approximations for functional clusters. Sequence evolution is parameterized separately for each block. Tools include Mauve and Shuffle-Lagan, with MultiZ (align against the human genome) used for vertebrates.

Much remains to be done:

Handling of rearrangements/duplications/losses is very limited to date. Handling of rearrangements in local alignments is poor. Determining syntenic blocks is thus hard, hence the many tools: GRIMM-Synteny, AdHore, Cinteny, CS7, FISH, OrthoCluster, TEAM, etc.

Nantes, 6/8/09 – p. 2

slide-23
SLIDE 23

Example: comparative mapping

Human chromosomal regions colored by mouse chromosomal code

Note: large color blocks may be composed of many syntenic blocks.

Mapping is a first alignment step: rearrangements among elements, mapping within each chromosome, local alignment of syntenic blocks, must all follow.

Nantes, 6/8/09 – p. 2

slide-24
SLIDE 24

Example: synteny for cotton/Arabidopsis

From Rong et al., Genome Research 15:1198–1210 (2005). Solid and dotted vertical colored lines denote syntenic blocks found by FISH and CS7.

Nantes, 6/8/09 – p. 2

slide-25
SLIDE 25

Visualizing the output

Two obvious problems:

How do we condense the data for presentation from genomic scales (billions) to human scales (tens)? What forms of data presentation are most useful to researchers? For the first: visual aids: zooming interfaces, perspective walls mathematical techniques: projections semantic techniques: high-level representations For the second, much depends on: the problem the researcher’s training the researcher’s perceptual abilities

Nantes, 6/8/09 – p. 2

slide-26
SLIDE 26

Genomic rearrangements

Rearrangements alter the order and strandedness of genomic regions, from subsequences through genes to syntenic blocks.

To study rearrangements, genomes are represented by ordered sequences of signed indices, each index representing a gene or syntenic block.

Rearrangements can be characterized by their outcome: breakpoints their mechanism: inversions, transpositions, translocations a mathematical model: permutations, the Nadeau-Taylor model, double-cut-and-join (DCJ) In any framework, they present challenging algorithmic questions.

Nantes, 6/8/09 – p. 2

slide-27
SLIDE 27

Rearrangements: viewpoints

G2=(1 2 −5 −4 −3 6 7 8) G1=(1 2 3 4 5 6 7 8)

breakpoints (arrows) are missing adjacencies

1 2 3 7 4 6 5 8 7 8 5 6 1 4 3 2 7 8 5 6 1 −4 −3 −2 1 7 6 5 8 −4 −3 −2 Inversion Inverted Transposition Transposition

Double-cut-and-join makes two cuts in the genome, then reglues the ends.

With one cut on each of two chromosomes, translocation and fusion can occur. With two cuts on the same chromosome, inversion and fission can occur. Two successive DCJs, one with fission, one with fusion, cause a block exchange.

Nantes, 6/8/09 – p. 2

slide-28
SLIDE 28

The breakpoint graph

A graph representation of two (or more) orderings of genes. One ordering is used to represent the identity permutation.

Each gene is represented by two vertices (+ and -), the current permutation is given by solid edges, the identity by dashed edges.

  • 2+

4− 3+ R L 2− 4+ 3− 1− 1+

−2 4 3 −1 Every vertex has degree 2, with one solid and one dashed edge. Thus the graph decomposes into alternating cycles (here two cycles).

Nantes, 6/8/09 – p. 2

slide-29
SLIDE 29

Results on inversions

formulation and first results (1986) breakthrough theorem (Hannenhalli and Pevzner 1997): edit distance = # genes - # cycles in BP graph + # hurdles + # fortresses

  • ptimal
O (n) distance computation (2001)

use in unichromosomal phylogenetic reconstruction (2001) evolutionary distance estimators (2002) use in multichromosomal phylogenetic reconstruction (2002) improved theoretical framework (2002) inversions combined with duplication/loss (2004) probabilistic framework (2007)

  • ptimal
~ O (n log n) sorting (2009)

Nantes, 6/8/09 – p. 2

slide-30
SLIDE 30

Results on DCJ

DCJ unifies various rearrangements: inversions, transpositions, block exchanges, translocations, fissions, and fusions are all aspects of DCJ.

formulation of the problem (2005) improved theoretical framework (2006) evolutionary distance estimators (2008) probabilistic framework (2008) fast median computation (2008)

DCJ is particularly useful for foundational work in combinatorics, algorithmics, and statistics, but also appears to work well with biological data.

Nantes, 6/8/09 – p. 3

slide-31
SLIDE 31

Open questions

Research on rearrangements has yielded a wealth of results, but much remains to be done to make it useful in practice.

How do we: determine the relative importance of operations? parameterize operations as a function of the locations and lengths

  • f affected segments?

estimate breakpoint reuse (rearrangement hotspots)? characterize and optimize rearrangement scenarios with additional dependencies or constraints? specify sufficient biological constraints to make ancestral reconstruction possible?

Nantes, 6/8/09 – p. 3

slide-32
SLIDE 32 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p. 3

slide-33
SLIDE 33

Multiple Mauve alignment of Yersinia spp.

Nantes, 6/8/09 – p. 3

slide-34
SLIDE 34

Two is easy, more is hard

An optimization problem that can be solved efficiently for two

  • bjects often becomes intractable for three or more objects.

Examples include Satisfiability of Boolean formulae in conjunctive form (easy for clauses of 2 variables, hard for clauses of 3 variables), matching (easy for matching pairs, hard for matching triples), problems on graphs of fixed degree (trivial on graphs of degree 2, often hard on graphs of degree 3), etc. In comparative genomics, the two basic problems exhibit this same behavior. Sequence alignment, and hence also genomic alignment, is easy for two sequences, hard for more. Finding the rearrangement median between genomes is easy for two genomes, hard for more.

Nantes, 6/8/09 – p. 3

slide-35
SLIDE 35

Multiple sequence alignment

It remains the “single point of failure" of comparative genomics.

all methods attempt to reduce multiple alignment to a series of pairwise alignments every popular tool is based on progressive alignment, using some assumed (or heuristically built) phylogenetic tree even the best alignment packages (MAFFT, ProbCons, Muscle) handle only point mutations and indels results tend to be poor for sequences with significant divergence

In comparative genomics, the phylogeny is often known, yet even then progressive alignment may be poor.

Nantes, 6/8/09 – p. 3

slide-36
SLIDE 36

Sankoff's problem

The best formulation of multiple sequence alignment is due to Sankoff (1975).

Given

n sequences to align, find a binary tree on n leaves, an assignment of the n

sequences to the

n leaves, and n-1 sequences labelling the internal nodes of the tree

(“ancestral” sequences), that together optimize the sum, taken over all edges of the tree, of the pairwise alignment scores of the sequences associated with the two endpoints of each edge.

Sequences can be replaced by genomes, as gene maps, gene orders, full genome sequences, etc. Edge scores can be parsimony-based or reflect likelihood under some model. This formulation requires reconstruction of the full history of the given sequences from their last common ancestor—a very hard task. No tool exists for this problem at present, except for small-scale work

  • n gene-order data.

Nantes, 6/8/09 – p. 3

slide-37
SLIDE 37

Example: Sankoff's problem

ancestral reconstruction pairwise alignment

  • bserved sequence

All pairwise alignments involve at least one ancestral sequence.

Nantes, 6/8/09 – p. 3

slide-38
SLIDE 38

The median problem

The “other" big computational problem in comparative genomics is deceptively simple:

given

k genomes (usually 3), find a new genome that minimizes the sum
  • f the pairwise genomic distances from itself to the given
n genomes

Finding a median is

a key step in phylogenetic reconstruction and thus for Sankoff’s problem the most common approach to ancestral reconstruction

Median optimization is intractable under most measures

  • f genomic distance.

Nantes, 6/8/09 – p. 3

slide-39
SLIDE 39

Taming median computations

We cannot avoid the problem entirely (except with progressive alignment), but we can find ways of estimating or approximating medians quickly: progressive alignment: replaces median by a pairwise alignment profile minimum spanning tree: easily computed, then altered heuristically to produce a phylogenetic tree tight bounding on edge scores: based on mixed integer-linear programming,

  • ne set for each tree

greedy methods: in a median of three, repeatedly move from one end towards the other two simultaneously; fork when no such move exists compression methods: identify commuting or noninterfering operations (for which a single path suffices) decomposition methods: successfully used for DCJ medians

Nantes, 6/8/09 – p. 3

slide-40
SLIDE 40

The UPenn/UCSC aligner

The UC Santa Cruz Human Genome browser includes 27 additional tracks for 27 other vertebrate genomes, including reptiles, amphibians, and fishes.

The alignment is a combination of star alignment (everything is referenced to the human genome) and progressive alignment, produced by MultiZ through a complex data-handling pipeline.

Independent assessments of the alignments find less than 10% of the alignments to be “suspicious.”

Vertebrates are all very closely related, making alignment much easier, but their genomes are huge, so this pipeline is a major achievement.

Nantes, 6/8/09 – p. 4

slide-41
SLIDE 41

Issue in alignment of zebrafish to vertebrates

Part of the 28-vertebrate genome alignment at UCSC.

From Prakash and Tompa, Genome Biology 8:R124 (2007).

Nantes, 6/8/09 – p. 4

slide-42
SLIDE 42

ECR browser tracks

(very similar to the UCSC browser tracks)

Nantes, 6/8/09 – p. 4

slide-43
SLIDE 43

Assessing genomic alignments

Most genomic alignments published to date are of:

strains of the same species of bacteria (e.g., E. coli) minimally divergent species in the same genus (e.g., Yersinia spp.) larger taxonomic groups with a short history (e.g., vertebrates)

In all cases, the small divergence facilitates the work. Genomic alignments are tested by the community mostly by examining local sequence alignments (using the tracks).

not a high-throughput process! ignores rearrangement handling lacks supporting evolutionary scenarios

Nantes, 6/8/09 – p. 4

slide-44
SLIDE 44 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p. 4

slide-45
SLIDE 45

Ancestral genomes

Medians are used to reconstruct ancestral genomes, but the two are quite distinct. Medians:

store intermediate algorithmic results answer a very simple optimization criterion

Ancestral genomes:

biological constraints are presumably numerous, but we know very little about them, and so good probabilistic models are lacking reconstruction is severely underconstrained “optimal” solutions (i.e., medians) abound

Nantes, 6/8/09 – p. 4

slide-46
SLIDE 46

Negative results

For moderately diverged genomes, current biological constraints do not suffice.

50 100 150 200 250 2 4 6 8 10 Edit Distance from Reconstructed Label to True Label Internal Node Inversion Only Inversions and Insertions Inversions and Deletions All Ops - Low Insertions/Deletions All Ops - High Insertions/Deletions

12

  • proteobacteria from Earnest-DeYoung et al., WABI 2004

Nodes 3, 4, 5, and 7 are two levels above the leaves and show large errors under all mixes of operations.

Nantes, 6/8/09 – p. 4

slide-47
SLIDE 47

Positive results

Tracking gene clusters or operons across lineages. Ancestral genomes claimed for mammalian genomes at coarse resolution. Assembly approach instead of median: ancestral genome in a star tree built from many known syntenic blocks by selecting some and assembling them into a (possibly incomplete) genome. Signature approach by identifying rearrangements common to all shortest evolutionary paths.

Nantes, 6/8/09 – p. 4

slide-48
SLIDE 48 Comparative approaches The genome and its evolution High-throughput data and computation What do we want to know? Comparing two genomes Comparing multiple genomes Ancestral reconstruction Challenges

Nantes, 6/8/09 – p. 4

slide-49
SLIDE 49

Modelling challenges

Details of rearrangements: affected areas by position and length, forbidden areas (centromeres?), relative frequencies. Details of duplications: affected areas by position and length, relative frequencies. Interactions between duplications/losses and rearrangements. Constraints from gene clusters or operons. Combining gene-level and sequence-level models. Effects of the type of evolutionary selection.

Nantes, 6/8/09 – p. 4

slide-50
SLIDE 50

Algorithmic challenges

Automatic syntenic block identification. Scalable rearrangement handling. Sankoff’s problem. Orthology assignment using sequence- and gene-level models. Alignment using sequence- and gene-level models. Presentation of results.

Nantes, 6/8/09 – p. 5

slide-51
SLIDE 51

Assessment challenges

Automated assessment tools using independent methods. Data collection for such assessments. Resampling (in the style of “bootstrapping”) methods. Interactive presentation of results. High-throughput lab methods for verification.

Nantes, 6/8/09 – p. 5

slide-52
SLIDE 52

Thank you

Nantes, 6/8/09 – p. 5