Comparative Genomics: Computational Challenges
Bernard M.E. Moret
Laboratory for Computational Biology and Bioinformatics
EPFL
Nantes, 6/8/09 – p.
Comparative Genomics: Computational Challenges Bernard M.E. Moret - - PowerPoint PPT Presentation
Comparative Genomics: Computational Challenges Bernard M.E. Moret Laboratory for Computational Biology and Bioinformatics EPFL Nantes, 6/8/09 p. Overview Comparative approaches The genome and its evolution High-throughput data
Laboratory for Computational Biology and Bioinformatics
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
(originally used in genetics to denote colocation on the same chromosome)
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p.
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
from Ogata et al., Science 293(5537):2093–2098 (2001)
600000 605000 610000 615000 620000 0616 0617 trxB1 0619 0620 gabD 0622 lgtD uvrD 0625 0626 0627 tdcB lon 0630 0631 0632 tRNA−Asp 0633 yhbH 0635 folD 545000 550000 555000 560000 443 444 trxB1 lgtD uvrD 448 tdcB lon sca3 625000 630000 635000 640000 645000 650000 trxB2 0638 0639 0640 0641 0642 0643 0644 0645 0646 nrdA 0648 0649 0650 nrdB 0652 0653 0654 0655 miaA exoC abcT2 0659 0660 kpsF 0662 pnp rpsO truB tlc4 sca4 615000 620000 625000 630000 635000 640000 rpsA clpP tRNA−Ala tRNA−Asp yhbH 516 folD trxB2 nrdA nrdB 511 miaA exoC abcT2 507 506 kpsF pnp rpsO 502 truB tlc4 499 sca4 650000 655000 sca4 0668 0669 0670 0671 0672 0673 def2 0675 0676 0677 addA 610000 615000 499 sca4 497 496 495 494 addA ppdK 660000 665000 670000 675000 680000 685000 690000 695000 700000 0679 tRNA−Ser glmU 0681 rpmJ 0683 0684 0685 0686 0687 0688 0689 0690 0691 bioC pdhD 0694 0695 rnd 0697 0698 0699 0700 0701 panF 0703 0704 mccF2 hemC 0707 0708 0709 tRNA−Ser trpS plsC 0712 0713 0714 0715 0716 0717 ampG1 0719 0720 0721 tlc3 0723 ispB tRNA−Arg tRNA−Gln 0725 potE 0727 hesB2 nifU 565000 570000 575000 580000 585000 590000 595000 600000 452 tRNA−Ser glmU 455 rpmJ 457 458 459 pdhD 461 rnd 463 464 phoR hemC tRNA−Ser trpS plsC 470 471 472 473 474 ampG1 rfaJ tlc3 478 ispB tRNA−Arg tRNA−Gln 482 potE hesB2 nifU 700000 705000 0727 hesB2 nifU spl1 spl1 0732 0733 0734 0735 0736 0737 0738 0739 0740 0741 600000 605000 nifU spl1 spl1 488 489 490 ppdK 710000 715000 720000 725000 730000 735000 0742 0743 0744 0745 tRNA−Ala clpP rpsA cmk 0749 tRNA−Phe 0750 0751 0752 0753 0754 0755 0756 himD sppA rho 0760 recJ prfA 0763 pdhC infC 0766 0767 0768 0769 0770 0771 0772 birA 0774 0775 0776 0777 sodB folC 645000 650000 655000 660000 tRNA−Asp tRNA−Ala clpP rpsA cmk tRNA−Phe 524 sppA rho 527 recJ prfA pdhC infC 532 birA 534 sodB folC 740000 745000 750000 bioY 0781 0782 rnpB ppdK 0784 0785 0786 0787 0788 0789 0790 0791 0792 0793 0794 0795 nuoN1 0797 605000 610000 615000 spl1 488 489 490 ppdK addA 494 495 496 497 sca4 750000 755000 760000 765000 770000 775000 780000 785000 790000 795000 800000 nuoN1 0797 hemB 0799 priA ubiX 0802 dnaB 0804 0805 0806 rluB 0808 0809 radA 0811 0812 0813 0814 0815 infB nusA 0818 0819 0820 0821 tlyA tyrS 0824 0825 0826 tRNA−Arg 0827 0828 0829 0830 0831 0832 0833 0834 0835 0836 0837 0838 0839 0840 0841 0842 0843 0844 0845 0846 0847 ubiH ntrX 0850 665000 670000 675000 680000 685000 690000 695000 700000 705000 nuoN1 538 hemB priA ubiX dnaB 543 544 545 radA 547 548 549 550 551 infB nusA 554 tlyA tyrS tRNA−Arg 558 559 fadB 561 ntrX 563Nantes, 6/8/09 – p. 1
Nantes, 6/8/09 – p. 1
From Soanes et al., The Plant Cell 19:3318–3326 (2007)
Nantes, 6/8/09 – p. 2
Nantes, 6/8/09 – p. 2
Nantes, 6/8/09 – p. 2
Note: large color blocks may be composed of many syntenic blocks.
Nantes, 6/8/09 – p. 2
From Rong et al., Genome Research 15:1198–1210 (2005). Solid and dotted vertical colored lines denote syntenic blocks found by FISH and CS7.
Nantes, 6/8/09 – p. 2
Nantes, 6/8/09 – p. 2
Nantes, 6/8/09 – p. 2
G2=(1 2 −5 −4 −3 6 7 8) G1=(1 2 3 4 5 6 7 8)
breakpoints (arrows) are missing adjacencies
1 2 3 7 4 6 5 8 7 8 5 6 1 4 3 2 7 8 5 6 1 −4 −3 −2 1 7 6 5 8 −4 −3 −2 Inversion Inverted Transposition Transposition
With one cut on each of two chromosomes, translocation and fusion can occur. With two cuts on the same chromosome, inversion and fission can occur. Two successive DCJs, one with fission, one with fusion, cause a block exchange.
Nantes, 6/8/09 – p. 2
4− 3+ R L 2− 4+ 3− 1− 1+
Nantes, 6/8/09 – p. 2
Nantes, 6/8/09 – p. 2
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Given
n sequences to align, find a binary tree on n leaves, an assignment of the nsequences to the
n leaves, and n-1 sequences labelling the internal nodes of the tree(“ancestral” sequences), that together optimize the sum, taken over all edges of the tree, of the pairwise alignment scores of the sequences associated with the two endpoints of each edge.
Nantes, 6/8/09 – p. 3
ancestral reconstruction pairwise alignment
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 3
Nantes, 6/8/09 – p. 4
From Prakash and Tompa, Genome Biology 8:R124 (2007).
Nantes, 6/8/09 – p. 4
(very similar to the UCSC browser tracks)
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 4
50 100 150 200 250 2 4 6 8 10 Edit Distance from Reconstructed Label to True Label Internal Node Inversion Only Inversions and Insertions Inversions and Deletions All Ops - Low Insertions/Deletions All Ops - High Insertions/Deletions
12
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 4
Nantes, 6/8/09 – p. 5
Nantes, 6/8/09 – p. 5
Nantes, 6/8/09 – p. 5