genome duplication and gene annotation an example for a
play

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE - PowerPoint PPT Presentation

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES. Alessandra Vigilante, Mara Sangiovanni, Chiara Colantuono, Luigi Frusciante and Maria Luisa Chiusano Dept. of Soil, Plant, Environmental and Animal Production


  1. GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES. Alessandra Vigilante, Mara Sangiovanni, Chiara Colantuono, Luigi Frusciante and Maria Luisa Chiusano Dept. of Soil, Plant, Environmental and Animal Production Sciences CAB (Computer Aided Biosciences) group Web: http://cab.unina.it vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  2. BACKGROUND: Arabidopsis thaliana as a reference genome Arabidopsis thaliana WAS THE FIRST PLANT GENOME TO BE COMPLETELY SEQUENCED vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  3. BACKGROUND: Arabidopsis thaliana as a reference genome A REFERENCE GENOME SHOULD BE: FULLY RELIABLE SAFELY ANNOTATED WELL UNDERSTOOD IN TERMS OF EVOLUTIONARY HISTORY vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  4. BACKGROUND: Arabidopsis thaliana as a reference genome A REFERENCE GENOME SHOULD BE: FULLY RELIABLE SAFELY ANNOTATED WELL UNDERSTOOD IN TERMS OF EVOLUTIONARY HISTORY Arabidopsis thaliana GENOME IS: GENE DENSE COMPLEX BECAUSE HIGHLY DUPLICATED AND CLAIMED TO BE ARCHEOPOLYPLOID STILL NOT EXHAUSTIVELY ANNOTATED vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  5. BACKGROUND: Arabidopsis thaliana as a reference genome Whole genome duplication events Nature (2007) Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla . vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  6. BACKGROUND: Arabidopsis thaliana as a reference genome V. vinifera A. thaliana Whole genome duplication events Nature (2007) Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla . vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  7. STRATEGY: Unraveling Arabidopsis thaliana genome IT COULD BE USEFUL TO REVIEW THE GENOME IN TERMS OF RELATIONSHIPS BETWEEN DUPLICATED GENES/PARALOGS vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  8. STRATEGY: Unraveling Arabidopsis thaliana genome IT COULD BE USEFUL TO REVIEW THE GENOME IN TERMS OF RELATIONSHIPS BETWEEN DUPLICATED GENES/PARALOGS NETWORKS OF PARALOG GENES vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  9. Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  10. Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  11. Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  12. Gene duplication analysis: pipeline Gene duplication analysis: pipeline Arabidopsis thaliana proteome (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Network extraction Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  13. Gene duplication analysis: pipeline Gene duplication analysis: pipeline A. thaliana GENES AND PARALOGIES ARE REPRESENTED AS AN (UNDIRECTED) GRAPH G(V,E) WHERE: - V ={v 1 ,..v N } = genes - E ={e 1 ,..e M } = paralogies vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  14. Gene duplication analysis: pipeline Gene duplication analysis: pipeline A. thaliana GENES AND PARALOGIES ARE REPRESENTED AS AN (UNDIRECTED) GRAPH G(V,E) WHERE: - V ={v 1 ,..v N } = genes - E ={e 1 ,..e M } = paralogies vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  15. RESULTS: Gene duplication analysis Gene duplication analysis: pipeline Arabidopsis thaliana proteome 27169 (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  16. RESULTS: Gene duplication analysis Gene duplication analysis: pipeline Arabidopsis thaliana proteome 27169 (TAIR9 release) All-against-all BLASTp versus protein-coding genes E<10 -10 , Rost’s formula Singleton genes Duplicated genes 21843 5326 3017 Networks of duplicated genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  17. RESULTS: Gene duplication analysis Gene duplication analysis: pipeline 1400 1000 600 200 0 2 3-9 10-30 31-207 31-207 208-5168 Genes A network contains all and only the genes that share at least one paralogy relationship. Each gene belongs to one and only one network. vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  18. RESULTS: Networks applications Gene duplication analysis: pipeline Networks applications NETWORKS ARE A USEFUL TOOL TO DEEPLY INVESTIGATE RELATIONSHIPS BETWEEN SUBSETS OF GENES SHARING DUPLICATION RELATIONSHIPS NETWORKS CAN BE A USEFUL TOOL TO REFINE GENE ANNOTATION vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  19. RESULTS: Networks applications for the annotation of Gene duplication analysis: pipeline Networks applications unknown information The 19% of the proteome is still annotated as “unknown protein” NETWORKS CAN HELP IN REFINING GENE ANNOTATION vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  20. RESULTS: Networks applications for study of relationships Gene duplication analysis: pipeline Networks applications between gene familes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  21. RESULTS: Networks applications for study of relationships Gene duplication analysis: pipeline Networks applications between gene familes Networks can be a useful tool for highlighting evolutionary relationships between different gene families vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  22. RESULTS: Networks of duplicated genes Gene duplication analysis: pipeline Networks applications vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  23. RESULTS: Networks of duplicated genes Gene duplication analysis: pipeline Networks applications vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  24. RESULTS: Networks of duplicated genes Gene duplication analysis: pipeline Networks applications vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  25. Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  26. Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks Arabidopsis thaliana chromosomes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  27. Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks All protein-coding genes vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  28. Gene duplication analysis: pipeline RESULTS: 2-gene networks Networks applications 2-gene networks All genes involved in two-gene networks vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  29. RESULTS: 2-gene networks and singleton genes Gene duplication analysis: pipeline Networks applications 2-gene networks AT LEAST 5% OF THE ENTIRE PROTEOME HAS A SINGLE PARALOGY RELATIONSHIP AT LEAST 20% OF THE ENTIRE PROTEOME IS A SINGLETON ABOUT ONE QUARTER OF THE ENTIRE PROTEOME HAS ZERO OR AT MOST ONE PARALOGY RELATIONSHIP vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  30. RESULTS: Singleton genes analysis Gene duplication analysis: pipeline Networks applications 2-gene networks WHAT ABOUT SINGLETON GENES? vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  31. RESULTS: Singleton genes analysis Gene duplication analysis: pipeline Networks applications 2-gene networks Singleton genes Blastn mRNA not having protein-coding Singleton genes validated or sequences against paralogs not validated by ESTs ESTs 3588 vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend