GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE - - PowerPoint PPT Presentation

genome duplication and gene annotation an example for a
SMART_READER_LITE
LIVE PREVIEW

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE - - PowerPoint PPT Presentation

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES. Alessandra Vigilante, Mara Sangiovanni, Chiara Colantuono, Luigi Frusciante and Maria Luisa Chiusano Dept. of Soil, Plant, Environmental and Animal Production


slide-1
SLIDE 1

Alessandra Vigilante, Mara Sangiovanni, Chiara Colantuono, Luigi Frusciante and Maria Luisa Chiusano

  • Dept. of Soil, Plant, Environmental and Animal Production Sciences

GENOME DUPLICATION AND GENE ANNOTATION: AN EXAMPLE FOR A REFERENCE PLANT SPECIES.

CAB (Computer Aided Biosciences) group Web: http://cab.unina.it

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

slide-2
SLIDE 2

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

BACKGROUND: Arabidopsis thaliana as a reference genome Arabidopsis thaliana WAS THE FIRST PLANT GENOME TO BE COMPLETELY SEQUENCED

slide-3
SLIDE 3

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

A REFERENCE GENOME SHOULD BE: FULLY RELIABLE SAFELY ANNOTATED WELL UNDERSTOOD IN TERMS OF EVOLUTIONARY HISTORY BACKGROUND: Arabidopsis thaliana as a reference genome

slide-4
SLIDE 4

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

A REFERENCE GENOME SHOULD BE: FULLY RELIABLE SAFELY ANNOTATED WELL UNDERSTOOD IN TERMS OF EVOLUTIONARY HISTORY Arabidopsis thaliana GENOME IS: GENE DENSE COMPLEX BECAUSE HIGHLY DUPLICATED AND CLAIMED TO BE ARCHEOPOLYPLOID STILL NOT EXHAUSTIVELY ANNOTATED BACKGROUND: Arabidopsis thaliana as a reference genome

slide-5
SLIDE 5

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Nature (2007)

Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla.

Whole genome duplication events BACKGROUND: Arabidopsis thaliana as a reference genome

slide-6
SLIDE 6

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Nature (2007)

Jaillon O, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla.

  • A. thaliana
  • V. vinifera

Whole genome duplication events BACKGROUND: Arabidopsis thaliana as a reference genome

slide-7
SLIDE 7

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

IT COULD BE USEFUL TO REVIEW THE GENOME IN TERMS OF RELATIONSHIPS BETWEEN DUPLICATED GENES/PARALOGS STRATEGY: Unraveling Arabidopsis thaliana genome

slide-8
SLIDE 8

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

IT COULD BE USEFUL TO REVIEW THE GENOME IN TERMS OF RELATIONSHIPS BETWEEN DUPLICATED GENES/PARALOGS NETWORKS OF PARALOG GENES STRATEGY: Unraveling Arabidopsis thaliana genome

slide-9
SLIDE 9

Gene duplication analysis: pipeline

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Arabidopsis thaliana proteome (TAIR9 release) Network extraction Duplicated genes Singleton genes All-against-all BLASTp versus protein-coding genes

E<10-10, Rost’s formula

Networks of duplicated genes

Gene duplication analysis: pipeline

slide-10
SLIDE 10

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Arabidopsis thaliana proteome (TAIR9 release) Network extraction Duplicated genes Singleton genes All-against-all BLASTp versus protein-coding genes

E<10-10, Rost’s formula

Networks of duplicated genes

Gene duplication analysis: pipeline Gene duplication analysis: pipeline

slide-11
SLIDE 11

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Arabidopsis thaliana proteome (TAIR9 release) Network extraction Duplicated genes Singleton genes All-against-all BLASTp versus protein-coding genes

E<10-10, Rost’s formula

Networks of duplicated genes

Gene duplication analysis: pipeline Gene duplication analysis: pipeline

slide-12
SLIDE 12

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Arabidopsis thaliana proteome (TAIR9 release) Network extraction Duplicated genes Singleton genes All-against-all BLASTp versus protein-coding genes

E<10-10, Rost’s formula

Networks of duplicated genes

Gene duplication analysis: pipeline Gene duplication analysis: pipeline

slide-13
SLIDE 13

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  • A. thaliana GENES AND PARALOGIES ARE REPRESENTED AS AN (UNDIRECTED) GRAPH

G(V,E) WHERE:

  • V ={v1,..vN} = genes
  • E ={e1,..eM} = paralogies

Gene duplication analysis: pipeline Gene duplication analysis: pipeline

slide-14
SLIDE 14

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

  • A. thaliana GENES AND PARALOGIES ARE REPRESENTED AS AN (UNDIRECTED) GRAPH

G(V,E) WHERE:

  • V ={v1,..vN} = genes
  • E ={e1,..eM} = paralogies

Gene duplication analysis: pipeline Gene duplication analysis: pipeline

slide-15
SLIDE 15

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

27169

Arabidopsis thaliana proteome (TAIR9 release) Duplicated genes Singleton genes All-against-all BLASTp versus protein-coding genes

E<10-10, Rost’s formula

Networks of duplicated genes

Gene duplication analysis: pipeline RESULTS: Gene duplication analysis

slide-16
SLIDE 16

3017

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

21843 27169

Arabidopsis thaliana proteome (TAIR9 release) Duplicated genes Singleton genes All-against-all BLASTp versus protein-coding genes

E<10-10, Rost’s formula

Networks of duplicated genes

5326

Gene duplication analysis: pipeline RESULTS: Gene duplication analysis

slide-17
SLIDE 17

2 3-9 10-30 31-207 31-207 208-5168 200 600 Genes 1000 1400

A network contains all and only the genes that share at least one paralogy relationship. Each gene belongs to one and only

  • ne network.

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Gene duplication analysis: pipeline RESULTS: Gene duplication analysis

slide-18
SLIDE 18

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Networks applications NETWORKS CAN BE A USEFUL TOOL TO REFINE GENE ANNOTATION NETWORKS ARE A USEFUL TOOL TO DEEPLY INVESTIGATE RELATIONSHIPS BETWEEN SUBSETS OF GENES SHARING DUPLICATION RELATIONSHIPS Gene duplication analysis: pipeline RESULTS: Networks applications

slide-19
SLIDE 19

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

NETWORKS CAN HELP IN REFINING GENE ANNOTATION The 19% of the proteome is still annotated as “unknown protein” Networks applications Gene duplication analysis: pipeline RESULTS: Networks applications for the annotation of unknown information

slide-20
SLIDE 20

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Networks applications Gene duplication analysis: pipeline RESULTS: Networks applications for study of relationships between gene familes

slide-21
SLIDE 21

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Networks can be a useful tool for highlighting evolutionary relationships between different gene families Networks applications Gene duplication analysis: pipeline RESULTS: Networks applications for study of relationships between gene familes

slide-22
SLIDE 22

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Networks applications Gene duplication analysis: pipeline RESULTS: Networks of duplicated genes

slide-23
SLIDE 23

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Networks applications Gene duplication analysis: pipeline RESULTS: Networks of duplicated genes

slide-24
SLIDE 24

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Networks applications Gene duplication analysis: pipeline RESULTS: Networks of duplicated genes

slide-25
SLIDE 25

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: 2-gene networks

slide-26
SLIDE 26

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: 2-gene networks

Arabidopsis thaliana chromosomes

slide-27
SLIDE 27

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: 2-gene networks

All protein-coding genes

slide-28
SLIDE 28

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: 2-gene networks

All genes involved in two-gene networks

slide-29
SLIDE 29

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

AT LEAST 5% OF THE ENTIRE PROTEOME HAS A SINGLE PARALOGY RELATIONSHIP AT LEAST 20% OF THE ENTIRE PROTEOME IS A SINGLETON ABOUT ONE QUARTER OF THE ENTIRE PROTEOME HAS ZERO OR AT MOST ONE PARALOGY RELATIONSHIP 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: 2-gene networks and singleton genes

slide-30
SLIDE 30

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

WHAT ABOUT SINGLETON GENES?

slide-31
SLIDE 31

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Singleton genes validated or not validated by ESTs

Blastn mRNA sequences against ESTs

Singleton genes not having protein-coding paralogs

3588 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

slide-32
SLIDE 32

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Singleton’s orthologs in different species (O.sativa, V.vinifera, S.bicolor, P.trichocarpa)

Comparative analysis (Biomart Ensembl)

Singleton genes not having protein-coding paralogs

3588 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

slide-33
SLIDE 33

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis 3588 singleton genes

slide-34
SLIDE 34

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis 3588 singleton genes 1072 Not confirmed by ESTs 2516 confirmed by ESTs

slide-35
SLIDE 35

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis 3588 singleton genes 1072 Not confirmed by ESTs 2516 confirmed by ESTs 1072 Present only in

  • A. thaliana
slide-36
SLIDE 36

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis 3588 singleton genes 1072 Not confirmed by ESTs 2516 confirmed by ESTs 481 Present only in

  • A. thaliana

1072 Present only in

  • A. thaliana
slide-37
SLIDE 37

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 2516 confirmed by ESTs 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

slide-38
SLIDE 38

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 481 out of 2516 present

  • nly in A.thaliana

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

slide-39
SLIDE 39

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

SINGLETON GENES ANALYSIS RESULTS 1110 Not confirmed by ESTs and present

  • nly in A.thaliana

2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

slide-40
SLIDE 40

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Singleton genes not having protein-coding/ non protein-coding paralogs and any significant hit in the genome

250

BLASTn vs: Pseudogenes Transposons Other RNA Intergenic regions

Singleton genes not having protein-coding paralogs

3588 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis

slide-41
SLIDE 41

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

Searching for ORF annotation errors

Blastx protein sequences against transcript

Singleton genes not having protein-coding paralogs

3588 2-gene networks Networks applications Gene duplication analysis: pipeline RESULTS: Singleton genes analysis Gene family: metallothionein binds to and detoxifies excess copper and

  • ther

metals, limiting

  • xidative

damage. copper ion binding / metal ion binding MT1A (METALLOTHIONEIN 1A) MT1B (METALLOTHIONEIN 1B)

slide-42
SLIDE 42

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

CONCLUSIONS

This work was helpful to highlight several crucial issues related to gene annotation Networks of paralogs

  • Are useful for the investigation of the gene annotation of reference and novel

genomes

  • Provide a useful tool for deeper investigations of gene families
  • Reveal intriguing issues on the genome organization of Arabidopsis thaliana
slide-43
SLIDE 43
  • Prof. Luigi Frusciante

FUNDING & ACKNOWLEDGEMENTS Alessandra Vigilante BIOLOGIST Maria Luisa Chiusano PRINCIPAL INVESTIGATOR Mara Sangiovanni COMPUTER SCIENTIST Mario Aversano TECHNICAL ASSISTANT Alessandra Traini PHYSICIST Nunzio D’Agostino

BIOLOGIST

Miriam Di Filippo BIOLOGIST Chiara Colantuono STUDENT

vigilantealessandra@gmail.com Web: http://cab.unina.it Contact: chiusano@unina.it

slide-44
SLIDE 44

Reggia di Portici, XVIII century

THANK YOU FOR THE ATTENTION!