GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene - - PowerPoint PPT Presentation

go2pub pubmed query tool based on semantic expansion of
SMART_READER_LITE
LIVE PREVIEW

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene - - PowerPoint PPT Presentation

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene Ontology Terms, a Lipid Metabolism Case Study Charles Bettembourg, Christian Diot, Anita Burgun and Olivier Dameron INRA UMR598 - INSERM U936 03/07/2012 Bettembourg, Diot,


slide-1
SLIDE 1

GO2PUB PubMed Query Tool Based on Semantic Expansion of Gene Ontology Terms, a Lipid Metabolism Case Study

Charles Bettembourg, Christian Diot, Anita Burgun and Olivier Dameron

INRA UMR598 - INSERM U936

03/07/2012

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 1 / 39

slide-2
SLIDE 2

Introduction Context: Literature search

Context: Literature search

Pubmed : more than 20 million citations Fast continuous growth

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

slide-3
SLIDE 3

Introduction Context: Literature search

Context: Literature search

Pubmed : more than 20 million citations Fast continuous growth

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

slide-4
SLIDE 4

Introduction Context: Literature search

Context: Literature search

Pubmed : more than 20 million citations Fast continuous growth Numerous queries to build... . . . and relevant results to select. Need for automatic search tools

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 2 / 39

slide-5
SLIDE 5

Introduction Requirements

Requirements

Precise and complex queries

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

slide-6
SLIDE 6

Introduction Requirements

Requirements

Precise and complex queries Low silence and noise

Available articles

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

slide-7
SLIDE 7

Introduction Requirements

Requirements

Precise and complex queries Low silence and noise

Available articles Relevant article Non relevant article

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

slide-8
SLIDE 8

Introduction Requirements

Requirements

Precise and complex queries Low silence and noise

Available articles Search results Relevant article Non relevant article Silence Noise

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 3 / 39

slide-9
SLIDE 9

Introduction Requirements

Relevance measures: Precision

PubMed Search results Relevant article Non relevant article Silence Noise

Precision = Relevant retrieved documents All retrieved documents = +

Precision

Precision is the ratio between relevant retrieved documents and all the results obtained by the search tool for a query.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 4 / 39

slide-10
SLIDE 10

Introduction Requirements

Relevance measures: Recall

Recall = Relevant retrieved documents All relevant documents =

PubMed Search results Relevant article Non relevant article Silence Noise

+

Recall

Precision is the ratio between relevant retrieved documents and all the relevants documents available in the database.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 5 / 39

slide-11
SLIDE 11

Introduction Requirements

Relevance measures: F-Score

F-Score

Measure combining precision and recall F

β

(1 + β²) . (Precision . Recall) (β² . Precision + Recall) = F

1

2 . (Precision . Recall) (Precision + Recall) =

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 6 / 39

slide-12
SLIDE 12

Introduction Problems

Domain specific vocabulary

Application: literature search for species-specific metabolisms

◮ ex: lipid metabolism for chicken

Several methods and tools

◮ Interface to set filters on PubMed queries ◮ Text-mining approaches ⋆ Natural Language Process ⋆ Latent Semantic Analysis

BUT: Need a corpus of specific vocabulary

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 7 / 39

slide-13
SLIDE 13

Introduction Problems

Complex querying process

Writing exhaustive and complex queries relies on domain-specific knowledge

◮ ex: lipid metabolism

Need a lot of keywords for a complex query

◮ Contradicts user-friendly requirement

Automatic query enrichment using ontologies reconciles both requirements

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 8 / 39

slide-14
SLIDE 14

Introduction Problems

Complex querying process

Writing exhaustive and complex queries relies on domain-specific knowledge

◮ ex: lipid metabolism

Need a lot of keywords for a complex query

◮ Contradicts user-friendly requirement

Automatic query enrichment using ontologies reconciles both requirements

Ontology definition (Bard, 2004)

An ontology is a formal way of representing knowledge in which concepts are described both by their meaning and their relationship to each other.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 8 / 39

slide-15
SLIDE 15

Semantic expansion and query enrichment Gene Ontology

Gene Ontology

Controlled vocabulary Hierarchy with inheritance

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 9 / 39

slide-16
SLIDE 16

Semantic expansion and query enrichment Gene Ontology

Gene Ontology

Controlled vocabulary Hierarchy with inheritance More than 34.000 terms to describe:

◮ Biological Processes ◮ Molecular Functions ◮ Cellular Components — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 9 / 39

slide-17
SLIDE 17

Semantic expansion and query enrichment Gene Ontology Annotations

Gene Ontology Annotations

Functional annotation of genes Multi-species One GO term may annote many genes Each gene can be used as PubMed keyword

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 10 / 39

slide-18
SLIDE 18

Semantic expansion and query enrichment Gene Ontology Annotations

Gene Ontology Annotations

Functional annotation of genes Multi-species One GO term may annote many genes Each gene can be used as PubMed keyword

Main idea

The genes annotated by a GO term of interest or one of its descendants can be used as keywords in a PubMed query.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 10 / 39

slide-19
SLIDE 19

Semantic expansion and query enrichment Expansion illustration

Expansion illustration

Regulation of fatty acid metabolic process (GO:0019217) Regulation of fatty acid biosynthetic process (GO:0042304) Negative regulation of fatty acid biosynthetic process (GO:0045717) Positive regulation of fatty acid biosynthetic process (GO:0045723) PPAR CAV1 BRCA1 ChREBP APOA1

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 11 / 39

slide-20
SLIDE 20

Semantic expansion and query enrichment Expansion illustration

Expansion illustration

Regulation of fatty acid metabolic process (GO:0019217) Regulation of fatty acid biosynthetic process (GO:0042304) Negative regulation of fatty acid biosynthetic process (GO:0045717) Positive regulation of fatty acid biosynthetic process (GO:0045723) PPAR CAV1 BRCA1 ChREBP APOA1

Extension to the descendants is important Not supported by other tools

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 11 / 39

slide-21
SLIDE 21

Semantic expansion and query enrichment Example

Example

57 symbols, names and synonyms = keywords 14 genes GO:0019217 Regulation of fatty acid metabolic process

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 12 / 39

slide-22
SLIDE 22

Semantic expansion and query enrichment Example

Semantic expansion is useful

GO:0019217 (Regulation of fatty acid metabolic process), Chicken

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

slide-23
SLIDE 23

Semantic expansion and query enrichment Example

Semantic expansion is useful

GO:0019217 (Regulation of fatty acid metabolic process), Chicken Without query expansion:

◮ 2 articles concerning only 1 gene — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

slide-24
SLIDE 24

Semantic expansion and query enrichment Example

Semantic expansion is useful

GO:0019217 (Regulation of fatty acid metabolic process), Chicken Without query expansion:

◮ 2 articles concerning only 1 gene

With query expansion:

◮ 9 articles concerning 7 genes — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 13 / 39

slide-25
SLIDE 25

GO2PUB website http://go2pub.genouest.org — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 14 / 39

slide-26
SLIDE 26

GO2PUB website Fill the form

Please enter a GO term

lipi

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 15 / 39

slide-27
SLIDE 27

GO2PUB website Query example

Query example

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 16 / 39

slide-28
SLIDE 28

Relevance of GO2PUB Method

Relevance analysis

Comparison of GO2PUB with PubMed query system and with GoPubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

slide-29
SLIDE 29

Relevance of GO2PUB Method

Relevance analysis

Comparison of GO2PUB with PubMed query system and with GoPubMed Qualitative analysis

◮ Selection of relevant results of 3 very specific queries sent to GO2PUB,

PubMed and GoPubMed

◮ Calculation of Precision, Recall and F-score for each tool. — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

slide-30
SLIDE 30

Relevance of GO2PUB Method

Relevance analysis

Comparison of GO2PUB with PubMed query system and with GoPubMed Qualitative analysis

◮ Selection of relevant results of 3 very specific queries sent to GO2PUB,

PubMed and GoPubMed

◮ Calculation of Precision, Recall and F-score for each tool.

Generalization study

◮ Comparison of the results obtained with 20 random GO terms ◮ Selection of relevant results and computation of Precision, Recall and

F-score for GO2PUB and GoPubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 17 / 39

slide-31
SLIDE 31

Relevance of GO2PUB Qualitative analysis

Qualitative analysis

3 very specific queries about: Lipid biosynthesis in chicken liver Lipid transport in human blood Regulation of lipase activity in human cell membrane

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 18 / 39

slide-32
SLIDE 32

Relevance of GO2PUB Qualitative analysis

Relevance criteria

Article selection

◮ Blind selection : PubMed, GoPubMed and GO2PUB results mixed ◮ 2 reviewers: a biologist and a bioinformatician — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 19 / 39

slide-33
SLIDE 33

Relevance of GO2PUB Qualitative analysis

Relevance criteria

Article selection

◮ Blind selection : PubMed, GoPubMed and GO2PUB results mixed ◮ 2 reviewers: a biologist and a bioinformatician

A relevant article has to:

◮ describe at least one gene product ⋆ role, interactions, activation condition ◮ focus on chosen metabolism and species ◮ not exclusively focus on the effect of a supplementation in a substance — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 19 / 39

slide-34
SLIDE 34

Relevance of GO2PUB Qualitative analysis

Query 1: Lipid biosynthesis in chicken liver

GO term: GO:0008610 = Lipid biosynthetic process Species: Gallus gallus = Chickens Major topic: Liver MeSH term: Lipid metabolism Equivalent query on GoPubMed Similar query on PubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 20 / 39

slide-35
SLIDE 35

Relevance of GO2PUB Qualitative analysis

Query 1 results

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 21 / 39

slide-36
SLIDE 36

Relevance of GO2PUB Qualitative analysis

Query 1 results selection

2 1 5 4 2 2 10 10 13 1 6 3 2

Lipogenesis in chicken liver

PubMed GoPubMed (exp) GO2PUB

(a) Repartition of all results for Q1 (b) Repartition of the results considered as relevant among (a)

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 22 / 39

slide-37
SLIDE 37

Relevance of GO2PUB Qualitative analysis

Relevance of Q1 results

GO2PUB GoPM (std) GoPM (exp) PubMed (a) Number of results

24 16 16 19

(b) Relevant among (a)

13 5 5 8

Precision

0.542 0.313 0.313 0.421

Relative Recall

0.813 0.313 0.313 0.5

F-score

0.650 0.313 0.313 0.457

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 23 / 39

slide-38
SLIDE 38

Relevance of GO2PUB Qualitative analysis

Query 2: Lipid transport in human blood

GO term: GO:0006869 = Lipid Transport Species: Homo sapiens Major topic: Blood MeSH term: Lipid metabolism Equivalent query on GoPubMed Similar query on PubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 24 / 39

slide-39
SLIDE 39

Relevance of GO2PUB Qualitative analysis

Query 2 results selection

Lipid transport in human blood

GoPubMed (exp) GO2PUB

(a) Repartition of all results for Q2

5 12 4 3 7 3

(b) Repartition of the results considered as relevant among (a)

45 2

PubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 25 / 39

slide-40
SLIDE 40

Relevance of GO2PUB Qualitative analysis

Relevance of Q2 results

GO2PUB GoPM (std) GoPM (exp) PubMed (a) Number of results

16 9 9 45

(b) Relevant among (a)

10 6 6 2

Precision

0.625 0.667 0.667 0.044

Relative Recall

0.769 0.462 0.462 0.133

F-score

0.690 0.546 0.546 0.067

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 26 / 39

slide-41
SLIDE 41

Relevance of GO2PUB Qualitative analysis

Query 3: Regulation of lipase activity in human cell membrane

GO term: GO:0060191 = Regulation of lipase activity Species: Homo sapiens MeSH terms: Cell Membrane, Lipid metabolism Equivalent query on GoPubMed Similar query on PubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 27 / 39

slide-42
SLIDE 42

Relevance of GO2PUB Qualitative analysis

Query 3 results selection

Regulation of lipase activity in human cell membrane

GoPubMed (exp) GO2PUB

(a) Repartition of all results for Q3 (b) Repartition of the results considered as relevant among (a)

3 19 5 3 3 3 2 23

PubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 28 / 39

slide-43
SLIDE 43

Relevance of GO2PUB Qualitative analysis

Relevance of Q3 results

GO2PUB GoPM (std) GoPM (exp) PubMed (a) Number of results

24 6 8 23

(b) Relevant among (a)

6 5 6 2

Precision

0.25 0.833 0.75 0.087

Relative Recall

0.667 0.556 0.667 0.182

F-score

0.364 0.667 0.706 0.118

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 29 / 39

slide-44
SLIDE 44

Relevance of GO2PUB Generalization study

Generalization study

Comparison of results obtained by GO2PUB and GoPubMed using 20 random GO terms

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 30 / 39

slide-45
SLIDE 45

Relevance of GO2PUB Generalization study

Generalization study

Comparison of results obtained by GO2PUB and GoPubMed using 20 random GO terms Query pattern: “a random GO term + a species (mouse) + a publication date limit (2011) + a keyword (the GO term name)” GO terms randomly selected among all Biological Process terms having a granularity similar to those of the 3 GO terms used in the qualitative study

◮ Granularity of a term depends on the mean length of its path to the

root, and its number of descendents

◮ GO terms of generalization study: mean path length to the root

between 3.5 and 5.25 edges and number of descendents between 35 and 244

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 30 / 39

slide-46
SLIDE 46

Relevance of GO2PUB Generalization study

Repartition of the results

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 31 / 39

slide-47
SLIDE 47

Relevance of GO2PUB Generalization study

Comparison with Q1, Q2 and Q3

Qualitative study

◮ GO2PUB: 21.33 articles on average = 77 % of the total ◮ GoPubMed: 11.0 articles on average = 40 % of the total ◮ Common set: 4.67 articles on average = 17 % of the total

Generalization study

◮ GO2PUB: 46 articles on average = 70 % of the total ◮ GoPubMed: 25.1 articles on average = 38 % of the total ◮ Common set: 5.75 articles on average = 9 % of the total

Similar profiles between the 2 studies

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 32 / 39

slide-48
SLIDE 48

Relevance of GO2PUB Generalization study

Relevance analysis for 7 queries of the generalization study

3 lipid-related queries GO:0044242 GO:0008299 GO:0008654

Tool

GPM G2P GPM G2P GPM G2P

(a) Number of results

4 26 9 16 25 27

(b) Relevant among (a)

3 20 1 2 5 11

(c) Total relevant

22 3 12

(d) Common results

1 2 5

(e) Relevant among (d)

1 4

Precision

0.750 0.769 0.111 0.125 0.200 0.407

Relative Recall

0.136 0.864 0.333 0.667 0.417 0.917

F-score

0.231 0.814 0.167 0.211 0.270 0.564 GPM = GoPubMed G2P = GO2PUB

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 33 / 39

slide-49
SLIDE 49

Relevance of GO2PUB Generalization study

Relevance analysis for 7 queries of the generalization study

4 queries about other topics

GO:0050658 GO:0033013 GO:0006805 GO:0048284

Tool

GPM G2P GPM G2P GPM G2P GPM G2P

(a) Number of results

25 10 15 19 30 23 24 17

(b) Relevant among (a)

7 2 3 3 17 14 10 9

(c) Total relevant

9 6 26 16

(d) Common results

3 6 7 8

(e) Relevant among (d)

1 2 4 4

Precision

0.280 0.200 0.200 0.158 0.567 0.609 0.417 0.529

Relative Recall

0.875 0.250 0.600 0.600 0.680 0.560 0.625 0.563

F-score

0.424 0.222 0.300 0.250 0.618 0.583 0.500 0.545

GPM = GoPubMed G2P = GO2PUB

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 34 / 39

slide-50
SLIDE 50

Discussion Relevance assessment

Relevance assessment

Both GO2PUB and GoPubMed are useful for retrieving relevant articles ignored by PubMed

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 35 / 39

slide-51
SLIDE 51

Discussion Relevance assessment

Relevance assessment

Both GO2PUB and GoPubMed are useful for retrieving relevant articles ignored by PubMed Better performances of GO2PUB for 2 of 3 qualitative queries... ...but the differences varied among the queries

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 35 / 39

slide-52
SLIDE 52

Discussion Relevance assessment

Relevance assessment

Both GO2PUB and GoPubMed are useful for retrieving relevant articles ignored by PubMed Better performances of GO2PUB for 2 of 3 qualitative queries... ...but the differences varied among the queries

◮ GO2PUB performances decrease from Q1 to Q3 ◮ The number of descendants of the choosen GO term decreases from

Q1 to Q3

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 35 / 39

slide-53
SLIDE 53

Discussion Relevance assessment

Relevance assessment

Both GO2PUB and GoPubMed are useful for retrieving relevant articles ignored by PubMed Better performances of GO2PUB for 2 of 3 qualitative queries... ...but the differences varied among the queries

◮ GO2PUB performances decrease from Q1 to Q3 ◮ The number of descendants of the choosen GO term decreases from

Q1 to Q3

⋆ The more descendants a GO term has, the more genes it is likely to

annotate

⋆ The more a GO term annotates genes, the more the gene-related

enrichment is important.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 35 / 39

slide-54
SLIDE 54

Discussion Relevance assessment

Relevance assessment

Both GO2PUB and GoPubMed are useful for retrieving relevant articles ignored by PubMed Better performances of GO2PUB for 2 of 3 qualitative queries... ...but the differences varied among the queries

◮ GO2PUB performances decrease from Q1 to Q3 ◮ The number of descendants of the choosen GO term decreases from

Q1 to Q3

⋆ The more descendants a GO term has, the more genes it is likely to

annotate

⋆ The more a GO term annotates genes, the more the gene-related

enrichment is important.

Results of qualitative study can be generalized

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 35 / 39

slide-55
SLIDE 55

Discussion Relevance assessment

Importance of query expansion

Query expansion is automatic with GO2PUB GO2PUB retrieves a lot less results without query expansion

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 36 / 39

slide-56
SLIDE 56

Discussion Relevance assessment

Importance of query expansion

Query expansion is automatic with GO2PUB GO2PUB retrieves a lot less results without query expansion Queries are not automatically expanded by GoPubMed We manually expanded GoPubMed queries and compared it to GO2PUB

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 36 / 39

slide-57
SLIDE 57

Discussion Relevance assessment

Importance of query expansion

Query expansion is automatic with GO2PUB GO2PUB retrieves a lot less results without query expansion Queries are not automatically expanded by GoPubMed We manually expanded GoPubMed queries and compared it to GO2PUB

◮ The added value of semantic expansion was null for Q1 and Q2 and

important for Q3 (+33%)

◮ So query expansion is essential for GO2PUB, and would be a valuable

extension for GoPubMed.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 36 / 39

slide-58
SLIDE 58

Discussion Relevance assessment

Obtain minimum silence and noise

Most of the results obtained by both tools are relevant

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 37 / 39

slide-59
SLIDE 59

Discussion Relevance assessment

Obtain minimum silence and noise

Most of the results obtained by both tools are relevant

◮ The intersection of GoPubMed and GO2PUB results decreases noise — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 37 / 39

slide-60
SLIDE 60

Discussion Relevance assessment

Obtain minimum silence and noise

Most of the results obtained by both tools are relevant

◮ The intersection of GoPubMed and GO2PUB results decreases noise

Each tool yields relevant articles that are ignored by the other

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 37 / 39

slide-61
SLIDE 61

Discussion Relevance assessment

Obtain minimum silence and noise

Most of the results obtained by both tools are relevant

◮ The intersection of GoPubMed and GO2PUB results decreases noise

Each tool yields relevant articles that are ignored by the other

◮ The union of their results decreases silence — Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 37 / 39

slide-62
SLIDE 62

Conclusion

Conclusion

GO2PUB brings relevant results ignored by GoPubMed even doing a manual query expansion.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 38 / 39

slide-63
SLIDE 63

Conclusion

Conclusion

GO2PUB brings relevant results ignored by GoPubMed even doing a manual query expansion. Conversely GoPubMed text-mining approach finds relevant articles ignored by GO2PUB.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 38 / 39

slide-64
SLIDE 64

Conclusion

Conclusion

GO2PUB brings relevant results ignored by GoPubMed even doing a manual query expansion. Conversely GoPubMed text-mining approach finds relevant articles ignored by GO2PUB. Therefore, these two very different strategies of information retrieval complement each other.

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 38 / 39

slide-65
SLIDE 65

Conclusion

Thank you for your attention

Thanks to BioGenOuest for hosting and support Thanks to my supervisors: Olivier Dameron Christian Diot

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39

slide-66
SLIDE 66

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39

slide-67
SLIDE 67

Appendices Query building

GO terms of analysed generalization queries

GO:0044242 = cellular lipid catabolic process GO:0008299 = isoprenoid biosynthetic process GO:0008654 = phospholipid biosynthetic process GO:0050658 = RNA transport GO:0033013 = tetrapyrrole metabolic process GO:0006805 = xenobiotic metabolic process GO:0048284 = organelle fusion

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39

slide-68
SLIDE 68

Appendices Query building

Query building

GO2PUB query pattern

All Symbol[TIAB] OR Name[TIAB] OR Synonyms[TIAB] of genes of asked taxon annotated by asked GO term AND (“asked taxon”[MH]) AND (“MeSH term”[MAJR]) AND (“year 1”[PDat] : “year 2”[PDat])

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39

slide-69
SLIDE 69

Appendices Query building

Query building

GO2PUB query pattern

All Symbol[TIAB] OR Name[TIAB] OR Synonyms[TIAB] of genes of asked taxon annotated by asked GO term AND (“asked taxon”[MH]) AND (“MeSH term”[MAJR]) AND (“year 1”[PDat] : “year 2”[PDat]) Example: All Symbol[TIAB] OR Name[TIAB] OR Synonyms[TIAB] of genes of Chickens annotated by GO:0019217 AND (“Chickens”[MH]) AND (“Gene Expression Regulation”[MAJR]) AND (“2005”[PDat] : “2010”[PDat])

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39

slide-70
SLIDE 70

Appendices Sent query

Sent query

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39

slide-71
SLIDE 71

Appendices Inter-rater agreement on Q1, Q2 and Q3

Inter-rater agreement on Q1, Q2 and Q3

Cohen’s kappa coefficient was 0.83 => almost perfect agreement

— Bettembourg, Diot, Burgun, Dameron — (INRA/INSERM)GO2PUB 03/07/2012 39 / 39