Toward Community-Driven, Shared Literature Annotation Resources - - PowerPoint PPT Presentation

toward community driven shared literature annotation
SMART_READER_LITE
LIVE PREVIEW

Toward Community-Driven, Shared Literature Annotation Resources - - PowerPoint PPT Presentation

Toward Community-Driven, Shared Literature Annotation Resources Jin-Dong Kim Database Center for Life Science (DBCLS) L i c e n s e d u n d e r a C r e a t i v e C o mmo n s 2 . 1 l i c e n s


slide-1
SLIDE 1

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Toward Community-Driven, Shared Literature Annotation Resources

Jin-Dong Kim Database Center for Life Science (DBCLS)

slide-2
SLIDE 2

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

DBCLS

  • Database Center for Life Science

✔ A government-funded research center ✔ For integration of databases of life

sciences

✔ It annually organizes

➔ BioHackathon series ➔ BLAH series

slide-3
SLIDE 3

DBCLS

NII NIG DBCLS ... NBDC ROIS JST DDBJ Research Organization Funding Agency Genetics Informatics Service R&D Database

slide-4
SLIDE 4

Introduction to DBCLS

NII NIG DBCLS ... NBDC ROIS JST DDBJ Research Organization Funding Agency Genetics Informatics Service R&D Database PDBj

slide-5
SLIDE 5

Introduction to DBCLS

NII NIG DBCLS ... NBDC ROIS JST DDBJ Research Organization Funding Agency Genetics Informatics Service R&D Database PDBj

slide-6
SLIDE 6

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

“Inheritance involves the passing of discrete units

  • f inheritance”

Proceedings of the Natural History Society of Brünn, 1866 Gregor Mendel

slide-7
SLIDE 7

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

“Inheritance involves the passing of discrete units

  • f inheritance”

Proceedings of the Natural History Society of Brünn, 1866 Gregor Mendel

slide-8
SLIDE 8

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Heredity

genome

slide-9
SLIDE 9

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東 Marshall McLuhan

“all media are extensions of

  • ur human senses, bodies

and minds.” The Medium Is the Massage, 1967

slide-10
SLIDE 10

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Extension of Heredity

genome literature

slide-11
SLIDE 11

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

slide-12
SLIDE 12

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Cell

slide-13
SLIDE 13

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Society

transcribed transcribed translated

slide-14
SLIDE 14

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Society

transcribed transcribed translated

slide-15
SLIDE 15

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

  • Scholarly articles, Textbooks, …
  • Authoritative information
  • Accumulation of scientifjc knowledge
  • Basis for new discoveries
  • Repeatedly accessed

Science Literature

slide-16
SLIDE 16

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

My research is based on ...

slide-17
SLIDE 17

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Literature Indexing

slide-18
SLIDE 18

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Geospatial Indexing

slide-19
SLIDE 19

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Google Maps (Washington D.C.)

slide-20
SLIDE 20

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Entities

slide-21
SLIDE 21

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

(Geospatial) Pathways

slide-22
SLIDE 22

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Entities

slide-23
SLIDE 23

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

(Linguistic) Pathways

slide-24
SLIDE 24

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Why people use Google Maps?

  • Useful

✔ Contents

  • Easy to use

✔ Interface

➔ To access ➔ To exchange ➔ To create ➔ To reuse

Geospatial annotations

slide-25
SLIDE 25

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Literature annotation

  • Do we have good contents?

✔ Many groups are producing annotations.

  • Do we have good ways to access them?

✔ Ann. resources are scattered and isolated.

Let’s link them to each other, and Share them, altogether. BLAH!

slide-26
SLIDE 26

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

PubAnnotation

✔ Is a repository of literature annotation ✔ Is based on a scalable storage system ✔ Specifjcally aims at PubMed and PMC ✔ Solicits contribution of annotations from

the community

✔ Solves various problems for sharing the

annotations

➔ Alignment ➔ Global addressing system ➔ REST APIs

slide-27
SLIDE 27

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

slide-28
SLIDE 28

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Challenges for Integration

  • Format is not standardized

✔ Many proprietary formats

  • Texts are changed

➔ PubMed, PMC change texts ➔ Web masters change texts ➔ Annotation projects change texts

✔ For

➔ Cleaning ➔ Convenience for annotation

– Unicode → ASCII

slide-29
SLIDE 29

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Challenges for Integration

  • Format is not standardized

✔ A matter of conversion

  • Texts are changed

✔ Breaks stand-ofg annotation

➔ Character ofgsets become invalid

✔ Solution

➔ Sequence alignment (BLAST!)

slide-30
SLIDE 30

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

GATA3-Driven Th2 Responses Inhibit TGF-1Induced FOXP3 Expression and the Formation of Regulatory T Cells T ranscription factors act in concert to induce lineage commitment towards Th1, Th2, or T regulatory (T reg) cells, and their counter-regulatory mechanisms were shown to be critical for polarization between Th1 and Th2 phenotypes. FOXP3 is an essential transcription factor for natural, thymus-derived (nT reg) and inducible T reg (iT reg) commitment; however, the mechanisms regulating its expression are as yet unknown. We describe a mechanism controlling iT reg polarization, which is overruled by the Th2 difgerentiation

  • pathway. We demonstrated that interleukin 4 (IL-4) present at the time of T cell priming

inhibits FOXP3. This inhibitory mechanism was also confjrmed in Th2 cells and in T cells of transgenic mice overexpressing GATA-3 in T cells, which are shown to be defjcient in transforming growth factor (TGF) b-mediated FOXP3 induction. This inhibition is mediated by direct binding of GATA3 to the FOXP3 promoter, which represses its transactivation process. ...

Alignment

This inhibitory mechanism was also confjrmed in Th2 cells and in T cells of transgenic mice over-expressing GATA-3 in T cells, which are shown to be defjcient in transforming growth factor (TGF)-beta-mediated FOXP3 induction.

PubAnnotation Local Annotation

107-113, Protein 208-213, Protein

slide-31
SLIDE 31

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

GATA3-Driven Th2 Responses Inhibit TGF-1Induced FOXP3 Expression and the Formation of Regulatory T Cells T ranscription factors act in concert to induce lineage commitment towards Th1, Th2, or T regulatory (T reg) cells, and their counter-regulatory mechanisms were shown to be critical for polarization between Th1 and Th2 phenotypes. FOXP3 is an essential transcription factor for natural, thymus-derived (nT reg) and inducible T reg (iT reg) commitment; however, the mechanisms regulating its expression are as yet unknown. We describe a mechanism controlling iT reg polarization, which is overruled by the Th2 difgerentiation

  • pathway. We demonstrated that interleukin 4 (IL-4) present at the time of T cell priming

inhibits FOXP3. This inhibitory mechanism was also confjrmed in Th2 cells and in T cells of transgenic mice overexpressing GATA-3 in T cells, which are shown to be defjcient in transforming growth factor (TGF) b-mediated FOXP3 induction. This inhibition is mediated by direct binding of GATA3 to the FOXP3 promoter, which represses its transactivation process. ...

Alignment

This inhibitory mechanism was also confjrmed in Th2 cells and in T cells of transgenic mice over-expressing GATA-3 in T cells, which are shown to be defjcient in transforming growth factor (TGF)-beta-mediated FOXP3 induction.

PubAnnotation Local Annotation

Upload & align

107-113, Protein 208-213, Protein

slide-32
SLIDE 32

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

GATA3-Driven Th2 Responses Inhibit TGF-1Induced FOXP3 Expression and the Formation of Regulatory T Cells T ranscription factors act in concert to induce lineage commitment towards Th1, Th2, or T regulatory (T reg) cells, and their counter-regulatory mechanisms were shown to be critical for polarization between Th1 and Th2 phenotypes. FOXP3 is an essential transcription factor for natural, thymus-derived (nT reg) and inducible T reg (iT reg) commitment; however, the mechanisms regulating its expression are as yet unknown. We describe a mechanism controlling iT reg polarization, which is overruled by the Th2 difgerentiation

  • pathway. We demonstrated that interleukin 4 (IL-4) present at the time of T cell priming

inhibits FOXP3. This inhibitory mechanism was also confjrmed in Th2 cells and in T cells of transgenic mice overexpressing GATA-3 in T cells, which are shown to be defjcient in transforming growth factor (TGF) b-mediated FOXP3 induction. This inhibition is mediated by direct binding of GATA3 to the FOXP3 promoter, which represses its transactivation process. ...

Alignment

This inhibitory mechanism was also confjrmed in Th2 cells and in T cells of transgenic mice over-expressing GATA-3 in T cells, which are shown to be defjcient in transforming growth factor (TGF)-beta-mediated FOXP3 induction.

PubAnnotation Local Annotation

Upload & align

107-113, Protein 208-213, Protein 838-833, Protein 936-941, Protein

slide-33
SLIDE 33

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Alignment

  • Jin-Dong Kim, “A generalized LCS algorithm

and its application to corpus alignment”, Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP), pp.14-18, 2013

✔ A defjnite solution to text variant problem ✔ It can align even full paper articles

sourced by two difgerent groups.

slide-34
SLIDE 34

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Aligned annotations from difgerent groups

slide-35
SLIDE 35

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Alignment

  • Aligned annotations

✔ http://www.pubannotation.org/

slide-36
SLIDE 36

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Global addressing system

  • Persistently preserve the texts of all

the articles from PubMed / PMC(OA)

✔ UTF-8 ✔ ASCII conversion is provided

  • Ofgset indices are stably maintained
slide-37
SLIDE 37

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

a case of Google Map

slide-38
SLIDE 38

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

A case of PubAnnotation

  • Example of URL

✔ http://pubannotation.org/docs/sourcedb

/PubMed/sourceid/10022882/spans/606-71 0/annotations/visualize

  • How to get the URL (Example)

✔ http://pubannotation.org/docs/sourced

b/PubMed/sourceid/10022882

slide-39
SLIDE 39

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Global addressing system

  • Persistently preserve the texts of all

the articles from PubMed / PMC(OA)

✔ UTF-8 ✔ ASCII conversion is provided

  • Ofgset indices are stably maintained
slide-40
SLIDE 40

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Exchange

Hi Bill, what is the diagnostic test for MERS infection? Check this link out. FYR, I've annotated it using NCIt, OBI, and SNOMEDCT.

slide-41
SLIDE 41

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Exchange

Hi Bill, what is the diagnostic test for MERS infection? Check this link out. FYR, I've annotated it using NCIt, OBI, and SNOMEDCT.

slide-42
SLIDE 42

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Currently, in PubAnnotation

slide-43
SLIDE 43

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

PubAnnotation

  • New version to be released soon

✔ Performance improved ✔ Interface improved ✔ BioC conversion to be supported

➔ Thanks to the NCBI team

✔ Bug fjxes

slide-44
SLIDE 44

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

TextAE

  • To access/edit annotations

✔ http://textae.pubannotation.org ✔ Has fully RESTful APIs

slide-45
SLIDE 45

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

PubDictionaries

  • To share dictionary resources

✔ Current version

➔ http://pubdictionaries.org

✔ New version

➔ http://new.pubdictionaries.org

slide-46
SLIDE 46

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Difgerent perspectives Shared Annotation Targets

slide-47
SLIDE 47

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Many literature annotation projects

slide-48
SLIDE 48

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

None of them is complete

slide-49
SLIDE 49

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

None of them is complete

slide-50
SLIDE 50

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

None of them is complete

slide-51
SLIDE 51

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Then, why don't we collect & link them?

slide-52
SLIDE 52

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Biomedical Linked Annotation Hackathon

slide-53
SLIDE 53

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

BLAH

  • Biomedical Linked Annotation Hackathon

✔ BLAH ➔Feb. 2015, Kashiwa ✔ BLAH2 ➔Nov. 2015, Mishima / Ito ✔ BLAHMUC ➔Oct. 2016, Munich

slide-54
SLIDE 54

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

BLAH

  • Biomedical Linked Annotation Hackathon

✔ BLAH ➔Feb. 2015, Kashiwa ✔ BLAH2 ➔Nov. 2015, Mishima / Ito ✔ BLAHMUC ➔Oct. 2016, Munich ✔ BLAH3 ➔Jan. 2017, Tokyo

slide-55
SLIDE 55

L i c e n s e d u n d e r a C r e a t i v e C

  • mmo

n s 表示2 . 1 日本 l i c e n s e ( c ) 2 1 3 金進東

Thank you! Happy October Blah!