French on the Internet Prof. SJ. Darmoni, MD, PhD TIBS, LITIS Lab - - PowerPoint PPT Presentation

french on the internet
SMART_READER_LITE
LIVE PREVIEW

French on the Internet Prof. SJ. Darmoni, MD, PhD TIBS, LITIS Lab - - PowerPoint PPT Presentation

CISMeF Catalog & Index of Health Resources in French on the Internet Prof. SJ. Darmoni, MD, PhD TIBS, LITIS Lab Rouen University Hospital & Rouen Medical School, France Email: Stefan.Darmoni@chu-rouen.fr MIE Oslo August 2011 2


slide-1
SLIDE 1

CISMeF

Catalog & Index of Health Resources in French on the Internet

  • Prof. SJ. Darmoni, MD, PhD

TIBS, LITIS Lab Rouen University Hospital & Rouen Medical School, France

Email: Stefan.Darmoni@chu-rouen.fr MIE Oslo August 2011

slide-2
SLIDE 2

Introduction

§ Quality controlled subject gateways (or portals) were defined by Koch as Internet services which apply a comprehensive set of quality measures to support systematic resource discovery. § CISMeF = quality controlled health gateway for French institutional health resources

ü www.cismef.org

2

slide-3
SLIDE 3

Introduction

§ The objective of CISMeF (Catalog and Index of French- speaking resources) is to assist the health professional & lay people during the search for electronic information available on the Internet. CISMeF covers healthcare disciplines and medical sciences. § CISMeF was a project originally initiated by Rouen University Hospital (RUH).

§ URL: http://www.cismef.org & http://www.chu-rouen.fr/cismef

§ CISMeF began in February 1995 § Doc’CISMeF in 2000: creation of a generic search tool using the CISMeF semi-informal ontology

§ URL: http://doccismef.chu-rouen.fr/

Methods of Information in Medicine 2000; Jan;39(1) 30-35 Medical Informatics & The Internet in Medicine 2001; 26(3):165 - 178

slide-4
SLIDE 4

CISMeF terminology

§ Two standard tools for organising information:

ü the MeSH (Medical Subject Headings) thesaurus from the US National Library of Medicine ü Several metadata element sets

  • the Dublin Core metadata format + CISMeF specific fields
  • For teaching resources, IEEE 1484 LOM metadata format

11 elements of the LOM Educational category => DC.Education

  • For evidence-based medicine resources, CISMeF specific fields: level of

evidence + method to evaluate it

  • The HIDDEL metadata set is used to enhance transparency, trust and

quality of health information on the Internet.

§ Do not reinvent the wheel +++ but adapt it

DC-2004, International Conference on Dublin Core and Metadata Applications Stud Health Technol Inform. 2003;95:707-712

slide-5
SLIDE 5

MeSH ‘enhancements’ § The heterogeneity of Internet health resources led the CISMeF team to enhance the MeSH thesaurus with the introduction of two new concepts

ü resource types (N≈300), ü metaterms (N≈120), ü predefined queries (N≈200)

Health Information and Libraries Journal 2004 Dec;21(4):253-61

slide-6
SLIDE 6

MeSH ‘enhancements’ § Improvement of the MeSH thesaurus itself

ü Add-on of 10,000 French synonyms, including (ambiguous) acronyms ü Manual translations of 6,000 definitions (semi-automatic translation for the rest of the MeSH soon) ü French translation of >20,000 MeSH Supplementary Concepts (SC) & add-on of 6,000 synonyms

6

slide-7
SLIDE 7

Strategic revolution in 2005

§ Between 1995 & 2005, mono-terminological world around the MeSH § Since 2005, shift to multi-terminological universe :

ü CCAM, CIM10, SNOMED Int., CIF/CIH, CISP2, DRC ü Creation of a French Health Multi-Terminological Server (HMTS): ANR, InterSTIS ü Multi-Terminological extraction (7th FP EU, PSIP ü Multi-Terminological Information Retrieval (JFIM 2009)

§ Several health terminologies for the automatic indexing and the information retrieval in the CISMeF quality-controlled health portal… and beyond § Can be reused in any European language if health terminologies are available in your language!!! In particular in Norway

slide-8
SLIDE 8

Multi terminology automatic indexing

ECMT InfoRoute

CISMeF Information System

32 health terminologies

Multi terminology information retrieval Doc’CISMeF search engine European Health multi-Termino-Ontology Cross-lingual Portal EHTOP

Situation in August 2011

TIBS Information processing in biology and in health

  • Prof. SJ. Darmoni
slide-9
SLIDE 9

Multi-Terminological extraction

§ Collaboration with Vidal company § F-MTI & ECMT tools

ü 3 PhDs (A. Névéol, S. Pereira & S. Sakji)

§ Bag of words algorithm, stemming (or lemmatization) § Inclusion of health terminologies available in French

ü SNOMED Int, ICD 10, MeSH, MeSH SC, ICDC (included in UMLS) ü ATC, CIF (WHO) ü CCAM, DRC, Orphanet, TUV, CIS, CIP, INN, Brand Names ü MedDRA, WHO-ART, LOINC (to be included) ü Recent study on CISMeF corpus de CISMeF: MonoT vs. MultiT (AMIA 2009) : +7% recall ; -12% precision

slide-10
SLIDE 10

Multi-Terminological extraction

§ New concept: automatic affiliation of a subheading to a MeSH term § Manual affiliation of a subheading to a MeSH Supplementary Concept (Evaluation to perform) § Stoilo & Lewenstein distance (PhD Z. Moalla Y2) § In the near future MeSH Indexing at the concept level and not anymore at the descriptor level

ü Interesting fo rare diseases; potential collaboration

slide-11
SLIDE 11

IR in CISMeF: currently

§ Only three steps

Step1: Reserved terms (∈CISMeF terminology) OR document's title Step2: The CISMeF metadata Mixing the reserved terms, all fields and adjacency in the titles (word adjacency: (n-1)*5) Step 3: Adjacency in the plain texts Mixing the reserved terms, all fields and adjacency in the plain texts (word adjacency: (n-1)*10)

11

slide-12
SLIDE 12

CISMeF Information Retrieval

§ Since 2005, four levels of indexing in CISMeF

ü Level 1: manuel indexing (e.g. guidelines) ü Level 2: supervised indexing (e.g. technical report

  • r teaching document from national medical

societies) ü Level 3: automatic indexing (e.g. SCPs, teaching document from one medical school) ü Level 4: extending the CISMeF corpus => Google CISMeF (restricted to publishers included in CISMeF)

slide-13
SLIDE 13

13

slide-14
SLIDE 14

CISMeF Information Retrieval

§ Some differences with PubMed

ü Resources automatically indexed included

§ CISMeF resource ranking

ü Analysis of the query ü MeSH Major (or Title) first (display of score)

  • Then, date (as PubMed )

ü Automatic (Title or SubTitle) ü Minor MeSH

slide-15
SLIDE 15

Multi-Terminological Information Retrieval

§ RIMT using the same health terminologies, integrated to the CISMeF backoffice

ü Operational in Doc’CISMeF since April 2009 (test) ü Bi-terminological in the PSIP DIP since September 2008

§ Bag of words algorithm, stemming § Double context

ü Knowledge (CISMeF) + contextual knowledge

  • PhD Saoussen Sakji , dec. 2010 (Tunisia)

ü Electronic Health Record (EHR)

  • PhD AD Dirieh-Dibad 4Y, planned March 2012 (Djibouti)
slide-16
SLIDE 16

ATC

16

MeSH

slide-17
SLIDE 17

Results in the Doc’CISMeF search engine

§ Use of multi-terminology indexing with SNOMED & MedDRA + MeSH indexing

17

Multi-terminology manual indexing using PTS Multi-terminology information retrieval

slide-18
SLIDE 18

CISMeF & PTS

§ During 2009, in collaboration with 8 students-engineers from the INSA de Rouen, and with LERTIM & MONDECA, the CISMeF team has developed a Multi-Terminology Health Portal (PTS as a French acronym). § Since 2007, modelization of a generic model to integrate main terminologies and ontologies available in French

§ The current health terminologies included in PTS are: ü MeSH (+ MeSH SC + CISMeF extension), SNOMED Int, CCAM, ICD10 & ICPC2 (InterSTIS project) ü ATC, ICF, WHO-ART, WHO-ICPS (WHO) ü DRC, MedDRA, MEDLINEPlus ü CIS, CIP, CAS, EC, INN, Brand Names, PSIP taxonomy, IUPAC, NCC-MERP (PSIP Project) ü Orphanet (rare diseases), LPP & Cladimed (medical devices) ü TUV (Vidal), FMA, SNOMED CT, LOINC ü ADICAP, NCIT (to be included)

18

slide-19
SLIDE 19

HMTP generic model

19

slide-20
SLIDE 20

Health Multi-Terminology Portal (HMTP; PTS)

§ URL: http://pts.chu-rouen.fr/ § Access for humans and coumputers (Web services)

ü Since September 2010, daily used by CISMeF team to index manually and automatically Web resources ü Since January 2011, MeSH is freely available (600 unique users per working day)

§ Restricted access to the other terminologies (230 registred) § Cooperation with BioPortal: Clement Jonquet & Mark Musen

20

slide-21
SLIDE 21

Main figures

Terminologies Concepts Synonyms Definitions Relations & hierarchies 25 > 580 000 > 840 000 > 220 000 > 1 200 000 May 2010 Terminologies Concepts Synonyms Definitions Relations & hierarchies 32 > 1 100 000 > 2 300 000 > 220 000 > 4 000 000 August 2011

slide-22
SLIDE 22

Future work

§ EHTOP

ü ICD10 in five European languages ü URL: cispro.chu-rouen.fr/ehtop_site ü Procedures & medical devices T/O

§ RIDoPI: Information retrieval on EHR

ü Numerical data ü Temporal data ü RAVEL (2012-4 ANR TecSan program)

§ Interface Terminologies § Multi-lingual search engine (already multi-T/O) § Teaching document: http://www.univ-rouen.fr/med/breeze/ SDinserm/index.htm

slide-23
SLIDE 23

Many thanks

§ Email: Stefan.Darmoni@chu-rouen.fr

slide-24
SLIDE 24
slide-25
SLIDE 25

Conseil Scientifique, 4 Mai 2007

Laboratoire d’Informatique Traitement de l’Information et des Systèmes

EA 4108

TIBS Information processing in biology and in health

  • Prof. SJ. Darmoni

CISMeF terminology Encapsulated MeSH Metaterms Resource types Strategy searches Metadata

Implicit Information Retrieval NLP, text mining,

  • ntology

Textual automatic indexing NLP, KNN Categorization

  • B. Thirion, C. Letord
  • G. Kerdelhué, J. Piot
  • L. Soualmia
  • A. Neveol

Other Medical Terminologies & Dictionaries UMLF, VUMeF, VODeL, PIH

  • S. Pereira
  • A. Rogozan

Health Information Systems

  • P. Massari
  • B. Dahamna
  • IM. Kergourlay

LERTIM / INSA / Mondeca

Cross lingual Multi terminology Portal EHTOP

32 T/O Interstis Multi terminology information retrieval Multi terminology automatic indexing Computer-assisted coding sytem

MONOTERMINOLOGY 1995-2005 MULTITERMINOLOGY 2005-

Saoussen Sakji Semantic Interoperability Intra and Inter Terminologies in Health

  • T. Merabti
  • T. Lecroq
  • M. Joubert / JF. Gehanno
  • M. Joubert
  • M. Joubert / CIFRE Vidal

French Infobutton Contexutal Knowledge

  • S. Pereira
slide-26
SLIDE 26

Introduction (cont.)

§ Three priority axes: ü evidence based medicine ü teaching material ü patient information § >81,000 resources included § 10,000 unique machines/working day § CISMeF team in 2011: N= 14 ü 1.5 medical informaticians ü 1 chief medical librarian + 2.5 medical librarians ü 1 computer scientist (one junior lecturer) + 1 medical resident ü 3 research engineers ü 2 Postdoc + 3 PhD students § Budget ≈ 500 K€/y; 40% RUH § 30 grants in the last ten years for CISMeF (TIBS, LITIS)