Structuring Medical Records with Apache Stanbol Rafa Haro, Senior - PowerPoint PPT Presentation

Structuring Medical Records with Apache Stanbol Rafa Haro, Senior Software Engineer, Athento Antonio Pérez Morales, Senior Software Engineer, Ixxus  

• Committer, PMC Member @ Apache Stanbol, Apache ManifoldCF • Topics : Document Analysis, NLP, Machine Learning, Semantic Technologies, ECM • Committer @ Apache Stanbol, Apache ManifoldCF • Topics : ECM, Semantic Search, ETL, Machine Learning

Apache Stanbol provides a set of reusable components for semantic content management. It extends existing CMSs with a number of semantic services. Traditional Semantic CMS

Software Architecture for Semantically Enabled CM and ECM systems

Apache Stanbol Story • Started within FP7 European Project IKS (Interactive Knowledge Stack. 2009 - 2012)   • IKS project brought together an Open Source Community for Defining and Building Platforms in the Semantic CMS Space   • Incubated in November 2010   • Successfully promoted within CMS and ECM industry through IKS Early Adopters Program   • Graduated to Top-Level Apache Project in October 2012

What is a Semantic CMS? Traditional CMS Semantic CMS Atomic Unit: Document Atomic Unit: Entity Properties as meta-data Semantic meta-data (key-value schemas) (RDF) Keyword Search Semantic Search Document Management Knowledge Management Document Types Entity Management Document Workflow Ontologies Source: What Apache Stanbol Can Do for You?. Fabian Christ. ApacheCon Europe 2012

Key Points • Designed to bring Semantic Technologies to existing CMS   • Non-intrusive set of RESTful ‘Semantic’ Services   • Extremely Modular : Use only the modules you need   • Main Features: • Multilingual Content Enhancement : Structure Content through Semantic Metadata   • Knowledge Bases Management   • Knowledge Models and Reasoning   • Semantic Indexing and Search

Stanbol Components • Stanbol components provide: • RESTful API • Java APIs and OSGi services • Stanbol components do NOT depend on each other • however they can be easily combined to Apache Apache Apache Apache Stanbol Stanbol Stanbol Stanbol Ontology Manager Enhancer EntityHub Reasoners Apache Apache Apache Stanbol Stanbol Stanbol Rules ContentHub FactStore Stanbol Enhancement Engines Apache Apache Stanbol Stanbol CMS Adapter Component Layer

Stanbol Components (II) • Enhancer : Extracts Knowledge from unstructured parsed content   • EntityHub : Manage Domain Entities and Topics (Knowledge Bases)   • ContentHub: Semantic Indexing / Search over your - semantic enhanced - Content   • CMS Adapter : Sync. your CMS with Apache Stanbol (JCR/CMIS)   • Ontology Manager : Manage you formal Domain Knowledge   • Reasoners & Rules : Apply Domain Knowledge to improve / validate extracted Information. Refactor / refine knowledge to align it to public schemas such as schema.org

Built on Top of Apache…. • Apache Felix as OSGi environment • Apache Sling launchers and OSGi Tools • Apache Maven for building • Apache Clerezza as RDF Framework • Apache Jena as TripleStore • Apache Solr for Knowledge Bases Management • Apache Tika for converting input • Apache OpenNLP for NLP Processing

Integration Scenarios • Stand-Alone Server (Stanbol Launchers) • Web Application (Servlet-Container) • Embedded within an OSGi environment Source: What Apache Stanbol Can Do for You?. Fabian Christ. ApacheCon Europe 2012

Project Current Status Apache Stanbol IKS Project Ending Apache Stanbol Apache Stanbol Incubation Graduation 0.9.0-incubating (Dec 2012) 0.12.0 1.0.0 (Nov 2010) (October 2012) (Aug 2012) (March 2014) (October 2016) Contributions (commits) to Trunk Since Incubation

Project Current Status (II) • 22 PMC Members (Last Addition Jul 2016) • 26 Committers (Last Addition May 2015) • 3-5 active committers last 2 years • dev@stanbol.apache.org: 228 subscribers • Activity has been gradually decreasing • 3 major releases Source: Apache Stanbol Committee Report Helper (https://reporter.apache.org/?stanbol)

Stanbol Enhancer RDF

Stanbol Enhancer (II)

Stanbol Enhancer (III)

Stanbol Enhancement Chains • Define how Content is processed by the Enhancer through an ExecutionPlan • Different Implementations: • ListChain : in order sequential enhancement engines execution. Parallel Execution of engines not supported • WeightedChain : ExecutionPlan is calculated using the engines order metadata. Parallel Execution of engines allowed • API: • /enhancer : executes the default chain • /enhancer/chain/{chain-name} : executes a concrete named chain • /enhancer/engine/{engine-name} : executes a concrete named engine

Current Enhancement Engines • Preprocessing • Tika Engine • content type detection • text extraction from several document formats • metadata extraction from several document formats • Natural Language Processing • Language Detection (different implementations) • Sentence Detection (OpenNLP, SmartCN, REST) • Tokenizer (OpenNLP, SmartCN, REST) • POS Tagging (OpenNLP, REST) • Chunking (OpenNLP, REST) • NER (OpenNLP, OpenCalais, REST) • Entity Linking • Named Entity Linking • EntityHub Linking Engine • FST (Lucene Finit State Transducer) Linking Engine • Entity Co-mention • Commercial Engines (OpenCalais, Zemanta, CELI…) • Sentiment Analysis • Disambiguation • DBPedia Spotlight • Solr MLT based • PostProcessing : • Dereferencing

Stanbol EntityHub

    Stanbol EntityHub (II) • Manage Multiple Entity Sources (Knowledge Bases) • Allows Fast Entity-Lookup using Apache Solr   • Referenced Site (Remote LD + Local Caches) Vs Managed Site (Entity CRUD Api over manually configured Sites)   • API: • Query for Entities (used by Entity Linking Engines)   curl -X POST -d "name=lyon&limit=10" \ http://localhost:8080/entityhub/site/dbpedia/find • CRUD for Managed Sites friend-names = foaf:knows/foaf:name • LDPath support for: • Graph Path Retrieval (Used for dereferencing) • Schema Translation schema:name = rdfs:label[@en]; • Simple Reasoning

Use Case: Hexin Project - Structuring Medical Records • R&D Project for Sergas (Galician Public Health Office) • Clinical Data Analysis Platform for supporting: • Clinical Assistance • Epidemiology studies • Medical Research • Big Data approach for analyzing both structured historical clinical data and unstructured medical records • Medical Records are written in Spanish and Galician

Hexin: Architecture Event Detection New Case Reference Cases Process Detection Process BIG DATA (HDFS + PatientId HIVE) Date BI Structured Events Semantic Events URX Symptoms: Rules • Cough ETL • Unrest Data Source Cassandra Unrest Cough Fever>38 Patient Validation Analysis

Hexin: Semantic Tagging

Hexin: Objective “Paciente diabético desde los 5 años y con EPOC moderada grado 2 de la GOLD ”

Hexin:Solution Design • Structure Medical Records using Apache Stanbol Enhancer • Custom Ontology : • Symptoms • Diseases • Diagnosis Tests • Family and Personal History • Custom Enhancement Chain : • Language Detection > NLP > Entity Linking > Negation Detection > Fact Extraction

Hexin: Ontology

Hexin: Ontology Indexing • For supporting the Entity Linking process against Hexin Ontology, an EntityHub site must be created • 2 options: • ManagedSite : full CRUD storage <-> DYNAMIC • ReferencedSite : READ-ONLY remote site + local index • Stanbol EntityHub Indexing Tool : hexin:* • RDF —> JenaTDB —> Solr Index hexin:label > rdfs:label • Configure Custom Namespaces, Mappings and Properties • Generates an OSGi Bundle with the Yard and YardSite default configurations • Copy the index to Stanbol /datafiles folder and install the bundle using Apache Felix OSGi Web Console

Hexin: Enhancement Chain Lang. Detect. OpenNLP-Sent. OpenNLP-Token OpenNLP-POS OpenNLP-Chunker Hexin Linking Fact Extract. Negex Custom Hexin Engine . Implemented for the project Entity Linking Engine. Available in Stanbol with a Custom Configuration for this use case NLP Engines. Available in Stanbol. Default Configuration Pre-Processing Engine. Available in Stanbol

Hexin: Linking

Hexin: Linking (II)

Hexin: Custom Engines @Component @Service OSGi bundle public class MyEngine implements EnhancementEngine { Maven build MANIFEST.MF @Activate OSGi   public void activate(ComponentContext c) { metadata // initialize, configure, ... maven-bundle- } plugin registered by OSGi public int canEnhance(ContentItem item) { adds OSGI metadata if(...item matches our expectations...) { MyEngine   return ENHANCE_SYNCHRONOUS; Service } else { maven-scr-plugin return CANNOT_ENHANCE; adds services metadata } } public void computeEnhancements(ContentItem item) { Install in   // run the engine and add results to item’s // RDF graph based on the item’s InputStream Stanbol   } no restart   } needed

NLP at Apache Stanbol

Structuring Medical Records with Apache Stanbol Rafa Haro, Senior - PowerPoint PPT Presentation

Structuring Medical Records with Apache Stanbol Rafa Haro, Senior Software Engineer, Athento Antonio Prez Morales, Senior Software Engineer, Ixxus Committer, PMC Member @ Apache Stanbol, Apache ManifoldCF Topics : Document Analysis,

Open Annotation Support for Apache Stanbol Apache Stanbol Enhancer POST content Results

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Public Records Public Records Public Records Office Public Records Office Finance Finance

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

An Apache Based, Intelligent IoT Stack Trevor Grant PMC Apache Mahout Project PPMC Apache

CN Computer Networks Research Group G R Department of Computer Science and The

Financial Frictions, Asset Prices, and the Great Recession Zhen Huo and Jos e-V ctor R

Marlene Sandstrom, Dean of the College To welcome To encourage interaction with faculty

2: Old English Sound Laws and Verbs Proto-Germanic fts , dative fti , plural *ftiz Front

Zheng-Tian Lu Physics

Standard Model Tests with Nuclei and Energy Applications Jerry Nolen, ANL and Guy Savard,

Identification of associated transcription factors in promoters and their related enhancer regions

ShadowDraw Real-Time User Guidance for Freehand Drawing Harshal Priyadarshi Demo Components of

Structuring Medical Records with Apache Stanbol Rafa Haro, Senior - PowerPoint PPT Presentation

Structuring Medical Records with Apache Stanbol Rafa Haro, Senior Software Engineer, Athento Antonio Prez Morales, Senior Software Engineer, Ixxus Committer, PMC Member @ Apache Stanbol, Apache ManifoldCF Topics : Document Analysis,

Open Annotation Support for Apache Stanbol Apache Stanbol Enhancer POST content Results

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Public Records Public Records Public Records Office Public Records Office Finance Finance

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora &amp; Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC &amp; Apache Software Foundation

An Apache Based, Intelligent IoT Stack Trevor Grant PMC Apache Mahout Project PPMC Apache

CN Computer Networks Research Group G R Department of Computer Science and The

Financial Frictions, Asset Prices, and the Great Recession Zhen Huo and Jos e-V ctor R

Marlene Sandstrom, Dean of the College To welcome To encourage interaction with faculty

2: Old English Sound Laws and Verbs Proto-Germanic *fts , dative *fti , plural *ftiz Front

Zheng-Tian Lu Physics

Standard Model Tests with Nuclei and Energy Applications Jerry Nolen, ANL and Guy Savard,

Identification of associated transcription factors in promoters and their related enhancer regions

ShadowDraw Real-Time User Guidance for Freehand Drawing Harshal Priyadarshi Demo Components of

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

2: Old English Sound Laws and Verbs Proto-Germanic fts , dative fti , plural *ftiz Front