Structuring Medical Records with Apache Stanbol
Rafa Haro, Senior Software Engineer, Athento Antonio Pérez Morales, Senior Software Engineer, Ixxus
Structuring Medical Records with Apache Stanbol Rafa Haro, Senior - - PowerPoint PPT Presentation
Structuring Medical Records with Apache Stanbol Rafa Haro, Senior Software Engineer, Athento Antonio Prez Morales, Senior Software Engineer, Ixxus Committer, PMC Member @ Apache Stanbol, Apache ManifoldCF Topics : Document Analysis,
Rafa Haro, Senior Software Engineer, Athento Antonio Pérez Morales, Senior Software Engineer, Ixxus
Stanbol, Apache ManifoldCF
Learning, Semantic Technologies, ECM
ManifoldCF
Machine Learning
Apache Stanbol provides a set of reusable components for semantic content management. It extends existing CMSs with a number of semantic services.
Traditional Semantic
Software Architecture for Semantically Enabled CM and ECM systems
Apache Stanbol Story
Knowledge Stack. 2009 - 2012)
Defining and Building Platforms in the Semantic CMS Space
IKS Early Adopters Program
What is a Semantic CMS?
Traditional CMS Atomic Unit: Document Properties as meta-data (key-value schemas) Keyword Search Document Management Document Types Document Workflow Semantic CMS Atomic Unit: Entity Semantic meta-data (RDF) Semantic Search Knowledge Management Entity Management Ontologies
Source: What Apache Stanbol Can Do for You?. Fabian Christ. ApacheCon Europe 2012
Key Points
Metadata
Stanbol Components
Apache Stanbol Component Layer
Apache Stanbol Reasoners Apache Stanbol Enhancer Apache Stanbol Rules Apache Stanbol Ontology Manager Apache Stanbol ContentHub Apache Stanbol EntityHub Apache Stanbol FactStore Stanbol Enhancement Engines Apache Stanbol CMS Adapter
Stanbol Components (II)
refine knowledge to align it to public schemas such as schema.org
Built on Top of Apache….
Integration Scenarios
Source: What Apache Stanbol Can Do for You?. Fabian Christ. ApacheCon Europe 2012
(Stanbol Launchers)
(Servlet-Container)
OSGi environment
Project Current Status
Contributions (commits) to Trunk Since Incubation
Incubation (Nov 2010) Apache Stanbol 0.9.0-incubating (Aug 2012) Graduation (October 2012) IKS Project Ending (Dec 2012) Apache Stanbol 0.12.0 (March 2014) Apache Stanbol 1.0.0 (October 2016)
Project Current Status (II)
Source: Apache Stanbol Committee Report Helper (https://reporter.apache.org/?stanbol)
Stanbol Enhancer
Stanbol Enhancer (II)
Stanbol Enhancer (III)
Stanbol Enhancement Chains
supported
engines allowed
Current Enhancement Engines
Stanbol EntityHub
Stanbol EntityHub (II)
schema:name = rdfs:label[@en]; friend-names = foaf:knows/foaf:name curl -X POST -d "name=lyon&limit=10" \ http://localhost:8080/entityhub/site/dbpedia/find
Use Case: Hexin Project - Structuring Medical Records
clinical data and unstructured medical records
Hexin: Architecture
Validation Analysis Patient
Data Source
URX ETL BIG DATA (HDFS + HIVE)
Event Detection Process
Cassandra Reference Cases Detection Process
New Case
BI
PatientId Date Structured Events Semantic Events Symptoms:
Unrest Cough Fever>38 Rules
Hexin: Semantic Tagging
Hexin: Objective
“Paciente diabético desde los 5 años y con EPOC moderada grado 2 de la GOLD”
Hexin:Solution Design
Detection > Fact Extraction
Hexin: Ontology
Hexin: Ontology Indexing
an EntityHub site must be created
configurations
using Apache Felix OSGi Web Console
hexin:* hexin:label > rdfs:label
Negex Fact Extract. Hexin Linking
Hexin: Enhancement Chain
OpenNLP-Chunker OpenNLP-POS OpenNLP-Token OpenNLP-Sent.
Custom Hexin Engine. Implemented for the project Entity Linking Engine. Available in Stanbol with a Custom Configuration for this use case NLP Engines. Available in Stanbol. Default Configuration Pre-Processing Engine. Available in Stanbol
Hexin: Linking
Hexin: Linking (II)
Hexin: Custom Engines
@Component @Service public class MyEngine implements EnhancementEngine { @Activate public void activate(ComponentContext c) { // initialize, configure, ... } public int canEnhance(ContentItem item) { if(...item matches our expectations...) { return ENHANCE_SYNCHRONOUS; } else { return CANNOT_ENHANCE; } } public void computeEnhancements(ContentItem item) { // run the engine and add results to item’s // RDF graph based on the item’s InputStream } }
maven-bundle- plugin
adds OSGI metadata
Maven build
maven-scr-plugin
adds services metadata
registered by OSGi
MyEngine Service
MANIFEST.MF
OSGi metadata
OSGi bundle
Install in Stanbol
no restart needed
NLP at Apache Stanbol
NLP at Apache Stanbol (II)
concurrent Modifications
tag (e.g. NE) lexical category (e.g. Noun)
tag (e.g. NP) lexical-category (e.g. NounPhrase)
Stanbol is an Amazing Tool
Sentence Chunk Token
Span Types:
Hexin Custom Engine: Negex
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. Oct 2001;34(5):301-310. public abstract class AbstractNegexDetector implements NegexDetector { @Override public Set<IRI> detectNegations(String language, Graph metadata, AnalysedText at) throws NegexException{} protected abstract boolean isNegated(String language, String concept, String sentence);
}
Hexin Custom Engine: Negex (II)
Annotations properties
surface-form (mention) for applying the algorithm
Hexin Custom Engine: Fact Extraction
“Paciente diabético desde los 5 años y con EPOC moderada grado 2 de la GOLD”
Hexin Custom Engine: Fact Extraction (II)
Properties
AnalyzedText structure)
Hexin Custom Engine: Fact Extraction (III)
Hexin Custom Engine: Fact Extraction (IV)
Diabetes diagnosed when he was 5 years old
NNS VB WRB PRP VBD CD NNS JJ
ENTITY \s VB * VB[be] (CD) years old
ENTITY \s VB * VB[be] (CD)
Thanks for your attention!