Building a Linked Data Graph for Education Dr Tom Heath - - PowerPoint PPT Presentation

building a linked data graph for education
SMART_READER_LITE
LIVE PREVIEW

Building a Linked Data Graph for Education Dr Tom Heath - - PowerPoint PPT Presentation

Building a Linked Data Graph for Education Dr Tom Heath tom.heath@talis.com Talis Education Ltd SemTechBiz, London, September 2012 What do we mean by an 'Education Graph'? The set of all (connected) entities that have an active or potential


slide-1
SLIDE 1

Building a Linked Data Graph for Education

Dr Tom Heath tom.heath@talis.com Talis Education Ltd SemTechBiz, London, September 2012

slide-2
SLIDE 2

What do we mean by an 'Education Graph'?

The set of all (connected) entities that have an active or potential bearing on the education process

(initially focused on higher education)

slide-3
SLIDE 3

Why do we care about an 'Education Graph'?

Because learning is artificially disconnected An education graph can bridge those disconnects

slide-4
SLIDE 4

Components of the Education Graph

Course Content Scheduling Course Descriptions Students and Teachers Publishers Biblio Databases Wikipedia Youtube Subject Codes Facebook etc Khan Academy, P2PU, etc

slide-5
SLIDE 5

Components of the Education Graph

Course Content Scheduling Course Descriptions Students and Teachers Publishers Biblio Databases Wikipedia Youtube Subject Codes Facebook etc Khan Academy, P2PU, etc

slide-6
SLIDE 6

Talis Aspire Reading Lists

slide-7
SLIDE 7

Talis Aspire Reading Lists

  • >40 enterprise customers in the UK and beyond
  • = 1/3 UK universities
  • 10,000s of reading lists
  • 100,000s of learning resources
  • Heavy usage with interesting peaks in demand
slide-8
SLIDE 8

Talis Aspire Reading Lists

  • An RDF-native application, hosted by us
  • Backed by a hosted triplestore
  • Migrating to MongoDB
  • Linked Data views available on the public Web
  • A real, live Linked Data application with paying

customers

  • (Probably) the most heavily used Linked Data

application in the education domain

slide-9
SLIDE 9

Components of the Education Graph

Course Content Scheduling Course Descriptions Students and Teachers Publishers Biblio Databases Wikipedia Youtube Khan Academy, P2PU, etc Subject Codes Facebook etc

slide-10
SLIDE 10

Building and Enriching the Education Graph

slide-11
SLIDE 11

From Plain Text to a 'Biblio-graph-ic' Record

  • Problem
  • Only some data is entered in structured form
  • Legacy data is typically plain text citations
  • Our Approach
  • Pre-process citation text with regex
  • Pass through heavily modified version of FreeCite
  • Clean output again with regex
  • Return as JSON object
  • Pass through entity reconciliation process...
slide-12
SLIDE 12

Enhancing Data Quality with Entity Reconciliation

  • Validate the accuracy of the record by matching against

high-quality reference data sources

  • OpenLibrary, OpenKB (serials/journals), CrossRef
  • Books:
  • match on a precise edition, roll up to work, map to our

canonical resource

  • Articles:
  • enrich the graph describing the resource using OpenKB,

search CrossRef using enriched description, map to our canonical resource

slide-13
SLIDE 13

A Happy By-Product

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

slide-14
SLIDE 14

Unifying the Institutional Sub-Graphs

  • Goal
  • Create a cross-institution (portion of) the education

graph, centred around learning resources

  • Process
  • Harvest the data from each Reading List store
  • Repeat the entity reconciliation process

– Retain the links mapping canonical resources to institutional

data

slide-15
SLIDE 15

Warehousing the Education Graph

  • Goal
  • a single point of access to the education graph
  • Applications
  • analytics and business intelligence (for us and

customers)

  • data science experiments
  • Approach
  • based on Apache Jena TDB/Fuseki + MongoDB
slide-16
SLIDE 16

Applications of the Education Graph

slide-17
SLIDE 17

Directory of Quality Learning Resources

slide-18
SLIDE 18
slide-19
SLIDE 19

Recommending Learning Resources

slide-20
SLIDE 20

Reflections

slide-21
SLIDE 21

Reflections

  • What we think matters may not be important
  • e.g. native RDF apps vs. Linked Data views online
slide-22
SLIDE 22

Reflections

  • What we think matters may not be important
  • e.g. native RDF apps vs. Linked Data views online
  • Tooling Reality Check
  • how do we measure up compared to e.g. NoSQL

databases, Hadoop, etc?

slide-23
SLIDE 23

Reflections

  • What we think matters may not be important
  • e.g. native RDF apps vs. Linked Data views online
  • Tooling Reality Check
  • how do we measure up compared to e.g. NoSQL

databases, Hadoop, etc?

  • Is numerical data a second-class citizen of the

graph?

slide-24
SLIDE 24

Questions?

Web: talisaspire.com Twitter: @talisaspire YouTube: youtube.com/user/TalisAspire Facebook: facebook.com/talisaspire