Personal informa-on management systems and knowledge integra-on - - PowerPoint PPT Presentation

personal informa on management systems and knowledge
SMART_READER_LITE
LIVE PREVIEW

Personal informa-on management systems and knowledge integra-on - - PowerPoint PPT Presentation

Personal informa-on management systems and knowledge integra-on Serge Abiteboul Inria & Ecole Normale Suprieure Cachan serge.abiteboul@inria.fr http://abiteboul.com Organiza8on 1. Personal data 2. The Pims 1. The concept of Pims 2. The


slide-1
SLIDE 1

Personal informa-on management systems and knowledge integra-on

Serge Abiteboul

Inria & Ecole Normale Supérieure Cachan

serge.abiteboul@inria.fr http://abiteboul.com

slide-2
SLIDE 2

Organiza8on

  • 1. Personal data
  • 2. The Pims
  • 1. The concept of Pims
  • 2. The Pims are arriving and that is cool
  • 3. Research issues
  • 4. An illustra8on with the Thymeflow system

Disc 2016 Serge Abiteboul 2

slide-3
SLIDE 3
  • 1. Personal data
slide-4
SLIDE 4

Personal data out there

Serge Abiteboul 4 Disc 2016

slide-5
SLIDE 5

Personal data out there

  • Variety

– Structured, semi-structured, unstructured – Metadata and knowledge (RDF) – Different languages, terminologies, ontologies, structures

  • Veracity

– Varying quality: errors, opinions, missing data… – Varying importance: hard to assess

  • Velocity

– Changes, staleness… – Recent data is typically very valuable

− Volume (???)

– Growing but no Big data

+ Distributed

– In many autonomous systems that act as silos – Different systems, protocols

5 Serge Abiteboul Disc 2016

slide-6
SLIDE 6
  • Loss of func8onali8es because of fragmenta8on

– You don’t know where your data is, how to maintain it up to date, how to get it some8mes – Difficult to do global search, maintenance, synchroniza8on, archiving...

  • Loss of control over the data

– Difficult to control privacy – Difficult to control sharing – Leaks of private informa8on

  • Loss of freedom

– Vendor lock-in

Bad news (1)

6 Serge Abiteboul Disc 2016

slide-7
SLIDE 7

Bad news (2)

  • A few companies concentrate most of the world’s

data and analy8c power

– They have the means to destroy business compe88on in large por8ons of the economy

  • A few companies control all your personal data

– They determine what informa8on you are exposed to – They guide many of your decisions – They poten8ally infringe on your privacy and freedom

Disc 2016 Serge Abiteboul 7

slide-8
SLIDE 8

2. The Pims

From Managing your digital life with a Personal informa5on management system, with Benjamin André & Daniel Kaplan, Communica-ons of the ACM 2015

slide-9
SLIDE 9

Alterna8ves

  • Con8nue with this increasing

mess

– See a shrink to overcome the frustra8on

  • Gather all your data in one plaform

– Google, Apple, Facebook, …, a new comer – See a shrink to overcome resentment

  • Study 2 years to become a geek

– Geeks know how to manage their informa8on – See a shrink to survive the experience

9 Serge Abiteboul W h e r e d

  • y
  • u

k e e p y

  • u

r d a t a ? Disc 2016

slide-10
SLIDE 10

Or move to Pims!

A memex is a device in which an individual stores all his books, records, and communica5ons, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged in5mate supplement to his memory. Vannevar Bush, The Atlan8c Monthly, 1945

Defini8on for this talk : a Personal Informa-on Management System is a cloud system that manages all the informa5on of a person

One Pims, two Pims… many Pims

Serge Abiteboul 10 Disc 2016

slide-11
SLIDE 11

The Pims: a change in paradigm

Many Web services Each one running

  • On some unknown

machines

  • With your data
  • Some sokware

Your Pims

  • Your machine
  • With your data

– possibly replica of data from systems you like

  • Wrapper to some sokware

– External service

  • Or your sokware

– Decentralized service

Serge Abiteboul 11 Disc 2016

slide-12
SLIDE 12

The Pims are (I believe) arriving!

Why? For 3 kinds of reasons:

  • Society
  • Technology
  • Industry

Disc 2016 Serge Abiteboul 12

slide-13
SLIDE 13

Society is ready to move

  • Growing resentment

– Against companies: intrusive marke8ng, cryp8c personaliza8on and business decisions (e.g., on pricing), creepy "big data" inferences – Against governments: NSA and its European counterparts

  • Increasing awareness of the dissymmetry

– between what these systems know about a person, and what the person actually knows

  • Emerging understanding of the value of personal data

for individuals

– Quan8fied self

Serge Abiteboul 13 Disc 2016

slide-14
SLIDE 14

Society is ready to move (2)

  • Privacy control: regula8ons in Europe
  • Informa8on symmetry: Vendor rela8on management
  • Many reports/proposals that affirm the ownership of

personal data by the person

  • Personal data disclosure ini8a8ves

– Smart Disclosure (US); MiData (UK), MesInfos (France) – Several large companies (network operators, banks, retailers, insurers…) agreeing to share with customers the personal data that they have about them

Serge Abiteboul 14 Disc 2016

slide-15
SLIDE 15

Technology is gearing up

  • System administra8on is easier

– Abstrac8on technologies for servers – Virtualiza8on and configura8on management tools

  • Open-source alterna8ves to proprietary online

services are increasingly available

  • Price of machines is going down

– A hosted low-cost server is as cheap as 5€/month – Paying is no longer a barrier for a majority of people

You may have friends already doing it

Serge Abiteboul 15 Disc 2016

slide-16
SLIDE 16

Technology is gearing up (2)

  • Many systems & projects

– Lifestreams, Stuff-I’ve-Seen, Haystack, MyLifeBits, Connec8ons, Seetrieve, Personal Dataspaces, or deskWeb. – YounoHost, Amahi, ArkOS, OwnCloud or Cozy Cloud

  • Some on par8cular aspects

– Mailpile for mail – Lima for a Dropbox-like service, but at home. – Personal NAS (network-connected storage) e.g. Synologie – Personal data store SAMI of Samsung...

  • Many more

Serge Abiteboul 16 Disc 2016

slide-17
SLIDE 17

Industry is interested Pre-digital companies

  • E.g., hotels or banks
  • Disintermediated from their customers by pure

Internet players such as Google, Amazon, Booking.com, Mint.

  • In Pims, they can rebuild direct interac8on
  • The playing field is neutral

– Unlike on the Internet where they have less data

  • They can offer new services without compromising

privacy

Serge Abiteboul 17 Disc 2016

slide-18
SLIDE 18

Industry is interested (2) Home appliances companies

  • Many devices deployed at home or in datacenters

– Internet service provider “boxes”, NAS servers, “smart” meters provided by energy vendors, home automa8on systems, “digital lockers”…

  • Personal data spaces dedicated to specific usage
  • Could evolve to become more generic
  • Control of private Internet of things

Serge Abiteboul 18 Disc 2016

slide-19
SLIDE 19

Industry is interested (3) Pure Internet players

  • Amazon: great know-how in providing services
  • Facebook, Google: cannot afford to be out of a

movement in personal data management

  • Very far from their business model based on

personal adver8sement

  • Moving to this new market would require major

changes & the clarifica8on of the rela8onship with users w.r.t. data mone8za8on

Serge Abiteboul 19 Disc 2016

slide-20
SLIDE 20

Advantages – rebalance the Web

  • User control over their data

– Who has access to what, under what rules, to do what

  • User empowerment

– They choose services freely & they can leave a service

  • Par8cipa8on in a more “neutral” Web

– With the “network effect”, the main plaforms are accumula8ng data/customers and distor8ng compe88on – The Pims bring back fairness on the Web – Good prac8ces are encouraged, e.g., interoperability, portability

Serge Abiteboul 20 Disc 2016

slide-21
SLIDE 21

The Pims will primarily arrive because of new func8onali8es

This is (for me) the key ingredient for adop8on New func8onali8es ➸ New opportuni8es New playing field for startups New playing field for researchers

Serge Abiteboul 21 Disc 2016

slide-22
SLIDE 22
  • 3. Research issues with the Pims

From Personal Informa5on Management Systems, tutorial in Extended Data Base Technology Conference, 2015, with Amélie Marian

slide-23
SLIDE 23

R&D issues we will not consider much

Some old problems revisited

  • Epsilon-principle (epsilon-user-administra8on)
  • Backups & Task sequencing
  • Access control & Exchange of informa8on
  • Security (e.g. works @ INRIA Rocquencourt)
  • Connected objects control

Serge Abiteboul 23 Disc 2016

slide-24
SLIDE 24

R&D issues we will briefly illustrate

Some old problems revisited

  • Personal informa8on integra8on
  • Synchroniza8on
  • Personaliza8on and context awareness
  • Personal data analysis

Serge Abiteboul 24 Disc 2016

slide-25
SLIDE 25
  • 4. An illustra8on with the

Thymeflow system

Demo in Interna8onal Conference on Informa8on and Knowledge Management (CIKM’16) with David Montoya, Thomas Pellissier-Tanon, Fabian M. Suchanek

slide-26
SLIDE 26

Pims are first about data integra8on

Disc 2016 Serge Abiteboul 26

m i m i l u l u z a z a

loca8on webSearch calendar mail contacts facebook TripAdvisor banks WhatsApp

Facebook Integra8on of the users of a service Integra8on of the services

  • f a user

A L I C E X X X X X X X X X

slide-27
SLIDE 27

Or rather on knowledge integra8on

  • Data / Informa-on ➼ Knowledge

– Personal data/info management is geyng too complicated – Machines prefer structured knowledge to unstructured informa8on or seman8c-free data

  • Thesis: Let us turn all our informa8on into a

distributed knowledge base

ERC Webdam, hzp://webdam.inria.fr (ended in 2015)

Serge Abiteboul 27 Disc 2016

slide-28
SLIDE 28

The Thymeflow Knowledge Base

  • Thymeflow is a KB, an extension of a person’s memory

– Episodical memory (typically related to spa8o-temporal events) and – Seman8c memory (knowledge that holds irrela8ve to any such event)

  • Thymeflow’s knowledge is

– Extracted from all the informa8on traces of the person – Obtained from the Web (Wikidata, OpenStreetMap…) – Derived by sokware modules that analyze the KB

  • Thymeflow is an applica8on for the Web and mobile phones

– Loading: calendar, contacts, mails, geoloca8on (GPS), social networks… – Deriving links between these data sources and other knowledge bases – Suppor8ng query processing and data analy8cs

Disc 2016 Serge Abiteboul 28

slide-29
SLIDE 29

Data sources loading/sync

Architecture

Disc 2016 Serge Abiteboul 29

synchronizer synchronizer synchronizer synchronizer synchronizer synchronizer synchronizer

Thymeflow

KB

enricher enricher enricher enricher enricher enricher enricher

KB enriching External sources querying

Persistent KB

Querying Visualiza-on Analy-cs

  • Backend:

– HTTP server – REST API – SPARQL endpoint (Sesame)

  • Frontend: Web app
  • Mobile app

– for geoloca8on

slide-30
SLIDE 30

RDF knowledge base

  • RDF model

– RDF Triples subject–predicate–object

  • Schema

– hzp://schema.org/ – hzp://thymeflow.com/personal

  • Most useful classes

– personal:Agent – schema:Event – schema:Place – schema:EmailMessage

Disc 2016 Serge Abiteboul 30

slide-31
SLIDE 31

Query examples

  • At what 8me do I usually send emails?
  • Full-text query in my en8re memory

Disc 2016 Serge Abiteboul 31

slide-32
SLIDE 32

Main component: synchronizer

  • Transform data into knowledge and synchronize a

data source with the knowledge base Examples

  • CalDavSynchronizer/CardDavSynchronizer :

– Manage iCalendar (.ical) and vCard (.vcf)

  • EmailSynchronizer

– IMAP to connect to mail servers

Disc 2016 Serge Abiteboul 32

slide-33
SLIDE 33

Thymeflow

KB

Update propaga8on

From data sources to KB From KB to data sources (1)

Disc 2016 Serge Abiteboul 33

Thymeflow

KB

Thymeflow

KB

Persistent KB Persistent KB Persistent KB

From KB to data sources (1)

Thymeflow

KB

Thymeflow

KB

Thymeflow

KB

Persistent KB

???

slide-34
SLIDE 34

Main component: enricher

  • Align concepts coming from different data sources
  • Add knowledge to the KB

Examples

  • Align agents based on, e.g., their names, emails…
  • Add geoloca8ons to calendar events
  • Add seman8cs to places physically visited
  • Align calendar events to places physically visited

Disc 2016 Serge Abiteboul 34

slide-35
SLIDE 35

Data analy8cs

  • Small data analysis with Pims

– Learn from personal data, e.g.,

  • Personal health and well-being
  • Digital personal assistant: no8fica8on & planning

– Issues

  • Much smaller amounts of data – sta8s8cs harder
  • Varying data quality: imprecision, inconsistencies
  • Big data analysis from Pims

– Aggregate data from large number of Pims – Derive knowledge useful for Pims, e.g., traffic jams – Issue: data privacy

Serge Abiteboul 35 Disc 2016

slide-36
SLIDE 36

Conclusion

Goal Make the digital world a bezer place to live in The Pims seem a promising direc8on for that Lots of research issues remaining

Serge Abiteboul 36 Disc 2016

slide-37
SLIDE 37