Open Data Science Initiative Neil D. Lawrence data@she ffi eld 16th - - PowerPoint PPT Presentation

open data science initiative
SMART_READER_LITE
LIVE PREVIEW

Open Data Science Initiative Neil D. Lawrence data@she ffi eld 16th - - PowerPoint PPT Presentation

Open Data Science Initiative Neil D. Lawrence data@she ffi eld 16th December 2015 Challenges for Companies Trying to dominate the modern interconnected data market (e.g. Amazon, Google, Facebook) buying up talent and competitors. or


slide-1
SLIDE 1

Open Data Science Initiative

Neil D. Lawrence

data@sheffield

16th December 2015

slide-2
SLIDE 2

Challenges for Companies

◮ Trying to dominate the modern interconnected data

market (e.g. Amazon, Google, Facebook) — buying up talent and competitors.

◮ or trying to exploit current ‘data silos’ (e.g. Tescos

clubcard, Experian) — monetising our data today (limited shelf life?)

◮ or trying to understand their own systems (the internal

google search)

◮ or new companies with new ideas that will generate data.

slide-3
SLIDE 3

Challenges for Companies

◮ How do they break the natural data monopoly? ◮ How do they access the necessary expertise?

slide-4
SLIDE 4

Challenges in Science

Data sharing is more widely accepted but:

◮ Most analysis is simple statistical tests or explorative

modelling with PCA or clustering.

◮ Few scientists understand these methodologies, apply

them as black box.

◮ There is an understanding gap between the data & scientist

and the data scientist.

slide-5
SLIDE 5

Challenges in Health

◮ Ensure the privacy of patients is respected. ◮ Leverage the wide range of data available for wider

societal benefit.

slide-6
SLIDE 6

International Development

◮ Exploit new telecommunications infrastructure to develop

a leap-frog developed countries.

◮ Needs mechanisms for data sharing that retain the

individual’s control.

◮ Widespread education of local talent in code and model

development.

slide-7
SLIDE 7

Common Strands

◮ Improving access to data whilst balancing against

individual’s right to privacy against societal needs to advance.

◮ Advancing methodologies: development of methodologies

needed to characterize large interconnected complex data sets.

◮ Analysis empowerment: giving scientists, clinicians,

students, commercial and academic partners ability to analyze their own data with latest methodologies.

slide-8
SLIDE 8

Open Data Science: A Magic Bullet?

◮ Make new methodologies available as widely and rapidly

as possible with as few conditions on their use as possible.

◮ Educate commercial, scientific and medical partners in use

  • f these methodologies.

◮ Act to achieve a balance between data sharing for societal

benefit and right of an individual to own their own data.

slide-9
SLIDE 9

Achieving This

◮ Use BSD-like licenses on software. ◮ Educate our partners (summer schools, courses etc). ◮ Act to achieve a balance between data sharing for societal

benefit and rights of the individual.

slide-10
SLIDE 10

Make Analysis Available

slide-11
SLIDE 11

Educating

But we need to do much more!

slide-12
SLIDE 12

Digital Identity and Data Ownership

slide-13
SLIDE 13

Data Warehousing

slide-14
SLIDE 14

Blog Post

slide-15
SLIDE 15

Blog Post

slide-16
SLIDE 16
slide-17
SLIDE 17

Modern Tools: Github

slide-18
SLIDE 18

Modern Tools: Reddit

slide-19
SLIDE 19

Modern Tools: IPython Notebook

slide-20
SLIDE 20

Literate Computing

slide-21
SLIDE 21

Example: Prediction of Malaria Incidence in Uganda

◮ Work with John Quinn and Martin Mubaganzi (Makerere

University, Uganda)

◮ See http://air.ug/research.html.

slide-22
SLIDE 22

Malaria Prediction in Uganda

Data SRTM/NASA from http://dds.cr.usgs.gov/srtm/version2_1 29°E 31°E 33°E 35°E 2°S 0°N 2°N 4°N

(??)

slide-23
SLIDE 23

Malaria Prediction in Uganda

3 2 1 1 2 3 4 5 6

Sentinel - all patients

3 2 1 1 2 3 4 5 6

Sentinel - patients with malaria

3 2 1 1 2 3 4 5 6

HMIS - all_patients

3 2 1 1 2 3 4 5 6

Satellite - rain

1500 2000 2500 3000 3500 3 2 1 1 2 3 4 5 6

  • W. station - temperature

Nagongera / Tororo (Multiple output model)

slide-24
SLIDE 24

Malaria Prediction in Uganda

300 600 900 1200 1500 1800 1000 2000 3000 4000 5000

sparse regression incidence

300 600 900 1200 1500 1800

time (days)

1000 2000 3000 4000 5000

multiple output incidence

Mubende

slide-25
SLIDE 25

GP School at Makerere

slide-26
SLIDE 26

Early Warning Systems

slide-27
SLIDE 27

Early Warning Systems