D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. - - PowerPoint PPT Presentation

d ata s cience
SMART_READER_LITE
LIVE PREVIEW

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. - - PowerPoint PPT Presentation

D ATA S CIENCE E COSYSTEM M. T AMER ZSU N ANCY R EID R AYMOND N G U. W ATERLOO U. T ORONTO UBC D ATA S CIENCE /B IG D ATA IN THE N EWS Canadian Data Science Workshop 2 D ATA S CIENCE E VERYWHERE !... Canadian Data Science Workshop 3 D


slide-1
SLIDE 1

DATA SCIENCE ECOSYSTEM

  • M. TAMER ÖZSU

NANCY REID RAYMOND NG

  • U. WATERLOO
  • U. TORONTO

UBC

slide-2
SLIDE 2

Canadian Data Science Workshop

DATA SCIENCE/BIG DATA IN THE NEWS…

2

slide-3
SLIDE 3

Canadian Data Science Workshop

DATA SCIENCE EVERYWHERE!...

3

slide-4
SLIDE 4

Canadian Data Science Workshop

DATA SCIENCE EVERYWHERE!...

3

slide-5
SLIDE 5

Canadian Data Science Workshop

DATA SCIENCE EVERYWHERE!...

3

slide-6
SLIDE 6

Canadian Data Science Workshop

DATA SCIENCE VOCABULARY

4

slide-7
SLIDE 7

Canadian Data Science Workshop

WHAT IS DATA SCIENCE?

5

slide-8
SLIDE 8

Canadian Data Science Workshop

WHAT IS DATA SCIENCE?

  • “Data science, also known as data-driven science, is an

interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.”

5

slide-9
SLIDE 9

Canadian Data Science Workshop

WHAT IS DATA SCIENCE?

  • “Data science, also known as data-driven science, is an

interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.”

  • “Data science intends to analyze and understand actual

phenomena with ‘data’. In other words, the aim of data science is to reveal the features or the hidden structure of complicated natural, human, and social phenomena with data from a different point of view from the established or traditional theory and method.”

5

slide-10
SLIDE 10

Canadian Data Science Workshop

WHAT IS DATA SCIENCE?

  • Fourth paradigm
  • “… change of all sciences moving from observational, to

theoretical, to computational and now to the 4th Paradigm – Data-Intensive Scientific Discovery”

6

slide-11
SLIDE 11

Canadian Data Science Workshop

WHAT IS IMPORTANT?

Need to solve a real problem using data… No applications, no data science.

7

slide-12
SLIDE 12

Canadian Data Science Workshop

DATA SCIENCE AS A UNIFIER

8

Data Science

Humanities Machine/ Statistical Learning Application Domain Expertise Visualization

Mathematical Optimization

Social Science Law Data Management

slide-13
SLIDE 13

Canadian Data Science Workshop

DATA SCIENCE AND BIG DATA

  • They are not the “same thing”
  • Big data = crude oil
  • Big data is about extracting “crude oil”, transporting it in “mega tankers”,

siphoning it through “pipelines”, and storing it in “massive silos”

  • Data science is about refining the “crude oil”

Carlos Samohano Founder, Data Science London

9

slide-14
SLIDE 14

Canadian Data Science Workshop

DATA SCIENCE AND ARTIFICIAL INTELLIGENCE

Data Science Artificial Intelligence

ML/DM/ Analytics

10

slide-15
SLIDE 15

Canadian Data Science Workshop

DATA SCIENCE AND ARTIFICIAL INTELLIGENCE

Data Science Artificial Intelligence

ML/DM/ Analytics

10

“Data science produces insights. Machine learning produces predictions”

slide-16
SLIDE 16

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • Fraud detection
  • Investigate fraud patterns in past data
  • Early detection is important
  • Before damage propagates
  • Harder than late detection
  • Precision is important
  • False positive and false negative are both

bad

  • Real-time analytics

11

slide-17
SLIDE 17

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • Recommender systems
  • The ability to offer unique

personalized service

  • Increase sales, click-through rates,

conversions, …

  • Netflix recommender system valued at

$1B per year

  • Amazon recommender system drives a

20-35% lift in sales annually

  • Collaborative filtering at scale

12

slide-18
SLIDE 18

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • Predicting why patients are being

readmitted

  • Reduce costs
  • Improve population health
  • Find the “why” behind specific

populations being readmitted

  • Data lakes of multiple data sources
  • Investigate ties between readmission and

socioeconomic data points, patient history, genetics, …

13

slide-19
SLIDE 19

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • “Smart cities”
  • Not well-defined

14

slide-20
SLIDE 20

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • “Smart cities”
  • Not well-defined

14

slide-21
SLIDE 21

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • “Smart cities”
  • Not well-defined
  • Generally refers to using data and

ICT to

  • Better plan communities
  • Better manage assets
  • Reduce costs
  • Deploy open data to better engage

with community

14

slide-22
SLIDE 22

Canadian Data Science Workshop

DATA SCIENCE APPLICATION EXAMPLES

  • Moneyball
  • How to build a baseball team on a very

low budget by relying on data

  • Sabermetrics: the statistical analysis of

baseball data to objectively evaluate performance

  • 2002 record of 103-59 was joint best in

MLB

  • Team salary budget: $40 million
  • Other team: Yankees
  • Team salary budget: $120 million

15

slide-23
SLIDE 23

Canadian Data Science Workshop

HOLISTIC APPROACH TO DATA SCIENCE

Dissemination & Visualization Ethics, Policy & Social Impact

Core

Data Acquisition Data Preservation

16

Modeling & Analysis Management of Big Data Making Data Trustable & Usable Data Security & Privacy

Application Application Application Application

slide-24
SLIDE 24

Canadian Data Science Workshop

CORE RESEARCH ISSUES & INTERACTIONS

Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management

17

slide-25
SLIDE 25

Canadian Data Science Workshop

CORE RESEARCH ISSUES & INTERACTIONS

Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management

  • Data cleaning
  • Sampling
  • Data provenance

17

slide-26
SLIDE 26

Canadian Data Science Workshop

CORE RESEARCH ISSUES & INTERACTIONS

Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management

  • Data cleaning
  • Sampling
  • Data provenance
  • Data lakes
  • Batch & online access
  • Platforms

17

slide-27
SLIDE 27

Canadian Data Science Workshop

CORE RESEARCH ISSUES & INTERACTIONS

Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management

  • Data cleaning
  • Sampling
  • Data provenance
  • Data lakes
  • Batch & online access
  • Platforms
  • Models & methods for data

lakes

  • Unsupervised

classification & AI

17

slide-28
SLIDE 28

Canadian Data Science Workshop

CORE RESEARCH ISSUES & INTERACTIONS

Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management

  • Data cleaning
  • Sampling
  • Data provenance
  • Data lakes
  • Batch & online access
  • Platforms
  • Models & methods for data

lakes

  • Unsupervised

classification & AI

  • Visualization for wider

audience

  • Visualization for data

exploration

  • Open data technologies

17

slide-29
SLIDE 29

Canadian Data Science Workshop

CORE RESEARCH ISSUES & INTERACTIONS

Making Data Trustable & Usable Modelling & Analysis Data Visualization & Dissemination Big Data Management

  • Data cleaning
  • Sampling
  • Data provenance
  • Data lakes
  • Batch & online access
  • Platforms
  • Models & methods for data

lakes

  • Unsupervised

classification & AI

  • Visualization for wider

audience

  • Visualization for data

exploration

  • Open data technologies
  • DM support for

provenance

  • Data preparation for big

data management

  • Cleaning for data

analysis

  • DM for ML
  • ML for DM
  • Visual analytics

17