The Data Science Process
Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com
The Data Science Process Polong Lin Big Data University Leader - - PowerPoint PPT Presentation
The Data Science Process Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com Every day , we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last
Polong Lin Big Data University Leader & Data Scientist IBM polong@ca.ibm.com
2
3
4
Business Understanding Data Understanding Data Preparation Analytic Approach Data Requirements Data Collection Modeling Evaluation Deployment Feedback
Cross Industry Standard Process for Data Mining
5
Business Understanding
6
Business Understanding
7
Business Understanding Analytic Approach
8
Data Understanding Data Requirements Data Collection
9
10
11
12
13
14
Anscombe's Quartet
15
Business Understanding Data Understanding Data Preparation Analytic Approach Data Requirements Data Collection Modeling Evaluation Deployment Feedback
16
Data Preparation
Modeling
17
Data Preparation Evaluation
20
[Age: 18, Sex: M, BMI: 23, Exercise: Frequent, Hobbies: Golf, …] [Age: 45, Sex: F, BMI: 28, Exercise: Frequent, Hobbies: Baseball, …] [Age: 83, Sex: F, BMI: 25, Exercise: Sedentary, Hobbies: Gymnastics, …] [Age: 28, Sex: M, BMI: 23, Exercise: Normal, Hobbies: Softball, …] [Age: 30, Sex: F, BMI: 25, Exercise: Normal, Hobbies: Golf, …] [Age: 15, Sex: M, BMI: 22, Exercise: Frequent, Hobbies: Golf, …] Model CLUSTER A CLUSTER B CLUSTER C CLUSTER B CLUSTER A CLUSTER A
21
[Age: 32, Sex: M, BMI: 23, Exercise: Frequent, … , Condition: Disorder 1 ] [Age: 45, Sex: F, BMI: 28, Exercise: Frequent, … , Condition: Healthy ] [Age: 63, Sex: F, BMI: 21, Exercise: Sedentary, … , Condition: Disorder 2 ] Model [Age: 48, Sex: M, BMI: 23, Exercise: Sedentary, … , Condition: ________ ] Disorder 1
22
Evaluation Modeling
23
Deployment Feedback
24
Business Understanding Data Understanding Data Preparation Analytic Approach Data Requirements Data Collection Modeling Evaluation Deployment Feedback
25
26
Variable #503
Variable #503
27
Variable #503
Variable #503
28
Variable #503
Variable #503
29
30
31
32
33
34
for your employees to gain skills in data science