Data science for business Sebastian Sauer - - PowerPoint PPT Presentation

data science for business
SMART_READER_LITE
LIVE PREVIEW

Data science for business Sebastian Sauer - - PowerPoint PPT Presentation

10.5.2019 Data science for business Data science for business Sebastian Sauer file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 1/27


slide-1
SLIDE 1

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 1/27

Data science for business

Sebastian Sauer

slide-2
SLIDE 2

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 2/27

Five Questions

  • n the use of data science for business
  • 1. What's the meaning of data science, machine learning, and all these

fancy terms?

  • 2. What's the best model out there?
  • 3. How do I know my model is doing good or bad?
  • 4. Can you give me a cook book for data science?
  • 5. What are all the core concepts of the eld?

2 / 27

slide-3
SLIDE 3

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 3/27

  • 1. What's the meaning of data

science, machine learning, and all these fancy terms?

3 / 27

slide-4
SLIDE 4

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 4/27

statistical models:

probability theory

machine learning:

algorithmic models

Source: Wikipedia by en:User:RolandH 4 / 27

slide-5
SLIDE 5

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 5/27

'data science' is a popular term

Google Trends (2019-04-32) of data analysis jargon

2015 2016 2017 2018 2019 25 50 75 100 artificial intelligence data mining Data science machine learning Predictive analytics predictive modeling statistical modeling

date hits keyword

5 / 27

slide-6
SLIDE 6

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 6/27

What's data science?

Depends on whom you ask.

6 / 27

slide-7
SLIDE 7

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 7/27

Common theme

Art and science of learning from data

Y = f(X) + ϵ ^ Y = ^ f (X)

7 / 27

slide-8
SLIDE 8

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 8/27

Machine learning: Feed the computer data, not rules

Source: Molnar, C. (2019). Interpretable Machine Learning [ePub Book]. Morrisville, NC: Christoph Molnar. 8 / 27

slide-9
SLIDE 9

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 9/27

  • 2. What's the best model out

there?

9 / 27

slide-10
SLIDE 10

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 10/27

package caret

getModelInfo() %>% names() %>% length() ## [1] 238

A lot of models out there

Show

5

entries Search: Showing 1 to 5 of 238 entries Previous 1 2 3 4 5 … 48 Next name value 1 1 ada 2 2 AdaBag 3 3 AdaBoost.M1 4 4 adaboost 5 5 amdai

10 / 27

slide-11
SLIDE 11

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 11/27

Wait, tell me which model is best

11 / 27

slide-12
SLIDE 12

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 12/27

Black box models

Random forests Support vector machines Neural networks ...

less interpretable more accurate (at times) less robust "White box" models

Linear regression k-nearest neighbours Decision trees ...

more interpretable less accurate (at times) more robust

There is no single best model

12 / 27

slide-13
SLIDE 13

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 13/27

Blackbox models do not explain

Source: Molnar, C. (2019). Interpretable Machine Learning [ePub Book]. Morrisville, NC: Christoph Molnar. 13 / 27

slide-14
SLIDE 14

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 14/27

Ensemble learners show a good track record

Source: Sauer, S. (2018). Moderne Datenanalyse mit R: Daten einlesen, aufbereiten, visualisieren und modellieren. Wiesbaden: Springer. 14 / 27

slide-15
SLIDE 15

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 15/27

The t of a model depends on eg the linearity of associations

15 / 27

slide-16
SLIDE 16

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 16/27

  • 3. How do I know my model is

doing good or bad?

16 / 27

slide-17
SLIDE 17

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 17/27

Short answer: The less error, the better the model

17 / 27

slide-18
SLIDE 18

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 18/27

Wait ...

Which model do you prefer?

18 / 27

slide-19
SLIDE 19

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 19/27

  • 4. Can you give me a cook book

for data science?

19 / 27

slide-20
SLIDE 20

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 20/27

Classify stu Estimate stu

Step 1: Choose your model(s)

20 / 27

slide-21
SLIDE 21

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 21/27

Overtting Undertting

Step 2: Build model fed on historical data

21 / 27

slide-22
SLIDE 22

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 22/27

Step 3: Predict the future

Run the model on new data

22 / 27

slide-23
SLIDE 23

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 23/27

Step 4: Evaluate the model

23 / 27

slide-24
SLIDE 24

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 24/27

Here's one way how to get going

Source: https://www.williamrchase.com/slides/slide_img/throw_into_pool.gif

24 / 27

slide-25
SLIDE 25

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 25/27

Some literature explaining core concepts of data science

Grolemund, G., & Wickham, H. (2016). R for Data Science. Retrieved from https://books.google.de/books?id=aZRYrgEACAAJ James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 6). New York City, NY: Springer. Sauer, S. (2019). Moderne Datenanalyse mit R: Daten einlesen, aufbereiten, visualisieren und modellieren (1. Auage 2019). Wiesbaden: Springer.

25 / 27

slide-26
SLIDE 26

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 26/27

Sebastian Sauer  sebastiansauer  https://data-se.netlify.com/  sebastian.sauer@data-divers.com  sauer_sebastian  Get slides here: https://data-se.netlify.com/slides/afd_ecda2019/afd- modeling-ECDA-2019.pdf CC-BY

26 / 27

slide-27
SLIDE 27

10.5.2019 Data science for business file:///Users/sebastiansaueruser/Documents/Publikationen/blog_ses/data_se/public/slides/data-science-business/intro-data-science-talk-2019-05-14.html#31 27/27

Reproducibility

Versions of employed software as of 2019-05-10, running this OS: macOS Mojave 10.14.4. Built with R, R version 3.5.3 (2019-03-11), RStudio 1.2.1335, xaringan, on the shoulders of giants Source Code: XXX Icons are from FontAwesome, licenced under CC-BY-4 (details) R-Packages used: assertthat_0.2.1, backports_1.1.4, broom_0.5.2, Cairo_1.5-10, caret_6.0-82,

cellranger_1.1.0, class_7.3-15, cli_1.1.0, codetools_0.2-16, colorspace_1.4-1, crayon_1.3.4, crosstalk_1.0.0, data.table_1.12.2, digest_0.6.18, dplyr_0.8.0.1, DT_0.5, evaluate_0.13, forcats_0.4.0, foreach_1.4.4, generics_0.0.2, ggplot2_3.1.1, ggrepel_0.8.0, glue_1.3.1.9000, gower_0.2.0, gridExtra_2.3, gtable_0.3.0, gtrendsR_1.4.2, haven_2.1.0, hms_0.4.2, htmltools_0.3.6, htmlwidgets_1.3, httpuv_1.5.1, httr_1.4.0, icon_0.1.0, ipred_0.9-8, iterators_1.0.10, jsonlite_1.6, knitr_1.22, labeling_0.3, later_0.8.0, lattice_0.20-38, lava_1.6.5, lazyeval_0.2.2, lubridate_1.7.4, magrittr_1.5, MASS_7.3-51.1, Matrix_1.2-15, mime_0.6, ModelMetrics_1.2.2, modelr_0.1.4, munsell_0.5.0, nlme_3.1- 137, nnet_7.3-12, pillar_1.3.1, pkgcong_2.0.2, plotly_4.9.0, plyr_1.8.4, prodlim_2018.04.18, promises_1.0.1, purrr_0.3.2, R6_2.4.0, Rcpp_1.0.1, readr_1.3.1, readxl_1.3.1, recipes_0.1.5, reshape2_1.4.3, rlang_0.3.4, rmarkdown_1.12.6, rpart_4.1- 13, rprojroot_1.3-2, rstudioapi_0.10, rvest_0.3.3, scales_1.0.0, sessioninfo_1.1.1.9000, shiny_1.3.1, stringi_1.4.3, stringr_1.4.0, survival_2.43-3, tibble_2.1.1, tidyr_0.8.3, tidyselect_0.2.5, tidyverse_1.2.1, timeDate_3043.102, viridisLite_0.3.0, withr_2.1.2, xaringan_0.9, xaringanthemer_0.2.0, xfun_0.6, xml2_1.2.0, xtable_1.8-3, yaml_2.2.0

Last update 2019-05-10

27 / 27