Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp - - PowerPoint PPT Presentation

welcome
SMART_READER_LITE
LIVE PREVIEW

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp - - PowerPoint PPT Presentation

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Supervised Learning in R: Case Studies In this course, you will... use exploratory data


slide-1
SLIDE 1

DataCamp Supervised Learning in R: Case Studies

Welcome!

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-2
SLIDE 2

DataCamp Supervised Learning in R: Case Studies

In this course, you will...

use exploratory data analysis to prepare for predictive modeling explore which modeling approaches to use for different kinds of data practice implementing supervised machine learning for classification and regression

slide-3
SLIDE 3

DataCamp Supervised Learning in R: Case Studies

Supervised machine learning

Regression Classification

slide-4
SLIDE 4

DataCamp Supervised Learning in R: Case Studies

Case studies

Fuel efficiency for cars Stack Overflow Developer Survey Voter turnout Predict age of nuns from survey responses

slide-5
SLIDE 5

DataCamp Supervised Learning in R: Case Studies

Fuel efficiency

slide-6
SLIDE 6

DataCamp Supervised Learning in R: Case Studies

Fuel efficiency

From the US Department of Energy

> cars2018 # A tibble: 1,144 x 15 Model `Model Index` Displacement Cylinders Gears Transmission MPG <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> 1 Acura NSX 57.0 3.50 6.00 9.00 Manual 21.0 2 ALFA ROMEO 4C 410 1.80 4.00 6.00 Manual 28.0 3 Audi R8 AWD 65.0 5.20 10.0 7.00 Manual 17.0 4 Audi R8 RWD 71.0 5.20 10.0 7.00 Manual 18.0 5 Audi R8 Spyde… 66.0 5.20 10.0 7.00 Manual 17.0 6 Audi R8 Spyde… 72.0 5.20 10.0 7.00 Manual 18.0 7 Audi TT Roads… 46.0 2.00 4.00 6.00 Manual 26.0 8 BMW M4 DTM Ch… 488 3.00 6.00 7.00 Manual 20.0 9 Bugatti Chiron 38.0 8.00 16.0 7.00 Manual 11.0 10 Chevrolet COR… 278 6.20 8.00 8.00 Automatic 18.0 # ... with 1,134 more rows, and 8 more variables: Aspiration <chr>, `Lockup # Torque Converter` <chr>, Drive <chr>, `Max Ethanol` <dbl>, `Recommended # Fuel` <fct>, `Intake Valves Per Cyl` <dbl>, `Exhaust Valves Per Cyl` <dbl>, # `Fuel injection` <chr>

slide-7
SLIDE 7

DataCamp Supervised Learning in R: Case Studies

Fuel efficiency

From the US Department of Energy

> names(cars2018) [1] "Model" "Model Index" [3] "Displacement" "Cylinders" [5] "Gears" "Transmission" [7] "MPG" "Aspiration" [9] "Lockup Torque Converter" "Drive" [11] "Max Ethanol" "Recommended Fuel" [13] "Intake Valves Per Cyl" "Exhaust Valves Per Cyl" [15] "Fuel injection"

slide-8
SLIDE 8

DataCamp Supervised Learning in R: Case Studies

Special characters in variable names

> cars2018 %>% + select(`Fuel injection`) # A tibble: 1,144 x 1 `Fuel injection` <chr> 1 Direct ignition 2 Direct ignition 3 Direct ignition 4 Direct ignition 5 Direct ignition 6 Direct ignition 7 Direct ignition 8 Direct ignition 9 Multipoint/sequential ignition 10 Direct ignition # ... with 1,134 more rows

slide-9
SLIDE 9

DataCamp Supervised Learning in R: Case Studies

Exploratory data analysis

slide-10
SLIDE 10

DataCamp Supervised Learning in R: Case Studies

Exploratory data analysis

library(tidyverse)

ggplot2 dplyr tidyr

  • thers!

To learn more about the tidyverse, visit this . page

slide-11
SLIDE 11

DataCamp Supervised Learning in R: Case Studies

Time to train some models!

SUPERVISED LEARNING IN R: CASE STUDIES

slide-12
SLIDE 12

DataCamp Supervised Learning in R: Case Studies

Getting started with caret

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-13
SLIDE 13

DataCamp Supervised Learning in R: Case Studies

Predicting fuel efficiency

slide-14
SLIDE 14

DataCamp Supervised Learning in R: Case Studies

Tools for predictive modeling

THE PACKAGE CARET

slide-15
SLIDE 15

DataCamp Supervised Learning in R: Case Studies

slide-16
SLIDE 16

DataCamp Supervised Learning in R: Case Studies

Training data and testing data with caret

> library(caret) > > in_train <- createDataPartition(cars_vars$Aspiration, + p = 0.8, list = FALSE) > training <- cars_vars[in_train,] > testing <- cars_vars[-in_train,]

slide-17
SLIDE 17

DataCamp Supervised Learning in R: Case Studies

Training data and testing data with caret

Build your model with your training data Choose your model with your validation data Evaluate your model with your testing data

slide-18
SLIDE 18

DataCamp Supervised Learning in R: Case Studies

Training a model

Train a model Evaluate that model using

> fit_lm <- train(log(MPG) ~ ., method = "lm", data = training, + trControl = trainControl(method = "none"))

yardstick

slide-19
SLIDE 19

DataCamp Supervised Learning in R: Case Studies

Evaluating a model

THE PACKAGE YARDSTICK

slide-20
SLIDE 20

DataCamp Supervised Learning in R: Case Studies

Let's practice!

SUPERVISED LEARNING IN R: CASE STUDIES

slide-21
SLIDE 21

DataCamp Supervised Learning in R: Case Studies

Training a model with resampling

SUPERVISED LEARNING IN R: CASE STUDIES

Julia Silge

Data Scientist at Stack Overflow

slide-22
SLIDE 22

DataCamp Supervised Learning in R: Case Studies

Bootstrap resampling

Sample with replacement from the original dataset

slide-23
SLIDE 23

DataCamp Supervised Learning in R: Case Studies

slide-24
SLIDE 24

DataCamp Supervised Learning in R: Case Studies

slide-25
SLIDE 25

DataCamp Supervised Learning in R: Case Studies

Bootstrap resampling with caret

> cars_rf_bt <- train(log(MPG) ~ ., method = "rf", + data = training, + trControl = trainControl(method = "boot")

slide-26
SLIDE 26

DataCamp Supervised Learning in R: Case Studies

Comparing predicted to real values

`log(MPG)` `Linear regression` `Random forest` <dbl> <dbl> <dbl> 1 2.89 2.79 2.83 2 2.89 3.00 2.89 3 3.26 3.22 3.26 4 3.14 3.09 3.10 5 3.26 3.22 3.26 6 2.89 3.11 2.98 7 2.48 2.59 2.51 8 2.71 2.81 2.82 9 3.37 3.29 3.27 10 2.83 2.90 2.90

slide-27
SLIDE 27

DataCamp Supervised Learning in R: Case Studies

Visualizing model predictions

slide-28
SLIDE 28

DataCamp Supervised Learning in R: Case Studies

Let's practice!

SUPERVISED LEARNING IN R: CASE STUDIES