surveying catholic sisters in 1967
play

Surveying Catholic sisters in 1967 Julia Silge Data Scientist at - PowerPoint PPT Presentation

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Surveying Catholic sisters in 1967 Julia Silge Data Scientist at Stack Overflow DataCamp Supervised Learning in R: Case Studies Conference of Major


  1. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Surveying Catholic sisters in 1967 Julia Silge Data Scientist at Stack Overflow

  2. DataCamp Supervised Learning in R: Case Studies Conference of Major Superiors of Women Sisters' Survey Fielded in 1967 with over 600 questions Responses from over 130,000 sisters in almost 400 congregations Data is freely available

  3. DataCamp Supervised Learning in R: Case Studies Opinions and attitudes in the 1960s Response Code Disagree very much 1 Disagree somewhat 2 Neither agree nor disagree 3 Agree somewhat 4 Agree very much 5 Check out the survey's codebook .

  4. DataCamp Supervised Learning in R: Case Studies Opinions and attitudes in the 1960s > sisters67 # A tibble: 77,112 x 67 age sister v116 v117 v118 v119 v120 v121 v122 v123 v124 v125 <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 1 60.0 1 1 1 3 5 1 1 3 5 3 1 2 70.0 2 2 2 4 4 1 3 1 5 4 1 3 60.0 3 1 1 3 2 2 3 1 1 3 1 4 60.0 4 5 1 2 4 1 3 4 3 3 4 5 50.0 5 2 3 3 3 2 2 1 5 2 5 6 40.0 7 4 3 2 5 4 3 1 5 2 5 7 50.0 9 5 4 5 4 4 5 3 5 4 2 8 40.0 10 5 4 3 5 1 3 5 5 5 4 9 30.0 11 2 2 3 5 1 3 3 5 1 3 10 30.0 12 4 1 5 5 1 4 3 5 1 5 # ... with 77,102 more rows, and 55 more variables: v126 <int>, v127 <int>, # v128 <int>, v129 <int>, v130 <int>, v131 <int>, v132 <int>, v133 <int>, # v134 <int>, v135 <int>, v136 <int>, v137 <int>, v138 <int>, v139 <int>, # v140 <int>, v141 <int>, v142 <int>, v143 <int>, v144 <int>, v145 <int>, # v146 <int>, v147 <int>, v148 <int>, v149 <int>, v150 <int>, v151 <int>, # v152 <int>, v153 <int>, v154 <int>, v155 <int>, v156 <int>, v157 <int>, # v158 <int>, v159 <int>, v160 <int>, v161 <int>, v162 <int>, v163 <int>, # v164 <int>, v165 <int>, v166 <int>, v167 <int>, v168 <int>, v169 <int>, # v170 <int>, v171 <int>, v172 <int>, v173 <int>, v174 <int>, v175 <int>, # v176 <int>, v177 <int>, v178 <int>, v179 <int>, v180 <int>

  5. DataCamp Supervised Learning in R: Case Studies

  6. DataCamp Supervised Learning in R: Case Studies Opinions and attitudes in the 1960s "Catholics should boycott indecent movies." "In the past 25 years, this country has moved dangerously close to socialism." "I would rather be called an idealist than a practical person."

  7. DataCamp Supervised Learning in R: Case Studies Tidy your data > sisters67 %>% + select(-sister) %>% + gather(key, value, -age) # A tibble: 5,012,280 x 3 age key value <dbl> <chr> <int> 1 60.0 v116 1 2 70.0 v116 2 3 60.0 v116 1 4 60.0 v116 5 5 50.0 v116 2 6 40.0 v116 4 7 50.0 v116 5 8 40.0 v116 5 9 30.0 v116 2 10 30.0 v116 4 # ... with 5,012,270 more rows

  8. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's practice!

  9. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Exploratory data analysis with tidy data Julia Silge Data Scientist at Stack Overflow

  10. DataCamp Supervised Learning in R: Case Studies Counting agreement > tidy_sisters %>% + count(value) # A tibble: 5 x 2 value n <int> <int> 1 1 1303555 2 2 844311 3 3 645401 4 4 1108859 5 5 1110154

  11. DataCamp Supervised Learning in R: Case Studies Overall agreement with age > tidy_sisters %>% + group_by(age) %>% + summarise(value = mean(value)) # A tibble: 9 x 2 age value <dbl> <dbl> 1 20.0 2.86 2 30.0 2.81 3 40.0 2.83 4 50.0 2.94 5 60.0 3.10 6 70.0 3.26 7 80.0 3.42 8 90.0 3.51 9 100 3.60

  12. DataCamp Supervised Learning in R: Case Studies Agreement on questions by age tidy_sisters %>% filter(key %in% paste0("v", 153:170)) %>% group_by(key, value) %>% summarise(age = mean(age)) %>% ggplot(aes(value, age, color = key)) + geom_line(alpha = 0.5, size = 1.5) + geom_point(size = 2) + facet_wrap(~key)

  13. DataCamp Supervised Learning in R: Case Studies

  14. DataCamp Supervised Learning in R: Case Studies

  15. DataCamp Supervised Learning in R: Case Studies Freedom of speech "People who don't believe in God have as much right to freedom of speech as anyone else."

  16. DataCamp Supervised Learning in R: Case Studies

  17. DataCamp Supervised Learning in R: Case Studies Conservatism and heritage "I like conservatism because it represents a stand to preserve our glorious heritage."

  18. DataCamp Supervised Learning in R: Case Studies

  19. DataCamp Supervised Learning in R: Case Studies Vietnam War "Catholics as a group should consider active opposition to US participation in Vietnam."

  20. DataCamp Supervised Learning in R: Case Studies

  21. DataCamp Supervised Learning in R: Case Studies

  22. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's explore!

  23. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting age with supervised machine learning Julia Silge Data Scientist at Stack Overflow

  24. DataCamp Supervised Learning in R: Case Studies Build models "rpart" "xgbLinear" "gbm"

  25. DataCamp Supervised Learning in R: Case Studies Choosing between multiple models ## CART sisters_cart <- train(age ~ ., method = "rpart", data = training) ## xgboost sisters_rf <- train(age ~ ., method = "xgbLinear", data = training) ## gbm sisters_gbm <- train(age ~ ., method = "gbm", data = training)

  26. DataCamp Supervised Learning in R: Case Studies

  27. DataCamp Supervised Learning in R: Case Studies Why three data partitions? Don't overestimate how well your model is performing!

  28. DataCamp Supervised Learning in R: Case Studies Why three data partitions? > validation %>% + mutate(prediction = predict(sisters_xg, validation)) %>% + rmse(truth = age, estimate = prediction) [1] 13.27101 > testing %>% + mutate(prediction = predict(sisters_xg, testing)) %>% + rmse(truth = age, estimate = prediction) [1] 13.36945

  29. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's practice!

  30. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES You made it! Julia Silge Data Scientist at Stack Overflow

  31. DataCamp Supervised Learning in R: Case Studies Predicting age > metrics(model_results, truth = age, estimate = CART) # A tibble: 1 x 2 rmse rsq <dbl> <dbl> 1 14.8 0.170 > metrics(model_results, truth = age, estimate = XBG) # A tibble: 1 x 2 rmse rsq <dbl> <dbl> 1 13.3 0.338 > metrics(model_results, truth = age, estimate = GBM) # A tibble: 1 x 2 rmse rsq <dbl> <dbl> 1 12.8 0.382

  32. DataCamp Supervised Learning in R: Case Studies Predicting age Build your model with your training data Choose your model with your validation data Evaluate your model with your testing data

  33. DataCamp Supervised Learning in R: Case Studies Diverse data, powerful tools Fuel efficiency of cars Developers working remotely in the Stack Overflow survey Voter turnout in 2016 Catholic nuns' ages based on beliefs and attitudes

  34. DataCamp Supervised Learning in R: Case Studies Practical machine learning Dealing with class imbalance Improving performance with resampling (bootstrap, cross-validation)

  35. DataCamp Supervised Learning in R: Case Studies Practical machine learning Dealing with class imbalance Improving performance with resampling (bootstrap, cross-validation) Hyperparameter tuning?

  36. DataCamp Supervised Learning in R: Case Studies Practical machine learning Try out multiple modeling approaches for each new problem Overall, gradient tree boosting and random forest perform well

  37. DataCamp Supervised Learning in R: Case Studies Never skip exploratory data analysis

  38. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Go train some models!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend