32nd International Conference on Technology in Collegiate Mathematics
VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to - - PowerPoint PPT Presentation
VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to - - PowerPoint PPT Presentation
32 nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE ictcm.com | #ICTCM A Unified Introduction to Predictive Model Building for Undergraduate Researchers Hasthika Rupasinghe * Lasanthi Watagoda * Alan
A Unified Introduction to Predictive Model Building for Undergraduate Researchers
Hasthika Rupasinghe * Lasanthi Watagoda * Alan Arnholt
Appalachian State University
ICTCM 2020
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 1 / 14
Outline
1 Problem 2 Our approach 3 Classroom trials 4 Structure
Guided Lab I: Data Cleaning Guided Lab II: Linear Model Fitting Guided Lab III: Non–Linear Model Fitting
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 2 / 14
Problems
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14
Problems
Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14
Problems
Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks. One of the challenges instructors face when using a standard text is providing activities that mimic a data scientist’s experience since data sets that accompany standard texts are generally clean and ready to be analyzed.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14
Problems
Detailed explanations of many algorithms used by researchers to create predictive models along with directions on how to use software to implement the algorithms are not commonly found in undergraduate textbooks. One of the challenges instructors face when using a standard text is providing activities that mimic a data scientist’s experience since data sets that accompany standard texts are generally clean and ready to be analyzed. A second challenge is the plethora of R packages and differing syntax among R packages one may choose to implement the numerous statistical learning algorithms.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 3 / 14
Our approach
This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers:
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14
Our approach
This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14
Our approach
This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Build several models making minimal changes to the R syntax
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14
Our approach
This work presents a unified introduction to building supervised prediction models using the caret package and provide guided labs where readers: Question the integrity of a data set and correct data entries to create a “clean” data set Build several models making minimal changes to the R syntax Practice reproducible research
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 4 / 14
Note:
Instructors:
The material in this article is suitable for use in classes where the instructors have advanced degrees in statistics and experience using R in the classroom.
Students:
Must have some knowledge in linear regression models (for Lab II) and classification models (for Lab III).
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 5 / 14
Classroom tested
The guided labs have been used with two undergraduate classes. These labs were implemented in the courses where the students were already using R, R Markdown, and had been exposed to ggplot2. Data Science II — STT 3860 where the students used the guided project also has as prerequisites:
a standard undergraduate (non-calculus based) introductory statistics course a data visualization and management course (Data Science I — STT 2860).
Statistical Data Analysis II (STT 3851) has a prerequisite Statistical Data Analysis I (STT 3850)
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 6 / 14
Structure
The Guided Labs are hosted on the Rstudio cloud and on GitHub: Questioning and Cleaning the bodyfat data Lab:
GitHub repository rstudio.cloud project
Linear models with the bodyfat data Lab:
GitHub repository rstudio.cloud project
Non-linear models with the bodyfat data Lab:
GitHub repository rstudio.cloud project
Instructor manual
Instructors are welcome to email: hasthika@appstate.edu, lasanthi@appstate.edu or arnholtat@appstate.edu to get an instructor version of the labs.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 7 / 14
Data
Boston Data
The Boston data set from the MASS package written by Ripley (2019) is used to illustrate various steps in predictive model building.
BodyFat Data
We use the data set provided in the article Fitting Percentage of Body Fat to Simple Body Measurements, Johnson (1996)
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 8 / 14
Lab I: Questioning and Cleaning the Body Fat Data
Guided Lab I: Data Cleaning
https://rstudio.cloud/project/1164604
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 9 / 14
Lab I: Questioning and Cleaning the Body Fat Data
Guided Lab I: Data Cleaning
https://rstudio.cloud/project/1164604 The purpose of this activity is to have the reader critically question, evaluate, and clean the original BodyFat data.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 9 / 14
Lab II: Fitting Linear Regression Models to Body Fat Data
Guided Lab II: Linear Model Fitting
https://rstudio.cloud/project/323646
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 10 / 14
Lab II: Fitting Linear Regression Models to Body Fat Data
Guided Lab II: Linear Model Fitting
https://rstudio.cloud/project/323646 The purpose of this activity is to have the reader create several regression models to predict the Body Fat using the some or all of the body measurements (explanatory variables) found in the Body Fat Data.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 10 / 14
Lab III: Fitting Non-Linear Regression Models to Body Fat Data
Guided Lab III: Non–Linear Model Fitting
https://rstudio.cloud/project/1169242
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 11 / 14
Lab III: Fitting Non-Linear Regression Models to Body Fat Data
Guided Lab III: Non–Linear Model Fitting
https://rstudio.cloud/project/1169242 The purpose of this activity is to have the reader create several non-linear regression models to predict the Body Fat using the some or all of the body measurements (explanatory variables) found in the Body Fat Data.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 11 / 14
References
- 1. Francis J. Anscombe, Graphs in statistical analysis, The American
Statistician, 27 (1973), 17-21.
- 2. A. Azzalini and A.W. Bowman, A look at some data on the Old
Faithful geyser, Journal of the Royal Statistical Society, Series C, 39 (1990), 357-366.
- 3. P
. Bickel and J.W. O’Connell, Is there a sex bias in graduate admissions?, Science, 187 (1975), 398-404.
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 12 / 14
Thank You!
Hasthika Rupasinghe and Lasanthi Watagoda A Unified Introduction to Model Building ICTCM 2020 13 / 14
32nd International Conference on Technology in Collegiate Mathematics VIRTUAL CONFERENCE
#ICTCM