Introduction to Data Science Winter Semester 2019/20 Oliver Ernst - PowerPoint PPT Presentation

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides

Contents I 1 What is Data Science? 2 Learning Theory 2.1 What is Statistical Learning? 2.2 Assessing Model Accuracy 3 Linear Regression 3.1 Simple Linear Regression 3.2 Multiple Linear Regression 3.3 Other Considerations in the Regression Model 3.4 Revisiting the Marketing Data Questions 3.5 Linear Regression vs. K -Nearest Neighbors 4 Classification 4.1 Overview of Classification 4.2 Why Not Linear Regression? 4.3 Logistic Regression 4.4 Linear Discriminant Analysis 4.5 A Comparison of Classification Methods 5 Resampling Methods Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 3 / 463

Contents II 5.1 Cross Validation 5.2 The Bootstrap 6 Linear Model Selection and Regularization 6.1 Subset Selection 6.2 Shrinkage Methods 6.3 Dimension Reduction Methods 6.4 Considerations in High Dimensions 6.5 Miscellanea 7 Nonlinear Regression Models 7.1 Polynomial Regression 7.2 Step Functions 7.3 Regression Splines 7.4 Smoothing Splines 7.5 Generalized Additive Models 8 Tree-Based Methods 8.1 Decision Tree Fundamentals 8.2 Bagging, Random Forests and Boosting Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 4 / 463

Contents III 9 Unsupervised Learning 9.1 Principal Components Analysis 9.2 Clustering Methods Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 5 / 463

Contents 7 Nonlinear Regression Models 7.1 Polynomial Regression 7.2 Step Functions 7.3 Regression Splines 7.4 Smoothing Splines 7.5 Generalized Additive Models Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 336 / 463

Nonlinear Regression Models Chapter overview • Despite the benefits of simplicity and interpretability of the standard linear model for regression, it will suffer from large bias if the model generating the data depends nonlinearly on the predictors. • In this chapter we explore methods which make the linear regression model more flexible by using linear combinations of nonlinear functions , specifi- cally 1 polynomial and piecewise polynomial functions, 2 piecewise constant functions, 3 piecewise piecewise polynomial functions with penalty terms and 4 generalized additive model functions of the predictors. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 337 / 463

Contents 7 Nonlinear Regression Models 7.1 Polynomial Regression 7.2 Step Functions 7.3 Regression Splines 7.4 Smoothing Splines 7.5 Generalized Additive Models Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 338 / 463

Nonlinear Regression Models Polynomial Regression • For univariate models, polynomial regression replaces the simple linear regression model Y = β 0 + β 1 X + ε with a polynomial of degree d > 1 in the predictor variable Y = β 0 + β 1 X + β 2 X 2 + · · · + β d X d + ε. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 339 / 463

Nonlinear Regression Models Polynomial Regression • For univariate models, polynomial regression replaces the simple linear regression model Y = β 0 + β 1 X + ε with a polynomial of degree d > 1 in the predictor variable Y = β 0 + β 1 X + β 2 X 2 + · · · + β d X d + ε. • High degree polynomials are often difficult to handle due to their oscillato- ry behavior and their unboundedness for large arguments, so that degrees higher than 4 can become problematic if employed naively. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 339 / 463

Nonlinear Regression Models Polynomial Regression • For univariate models, polynomial regression replaces the simple linear regression model Y = β 0 + β 1 X + ε with a polynomial of degree d > 1 in the predictor variable Y = β 0 + β 1 X + β 2 X 2 + · · · + β d X d + ε. • High degree polynomials are often difficult to handle due to their oscillato- ry behavior and their unboundedness for large arguments, so that degrees higher than 4 can become problematic if employed naively. • Example: Wage data set: income and demographic information for males who reside in the central Atlantic region of the United States. Fit response wage [in $ 1000] to predictor age by LS using a polynomial of degree d = 4. Oliver Ernst (NM) Introduction to Data Science Winter Semester 2018/19 339 / 463

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst - PowerPoint PPT Presentation

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory 2.1 What is Statistical Learning?

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

EMIS/DS 1300: A Practical Introduction to Data Science Slides by Michael Hahsler Data + Science

Introduction and lists Jason Myers Instructor DataCamp Data Types for Data Science Data types

Data Science in the Wild Lecture 1: Introduction Eran Toch Data Science in the Wild, Spring 2019

Data Science: Statistics or Computer Science? 9/15/2015 DATA SCIENCE: STATISTICS OR COMPUTER

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Kotlin for Data Science Thomas Nield @thomasnield9727 Agenda Kotlin for Data Science

CSCI 3022 Intro to Data Science with Probability and Statistics What is Data Science? What is

DATA SCIENCE DAN S REZNIK, DIRECTOR DATA SCIENCE CONSULTING LTD (c) 2019 Data Science Consutling

Data Science in the Wild Lecture 12: Memory-Based Data Warehouses Eran Toch Data Science in the

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data Set Overview

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

ww www. w.big bigbang bang-datasc atascience.com ience.com Agenda BBDS Team Data Science

Introduction to Data Science January 11, 2016 About this course DATA 5000: Introduction to Data

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Creating and looping

Using dictionaries Jason Myers Instructor DataCamp Data Types for Data Science Creating and

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by

High Dimensional Bayesian Optimisation and Bandits via Additive Models Kirthevasan Kanda samy ,

The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les

A Modern History of Probability Theory Kevin H. Knuth Depts. of Physics and Informatics

Presentation 7.3a: Multiple linear re- gression Murray Logan July 19, 2017 Table of contents

Lattice and Non-Lattice Markov Additive Models Jevgenijs Ivanovs, Guy Latouche and Peter Taylor

Tetraquarks in the Steiner tree model of confinement available at

Exploring models Summary, explainability, and prediction R.W. Oldford Modelling Recall how J.W.