Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on - PowerPoint PPT Presentation

Stat 451 Lecture Notes 01 12 Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaard’s ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: January 13, 2016 1 / 56

What to compute? Stat 451 is a course about computational statistics. Therefore, it is important first to discuss what we want to compute in a statistics problems. Here, we are basically concerned with two kinds of things: maximizing the likelihood function integrating a “posterior distribution” The former notion should be familiar from your experience with maximum likelihood in Stat 411. The latter may be new to you — it’s “Bayesian”. Next is a brief introduction to these concepts, along with a non-trivial illustration. 2 / 56

Maximum likelihood Suppose we have n independent observations, Y 1 , . . . , Y n , and the density/mass function p θ for these observations depends on an unknown parameter θ . The likelihood and log-likelihood functions are n n � � L ( θ ) = p θ ( Y i ) and ℓ ( θ ) = log p θ ( Y i ) . i =1 i =1 The maximum likelihood estimator (MLE) ˆ θ of θ , based on data, maximizes the likelihood, i.e., ˆ ⇒ ˙ ℓ (ˆ θ = arg max L ( θ ) ⇐ θ ) = 0 . θ Need to be able to optimize and/or find roots of functions. 3 / 56

Maximum likelihood (cont) Besides producing an estimate of the unknown parameter, we might also like to assess its uncertainty. In Stat 411 you learn that, under some conditions, when the sample size n is large, the distribution of ˆ θ is approximately normal with mean θ and variance I ( θ ) − 1 , where I ( θ ) is the Fisher information matrix: I ( θ ) = E θ { ˙ ℓ ( θ ) ˙ ℓ ( θ ) ⊤ } = − E θ { ¨ ℓ ( θ ) } . Then an approximate 95% confidence interval for θ j is � ˆ [ I (ˆ θ ) − 1 ] jj , θ j ± 1 . 96 · j = 1 , . . . , d . So, computing derivatives and inverting matrices is important. 4 / 56

Bayesian approach The Bayesian approach is based on using the rules of probability for inference. Start with a prior distribution for θ , with density/mass function π ( θ ), basically just a weight function. Yields a conditional distribution for θ , given Y , as L ( θ ) π ( θ ) � π ( θ | Y ) = L ( u ) π ( u ) du ∝ L ( θ ) π ( θ ) . Now we treat π ( θ | Y ) as the object of interest and the goal is to produce various summaries, such as mean, variance, quantiles, probabilities, etc. So, integrating functions will be important. 5 / 56

Example: probit regression Y 1 , . . . , Y n are independent (not iid) binary observations. � � ind Φ( x ⊤ Specifically, Y i ∼ Ber i θ ) , i = 1 , . . . , n , where: “Ber” denotes a Bernoulli distribution; x 1 , . . . , x n are fixed d -dimensional covariates; θ is a d -dimensional parameter vector; and Φ is the standard normal distribution function. 3 Exercise: write out log-likelihood function calculate Fisher information matrix ... 3 Other cdfs can be used, but then the model isn’t called “probit”... 6 / 56

Remarks This course will mainly study how to solve certain optimization and integration problems that arise in statistics applications. We’ll need some background on general numerical methods. Software will also be important — we will use R. Some of what we discuss in the class will be simple, other things more difficult. My goal is that students completing the course will have sufficient background to read current papers on computational statistics and implement their methods. 7 / 56

Outline 1 Review of statistical inference 2 Introduction to R Basics R session R graphics R programming Data entry 3 Math and stat tools 8 / 56

General facts about R R is a free version of the S-PLUS software. Can be downloaded for free ( http://cran.r-project.org ) for Windows, Mac, and Unix computers. Environment is interactive by default—like a calculator—but users can create files of R code (called scripts ) on the side which can be run all at once within R. It is possible to write code that works together with lower-level programming languages like C and FORTRAN (for speed). R is powerful because of its flexibility — users can easily define their own functions or modify existing functions to suit their needs. 9 / 56

Arithmetic Among other things, R can do arithmetic like a calculator. Basic binary (arithmetic) operations are: Addition ^ or ** Exponentiation + Subtraction Integer division - %/% * Multiplication %% Modulus (remainder) Division / 10 / 56

Variables and assignments Even with fairly routine calculations it is helpful to be able to store some intermediate values. R allows users to assign a value to a particular variable. Syntax: x <- 7 This means that the value 7 is assigned to the variable x . Note: The assignment symbol <- is to be treated as a single character, an arrow pointing to the left. One can use the underscore symbol or an equal sign in place of the assignment character — not recommended. Underscore symbol cannot be used in variable names; use a period instead, e.g., pred.value . 11 / 56

Expressions and objects In R, the user enters an expression and the system evaluates it and produces output. These expressions need not be formulas — they can generate graphs, output data sets, etc. Expressions work on objects , basically anything that can be assigned to a variable. But the syntax used is expression/object specific. In what follows we will discuss several important types of expressions and objects. Use the str(X) to view the “structure” of the object X . 12 / 56

Functions and arguments Functions in R can take many forms: There’s the kind that look like mathematical functions, say log(x) , and the kind that don’t, say plot(x, y, pch=2) . The common feature is that there is a set of parantheses containing those arguments that the fuction applies to. Two “types” of arguments: Positional – variable recognized by position in the list. Named – variable recognized by name . Some functions don’t have arguments, some have default arguments, and some allow “arbitrary” arguments. R has an extensive list of built-in function that can do all sorts of things – and it’s easy to write your own functions since the function syntax in R is the same as ordinary R syntax. 13 / 56

Vectors Numeric vectors are fairly straightforward. There are basically 4 two other kinds of vectors: Character Logical Character vectors have elements made up of character strings; e.g. names <- c(’Small’, ’Medium’, ’Large’) Logical vectors have elements TRUE or FALSE , and are very useful for indexing data sets. An example of how to get a logical vector: > gpa <- c(3.0, 2.8, 3.4, 3.7, 3.9, 3.3) > gpa > 3.5 [1] FALSE FALSE FALSE TRUE TRUE FALSE 4 Complex vectors also exist 14 / 56

Vectors (cont.) Three functions to create vectors: c() – concatenate seq() – patterned sequence rep() – repeat something A vector must contain elements of the same “type”, so what happens if two variables x and y of different types are concatenated? The general (and non-informative) answer is that they are coerced into types that match. For example: > c(FALSE, 7) [1] 0 7 > c(11.7, ’abc’) [1] ’11.7’ ’abc’ 15 / 56

Vectors (cont.) An interesting feature of R is that it does “vectorized arithmetic.” That is, R will apply arithmetic operations (and some other functions) in a natural way. For example: > x <- c(7, 10, 11) > y <- seq(5, 3, by=-1) > x + y [1] 12 14 14 If the two vectors are not of the same length, the shorter one gets “recycled” — error message if the length of the longer vector is not a multiple of the length of the shorter vector. When defining your own functions, remember to be careful about assuming it will vectorize how you want it to! 16 / 56

Matrices and arrays A natural extension of a vector is a matrix , which is just a vector with a double index. Example: M <- matrix(1:6, nrow=3, ncol=2) . In R, matrices are almost always 5 treated just like vectors. rbind() and cbind() functions can be used to append two or more matrices by rows and columns, respectively. Can name the rows and columns with rownames() and colnames() functions. More generally, R can work with an array (a vector with n indices), but these are a bit less common, perhaps because they’re hard to visualize. 5 The only time R treats matrices in a linear algebra sort of way is when the user asks R to do something “linear algebra like” such as matrix multiplication. 17 / 56

Data frames A data frame is R’s version of what we think of as a data matrix or data set. The columns represent variables and the rows represent cases. This idea is similar to a matrix, but matrices must be entirely of the same type, while data frames can have a mixture of numeric, character, and logical variables. To create a data frame: D <- data.frame( list-of-variables ) We’ll talk about reading files into a data frame later. Many statistical routines in R (e.g., linear regression) are designed to operate on data frames. 18 / 56

Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on - PowerPoint PPT Presentation

Stat 451 Lecture Notes 01 12 Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaards ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: January 13, 2016 1 / 56 What to compute?

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Geometry of statistical submanifolds FURUHATA Hitoshi (Hokkaido University) PADGE2012 Contents

Multipartite entanglement certification in quantum many-body systems using quench dynamics

Stochastic Cosmological Background Study with 3G Gravitational Wave Detectors : Probing the Very

What is the Fisher information content of cosmic shear surveys? Olivier Dor , JPL/Caltech

Prediction and Representation of Array Performance under Sensor Failure Erdal MEHMETCIK, Prof.

Linear Models are Most Favorable among Generalized Linear Models Kuan-Yun Lee and Thomas A.

Fidelity susceptibility in Gaussian Random Ensembles Marek Ku s* Piotr Sierant** Artur

LECTURE SET 6 PROBABILISTIC BEHAVIOUR RECOGNITION ECVision Summer School: 6 - Probabilistic

Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on - PowerPoint PPT Presentation

Stat 451 Lecture Notes 01 12 Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaards ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: January 13, 2016 1 / 56 What to compute?

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Geometry of statistical submanifolds FURUHATA Hitoshi (Hokkaido University) PADGE2012 Contents

Multipartite entanglement certification in quantum many-body systems using quench dynamics

Stochastic Cosmological Background Study with 3G Gravitational Wave Detectors : Probing the Very

What is the Fisher information content of cosmic shear surveys? Olivier Dor , JPL/Caltech

Prediction and Representation of Array Performance under Sensor Failure Erdal MEHMETCIK, Prof.

Linear Models are Most Favorable among Generalized Linear Models Kuan-Yun Lee and Thomas A.

Fidelity susceptibility in Gaussian Random Ensembles Marek Ku s* Piotr Sierant** Artur

LECTURE SET 6 PROBABILISTIC BEHAVIOUR RECOGNITION ECVision Summer School: 6 - Probabilistic

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview