introduction
play

Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on - PowerPoint PPT Presentation

Stat 451 Lecture Notes 01 12 Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaards ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: January 13, 2016 1 / 56 What to compute?


  1. Stat 451 Lecture Notes 01 12 Introduction Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on parts of: Dalgaard’s ISwR book, Chapter 1 in Givens & Hoeting, and Chapter 7 of Lange 2 Updated: January 13, 2016 1 / 56

  2. What to compute? Stat 451 is a course about computational statistics. Therefore, it is important first to discuss what we want to compute in a statistics problems. Here, we are basically concerned with two kinds of things: maximizing the likelihood function integrating a “posterior distribution” The former notion should be familiar from your experience with maximum likelihood in Stat 411. The latter may be new to you — it’s “Bayesian”. Next is a brief introduction to these concepts, along with a non-trivial illustration. 2 / 56

  3. Maximum likelihood Suppose we have n independent observations, Y 1 , . . . , Y n , and the density/mass function p θ for these observations depends on an unknown parameter θ . The likelihood and log-likelihood functions are n n � � L ( θ ) = p θ ( Y i ) and ℓ ( θ ) = log p θ ( Y i ) . i =1 i =1 The maximum likelihood estimator (MLE) ˆ θ of θ , based on data, maximizes the likelihood, i.e., ˆ ⇒ ˙ ℓ (ˆ θ = arg max L ( θ ) ⇐ θ ) = 0 . θ Need to be able to optimize and/or find roots of functions. 3 / 56

  4. Maximum likelihood (cont) Besides producing an estimate of the unknown parameter, we might also like to assess its uncertainty. In Stat 411 you learn that, under some conditions, when the sample size n is large, the distribution of ˆ θ is approximately normal with mean θ and variance I ( θ ) − 1 , where I ( θ ) is the Fisher information matrix: I ( θ ) = E θ { ˙ ℓ ( θ ) ˙ ℓ ( θ ) ⊤ } = − E θ { ¨ ℓ ( θ ) } . Then an approximate 95% confidence interval for θ j is � ˆ [ I (ˆ θ ) − 1 ] jj , θ j ± 1 . 96 · j = 1 , . . . , d . So, computing derivatives and inverting matrices is important. 4 / 56

  5. Bayesian approach The Bayesian approach is based on using the rules of probability for inference. Start with a prior distribution for θ , with density/mass function π ( θ ), basically just a weight function. Yields a conditional distribution for θ , given Y , as L ( θ ) π ( θ ) � π ( θ | Y ) = L ( u ) π ( u ) du ∝ L ( θ ) π ( θ ) . Now we treat π ( θ | Y ) as the object of interest and the goal is to produce various summaries, such as mean, variance, quantiles, probabilities, etc. So, integrating functions will be important. 5 / 56

  6. Example: probit regression Y 1 , . . . , Y n are independent (not iid) binary observations. � � ind Φ( x ⊤ Specifically, Y i ∼ Ber i θ ) , i = 1 , . . . , n , where: “Ber” denotes a Bernoulli distribution; x 1 , . . . , x n are fixed d -dimensional covariates; θ is a d -dimensional parameter vector; and Φ is the standard normal distribution function. 3 Exercise: write out log-likelihood function calculate Fisher information matrix ... 3 Other cdfs can be used, but then the model isn’t called “probit”... 6 / 56

  7. Remarks This course will mainly study how to solve certain optimization and integration problems that arise in statistics applications. We’ll need some background on general numerical methods. Software will also be important — we will use R. Some of what we discuss in the class will be simple, other things more difficult. My goal is that students completing the course will have sufficient background to read current papers on computational statistics and implement their methods. 7 / 56

  8. Outline 1 Review of statistical inference 2 Introduction to R Basics R session R graphics R programming Data entry 3 Math and stat tools 8 / 56

  9. General facts about R R is a free version of the S-PLUS software. Can be downloaded for free ( http://cran.r-project.org ) for Windows, Mac, and Unix computers. Environment is interactive by default—like a calculator—but users can create files of R code (called scripts ) on the side which can be run all at once within R. It is possible to write code that works together with lower-level programming languages like C and FORTRAN (for speed). R is powerful because of its flexibility — users can easily define their own functions or modify existing functions to suit their needs. 9 / 56

  10. Arithmetic Among other things, R can do arithmetic like a calculator. Basic binary (arithmetic) operations are: Addition ^ or ** Exponentiation + Subtraction Integer division - %/% * Multiplication %% Modulus (remainder) Division / 10 / 56

  11. Variables and assignments Even with fairly routine calculations it is helpful to be able to store some intermediate values. R allows users to assign a value to a particular variable. Syntax: x <- 7 This means that the value 7 is assigned to the variable x . Note: The assignment symbol <- is to be treated as a single character, an arrow pointing to the left. One can use the underscore symbol or an equal sign in place of the assignment character — not recommended. Underscore symbol cannot be used in variable names; use a period instead, e.g., pred.value . 11 / 56

  12. Expressions and objects In R, the user enters an expression and the system evaluates it and produces output. These expressions need not be formulas — they can generate graphs, output data sets, etc. Expressions work on objects , basically anything that can be assigned to a variable. But the syntax used is expression/object specific. In what follows we will discuss several important types of expressions and objects. Use the str(X) to view the “structure” of the object X . 12 / 56

  13. Functions and arguments Functions in R can take many forms: There’s the kind that look like mathematical functions, say log(x) , and the kind that don’t, say plot(x, y, pch=2) . The common feature is that there is a set of parantheses containing those arguments that the fuction applies to. Two “types” of arguments: Positional – variable recognized by position in the list. Named – variable recognized by name . Some functions don’t have arguments, some have default arguments, and some allow “arbitrary” arguments. R has an extensive list of built-in function that can do all sorts of things – and it’s easy to write your own functions since the function syntax in R is the same as ordinary R syntax. 13 / 56

  14. Vectors Numeric vectors are fairly straightforward. There are basically 4 two other kinds of vectors: Character Logical Character vectors have elements made up of character strings; e.g. names <- c(’Small’, ’Medium’, ’Large’) Logical vectors have elements TRUE or FALSE , and are very useful for indexing data sets. An example of how to get a logical vector: > gpa <- c(3.0, 2.8, 3.4, 3.7, 3.9, 3.3) > gpa > 3.5 [1] FALSE FALSE FALSE TRUE TRUE FALSE 4 Complex vectors also exist 14 / 56

  15. Vectors (cont.) Three functions to create vectors: c() – concatenate seq() – patterned sequence rep() – repeat something A vector must contain elements of the same “type”, so what happens if two variables x and y of different types are concatenated? The general (and non-informative) answer is that they are coerced into types that match. For example: > c(FALSE, 7) [1] 0 7 > c(11.7, ’abc’) [1] ’11.7’ ’abc’ 15 / 56

  16. Vectors (cont.) An interesting feature of R is that it does “vectorized arithmetic.” That is, R will apply arithmetic operations (and some other functions) in a natural way. For example: > x <- c(7, 10, 11) > y <- seq(5, 3, by=-1) > x + y [1] 12 14 14 If the two vectors are not of the same length, the shorter one gets “recycled” — error message if the length of the longer vector is not a multiple of the length of the shorter vector. When defining your own functions, remember to be careful about assuming it will vectorize how you want it to! 16 / 56

  17. Matrices and arrays A natural extension of a vector is a matrix , which is just a vector with a double index. Example: M <- matrix(1:6, nrow=3, ncol=2) . In R, matrices are almost always 5 treated just like vectors. rbind() and cbind() functions can be used to append two or more matrices by rows and columns, respectively. Can name the rows and columns with rownames() and colnames() functions. More generally, R can work with an array (a vector with n indices), but these are a bit less common, perhaps because they’re hard to visualize. 5 The only time R treats matrices in a linear algebra sort of way is when the user asks R to do something “linear algebra like” such as matrix multiplication. 17 / 56

  18. Data frames A data frame is R’s version of what we think of as a data matrix or data set. The columns represent variables and the rows represent cases. This idea is similar to a matrix, but matrices must be entirely of the same type, while data frames can have a mixture of numeric, character, and logical variables. To create a data frame: D <- data.frame( list-of-variables ) We’ll talk about reading files into a data frame later. Many statistical routines in R (e.g., linear regression) are designed to operate on data frames. 18 / 56

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend