Probability and Statistics for Computer Science The statement that - PowerPoint PPT Presentation

Probability and Statistics ì for Computer Science “The statement that “The average US family has 2.6 children” invites mockery” – Prof. Forsyth reminds us about criAcal thinking Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.27.2020

Last lecture ✺ Welcome/OrientaAon ✺ Big picture of the contents ✺ Lecture 1 - Data VisualizaAon & Summary (I) ✺ Some feedbacks

Warm up question: ✺ What kind of data is a le[er grade? ✺ What do you ask for usually about the stats of an exam with numerical scores?

Objectives ✺ Grasp Summary StaAsAcs ✺ Learn more Data VisualizaAon for Rela2onships

Summarizing 1D continuous data For a data set {x} or annotated as {x i }, we summarize with: ✺ LocaAon Parameters ✺ Scale parameters

Summarizing 1D continuous data ✺ Mean N mean ( x i ) = 1 � x i N i =1 It’s the centroid of the data geometrically, by idenAfying the data set at that point, you find the center of balance.

Properties of the mean ✺ Scaling data scales the mean mean ( { k · x i } ) = k · mean ( { x i } ) ✺ TranslaAng the data translates the mean mean ( { x i + c } ) = mean ( { x i } ) + c

Less obvious properties of the mean ✺ The signed distances from the mean sum to 0 N � ( x i − mean ( { x i } )) = 0 i =1 ✺ The mean minimizes the sum of the squared distance from any real value N ( x i − µ ) 2 = mean ( { x i } ) � argmin µ i =1

Q1: ✺ What is the answer for mean ( mean ({x i })) ? A. mean ({x i }) B. unsure C. 0

Standard Deviation (σ) ✺ The standard deviaAon � N � � 1 � � std ( { x i } ) = ( x i − mean ( { x i } )) 2 N i =1 � std ( { x i } ) = mean ( { x i − mean ( { x i } )) 2 } )

Q2. Can a standard deviation of a dataset be -1? A. YES B. NO

Properties of the standard deviation ✺ Scaling data scales the standard deviaAon std ( { k · x i } ) = | k | · std ( { x i } ) ✺ TranslaAng the data does NOT change the standard deviaAon std ( { x i + c } ) = std ( { x i } )

Standard deviation: Chebyshev’s inequality (1 st look) N ✺ At most items are k standard k 2 deviaAons ( σ ) away from the mean ✺ Rough jusAficaAon: Assume mean =0 N − N K 2 0 . 5 N 0 . 5 N 0 K 2 K 2 k σ − k σ � 1 N [( N − N k )0 2 + N std = k 2 ( k σ ) 2 ] = σ

Variance (σ 2 ) ✺ Variance = (standard deviaAon) 2 N var ( { x i } ) = 1 � ( x i − mean ( { x i } )) 2 N i =1 ✺ Scaling and translaAng similar to standard deviaAon var ( { k · x i } ) = k 2 · var ( { x i } ) var ( { x i + c } ) = var ( { x i } )

Q3: Standard deviation ✺ What is the value of std ( mean ({x i }) ? A. 0 B. 1 C. unsure

Standard Coordinates/normalized data ✺ The mean tells where the data set is and the standard devia-on tells how spread out it is. If we are interested only in comparing the shape, we could define: x i = x i − mean ( { x i } ) � std ( { x i ) } ✺ We say is in standard coordinates { � x i }

Q4: Mean of standard coordinates ✺ μ of is: { � x i } A. 1 B. 0 C. unsure x i = x i − mean ( { x i } ) � std ( { x i ) }

Q5: Standard deviation (σ) of standard coordinates ✺ σ of is: { � x i } A. 1 B. 0 C. unsure x i = x i − mean ( { x i } ) � std ( { x i ) }

Q6: Variance of standard coordinates ✺ Variance of is: { � x i } A. 1 B. 0 C. unsure x i = x i − mean ( { x i } ) � std ( { x i ) }

Q7: Estimate the range of data in standard coordinates ✺ EsAmate as close as possible, 90% data is within: A. [-10, 10] B. [-100, 100] C. [-1, 1] x i = x i − mean ( { x i } ) � D. [-4, 4] std ( { x i ) } E. others

Summary stats of standard Coordinates/normalized data

Standard Coordinates/normalized data to μ=0, σ=1, σ 2 =1 ✺ Data in standard coordinates always has mean = 0; standard deviaAon =1; variance = 1. ✺ Such data is unit-less, plots based on this someAmes are more comparable ✺ We see such normalizaAon very oren in staAsAcs

Median ✺ To organize the data we first sort it ✺ Then if the number of items N is odd median = middle item's value if the number of items N is even median = mean of middle 2 items' values

Properties of Median ✺ Scaling data scales the median median ( { k · x i } ) = k · median ( { x i } ) ✺ TranslaAng data translates the median median ( { x i + c } ) = median ( { x i } ) + c

Percentile ✺ k th percenAle is the value relaAve to which k% of the data items have smaller or equal numbers ✺ Median is roughly the 50 th percenAle

Q8: Scaling effect on percentiles ✺ Scaling data scales the percenAle A. True B. False

Q9: Translating effect on percentiles ✺ TranslaAng data does NOT change the percenAle A. True B. False

Interquartile range ✺ iqr = (75th percenAle) - (25th percenAle) ✺ Scaling data scales the interquarAle range iqr ( { k · x i } ) = | k | · iqr ( { x i } ) ✺ TranslaAng data does NOT change the interquarAle range iqr ( { x i + c } ) = iqr ( { x i } )

Box plots Vehicle death by region ✺ Boxplots ✺ Simpler than histogram DEATH ✺ Good for outliers ✺ Easier to use for comparison Data from h[ps://www2.stetson.edu/ ~jrasp/data.htm

Boxplots details, outliers ✺ How to Outlier define > 1.5 iqr Whisker outliers? (the default) Box InterquarAle Range (iqr) Median < 1.5 iqr

Discussion ✺ Pick a group to debate

Sensitivity of summary statistics to outliers ✺ mean and standard deviaAon are very sensiAve to outliers ✺ median and interquarAle range are not sensiAve to outliers

Modes ✺ Modes are peaks in a histogram ✺ If there are more than 1 mode, we should be curious as to why

Multiple modes ✺ We have seen the “iris” data which looks to have several peaks Data: “iris” in R

Example Bi-modes distribution ✺ Modes may indicate mulAple populaAons Data: Erythrocyte cells in healthy humans Piagnerelli, JCP 2007

Tails and Skews Credit: Prof.Forsyth

Looking at relationships in data ✺ Finding relaAonships between features in a data set or many data sets is one of the most important tasks in data science

Heatmap ✺ Display matrix of data via gradient of color(s) SummarizaAon of 4 locaAons’ annual mean temperature by month

3D bar chart ✺ Transparent 3D bar chart is good for small # of samples across categories

Relationship between data feature and time ✺ Example: How does Amazon’s stock change over 1 years? take out the pair of features x: Day y: AMZN

Relationship between data features ✺ Example: does the weight of people relate to their height? ✺ x : HIGHT, y: WEIGHT

The visual way for continuous features ✺ Time series plot ✺ Sca[er plot

Time Series Plot: Stock of Amazon

Scatter plot ✺ A most effecAve tool for geographic data and 2D data in general. It should be your first step with a new 2D dataset.

Scatter plot ✺ Body Fat data set

Scatter plot ✺ Sca[er plot with density

Scatter plot ✺ Removed of outliers & standardized

Scatter plot ✺ Coupled with heatmap to show a 3 rd feature

Correlation seen from scatter plots Zero PosiAve NegaAve CorrelaAon correlaAon correlaAon Credit: Prof.Forsyth

What kind of Correlation? ✺ line of code in a database and number of bugs ✺ GPA and hours spent playing video games ✺ earnings and happiness Credit: Prof. David Varodayan

Correlation doesn’t mean causation ✺ Shoe size is correlated to reading skills, but it doesn’t mean making feet grow will make one person read faster.

Assignments ✺ HW1 due Thurs. Sept. 3. ✺ Quiz 1 (open 4:30pm today un2l Sat.) ✺ Reading upto Chapter 2.1 ✺ Next Ame: the quanAtaAve part of correlaAon coefficient

Additional References ✺ Charles M. Grinstead and J. Laurie Snell "IntroducAon to Probability” ✺ Morris H. Degroot and Mark J. Schervish "Probability and StaAsAcs”

See you next time See You!

Probability and Statistics for Computer Science The statement that - PowerPoint PPT Presentation

Probability and Statistics for Computer Science The statement that The average US family has 2.6 children invites mockery Prof. Forsyth reminds us about criAcal thinking Credit: wikipedia Hongye Liu, Teaching Assistant

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Fuzzy Logic : Introduction Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in 07.01.2015

Natural Language Processing and Information Retrieval Performance Evaluation Query Expansion

A study for CSIRTs strengthening From a View point of Interactive Storytelling in an

Classification with generative models 2 DSE 210 Classification with parametrized models

Midterm II Review Sta 101 - Fall 2018 Todays office hours changed to 2 - 3pm Office

CSE 190 Lecture 14 Data Mining and Predictive Analytics Hubs and Authorities; PageRank Trust

Tradeoff Between Quality And Quantity Of Raters To Characterize Expressive Speech Alec Burmania,

PROJECT MANAGEMENT 6 Steps to Achieving Goals P roper P lanning P revents P oor P erformance STEP