Probability and Statistics for Computer Science The statement that - PowerPoint PPT Presentation

Probability and Statistics ì for Computer Science “The statement that “The average US family has 2.6 children” invites mockery” – Prof. Forsyth reminds us about criAcal thinking Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.27.2020

Last lecture � Welcome/OrientaAon � Big picture of the contents � Lecture 1 - Data VisualizaAon & Summary (I) is � Some feedbacks

Warm up question: � What kind of data is a le[er grade? � What do you ask for usually about the stats of an exam with numerical scores?

Objectives � Grasp Summary StaAsAcs � Learn more Data VisualizaAon for Rela2onships

Summarizing 1D continuous data For a data set {x} or annotated as {x i }, we summarize with: N items � LocaAon Parameters Mode Mean tu ) Median , , � Scale parameters Inter quartile standard egg ' range ciqr ) deviation ( 62 ) variance

Summarizing 1D continuous data � Mean N mean ( x i ) = 1 � x i N i =1 It’s the centroid of the data geometrically, by idenAfying the data set at that point, you find the center of balance.

it [ 1,87 { Ki ) 7 , { Ki }=1 , 6 , 3 , 4 , 12 5 , 2 , E' ga - meancfxi } ,=o TIKI - -

Properties of the mean � Scaling data scales the mean meant a { Ki } t e ) = a means { xi } ) t C mean ( { k · x i } ) = k · mean ( { x i } ) � TranslaAng the data translates the mean mean ( { x i + c } ) = mean ( { x i } ) + c

Less obvious properties of the mean � The signed distances from the mean I sum to 0 N � ( x i − mean ( { x i } )) = 0 i =1 � The mean minimizes the sum of the squared distance from any real value WE N ( x i − µ ) 2 = mean ( { x i } ) � argmin µ i =1

N - meant { ki ) , )=o prove I l ki Ei ,cxi ) - II. E- I mean ' " :3 ) LHS : numen N E Xi meemflx :D = N ÷÷xi , - IN LHS : =o . -

argmuin Eic * u ) - = mean 4%3 , Prove 8=-24 - ie N df d ⇐ f ) - cxi.ee ) ' = ? - f - - u 't dm = ga die - E. et : - da 's :* ÷ - . = Erg -28=0 - e - t ) ddtg -_ 2g \ it = - EE ,ag=o dashed "I = - I

N Z g g = Ki - ee - - o ' . ( Xi - M ) =o - 2 E- I N N - I µ = o ( ki ) -2 F- I F- I N N . µ = 0 -2 Xi - N - I c- ' I I ^ = mean µ = N = mean Arginine . , . .

Q1: � What is the answer for mean ( mean ({x i })) ? A. mean ({x i }) B. unsure C. 0

Standard Deviation (σ) 1- = ex - at � The standard devia-on Arginine -2 f = mean u � N � � 1 � � std ( { x i } ) = ( x i − mean ( { x i } )) 2 N i =1 � = mean ( { ( x i − mean ( { x i } )) 2 } )

Q2. Can a standard deviation of a dataset be -1? A. YES B. NO

Properties of the standard deviation � Scaling data scales the standard deviaAon std ( { k · x i } ) = | k | · std ( { x i } ) � TranslaAng the data does NOT change the standard deviaAon std ( { x i + c } ) = std ( { x i } )

.¥l A mum y 3 2 I

Standard deviation: Chebyshev’s inequality (1 st look) A N � At most items are k standard k 2 devia-ons ( σ ) away from the mean � Rough jus-fica-on: Assume mean =0 # N − N k 2 0 . 5 N 0 . 5 N 0 k 2 k 2 k σ − k σ � 1 N [( N − N k )0 2 + N std = k 2 ( k σ ) 2 ] = σ ( o - O ) - O

Variance (σ 2 ) � Variance = (standard deviaAon) 2 } N var ( { x i } ) = 1 � ( x i − mean ( { x i } )) 2 N i =1 � Scaling and translaAng similar to standard I deviaAon var ( { k · x i } ) = k 2 · var ( { x i } ) var ( { x i + c } ) = var ( { x i } )

Q3: Standard deviation � What is the value of std ( mean ({x i }) ? D A. 0 B. 1 C. unsure

Standard Coordinates/normalized data � The mean tells where the data set is and the standard devia,on tells how spread out it is. If we are interested only in comparing the IT shape, we could → define: x i = x i − mean ( { x i } ) 0 � std ( { x i } ) for every i C � We say is in standard coordinates { � x i }

Q4: Mean of standard coordinates � μ of is: I { � x i } A. 1 B. 0 C. unsure i. mean x i = x i − mean ( { x i } ) � std ( { x i } )

Q5: Standard deviation (σ) of standard coordinates � σ of is: O { � x i } l A. 1 B. 0 C. unsure Std x i = x i − mean ( { x i } ) � std ( { x i } )

Q6: Variance of standard coordinates � Variance of is: { � x i } A. 1 B. 0 C. unsure x i = x i − mean ( { x i } ) � std ( { x i } )

Q7: Estimate the range of data in standard coordinates � EsEmate as close as possible, 90% data is within: A. [-10, 10] B. [-100, 100] C. [-1, 1] x i = x i − mean ( { x i } ) � std ( { x i } ) D. [-4, 4] I E. others

¥÷÷÷ " :* . = k t - ko . r .

Summary stats of standard Coordinates/normalized data

Standard Coordinates/normalized data to μ=0, σ=1, σ 2 =1 � Data in standard coordinates always has mean = 0; standard deviaAon =1; - - variance = 1. - � Such data is unit-less, plots based on this someAmes are more comparable � We see such normalizaAon very oren in staAsAcs

Additional References � Charles M. Grinstead and J. Laurie Snell "IntroducAon to Probability” � Morris H. Degroot and Mark J. Schervish "Probability and StaAsAcs”

See you next time See You!

Probability and Statistics for Computer Science The statement that - PowerPoint PPT Presentation

Probability and Statistics for Computer Science The statement that The average US family has 2.6 children invites mockery Prof. Forsyth reminds us about criAcal thinking Credit: wikipedia Hongye Liu, Teaching Assistant

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Epigenetics 02-715 Advanced Topics in Computa8onal Genomics

Breast Reconstruction in the U.S. The State of Antibiotic Use in Implant Based Breast

Ebola virus disease Keep Safe, Keep Serving Ebola and the Academic Medical Center Response

Distributed Machine Learning with a Serverless Architecture Hao Wang 1 , Di Niu 2 , Baochun Li 1 1

ChIP-seq analysis D. Puthier Adapted from Aviesan Bioinformatic School (M. Defrance, C.

N C C C protein sequence but is not fully rigid C C peptide C C bond

Target Prediction for an Open Access Set of Compounds Active against Mycobacterium tuberculosis

Chapters 12 Discrete random variables Permutations Binomial and related distributions

Sambuz

Useful Links

Newsletter

Mail Us