probability and statistics
play

Probability and Statistics for Computer Science The statement that - PowerPoint PPT Presentation

Probability and Statistics for Computer Science The statement that The average US family has 2.6 children invites mockery Prof. Forsyth reminds us about criAcal thinking Credit: wikipedia Hongye Liu, Teaching Assistant


  1. Probability and Statistics ì for Computer Science “The statement that “The average US family has 2.6 children” invites mockery” – Prof. Forsyth reminds us about criAcal thinking Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.27.2020

  2. Last lecture � Welcome/OrientaAon � Big picture of the contents � Lecture 1 - Data VisualizaAon & Summary (I) is � Some feedbacks

  3. Warm up question: � What kind of data is a le[er grade? � What do you ask for usually about the stats of an exam with numerical scores?

  4. Objectives � Grasp Summary StaAsAcs � Learn more Data VisualizaAon for Rela2onships

  5. Summarizing 1D continuous data For a data set {x} or annotated as {x i }, we summarize with: N items � LocaAon Parameters Mode Mean tu ) Median , , � Scale parameters Inter quartile standard egg ' range ciqr ) deviation ( 62 ) variance

  6. Summarizing 1D continuous data � Mean N mean ( x i ) = 1 � x i N i =1 It’s the centroid of the data geometrically, by idenAfying the data set at that point, you find the center of balance.

  7. it [ 1,87 { Ki ) 7 , { Ki }=1 , 6 , 3 , 4 , 12 5 , 2 , E' ga - meancfxi } ,=o TIKI - -

  8. Properties of the mean � Scaling data scales the mean meant a { Ki } t e ) = a means { xi } ) t C mean ( { k · x i } ) = k · mean ( { x i } ) � TranslaAng the data translates the mean mean ( { x i + c } ) = mean ( { x i } ) + c

  9. Less obvious properties of the mean � The signed distances from the mean I sum to 0 N � ( x i − mean ( { x i } )) = 0 i =1 � The mean minimizes the sum of the squared distance from any real value WE N ( x i − µ ) 2 = mean ( { x i } ) � argmin µ i =1

  10. N - meant { ki ) , )=o prove I l ki Ei ,cxi ) - II. E- I mean ' " :3 ) LHS : numen N E Xi meemflx :D = N ÷÷xi , - IN LHS : =o . -

  11. argmuin Eic * u ) - = mean 4%3 , Prove 8=-24 - ie N df d ⇐ f ) - cxi.ee ) ' = ? - f - - u 't dm = ga die - E. et : - da 's :* ÷ - . = Erg -28=0 - e - t ) ddtg -_ 2g \ it = - EE ,ag=o dashed "I = - I

  12. N Z g g = Ki - ee - - o ' . ( Xi - M ) =o - 2 E- I N N - I µ = o ( ki ) -2 F- I F- I N N . µ = 0 -2 Xi - N - I c- ' I I ^ = mean µ = N = mean Arginine . , . .

  13. Q1: � What is the answer for mean ( mean ({x i })) ? A. mean ({x i }) B. unsure C. 0

  14. Standard Deviation (σ) 1- = ex - at � The standard devia-on Arginine -2 f = mean u � N � � 1 � � std ( { x i } ) = ( x i − mean ( { x i } )) 2 N i =1 � = mean ( { ( x i − mean ( { x i } )) 2 } )

  15. Q2. Can a standard deviation of a dataset be -1? A. YES B. NO

  16. Properties of the standard deviation � Scaling data scales the standard deviaAon std ( { k · x i } ) = | k | · std ( { x i } ) � TranslaAng the data does NOT change the standard deviaAon std ( { x i + c } ) = std ( { x i } )

  17. .¥l A mum y 3 2 I

  18. Standard deviation: Chebyshev’s inequality (1 st look) A N � At most items are k standard k 2 devia-ons ( σ ) away from the mean � Rough jus-fica-on: Assume mean =0 # N − N k 2 0 . 5 N 0 . 5 N 0 k 2 k 2 k σ − k σ � 1 N [( N − N k )0 2 + N std = k 2 ( k σ ) 2 ] = σ ( o - O ) - O

  19. Variance (σ 2 ) � Variance = (standard deviaAon) 2 } N var ( { x i } ) = 1 � ( x i − mean ( { x i } )) 2 N i =1 � Scaling and translaAng similar to standard I deviaAon var ( { k · x i } ) = k 2 · var ( { x i } ) var ( { x i + c } ) = var ( { x i } )

  20. Q3: Standard deviation � What is the value of std ( mean ({x i }) ? D A. 0 B. 1 C. unsure

  21. Standard Coordinates/normalized data � The mean tells where the data set is and the standard devia,on tells how spread out it is. If we are interested only in comparing the IT shape, we could → define: x i = x i − mean ( { x i } ) 0 � std ( { x i } ) for every i C � We say is in standard coordinates { � x i }

  22. Q4: Mean of standard coordinates � μ of is: I { � x i } A. 1 B. 0 C. unsure i. mean x i = x i − mean ( { x i } ) � std ( { x i } )

  23. Q5: Standard deviation (σ) of standard coordinates � σ of is: O { � x i } l A. 1 B. 0 C. unsure Std x i = x i − mean ( { x i } ) � std ( { x i } )

  24. Q6: Variance of standard coordinates � Variance of is: { � x i } A. 1 B. 0 C. unsure x i = x i − mean ( { x i } ) � std ( { x i } )

  25. Q7: Estimate the range of data in standard coordinates � EsEmate as close as possible, 90% data is within: A. [-10, 10] B. [-100, 100] C. [-1, 1] x i = x i − mean ( { x i } ) � std ( { x i } ) D. [-4, 4] I E. others

  26. ¥÷÷÷ " :* . = k t - ko . r .

  27. Summary stats of standard Coordinates/normalized data

  28. Standard Coordinates/normalized data to μ=0, σ=1, σ 2 =1 � Data in standard coordinates always has mean = 0; standard deviaAon =1; - - variance = 1. - � Such data is unit-less, plots based on this someAmes are more comparable � We see such normalizaAon very oren in staAsAcs

  29. Additional References � Charles M. Grinstead and J. Laurie Snell "IntroducAon to Probability” � Morris H. Degroot and Mark J. Schervish "Probability and StaAsAcs”

  30. See you next time See You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend