compsc sci 201 201 colle lectio ions ns hashing hing o
play

Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O - PowerPoint PPT Presentation

Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O Objects Susan Rodger January 31, 2020 1/31/2020 CompSci 201, Spring 2020 1 G is for Git Version control that's ubiquitous Garbage Collection Java recycles


  1. Compsc sci 201 201 Colle lectio ions ns, Hashing hing, O Objects Susan Rodger January 31, 2020 1/31/2020 CompSci 201, Spring 2020 1

  2. G is for … • Git • Version control that's ubiquitous • Garbage Collection • Java recycles • Google • How to find Stack Overflow 1/31/2020 CompSci 201, Spring 2020 2

  3. Announcements • Assig ignm nment nt P1 P1 due ue yesterday ay • You are in the grace period through midnight • APT APT-3 d 3 due T e Tues es, F Feb 4 4 • Can still turn in Friday til 11:59pm • Discus cussio ion n 4 4 on on Feb 3 3 • Prediscussion, do before, out today • Reading ing on c calend ndar ar • Slowing down ….. Nothing posted… 1/22/2020 Compsci 201, Spring 2020 3

  4. Plan for the Day • Gener eric c ic classes: : Arra rrayList to to Has ashS hSet • From ArrayList to HashSet to Collections to … • Fro rom m Object ect.e .equa uals ls to to Object ct.ha .hashC hCode • Everything is an Object, what can an object do? • Maps, I Inter erfac aces es, A Analy lysis is • Next week and next assignment 1/31/2020 CompSci 201, Spring 2020 5

  5. ArrayList Review • Wha hat is is an an Ar ArrayList? • A class that "wraps an array" • Part of java.util.Collections hierarchy • Almost an array: consta stant-time me access to any element given an index (independent of N) • How ar are elem elements ad added? • New array allocated, values copied, continue 1/31/2020 CompSci 201, Spring 2020 6

  6. DIYAD ArrayList • Do o It t Yourself lf Alg lgorithm and and Datast structu ture • SimpleStringArrayList: some methods • GrowableStringArrayList: more methods • Differences nces b bet etween + een +100, + 0, +1000, 0, a and * *2 • Helper methods are private: checkSize() 1/31/2020 CompSci 201, Spring 2020 7

  7. SimpleStringArrayList • DIYA YAD - I want ant t to writ ite an an Ar ArrayList class • State to define a ine an array • Meth thods to s to • Constructor - Create an array – fixed size • Add an element to an array • Get an element from an array 1/31/2020 CompSci 201, Spring 2020 8

  8. SimpleStringArrayList (part 1) 1/31/2020 CompSci 201, Spring 2020 9

  9. SimpleStringArrayList (part 2) 1/31/2020 CompSci 201, Spring 2020 10

  10. GrowableStringArrayList • DIYA YAD – write a e anothe her Ar ArrayList Class 1/31/2020 CompSci 201, Spring 2020 11

  11. DIYAD ArrayList • Do It Y Your urself A Alg lgorithm and and Datast structu ture • SimpleStringArrayList: some methods • GrowableStringArrayList: more methods • Differences nces b bet etween t een these t two classes? • Growable – grows as needed, not static 1/31/2020 CompSci 201, Spring 2020 12

  12. GrowableStringArrayList (part 1) 1/31/2020 CompSci 201, Spring 2020 13

  13. GrowableStringArrayList (part 2) 1/31/2020 CompSci 201, Spring 2020 14

  14. GrowableStringArrayList (part 3) 1/31/2020 CompSci 201, Spring 2020 15

  15. Analysis via Pictures Again • Growing ar array b by doub ubli ling eac each t tim ime • Create/copy 1, 2, 4, 8, 16, … 2 N • If If X = 2 N , , we' e've c crea eated 2x2 N -1 , or or 2X-1 • Roughly X, where "roughly" defined later 1/31/2020 CompSci 201, Spring 2020 16

  16. Analysis of Diyad ArrayLists • SimpleS leString ngArrayLis ist • Add 10,000 strings? ok. Add one more? BAD • Growab ableS leString ingArrayLis ist • Add as many strings as memory allows, how? • Conform rmingA gArra rrayList • Is-a java.util.List, also stores any Object type • Must implement List methods, interface 1/31/2020 CompSci 201, Spring 2020 17

  17. DIYAD Ideas • Move f fro rom S m Stri ring t g to Growab ableS leString ing to G Gener neric ic • Lots of work to fit in with Collections hierarchy • For our own work? Easier! All of Java? Harder! • Dif ifferences b between + +10, 10, + +1000 1000, * *2 2 and and * 1.2 1.2 • How do we measure empirically • How do we measure analytically • Private method checkSize() 1/31/2020 CompSci 201, Spring 2020 18

  18. Diyad ArrayList Growth • When hen int internal ar array f full? ull? C Create ne new, c copy, us use • Efficient add, get, set when done repeatedly • Not efficient if resize with +1, +100, +1000 • Is possible if resize with *2 or *1.25 1/31/2020 CompSci 201, Spring 2020 19

  19. Analysis with Math+Pictures • If we e grow b by ad addin ing 1 1 (or 100 100 or 1000 1000) • Copy 1, then 2, then 3, then … then N • 1+2+ … + + N N = N N(N+1)/2 • Same as 100+200+300+… • Roughly N 2 • Divide by 2, multiply by 100 1/31/2020 CompSci 201, Spring 2020 20

  20. Analysis via Math+Pictures Again • Growing ar array b by doub ubli ling eac each t tim ime • Create/copy 1, 2, 4, 8, 16, … 2 N • Total is 1+2+..+2 N = 2 N+1 -1 • If If X = 2 N , , we' e've c crea eated 2x2 N -1 , or or 2X-1 • Roughly X, where "roughly" defined later 1/31/2020 CompSci 201, Spring 2020 21

  21. Runtimes summarized • Re Re-sizing ing g geometric icall ally a and a addit itiv ively • Allocate new array, copy all pointers/references grow with x 2 grow with x 1.25 grow with +10000 size time size time size time 1000000 0.028 1000000 0.051 1000000 1.507 2000000 0.037 2000000 0.087 2000000 1.585 3000000 0.053 3000000 0.117 3000000 2.740 4000000 0.066 4000000 0.153 4000000 5.146 5000000 0.117 5000000 0.218 5000000 7.304 6000000 0.121 6000000 0.338 6000000 8.315 7000000 0.143 7000000 0.303 7000000 10.428 8000000 0.211 8000000 0.398 8000000 14.233 9000000 0.270 9000000 0.452 9000000 21.434 10000000 0.260 10000000 0.468 10000000 21.927 1/31/2020 CompSci 201, Spring 2020 22

  22. Diyad ArrayList Summary • If we e grow ad addit itively: + +1, o 1, or +100, 100, o or +1000 1000 • Performance is quadratic, for an array of N elements we expect N 2 time (allocate/copy) • If we e grow g geo eometrically: * *2, * 2, *1.2 1.2, * *3 • Performance is linear, for an array of N elements we expect N time (allocate copy) or 100*N 2 or • Ignore c e constant ants: : N 2 /2 or or 200N or or … … 1/31/2020 CompSci 201, Spring 2020 23

  23. WOTO http:// //bi bit.ly/2 /201spr pring20-01 0131 31-1 1/31/2020 CompSci 201, Spring 2020 24

  24. Maria Klawe • President of Harvey Mudd • Dean of Engineering at Princeton, ACM Fellow, College Dropout (and re-enroller) I personally believe that the most important thing we have to do today is use technology to address societal problems, especially in developing regions Coding is today's language of creativity. All our children deserve a chance to become creators instead consumers of computer science. 1/31/2020 CompSci 201, Spring 2020 25

  25. Generic ConformingArrayList • Rathe her t than S an String ing, u use g e gener neric t ic type p parameter • Can use E, T, Type, any identifier <E> • Similar to code for GrowableStringArrayList • java.ut util il.L .Lis ist • Interface 1/31/2020 CompSci 201, Spring 2020 26

  26. Can E be anything? String, Point, … • Met etho hod .equals that w t works a ks as e expected f for E E ! • Internal array myStorage contains Objects • ConformingArrayList<String> • What .equals is called? Object or String? • Runtime decision, not t comp mpile ile t time decision • What does elt reference/point to? String!!! 1/31/2020 CompSci 201, Spring 2020 27

  27. Why Diyad? • Trad adit itio iona nall lly us use Ar ArrayList<E> -- -- client ient code de • Understand methods via API • Problem solving in many contexts • Efficie icienc ncy: : a.get(1) as f s fast a st as a.get(1000) • Why ef efficie icient nt? U Understand anding ing b by analysis is • From the internal array which is efficient • From doubling on resize rather than adding one 1/31/2020 CompSci 201, Spring 2020 28

  28. Toward Applications • We e can an s spea eak w wit ith a h a lim limit ited v vocabulary • Learn vocabulary then speak, then read • We e can an als also w writ ite c code e sim imilarly • Eventually debugging may require understanding how .equals works • https:// ://arxiv iv.o .org/pdf/17 /1711. 11.00975.p 00975.pdf Scalable Streaming Tools for Analyzing N-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass 1/31/2020 CompSci 201, Spring 2020 29

  29. Massive Data sets • How do we e find ind w wha hat #ha #hashtags ar are t e trending o on n Twit itter in in real eal-tim ime? • 6,000 tweets/second, 350,000/minute, … • Do we weight by tweeter-importance? • Mus ust b be e ab able le t to lo look up up ver ery q quic uickly, c can annot s skim throug ugh a h all h has asht htags/all d all data • Conveniently, we use hashing and hash tables! 1/31/2020 CompSci 201, Spring 2020 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend