1
play

1 Dataset Dataset Alphabet Alphabet Items that can be found - PDF document

Background Powerset Explorer: A Datamining Application Jordan Lee 1 2 Background Background PAST PAST Datamining accomplished with human intuition Datamining accomplished with human intuition PRESENT Computer aided with


  1. Background Powerset Explorer: A Datamining Application Jordan Lee 1 2 Background Background � PAST � PAST – Datamining accomplished with human intuition – Datamining accomplished with human intuition � PRESENT – Computer aided with AI and brute force CPU cycles 3 4 Background Dataset � PAST – Datamining accomplished with human intuition � PRESENT – Computer aided with AI and brute force CPU cycles � FUTURE – Enter PowersetViewer…. 5 6 1

  2. Dataset Dataset Alphabet Alphabet � � – Items that can be found in transactions – Items that can be found in transactions – Eg. Apples, bread, chips – Eg. Apples, bread, chips Transaction � – Sets of items (unordered) – Eg. Tx1 = { Apples, Chips } – Eg. Tx2 = { Bread } 7 8 Dataset Example Dataset � Alphabet � Student enrollment database – Items that can be found in transactions – Eg. Apples, bread, chips Transaction � – Sets of items (unordered) – Eg. Tx1 = { Apples, Chips } – Eg. Tx2 = { Bread } � Transaction database – Collection of transactions (unordered, possibly repetitive) – Eg. Walmart transaction logs 9 10 Example Dataset Example Dataset � Student enrollment database � Student enrollment database – Alphabet = courses – Alphabet = courses � { CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 } � { CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 } – Transaction = courses student is enrolled in � #29389002 -> { CPSC 124, PHIL120, ENGL112 } 11 12 2

  3. Example Dataset Example Dataset (cont’d) 72423298 5 676 1701 3046 3900 1327 � Student enrollment database 38578546 7 175 178 1182 1701 3038 680 3912 – Alphabet = courses 7660625 5 326 676 1701 3038 3908 43359163 3 1177 1699 4317 � { CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 } 26495781 6 676 1177 1701 3038 3900 4275 – Transaction = courses student is enrolled in 48536452 4 1699 2339 1327 2826 64251972 6 676 1177 1701 3038 3900 2549 � #29389002 -> { CPSC 124, PHIL120, ENGL112 } 23212318 5 676 1701 3040 3813 3900 – Transaction DB = list of student course schedules 19820119 5 104 676 1699 3038 3900 65954629 4 480 676 3040 3908 54392012 5 676 1701 3038 3813 3899 85833501 5 676 1699 3040 3813 3900 65136197 5 676 1699 3038 3900 2580 13 14 Why? Why? � Why is this interesting? � Why is this interesting? – Consumer transaction logs -> trends in consumer buying 15 16 Why? Why? (cont’d) � Why is this interesting? � Dataset sizes growing exponentially – Consumer transaction logs -> trends in consumer buying – Student enrollment database -> trends in enrollment � What electives do most undergrad computer science students take? � Departments can determine which joint majors would fit the student population. 17 18 3

  4. Why? (cont’d) Why? (cont’d) � Dataset sizes growing exponentially � Dataset sizes growing exponentially – Human intuition has reached its limits – Human intuition has reached its limits – Require computers and AI (expensive) 19 20 Why? (cont’d) Powerset Explorer � Dataset sizes growing exponentially � Code base from TreeJuxtaposer (Munzner) – Human intuition has reached its limits – AccordianDrawer package – Require computers and AI (expensive) – Information visualization can scale the power of human intuition 21 22 Powerset Explorer � Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package � Goals TreeJuxtaposer 24 4

  5. Powerset Explorer Powerset Explorer � Code base from TreeJuxtaposer (Munzner) � Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package – AccordianDrawer package � Goals � Goals – Focus + context exploration using grids – Focus + context exploration using grids – Guaranteed visibility 25 26 Powerset Explorer Milestones Status Update � Code base from TreeJuxtaposer (Munzner) – AccordianDrawer package � Goals – Focus + context exploration using grids – Guaranteed visibility – Marking of groups 27 28 Milestones Status Update Milestones Status Update � #1 Completion of the basic visualization of a � #1 Completion of the basic visualization of a randomized database of small set size (~10) randomized database of small set size (~10) � #2 Addition of a single level of “marking”. 29 30 5

  6. Milestones Status Update Milestones Status Update � #1 Completion of the basic visualization of a � #1 Completion of the basic visualization of a randomized database of small set size (~10) randomized database of small set size (~10) � #2 Addition of a single level of “marking”. � #2 Addition of a single level of “marking”. � #3 Addition of multiple levels of “marking” (6) � #3 Addition of multiple levels of “marking” (6) � #4 Addition of background marking to demarcate areas of sets containing different amounts of items. 31 32 Milestones Status Update Milestones Status Update � #1 Completion of the basic visualization of a � #1 Completion of the basic visualization of a randomized database of small set size (~10) randomized database of small set size (~10) � #2 Addition of a single level of “marking”. � #2 Addition of a single level of “marking”. � #3 Addition of multiple levels of “marking” (6) � #3 Addition of multiple levels of “marking” (6) � #4 Addition of background marking to demarcate � #4 Addition of background marking to demarcate areas of sets containing different amounts of items. areas of sets containing different amounts of items. � #5 Implement multiple constraints � #5 Implement multiple constraints � #6 Increase maximum possible dataset size to at least 100. 33 34 Difficulties Difficulties � Multiple constraints difficult to implement on current server-side dataminer 35 36 6

  7. Difficulties Difficulties � Multiple constraints difficult to implement on � Multiple constraints difficult to implement on current server-side dataminer current server-side dataminer � Can not enumerate a powerset of alphabet � Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 size greater than 14 elements (integer = 32 bits) bits) – Solution: use java class BigInteger – Solution: use java class BigInteger � High CPU and memory usage – Solultion: upgrade computer! � hack 37 38 Current Status Current Status � Reduced database � Property file 0 CPSC 325 75.0 3 – 8680433 3 0 7 5 1 PHIL 327 84.0 1 2768129 2 6 4 2 ANTH 329 45.0 2 6385608 5 1 9 10 9 11 3 MATH 327 0.0 3 4 PSYC 328 0.0 1 147924 5 5 2 9 5 2 5 ENGL 329 0.0 2 234140 3 11 4 8 6 APSC 540 0.0 1 4331093 4 4 6 0 0 7 MECH 541 0.0 1 3158394 5 12 1 12 5 4 8 STAT 543 0.0 1 9 SPAN 201 71.0 1 5797538 6 11 4 3 13 12 4 10 FREN 258 76.0 2 6243191 1 5 11 ECON 260 84.0 1 5872060 4 3 8 9 6 12 LING 295 42.0 1 13 EECE 302 73.0 1 39 40 41 42 7

  8. 43 44 45 46 47 48 8

  9. 49 50 51 52 53 54 9

  10. 55 56 57 58 59 60 10

  11. 61 62 63 64 65 66 11

  12. Questions? 67 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend