Altobelli B. Mantuan and Leandro A. F. Fernandes {amantuan, - PowerPoint PPT Presentation

prograf.ic.uff.br Altobelli B. Mantuan and Leandro A. F. Fernandes {amantuan, laffernandes}@ic.uff.br Acknowledgments Graphics Processing Research Laboratory at IC-UFF, Brazil

It Itemset min ining • Frequent pattern / Itemset: A set of one or more items that occurs frequently in a dataset ▪ k-itemset: X = {x 1 , …, x k } Items bought • Finding all the itemsets Beer, Nuts, Diaper ▪ combinatorial problems Beer, Coffee, Diaper Beer, Diaper, Eggs Example: Itemset = {Beer, Diaper} Nuts, Eggs, Milk Support = 4 Nuts, Coffee, Diaper, Eggs, Beer 2

It Itemset min ining procedure • Existing solutions for generating frequent itemsets ▪ Threshold parameters o Difficult to perceive the influence of the parameter ▪ Amount of itemset retrieved ▪ Search space o large area of search space is unnecessarily explored It would be interesting to use other information to reduce the search space for generating itemset 3

Flo lowchart - SCIM IM 4

Contributions • The spatial contextualization of items for mining interesting itemsets in transactional databases • A procedure for clustering items in the Solution Space of the Dual Scaling mapping • A procedure for generating closed itemsets based on spatial contextualization 5

Summary ry • Dual Scaling • Overlapping clustering procedure • SC-close procedure • Results • Conclusion 6

Dual l Scaling Stimulus High Anxiety Frequ. Migr. Low Anxiety Mid Anxiety Middle Age Occa. Migr. Rare Migr. Subject Hight BP Medium Average Aver BP Low BP Young Heavy Short Light Old Tall 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 2 1 0 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 1 3 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 1 4 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 5 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 15 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 • Greater the occurrence of a set of stimulus appear in the database, smaller the distance between these stimulus • Lower the frequency of the stimulus, further from the origin the stimulus will be positioned Nishisato, Shizuhiko. (1994). Elements of Dual Scaling: An Introduction to Practical Data Analysis. 8

Dual l Scaling – Work rking example 𝜓 -square distance matrix for pairs of items Item Transaction Item 1 2 3 4 5 6 1 2 3 4 5 6 1 0 16.6052 20.5072 3.2338 21.9519 18.4428 1 1 0 0 1 0 0 2 16.6052 0 10.9090 14.0740 10.4710 1.7637 2 1 0 0 1 0 0 3 20.5072 10.9090 0 17.5109 1.8114 9.7269 3 0 0 1 1 0 0 4 3.2338 14.0740 17.5109 0 21.4269 17.5219 4 0 0 1 0 1 0 5 21.9519 10.4710 1.8114 21.4269 0 10.8555 5 0 0 1 0 1 0 6 18.4428 1.7637 9.7269 17.5219 10.8555 0 6 0 0 1 0 1 0 7 0 0 1 0 1 0 Reference distance is the distance of each 8 0 0 1 0 1 0 item to origin of the space 9 0 0 1 0 0 1 10 0 1 0 0 1 0 1 2 3 4 5 6 11 0 1 0 0 0 1 Subject = Transaction 12 0 1 0 0 0 1 9.7467 3.2046 3.7234 8.2790 4.6087 3.7871 13 0 1 0 0 0 1 Stimulus = Item 14 0 1 0 1 0 0 15 0 1 0 0 0 1 • Transactions are not maped to the solution space • Using 𝜓 -square distance for pairs of items 9

Overlapping clu lustering procuderure 𝜓 -square distance matrix X 1 X 2 X 3 X 4 X 5 X 6 0 16.6052 20.5072 3.2338 21.9519 18.4428 X 1 16.6052 0 10.9090 14.0740 10.4710 1.7637 X 2 20.5072 10.9090 0 17.5109 1.8114 9.7269 X 3 X 4 3.2338 14.0740 17.5109 0 21.4269 17.5219 X 5 21.9519 10.4710 1.8114 21.4269 0 10.8555 X 6 18.4428 1.7637 9.7269 17.5219 10.8555 0 • 𝜓 -square distance between pairs of points • Each item point 𝑦 𝑗 is the center of cluster 𝐷 𝑗 • Parameter 𝑒𝑠 using to define 𝑑𝑠 𝑗 11

SC SC-close • SC-Close uses the compact vertical structure FP-tree ▪ Clusters reduce the search space on the FP-tree • SC-Close uses a CFI-tree for report only closed itemsets • Itemset formation rule 1. The itemset must belong to the cluster coverage 2. At least one item from itemset must belong to minimum coverage 3. Condition 2 is ignored if minimum coverage only includes the center point 13

Results • Using 11 databases from LUCS-KDD • Compared algorithms ▪ SCIM, FPClose, Slim, and TopPI • Metrics ▪ Mean All-Confidence, MDL, and processing time 14

Mean All ll-Confidence 15

Mean All ll-Confidence 16

MDL and processing tim ime Database Database Technique Technique # Patterns # Patterns Time (s) Time (s) L% L% Database Database Technique Technique # Patterns # Patterns Time (s) Time (s) L% L% 31.89 Letter recognition Letter recognition Slim Slim 1,231 1,231 34.30 34.30 31.89 Ecoli Ecoli FPClose FPClose 530 530 0.12 0.12 53.59 53.59 40.15 n = 20,000 n = 20,000 TopPI k = 7 TopPI k = 7 447 447 0.62 0.62 73.58 73.58 Slim Slim 25 25 0.16 0.16 40.15 n = 336 n = 336 q = 16, m = 80 q = 16, m = 80 75.28 SCIM dr = 0.03 SCIM dr = 0.03 89 89 0.48 0.48 75.28 TopPI k = 3 TopPI k = 3 60 60 0.25 0.25 68.67 68.67 q = 7, m = 26 q = 7, m = 26 35.82 mFeat 59.50 mFeat Slim Slim 5,121 5,121 10,053.93 10,053.93 35.82 0.03 SCIM dr = 0.02 SCIM dr = 0.02 66 66 0.03 59.50 12.12 n = 2;000 n = 2;000 TopPI k = 3 TopPI k = 3 3,567 3,567 1.35 1.35 72.89 72.89 Connect-4 Connect-4 Slim Slim 1,506 1,506 88.79 88.79 12.12 MDL metric q = 240, m = 1;648 q = 240, m = 1;648 11,943 3,351.34 56.00 SCIM dr = 0.00 SCIM dr = 0.00 11,943 3,351.34 56.00 n = 67,557 TopPI k = 8 TopPI k = 8 965 965 2.87 2.87 67.39 67.39 n = 67,557 The relative total compressed 112.52 q = 42, m = 126 q = 42, m = 126 Wine Wine FPClose FPClose 13,169 13,169 0.42 0.42 112.52 SCIM dr = 0.00 SCIM dr = 0.00 1,002 1,002 9.98 9.98 54.28 54.28 76.79 Slim Slim 55 55 0.22 0.22 76.79 size (L%) achieved by the set of Tic-tac-toe Tic-tac-toe FPClose FPClose 42,684 42,684 2.98 2.98 145.81 145.81 n = 178, n = 178, 53.19 TopPI k = 2 TopPI k = 2 68 68 0.25 0.25 90.22 90.22 Slim Slim 125 125 0.22 0.22 53.19 q = 13, m = 65 q = 13, m = 65 patterns retrieved by each n = 958 n = 958 0.04 SCIM dr = 0.03 SCIM dr = 0.03 60 60 0.04 88.92 88.92 TopPI k = 3 TopPI k = 3 250 250 0.31 0.31 86.27 86.27 q = 9, m = 27 q = 9, m = 27 algorithm 91.32 Page blocks Page blocks FPClose FPClose 714 714 0.20 0.20 3.90 3.90 0.06 SCIM dr = 0.02 SCIM dr = 0.02 330 330 0.06 91.32 3.84 Slim Slim 40 40 0.19 0.19 3.84 Led7 Led7 FPClose FPClose 1,936 1,936 0.20 0.20 25.77 25.77 n = 5,473 n = 5,473 TopPI k = 1 TopPI k = 1 31 31 0.29 0.29 83.09 83.09 25.34 Slim 78 0.16 25.34 Slim 78 0.16 q = 10, m = 41 q = 10, m = 41 n = 3,200 n = 3,200 Processing time 38.23 SCIM dr = 0.00 SCIM dr = 0.00 54 54 0.13 0.13 38.23 TopPI k = 3 67 0.30 69.06 TopPI k = 3 67 0.30 69.06 q = 7, m = 14 q = 7, m = 14 38.67 Pen digits Slim 1,220 45.23 38.67 0.04 Pen digits Slim 1,220 45.23 SCIM dr = 0.02 15 70.54 SCIM dr = 0.02 15 0.04 70.54 Slim and TopPI runned in n = 10,992 TopPI k = 7 401 0.55 76.14 n = 10,992 TopPI k = 7 401 0.55 76.14 Pima Pima FPClose 1,608 0.17 41.47 FPClose 1,608 0.17 41.47 single thread q = 16, m = 79 q = 16, m = 79 SCIM dr = 0.04 148 0.37 76.51 31.35 SCIM dr = 0.04 148 0.37 76.51 Slim 55 0.21 31.35 Slim 55 0.21 n = 768 n = 768 38.84 Waveform Slim 717 8.49 38.84 Waveform Slim 717 8.49 TopPI k = 3 717 0.31 54.57 TopPI k = 3 717 0.31 54.57 q = 8, m = 36 q = 8, m = 36 n = 5,000 TopPI k = 9 734 0.44 77.53 0.06 42.38 n = 5,000 TopPI k = 9 734 0.44 77.53 SCIM dr = 0.02 522 42.38 SCIM dr = 0.02 522 0.06 q = 21, m = 98 q = 21, m = 98 SCIM dr = 0.00 724 0.39 72.35 SCIM dr = 0.00 724 0.39 72.35 17

Conclusions • We showed that spatial contextualization can be used to guide a closed itemset generation • We provided an unsupervised clustering heuristic with cluster overlapping • We presented a procedure to reduce the search space in FP-tree for generation of closed itemsets 18

Thank you! prograf.ic.uff.br Project webpage 19

Altobelli B. Mantuan and Leandro A. F. Fernandes {amantuan, - PowerPoint PPT Presentation

prograf.ic.uff.br Altobelli B. Mantuan and Leandro A. F. Fernandes {amantuan, laffernandes}@ic.uff.br Acknowledgments Graphics Processing Research Laboratory at IC-UFF, Brazil It Itemset min ining Frequent pattern / Itemset: A set of one

Intuitive Parameterization of Distance-Based Clustering Techniques Altobelli de Brito Mantuan

Faades of Interest in Street View Panoramic Sequences Andr A. Arajo, Jonas C. Sampaio,

SOUTH SAN LEANDRO UPDATE City of San Leandro City Council Meeting June 2, 2014 COUNCIL GOALS

Real-Time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak Real-Time Systems Group

Smart Customers & Smart Marketing Peter Altobelli Vice President - Yardi Systems Social

1 San Leandro Bay Bridge PM 18.5 2 Perform needed maintenance Keep the bridges safe for

Priority-based Wormhole Networks-on-Chip: challenges and opportunities Leandro Soares Indrusiak

How to Make Best Use of Cross-Company Data for Web Effort Estimation? Leandro L. Minku

CITY OF SAN LEANDRO, CALIFORNIA Fiber-Optic Master Plan Agenda Introduction Smart City

Software Effort Estimation as a Multi-objective Learning Problem Leandro Minku (

An Evaluation of Ensemble Learning for Software Effort Estimation Leandro Minku CERCIA, School

Chinese Whispers with a Twist Communication activity MARIANA LEANDRO CRUZ PhD researcher TU

Textures for Real-Time Ray Tracing Christian F. Ruff, Esteban W. G. Clua and Leandro A. F.

On the Generalization of Subspace Detection in Unordered Multidimensional Data Leandro Augusto

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Session 3. Numerical problem C. Fernandes, L.L. Ferr as, J.M. N obrega Institute for

HEP Computing Tools - lecture and tutorial Graduate lecture Rene Poncelet 9th/11th March 2020

Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

Lecture 5: Parallelism and Locality in Scientific Codes David Bindel 13 Sep 2011 Logistics

Exploring possible futures for territorial attractiveness - The ATTREG-future-model EUROPEAN

Updated Results on Higgs searches at CMS Cristina Botta (CERN) on behalf of the CMS

NS-RSA 2012 Oslo, March 2012 2 st Nordic Winter Conference Andreas P. Cornett IfG University

Sub-topics Electrical Characterization Laboratory & Field Investigations

Exploiting Synchrony and Symmetry in Relational Verification Lauren Pick 1 Relational

Altobelli B. Mantuan and Leandro A. F. Fernandes {amantuan, - PowerPoint PPT Presentation

prograf.ic.uff.br Altobelli B. Mantuan and Leandro A. F. Fernandes {amantuan, laffernandes}@ic.uff.br Acknowledgments Graphics Processing Research Laboratory at IC-UFF, Brazil It Itemset min ining Frequent pattern / Itemset: A set of one

Intuitive Parameterization of Distance-Based Clustering Techniques Altobelli de Brito Mantuan

Faades of Interest in Street View Panoramic Sequences Andr A. Arajo, Jonas C. Sampaio,

SOUTH SAN LEANDRO UPDATE City of San Leandro City Council Meeting June 2, 2014 COUNCIL GOALS

Real-Time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak Real-Time Systems Group

Smart Customers &amp; Smart Marketing Peter Altobelli Vice President - Yardi Systems Social

1 San Leandro Bay Bridge PM 18.5 2 Perform needed maintenance Keep the bridges safe for

Priority-based Wormhole Networks-on-Chip: challenges and opportunities Leandro Soares Indrusiak

How to Make Best Use of Cross-Company Data for Web Effort Estimation? Leandro L. Minku

CITY OF SAN LEANDRO, CALIFORNIA Fiber-Optic Master Plan Agenda Introduction Smart City

Software Effort Estimation as a Multi-objective Learning Problem Leandro Minku (

An Evaluation of Ensemble Learning for Software Effort Estimation Leandro Minku CERCIA, School

Chinese Whispers with a Twist Communication activity MARIANA LEANDRO CRUZ PhD researcher TU

Textures for Real-Time Ray Tracing Christian F. Ruff, Esteban W. G. Clua and Leandro A. F.

On the Generalization of Subspace Detection in Unordered Multidimensional Data Leandro Augusto

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Session 3. Numerical problem C. Fernandes, L.L. Ferr as, J.M. N obrega Institute for

HEP Computing Tools - lecture and tutorial Graduate lecture Rene Poncelet 9th/11th March 2020

Clustering CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani

Lecture 5: Parallelism and Locality in Scientific Codes David Bindel 13 Sep 2011 Logistics

Exploring possible futures for territorial attractiveness - The ATTREG-future-model EUROPEAN

Updated Results on Higgs searches at CMS Cristina Botta (CERN) on behalf of the CMS

NS-RSA 2012 Oslo, March 2012 2 st Nordic Winter Conference Andreas P. Cornett IfG University

Sub-topics Electrical Characterization Laboratory &amp; Field Investigations

Exploiting Synchrony and Symmetry in Relational Verification Lauren Pick 1 Relational

Smart Customers & Smart Marketing Peter Altobelli Vice President - Yardi Systems Social

Sub-topics Electrical Characterization Laboratory & Field Investigations