ZDD and its applications to intelligent processing Shin-ichi Minato - PowerPoint PPT Presentation

ZDD and its applications to intelligent processing Shin-ichi Minato Graduate School of Information Science and Technology Hokkaido University, Japan.

Background BDD-based algorithms have been developed mainly in  VLSI logic design area. (since early 1990’s.) Equivalence checking for combinational circuits.  Symbolic model checking for logic / behavioral designs.  Logic synthesis / optimization.  Test pattern generation.  Recently, BDDs are applied for not only VLSI design  but also for more general purposes. Data mining (Fast frequent itemset mining)  [Minato2005,2008,2010] Computation of Bayesian networks for probabilistic system  analysis.[Minato2007] Oct. 19, 2010 Shin-ichi Minato 2

BDD (Binary Decision Diagram) [Bryant86] Graph representation of Boolean function data.  Canonical form obtained by applying reduction rules  to a binary tree with a fixed variable ordering. a a 1 0 b b reduction b 0 1 c c c c c 0 1 1 0 1 0 1 0 1 1 1 0 Binary decision tree Reduced Ordered BDD equivalent to truth table Oct. 19, 2010 Shin-ichi Minato 3

BDD reduction rules (share) x x x x (jump) f 0 f 1 f 1 f 0 f f Share all equivalent nodes. Eliminate all redundant nodes. Gives a unique and compressed representation Gives a unique and compressed representation for a given Boolean function for a given Boolean function under a fixed variable ordering. under a fixed variable ordering. Oct. 19, 2010 Shin-ichi Minato 4

Effect of BDD reduction rules Exponential advantage can be seen in extreme cases.  Depends on instances, but effective for many practical ones.  O( n ) O(2 n ) Oct. 19, 2010 Shin-ichi Minato 5

BDD-based logic operation algorithm If we generate BDDs from the binary tree:  always requires exponential time & space. (  impracticable for large number of variables) Innovative BDD synthesis algorithm  Proposed by R. Bryant in 1986.  R. Bryant (CMU) Best cited paper for many years in EE&CS areas.  F F and G AND (Reduced) BDD BDD BDD (Reduced) BDD G (Reduced) BDD BDD A BDD can be constructed from the two operands of BDDs. (Computation time is linear to BDD size.) Oct. 19, 2010 Shin-ichi Minato 6

Boolean function and combinatorial itemset Boolean function: a b c F F = ( a b ~ c ) V (~ b c ) 0 0 0 0 Combinatorial itemset: 1 0 0 0 F = { ab , ac , c } 0 1 0 0  ab 1 1 0 1 (customer’s choice)  c Operations of combinatorial itemsets 0 0 1 1  can be done by BDD-based logic  ac 1 0 1 1 operations. 0 1 1 0 Union of sets  logical OR  Intersection of sets  logical AND 1 1 1 0  Complement set  logical NOT  Oct. 19, 2010 Shin-ichi Minato 7

Zero-suppressed BDD (ZDD) [Minato93] A variant of BDDs for combinatorial itemets.  Uses a new reduction rule different from ordinary BDDs.  Eliminate all nodes whose “1-edge” directly points to 0-terminal.  Share equivalent nodes as well as ordinary BDDs.  If an item x does not appear in any itemset, the ZDD  node of x is automatically eliminated. When average appearance ratio of each item is 1%, ZDDs are  more compact than ordinary BDDs, up to 100 times. x x (jump) (jump) 0 f f f f Zero-suppressed reduction Ordinary BDD reduction Oct. 19, 2010 Shin-ichi Minato 8

BDDs/ZDDs in the Knuth’s book The latest Knuth’s book fascicle (Vol. 4-1) includes a  BDD section with 140 pages and 236 exercises . In this section, Knuth used 30 pages for ZDDs,  including more than 70 exercises. I honored to serve  proofreading of the draft version of his article. Knuth recommended to use  “ZDD” instead of “ZBDD.” He named ZDD operation  set as “Family Algebra.” Knuth has developed his  own BDD/ZDD package. His recent lecture at Oxford  was titled “Fun with ZDDs. Oct. 19, 2010 Shin-ichi Minato 9

Algebraic operations for ZDDs Knuth evaluated not only the data structure of ZDDs,  but more interested in the new algebra on ZDDs . φ , {1} Empty and singleton set . (0/1-terminal) Returns the item-I D at the top node of P . P.top P.onset(v) Selects the subset of itemsets Basic operations P.offset(v) including or excluding v . (Corresponds to Switching v ( add / delete ) on each itemset. P.change(v) Boolean algebra) ∪ , ∩ , ＼ Returns union, intersection, and difference set . Counts number of combinations in P. P.count Cartesian product set of P and Q. P * Q New operations Quotient set of P divided by Q . introduced by P / Q Minato. Reminder set of P divided by Q . P % Q Formerly I called this “unate cube set algebra,” Useful for many Useful for many practical applications. but Knuth reorganized as “Family algebra.” practical applications. Oct. 19, 2010 Shin-ichi Minato 10

Frequent itemset mining Basic and well-known problem in database analysis.  Record Tuple ID Frequency threshold = 10 { b } 1 a b c 2 a b Frequency threshold = 8 { ab, a, b, c } 3 a b c 4 b c Frequency threshold = 7 5 a b { ab, bc, a, b, c } 6 a b c 7 c Frequency threshold = 5 {abc, ab, bc, ac, a, b, c } 8 a b c 9 a b c Frequency threshold = 1 10 a b {abc, ab, bc, ac, a, b, c } 11 b c Oct. 19, 2010 Shin-ichi Minato 11

Existing itemset mining algorithms Frequent itemset mining is one of the fundamental  data mining problems. Apriori [Agrawal1993]  First efficient method of enumerating all frequent patterns. Breadth-first search with dynamic programming. Eclat [Zaki1997]  Depth-first search algorithm. Less memory consuming. In some cases, faster than Apriori. FP-growth [Han2000]  Depth-first search using “FP-tree,” graph-based data structure. (  ZDD-growth [Minato2006]) LCM (Linear time Closed itemset Miner) [Uno2003]  with a theoretical bound as output linear time.  known as one of the fastest implementation.  Oct. 19, 2010 Shin-ichi Minato 12

Problem in LCM (and the most of others) LCM (and most of the other itemset mining algorithms)  focuses on just enumerating the frequent itemsets. It is a different matter how to store and index the result  of huge number of itemsets. If we want to post-process the mining results, once we have  to dump the frequent itemsets into storage. Even LCM is an output linear time algorithm, it may require  impracticable time and space. (  number of solution may be exponential.) Usually we control the output size with the minimum support  threshold in ad hoc setting, but we do not know if it may lose some important information. Oct. 19, 2010 Shin-ichi Minato 13

“LCM over ZDDs” [Minato et al. 2008] LCM: [Uno2003]  Output-linear time algorithm of frequent itemset mining. ZDD: [Minato93]  A compact graph-based representation for large-scale sets of combinations. Combination of the two techniques Generates large-scale frequent itemsets on the main Generates large-scale frequent itemsets on the main memory, with a very small overhead from the original LCM. memory, with a very small overhead from the original LCM. (  Sub-linear time and space to the number of solutions when ZDD compression works well.) Oct. 19, 2010 Shin-ichi Minato 14

LCM over ZDDs: An example The results of frequent itemsets are obtained as ZDDs  on the main memory. (not generating a file.) Record Tuple ID F 1 a b c 2 a b a 3 a b c 0 1 LCM over ZDDs 4 b c 5 a b Freq. thres. α = 7 b b 6 a b c 0 1 0 1 7 c { ab, bc, a, b, c } 8 a b c c c 1 1 9 a b c 0 0 10 a b 0 1 11 b c Oct. 19, 2010 Shin-ichi Minato 15

16 Original LCM LCM over ZDDs Shin-ichi Minato # solutions Oct. 19, 2010

Performance of LCM over ZDDs previous method (LCM-dump) new method (LCM over ZDDs) 400 3843.06 350 300 250 CPU time (sec) 200 150 100 50 0 mushroom T10I4D100K BMS-WebView-1 chess connect pumsb BMS-WebView-2 measured by a Linux PC, Core2Duo E6600, 2.4GHz, 2GB memory. Oct. 19, 2010 Shin-ichi Minato 17

Post Processing after LCM over ZDDs LCM over ZDDs Dataset 1 ZDD ? ZDD Dataset 1 ZDD ZDD LCM over ZDD algebraic ZDDs operation Dataset 2 Dataset 2 ZDD ZDD Distinctive Frequent All Frequent All Freq. Itemsets Itemsets Itemsets We can extract distinctive itemsets by comparing  frequent itemsets for multiple sets of databases. Various ZDD algebraic operations can be used for the  comparison of the huge number of frequent itemsets. Oct. 19, 2010 Shin-ichi Minato 18

Conclusion We presented our recent results on ZDD-based  techniques for data mining and knowledge discovery. Automatic compressed data for a huge size of itemsets.  Can be processed efficiently by using various set operations  without decompression. Limitation: no results obtained when memory overflow occurs.  In 1990’s, BDDs were only applied for VLSI design area.  On that time, the main memory capacity was not sufficient for  database applications. Recently, BDD/ZDD-based techniques becomes practicable for  many database application. We started a new nation-wide project “ERATO”:  “Discrete Structure Manipulation System” promoted by JST, scientific agency of Japan. Oct. 19, 2010 Shin-ichi Minato 19

ZDD and its applications to intelligent processing Shin-ichi Minato - PowerPoint PPT Presentation

ZDD and its applications to intelligent processing Shin-ichi Minato Graduate School of Information Science and Technology Hokkaido University, Japan. Background BDD-based algorithms have been developed mainly in VLSI logic design area.

BDD/ZDD-based knowledge indexing and real-life applications Shin-ichi Minato Hokkaido

Graphillion: ZDD-Based Compilation Tool for Graph Enumeration and Random Sampling Shin-ichi

Intelligent Management Solutions For Keys and Items Traka - iFOB iFOB = intelligent FOB The

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Intelligent Computer Mathematics Intelligent Computing? OR Franz Lichtenberger Mathematics

Outline What is ITS? Overview of ITS ITS Benefits ITS Applications What is a

Chittenden County Intelligent Transportation Systems (ITS) Plan - 2016 January 20, 2016 CCRPC

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

Intelligent Compaction and Pave-IR in Minnesota 2012 NCAUPG Technical Conference February 16,

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

Intelligent Agents: acting rationally AIMA chapter 2 Summary Intelligent Agents: acting

Intelligent Compaction GPS-based Compaction Control 23. & 24. January 2008, Dallas TX

Intelligent Lighting in a CREST Living Lab Hdi HAMDI 23/03/2015 Intelligent buildings Contenu

INTELLIGENT (SMART) E-COMMERCE

Cutting edge Forum on Autonomous Driving, Contributions from Intelligent Robotics, AI and ITS

Intelligent Driving Agents Intelligent Driving Agents Microscopic traffic simulation with

MajSynth An n -input Majority Algebra based Logic Synthesis Tool for QCA . Rajeswari Devadoss,

We are the community resource responsible for connecting, coordinating and funding vital services

e estate S k y l i g h t S p e c i a l N e e d s P l a n n i n g Te a m planning p

Investor Presentation March 2015 TSX.V: PLA www.plminerals.com Forward Looking Statement

Funding System Redesign October 13, 2015 10/13/2015 1 Context Whole life, whole day

Quality Paul Massey Founder and Director Bluefruit Software Agile On The Beach 2015

by Maciej Klepacki Characteristics of Specification By Example (Behavior Driven Development)

FrodoKEM practical quantum-secure key encapsulation from generic lattices Erdem Alkim Joppe W. Bos

ZDD and its applications to intelligent processing Shin-ichi Minato - PowerPoint PPT Presentation

ZDD and its applications to intelligent processing Shin-ichi Minato Graduate School of Information Science and Technology Hokkaido University, Japan. Background BDD-based algorithms have been developed mainly in VLSI logic design area.

BDD/ZDD-based knowledge indexing and real-life applications Shin-ichi Minato Hokkaido

Graphillion: ZDD-Based Compilation Tool for Graph Enumeration and Random Sampling Shin-ichi

Intelligent Management Solutions For Keys and Items Traka - iFOB iFOB = intelligent FOB The

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Intelligent Computer Mathematics Intelligent Computing? OR Franz Lichtenberger Mathematics

Outline What is ITS? Overview of ITS ITS Benefits ITS Applications What is a

Chittenden County Intelligent Transportation Systems (ITS) Plan - 2016 January 20, 2016 CCRPC

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

Intelligent Compaction and Pave-IR in Minnesota 2012 NCAUPG Technical Conference February 16,

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

Intelligent Agents: acting rationally AIMA chapter 2 Summary Intelligent Agents: acting

Intelligent Compaction GPS-based Compaction Control 23. &amp; 24. January 2008, Dallas TX

Intelligent Lighting in a CREST Living Lab Hdi HAMDI 23/03/2015 Intelligent buildings Contenu

INTELLIGENT (SMART) E-COMMERCE

Cutting edge Forum on Autonomous Driving, Contributions from Intelligent Robotics, AI and ITS

Intelligent Driving Agents Intelligent Driving Agents Microscopic traffic simulation with

MajSynth An n -input Majority Algebra based Logic Synthesis Tool for QCA . Rajeswari Devadoss,

We are the community resource responsible for connecting, coordinating and funding vital services

e estate S k y l i g h t S p e c i a l N e e d s P l a n n i n g Te a m planning p

Investor Presentation March 2015 TSX.V: PLA www.plminerals.com Forward Looking Statement

Funding System Redesign October 13, 2015 10/13/2015 1 Context Whole life, whole day

Quality Paul Massey Founder and Director Bluefruit Software Agile On The Beach 2015

by Maciej Klepacki Characteristics of Specification By Example (Behavior Driven Development)

FrodoKEM practical quantum-secure key encapsulation from generic lattices Erdem Alkim Joppe W. Bos

Intelligent Compaction GPS-based Compaction Control 23. & 24. January 2008, Dallas TX