DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams - PDF document

DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams Hua-Fu Li a , Suh-Yin Lee a , and Man-Kwan Shan b a Department of Computer Science and Information Engineering National Chiao-Tung University, Hsinchu 300, Taiwan {hfli, sylee}@csie.nctu.edu.tw b Department of Computer Science National Chengchi University, Taipei 116, Taiwan mkshan@cs.nccu.edu.tw StreamPath , to mine the set of all path traversal patterns Abstract over continuous Web click-streams. In the framework of SteamPath algorithm, it requires a user-specified Online, single-pass mining Web click streams poses some minimum support threshold minsup , and then mines path interesting computational issues, such as unbounded traversal patterns with estimated support values that are length of streaming data, possibly very fast arrival rate, higher than the minimum support threshold. Unfortunately, and just one scan over previously arrived click-sequences. the setting of minimum support threshold is quite tricky In this paper, we propose a new, single-pass algorithm, and it leads to the following problem that may hinder its called DSM-TKP (Data Stream Mining for Top-K Path popular use. traversal patterns), for mining top-k path traversal If the value of minimum support threshold is too small, patterns, where k is the desired number of path traversal the pattern mining algorithm may lead to the generation of patterns to be mined. An effective summary data structure thousands of patterns, whereas a too big one may often called TKP-forest (Top-K Path forest) is used to maintain generate a few patterns or even no answers. As it is the essential information about the top-k path traversal difficult to predict how many patterns will be mined with patterns of the click-stream so far. Experimental studies a user-defined minimum support threshold, the top- k show that DSM-TKP algorithm uses stable memory usage pattern mining has been proposed. and makes only one pass over the streaming data. The first top- k pattern mining algorithm Itemset-Loop was proposed by Fu et al. [5]. Itemset-Loop algorithm mines the k most frequent itemsets with lengths shorter 1. Introduction than a user-defined value of m . LOOPBACK and BOMO are FP-tree-based top- k pattern mining algorithms [4], and Recently, database and data mining communities have uses the same estimated mechanism of Itemset-Loop. focused on a new data model, where data arrive in the Moreover, experiments in [4] show that LOOPBACK and form of continuous streams . It is often referred to as data BOMO outperform the Itemset-Loop. TFP algorithm [11] streams or streaming data . Mining such streaming data is a FP-tree-based algorithm and mines the top- k closed poses some interesting computational issues, such as frequent itemsets with lengths longer than a user-specified unknown or unbounded length of the stream, possibly value of min_l . TSP [10] is the first algorithm to mine the very fast arrival rate, and inability to backtrack over top-k closed sequential patterns of lengths no less than the previously arrived data elements [2, 7]. Many applications user-defined minimum length of mined patterns min_l . generate data streams in real time, such as sensor data Recently, Metwally et al. [9] proposed a single-pass generated from sensor networks, transaction flows in retail algorithm to mine the top- k elements over data streams. chains, Web record and click-streams in Web applications, However, the top- k elements are top- k items. In this paper, performance measurement in network monitoring and we propose an efficient single-pass algorithm called traffic management, call records in telecommunications, DSM-TKP (Data Stream Mining for Top-K Path traversal and so on. patterns) to mine the top- k path traversal patterns over Mining clusters in evolving Web click-streams have Web click streams. An effective summary data structure been discussed in recent years [10, 11]. In this paper, we called TKP-forest (Top-K Path forest) and an efficient study the problem of mining top-k path traversal patterns structure pruning mechanism called KP (K Pruning) are in Web click-streams. The original problem of mining proposed to overcome the data stream mining algorithm path traversal patterns from a large static Web click- issues such as bounded space requirement and dataset was proposed by Chen et al. [3]. Recently, Li et al. approximation. Based on our knowledge, DSM-TKP is [6] proposed a first single-pass algorithm, called

DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams - PDF document

DSM-TKP: Mining Top-K Path Traversal Patterns over Web Click-Streams Hua-Fu Li a , Suh-Yin Lee a , and Man-Kwan Shan b a Department of Computer Science and Information Engineering National Chiao-Tung University, Hsinchu 300, Taiwan {hfli,

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Some Highlights of DSM-5 Jan Fawcett, MD Conflicts of Interest: More Enjoyment Than DSM-5

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

GRAPH TRAVERSAL PATH FINDING AND GRAPH TRAVERSAL Path finding refers to determining the shortest

Graph traversal anhtt-fit@mail.hut.edu.vn Graph Traversal We need also algorithm to traverse

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Web Mining Web Mining to automatically discover and extract information from Web

J ava/ DSM A Plat f orm f or Het erogeneous Comput ing W. Yu, A. Cox Depar t ment of Comput

Applying TDD for Creating DSM solutions: demo Juha-Pekka Tolvanen 30 October, 2016 DSM

Web Mining Web Mining to automatically discover and extract information from Web

Binary Tree Traversal Methods Preorder Inorder In a traversal of a binary tree, each

graph traversal Nov. 15/16, 2017 1 Today Recursive graph traversal depth first

Graph Traversal Graph Traversal with DFS/BFS One of the most fundamental graph problems is to

Binary Tree Traversal Methods Preorder Inorder In a traversal of a binary tree, each

1 Closed Patterns and Max-Patterns Closed Patterns and Max-Patterns A long pattern contains a

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Smart Irrigation Program (S.I.P.) (Direct Installation Program) June 3, 2011: Board of Public

Response Energy Payment Update Place your chosen This text box and image image here. The four

Oregon Small Commercial Service Airports COVID 19 Impact August 6, 2020 State Aviation Board

It Its confusing HEALTHPLAN2020 BHB UNIFIED UNIVERSAL MFR MOCKPLAN PORTABILITY UN-INSURED

Jrgen Holmquist Chair IESBA Page 1 IESBA CAG Background Code recognises that

INVESTOR PRESENTATION March 2018 Forward-Looking Statements Certain matters set forth herein

Doing Business with IRWD Engineering April 13, 2017 Agenda Doing Business with IRWD Engineering

When it Rains it Pours Why we need a Stormwater Utility 7 th District Meeting March 27, 2010