Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, - - PowerPoint PPT Presentation
Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, - - PowerPoint PPT Presentation
Modeling Topic Diffusion in Multi- Relational Bibliographic Information Networks Huan Gui, Yizhou Sun, Jiawei Han, George Brova UIUC Multi-relational Information Networks In the real word, objects are connected via different types of
SLIDE 1
SLIDE 2
Multi-relational Information Networks
- In the real word, objects are connected via
different types of relationships, forming multi- relational heterogeneous information networks
- E.g.
– in the bibliographic information network, researchers could be linked together via different types of relationships
- collaboration relationships, citation relationships, sharing
common co-authors, co-attending conferences, etc.
– In the social network case, people are connected
- via friendships, colleague relationships, family relationships,
etc.
SLIDE 3
Multi-relational Information Networks
SLIDE 4
Goal of this paper
- They address the problem of modeling
information diffusion in multi-relational information networks
– Propose multi-relational diffusion model
- Propose two models by extending the Linear Threshold
model
– Learn parameters of the diffusion model
- Learning from action log (a sequence of object set
recording when an object is activated)
- Using MLE
SLIDE 5
Dataset
- They extracted topics from papers’ titles and abstracts:
– 79 topics in DBLP dataset, and 30 topics in APS dataset, – study diffusion of these topics during selected periods when these topics have increasing popularity trends
SLIDE 6
Distributed Graph Summarization
SLIDE 7
Graph Summarization
- Give a compressed representation of the graph
SLIDE 8
Distributed graph processing systems
- Giraph: an open source implementation of
Pregel [8] proposed by Google
– This paper
- Others
– GraphLab: proposed by Carlos Guestrin – Trinity: A Distributed Graph Engine on a Memory Cloud [SIGMOD 2013] by Microsoft Research Asia
- Other distributed system in the database
– Hadoop: Google – Hyracks: by Michael Carey et al (ICDE 2011)
SLIDE 9
Algorithm
SLIDE 10
MapReduce Triangle Enumeration With Guarantees
SLIDE 11
Idea
- Divide graphs into multiple overlap partitions,
and distribute each partition to a mapper
- Based on TTP (Triangle Type Partition)
algorithm [CIKM 2013]
- Using multiple rounds to reduce the memory
cost
SLIDE 12
Contributions
- They propose Colored Triangle Type Partition
(CTTP), a multi-round MapReduce randomized algorithm for triangle enumeration
– Require rounds in the worst case
- E is the total number of edges
- m denotes the expected memory size of a reducer
- M the total available space.
– use M/E space per mapper, m space per reducer, and M words as total aggregate space
SLIDE 13
Results
They are the first to get the result for this graph
SLIDE 14
Component Detection in Directed Networks
SLIDE 15
Directional community
- They propose a novel concept of communities,
directional community
– nodes play two different roles, source and terminal, in a directed network
SLIDE 16
Proposed Methods
- They changed Markov Clustering (MCL) and its
variant, R-MCL methods
- Based on a simulation of stochastic flows on the
network
SLIDE 17
Case Study: Twitter
- Detecting Communities from Twitter Interaction
Network
– a directed edge from a source node to a terminal node is created if any of the following interactions happens
- retweet(forwards) a tweet
- reply to a tweet
- mention someone
SLIDE 18
Case Study: Twitter
- Source: post some tweets
- Terminal: spread the tweets
This hashtag represents the “No vull pagar” (“I don’t want to pay”) campaign, a protest in Catalonia at early April, 2012