Network Analytics ER Model Towards a Conceptual View of Network - - PowerPoint PPT Presentation

network analytics er model towards a conceptual view of
SMART_READER_LITE
LIVE PREVIEW

Network Analytics ER Model Towards a Conceptual View of Network - - PowerPoint PPT Presentation

Network Analytics ER Model Towards a Conceptual View of Network Analytics Qing Wang Research School of Computer Science The Australian National University Australia qing.wang@anu.edu.au 2 A Question 1 What is the role of conceptual


slide-1
SLIDE 1

Network Analytics ER Model Towards a Conceptual View of Network Analytics

Qing Wang Research School of Computer Science The Australian National University Australia qing.wang@anu.edu.au

2

slide-2
SLIDE 2

A Question1

  • What is the role of conceptual modelling in Big-data analytics, such as network

analysis? Conceptual modelling

?

Network analysis

————————————————————————————————————————————————————————————————————- The images are taken from Google Image.

3

slide-3
SLIDE 3

Motivating Example

  • Let’s start with a traditional ER model:

ARTICLE CITE WRITE JOURNAL AUTHOR PUBLISH CONFERENCE

+ 4

slide-4
SLIDE 4

Motivating Example

  • Queries in a bibliographical network:
  • Collaborative communities
  • Most influential articles
  • Top-k influential researchers
  • Correlation journal citation
  • ...

ARTICLE CITE WRITE JOURNAL AUTHOR PUBLISH CONFERENCE

+

5

slide-5
SLIDE 5

Motivating Example

  • Queries in a bibliographical network:
  • Collaborative communities
  • Most influential articles
  • Top-k influential researchers
  • Correlation journal citation
  • ...
  • Some questions:

– Semantic integrity: Are they semantically relevant and consistent? – Analysis efficiency: Can the efficiency be improved by leveraging their semantics at the conceptual level? – Network dynamics: Can they be dynamically performed so as to predict trends?

6

slide-6
SLIDE 6

Network Analytics ER Model

  • We propose the Network Analytics ER Model (NAER) that extends the tradi-

tional ER models in three aspects:

  • Structure

i.e., analytical types are added

  • Manipulation

i.e., topological constructs are added

  • Integrity

i.e., semantic constraints are extended.

7

slide-7
SLIDE 7

The NAER Model - Structure

  • Base types vs analytical types
  • Base types: from the data management perspective

i.e., how to control data

  • Analytical types: from the data analysis perspective

i.e., how to use data Base types Analytical types Base entity Analytical entity Base relationship Analytical relationship

  • Base types are the root from which analytical types can be derived.

8

slide-8
SLIDE 8

The NAER Model - Example 1

  • Sco for the query collaborative communities:
  • supp(author∗) = {author}
  • supp(coauthorship) = {author, article, write}.

COAUTHOR SHIP ARTICLE CITE CONFERENCE WRITE JOURNAL AUTHOR PUBLISH AUTHOR* ARTICLE WRITE AUTHOR

+

Sco

9

slide-9
SLIDE 9

The NAER Model - Example 2

CITATION ARTICLE CITE CONFERENCE WRITE JOURNAL AUTHOR PUBLISH

from to

ARTICLE* ARTICLE CITE

+

Sci

  • Sci for most influential articles and top-k influential researchers:
  • supp(article∗) = {article}
  • supp(citation) = {article, cite}

10

slide-10
SLIDE 10

The NAER Model - Example 3

  • Sjo for the query correlation journal citations:
  • supp(journal∗) = {journal}
  • supp(cocitation) = {article, cite, journal, publish}

ARTICLE CITE CONFERENCE WRITE JOURNAL AUTHOR PUBLISH JOURNAL* COCITATION ARTICLE CITE JOURNAL PUBLISH

+

Sjo

11

slide-11
SLIDE 11

The NAER Model - Manipulation

  • Using topological constructs to specify topological structures hidden underneath

base entities and relationships. (1) cluster-by classifies elements into a set of clusters. (2) rank-by assigns rankings to elements.

  • A topological measure is used in each topological construct,
  • centrality – Cent: A → N describing how central elements are in A, such

as degree, betweenness and closeness centrality.

  • similarity – Simi: A×A → N describing the similarity between two elements

in A, such as q-gram, adjacency-based and distance-based similarity.

12

slide-12
SLIDE 12

The NAER Model - Examples

  • Each collaborative community is a group of authors in a network over Sco

measured by closeness centrality. cluster-by(Sco, author∗, cent-closeness).

  • The influence of an article is ranked, indicating its influence in terms of a

network over Sci, and measured by indegree centrality. rank-by(Sci, article∗, cent-indegree).

  • Each correlation group contains journals that are correlated in a network
  • ver Sjo and measured by betweenness centrality.

cluster-by(Sjo, journal∗, cent-betweenness).

13

slide-13
SLIDE 13

The NAER Model - Integrity

  • Integrity constraints over topological constructs:
  • disjoint (resp. overlapping) on cluster-by

Clusters must be disjoint (resp. can be overlapping).

  • connected on cluster-by

For each cluster, there is a path between each pair of its members, running

  • nly through elements of the cluster.
  • edge-density on cluster-by

For each cluster, its members have more edges inside the cluster than edges with other members who are outside the cluster.

  • total (resp. partial) on rank-by

Every element must be (resp. may not necessarily be) ranked.

14

slide-14
SLIDE 14

Analytical Framework

  • Our analytical framework has three components:

– A relatively large core schema i.e., base entity and relationship types – A number of small topology schemas i.e., analytical entity and relationship types – A collection of query topics i.e., trees, each representing a hierarchy of query object classes

15

slide-15
SLIDE 15

Analytical Framework

COAUTHOR SHIP CITATION ARTICLE CITE CONFERENCE WRITE JOURNAL AUTHOR PUBLISH JOURNAL* COCITATION

from to

ARTICLE* AUTHOR* WRITE AUTHOR JOURNAL PUBLISH CITE ARTICLE Collaborative community Influential article (VLDB)

Query Core Schema Topology Schemas Query Topics Influence of article Correlation group Influential researcher (top k)

+

Sco Sci Sjo VLDB article Researcher

16

slide-16
SLIDE 16

Design Principles

  • But, how should we design such an analytical framework in practice?

(1) Identify data requirements (2) Design the core schema based on the data requirements (3) Identify query requirements (4) Design topology schemas based on the query requirements (5) Identify constraints

17

slide-17
SLIDE 17

Design Principles – Questions

Question I: What are data and query requirements?

  • Data and queries are two different kinds of requirements.
  • Queries in NA applications may exist in various forms, e.g.,
  • database queries in the traditional sense
  • analysis queries from a topological perspective
  • a combination of database and analysis queries
  • When designing a conceptual model for NA applications, we are particularly

interested in analysis queries.

18

slide-18
SLIDE 18

Design Principles – Questions

Question II: How are query requirements and query topics related?

  • Queries need to be analyzed to unravel:
  • The semantic structure of a query
  • The semantic structure among a set of queries
  • Each query Q is associated with a query topic tree t(Q).
  • If t(Q1) and t(Q2) coincide over some nodes, then it means that two queries

Q1 and Q2 are related.

19

slide-19
SLIDE 19

Design Principles – Questions

Collaborative community Influential article (VLDB) Influence of article Correlation group Influential researcher (top 10) Researcher VLDB article

t(Q1) t(Q2) t(Q3) t(Q4)

Influence of article Collaborative community Influential article (VLDB) Influence of article Correlation group Influential researcher (top k) VLDB article Researcher

(b) (a)

20

slide-20
SLIDE 20

Design Principles – Questions

Question III: How are the core and topology schemas designed?

  • Central idea:

(1) Data requirements should be captured by the core schema. (2) Query requirements should be captured by a collection of topology schemas.

  • Two criteria for designing topology schemas:
  • Topology schemas should be small.
  • Topology schemas should be dynamic.

21

slide-21
SLIDE 21

Composition of Topology Schemas

(a) Composed through an analytical type, i.e., has

Sci

HAS

Sjo

ARTICLE* JOURNAL*

Sco

AUTHOR*

(b) Composed through a base type, i.e., Write and Publish

Sco Sci

WRITE

Sci

PUBLISH

Sjo

ARTICLE* ARTICLE* AUTHOR* JOURNAL*

22

slide-22
SLIDE 22

Conclusions and Future Work

  • We proposed the NAER model – a conceptual modelling paradigm that incor-

porates both data and query requirements of network analysis.

  • Enable us to better understand the semantics of data and queries, and how

they interact with each other;

  • Avoid unnecessary computations in network analysis queries;
  • Support comparative network analysis.
  • We plan to implement the NAER model over network analysis applications.
  • Establish an analytical framework;
  • Incorporate a query engine for processing topic-based queries.

23