READING THE NEWS THROUGH ITS STRUCTURE: NEW HYBRID CONNECTIVITY - - PowerPoint PPT Presentation

reading the news through its structure new hybrid
SMART_READER_LITE
LIVE PREVIEW

READING THE NEWS THROUGH ITS STRUCTURE: NEW HYBRID CONNECTIVITY - - PowerPoint PPT Presentation

READING THE NEWS THROUGH ITS STRUCTURE: NEW HYBRID CONNECTIVITY BASED APPROACHES Programa de Doutoramento em Cincias da Complexidade Doctoral Programme in Complexity Sciences Orientador / Advisor: Professor Jorge Manuel Anacleto Lou


slide-1
SLIDE 1

READING THE NEWS THROUGH ITS STRUCTURE: NEW HYBRID CONNECTIVITY BASED APPROACHES

Programa de Doutoramento em Ciências da Complexidade Doctoral Programme in Complexity Sciences Orientador / Advisor: Professor Jorge Manuel Anacleto Louçã David Manuel de Sousa Rodrigues March 17, 2014

slide-2
SLIDE 2

Outline of presentation

  • Context of this work and Related Work
  • Newspapers
  • Adaptive Networks
  • Q-analysis
  • Community detection
  • Ant Colony Optimisation
  • Hybrid Connectivity Based Approaches
  • Variation of Information and Dynamic Networks
  • Clustering News: Timelines with k-means
  • Clustering News: Community finding with Q-analysis filtering
  • Hamiltonian Paths in Q-analysis eccentricity matrices
  • Conclusions

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

2

17 March 2014

/ 40

slide-3
SLIDE 3

Objectives

  • The thesis presents four approaches to the problem of

identifying meaningful structure in the news published online.

  • This is a hard problem due to the high volume of

produced data and to the possible high dimensionality of the data collected.

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

3

17 March 2014

/ 40

slide-4
SLIDE 4

Contributions

  • The thesis shows how Hybrid Connectivity Based

Approaches give insights to news structure.

  • Adaptive Networks and Mutual Information
  • Clustering with k-means and feature vectors
  • Clustering news with pre-filtering with Q-analysis
  • Creating Hamiltonian paths of news using Q-analysis eccentricity

as distances.

  • New Ant Colony Optimisation Algorithm

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

4 / 40

slide-5
SLIDE 5

CONTEXT AND RELATED

Part I

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

5

17 March 2014

/ 40

slide-6
SLIDE 6

Context: newspapers (print)

Portuguese circulation UK Circulation

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

6

17 March 2014

/ 40

slide-7
SLIDE 7

Context: newspapers (electronic)

Internet traffic

Internet overtakes print as news

  • utlet

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

7

17 March 2014

/ 40

slide-8
SLIDE 8

Related work: Document analysis

  • Categorisation of documents (Supervised)
  • Machine learning
  • K-neighbours, SVM, NN, etc…
  • Clustering (unsupervised)
  • Document navigation
  • Sometimes associated with clustering
  • Information retrieval

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

8 / 40

slide-9
SLIDE 9

Related work: Networks

  • Network Science
  • Adaptive Networks
  • (interplay of topology dynamics and local dynamics of

networks)

  • Community Detection in Graphs
  • Clustering nodes of graphs
  • Q-analysis
  • Topological description of the high dimensionality of

structures.

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

9 / 40

slide-10
SLIDE 10

Related: Bio-inspired

  • Swarm Intelligence algorithms
  • Ant Systems
  • Ant Colony Optimisation
  • Travelling Salesman Problem
  • Anti-pheromone ideas
  • subtractive anti-pheromone (SAP)
  • 1 pheromone – subtracted from poor solutions
  • preferential anti-pheromone (PAP)
  • 2 pheromones but to solve bi-criterion optimisation problems

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

10 / 40

slide-11
SLIDE 11

HYBRID CONNECTIVITY BASED APPROACHES

Part II

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

11

17 March 2014

/ 40

slide-12
SLIDE 12

Research Opportunities

  • Finding Patterns in Data
  • Community Detection and Adaptive Networks
  • Q-analysis to describe high dimensional structures
  • Bio-inspired heuristics to solve
  • Combining Different Techniques to produce better

algorithms for existing problems.

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

12 / 40

slide-13
SLIDE 13

Hybrid Connectivity approaches

  • Hybrid?
  • This thesis proposes approaches that involve multiple techniques

Usually two techniques are used.

  • Connectivity?
  • Data is represented by entities and relations between them.
  • Binary relations (graphs)
  • n-ary relations (hypergraphs, etc..)

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

13 / 40

slide-14
SLIDE 14

TOPIC MONITORING WITH VARIATION OF INFORMATION AND DYNAMIC NETWORKS

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

14 / 40

slide-15
SLIDE 15

Description

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

15 / 40

slide-16
SLIDE 16

Main Results

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

16 / 40

slide-17
SLIDE 17

CLUSTERING NEWS:

constructing timelines of news with k-means

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

17 / 40

slide-18
SLIDE 18

Clustering with k-means

  • Objective: create clustered timelines of news to see time-

dependence of news.

  • Possibility to track back in time origins of stories
  • Create an interface for story navigation
  • Approach: tf.idf feature vectors clustered with k-means
  • Write interactive software for news navigation (part of Theseus)

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

18 / 40

slide-19
SLIDE 19

Clustering with k-means

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

19 / 40

slide-20
SLIDE 20

CLUSTERING NEWS:

finding communities with Q-analysis filtering

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

20 / 40

slide-21
SLIDE 21

Clustering with no filtering

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

21 / 40

slide-22
SLIDE 22

Fraction of vertices in resulting graphs

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

22 / 40

slide-23
SLIDE 23

Fraction of vertices in maximal cluster in relation to that particular subgraph

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

23 / 40

slide-24
SLIDE 24

Number of Clusters

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

24 / 40

slide-25
SLIDE 25

Modularity of the resulting clustering

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

25 / 40

slide-26
SLIDE 26

Software developed for visualisation of case study (on CD)

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

26 / 40

slide-27
SLIDE 27

HAMILTONIAN PATHS IN Q-ANALYSIS ECCENTRICITY MATRICES

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

27 / 40

slide-28
SLIDE 28

Two threads

  • Development of a novel Travelling Salesman Problem

algorithm

  • In collaboration with Vitorino Ramos [Rodrigues, 2011, Ramos 2011, Ramos 2013]
  • Application of Q-analysis eccentricities matrices as

distance matrices in the construction of Directed Hamiltonian Paths in the TSP problem.

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

28 / 40

slide-29
SLIDE 29

2nd Order Swarm Intelligence

  • Pharaoh's ants (Monomorium pharaonis) deposit a

pheromone as a 'no entry' signal to mark unrewarding foraging paths.

  • Double Pheromone Model on top of traditional ACS.
  • Traditional positive reinforcement pheromone
  • Use of Negative Pheromone to block bad paths.

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

29 / 40

slide-30
SLIDE 30

Results – Static problems

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

30 / 40

slide-31
SLIDE 31

Influence of negative pheromone

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

31 / 40

slide-32
SLIDE 32

Application to dynamic problems: recovery patterns

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

32 / 40

slide-33
SLIDE 33

Application to the News

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

33 / 40

slide-34
SLIDE 34

Software Developed (on CD)

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

34 / 40

slide-35
SLIDE 35

CONCLUSIONS

Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

35

17 March 2014

/ 40

slide-36
SLIDE 36

Main Contributions of this work

  • 4 approaches based on the connectivity of the system that

reveal the underlying structure of the news.

  • Each as advantages and disadvantages

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

36 / 40

slide-37
SLIDE 37

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

37 / 40

slide-38
SLIDE 38

Main Contributions of this work

  • 4 approaches based on the connectivity o the system that

reveal the underlying structure of the news.

  • Each as advantages and disadvantages
  • New Optimisation bio-inspired algorithm for TSP problems

(adaptable to new problems)

  • Software for dealing with gathering, processing, and

visualising these systems (Theseus)

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

38 / 40

slide-39
SLIDE 39

Limitations and Perspectives

  • a posteriori analysis
  • AN+VI
  • user defined parameters to fit the models
  • q in Q-analysis filtering, k in k-means, TTL in AN+VI
  • Size of data matrices
  • feature vectors in k-means or eccentricity matrices in SOSI.
  • Information theory based measures as signal detectors for

change

  • Bio inspired methods for new swarm intelligent algorithms
  • Topology based methods to reduce space of exploration
  • f solutions

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

39 / 40

slide-40
SLIDE 40

Limitations and Perspectives

  • There is no universal solution or general panacea

applicable to complex systems

  • Hybrid approaches have both advantages and

disadvantages.

  • Complexity practitioners need to do engineer problem-

driven solutions.

  • Q-analysis and any other low level description of data that

manipulates data to the least are important.

  • Bio-inspired algorithms are useful
  • Traditional combinatorial algorithm will cope badly with exponential

growth of data.

17 March 2014 Reading the News Through its Structure: New Hybrid Connectivity Based Approaches

40 / 40