The network-untangling problem: From interactions to activity - PowerPoint PPT Presentation

The network-untangling problem: From interactions to activity timelines Polina Rozenshtein (Nordea DS Lab, Finland) Nikolaj Tatti (University of Helsinki, Finland) Aristides Gionis (Aalto University, Finland) ECML/PKDD’17 + journal extension

Temporal networks • Temporal graph ! = #, % • # – set of entities (e.g. people, sensors, locations.. ) • Edges &, ', ( ∈ % – instantaneous interactions over entities • &, ' ∈ V • ( is the time of interaction • tweets, emails, comments on social networks..

Problem setting • consider a set of entities • entities can become active or inactive • entities interact over time, forming a temporal network • each interaction is attributed to an active entity

Problem setting • consider a set of entities • entities can become active or inactive • entities interact over time, forming a temporal network • each interaction is attributed to an active entity • can we reconstruct the activity timeline that explains best the observed temporal network? • assumption: being active is more costly, thus we want to minimize total activity time

Motivating example • analyze a discussion in twitter about a topic (e.g., brexit) • entities are hashtags • two hashtags interact if they appear in the same tweet • summarize the discussion by reconstructing a timeline • pick a set of important hashtags and the time intervals they are active

Motivating example #economy #brexit #negotiations #hardbrexit #tory time

Problem formulation • given a temporal network ! = ($, &) with & = {(), *, +)} - . = / . , 0 . – activity interval of ) ∈ $ (starts at / . and ends • at 0 . ) • find a set of activity intervals for all nodes • at most 2 per each node ) ∈ $

Problem formulation: preliminaries • given a temporal network ! = ($, &) with & = {(), *, +)} - . = / . , 0 . – activity interval of ) ∈ $ (starts at / . and ends • at 0 . ) • find a set of activity intervals for all nodes • at most 2 per each node ) ∈ $ • Activity timeline of ! is a set of activity intervals 3 = - .4 .∈5,4∈[7,8] • The timeline 3 covers temporal network ! , if for each edge ), *, + ∈ & we have + ∈ - .4 or + ∈ - :4 for some ; ∈ [1, 2] .

Problem formulation Problem 1. (Sum-Span) • Find a timeline ! = # $% $∈',%∈[*,+] that covers - and minimizes total length of ! . Problem 2. (Max-Span) • Find a timeline ! = # $% $∈',%∈[*,+] that covers - and minimizes maximum length of intervals in ! . • For the ease of analysis consider . = 1 and . > 1 separately

1-Sum-Span Problem 1-Sum-Span is NP-hard Consider subproblem Coalesce : • Assume we are also given one active time point ! " for each vertex # ∈ % . • Find an optimal activity timeline & , which contains the corresponding active time points ! " "∈' .

1-Sum-Span • Coalesce can be solved in linear time with factor 2 approximation, based on Binary LP-formulation. • Define a variable ! "# ∈ {0,1} for each vertex * ∈ + and time stamp , ∈ -(*) (moments of interactions of * ). • ! "# = 1 indicates that , is either the beginning or end of the active interval of * . • Binary LP: – Cost function min ∑ ",# |, − 7 " | ! "# – Constraints to ensure feasibility

1-Sum-Span • Relax the integrality and write the dual • Maximal solution to the dual program is a 2-approximation for Coalesce • Maximal solution can be found in one pass ( ! " , Alg. Maximal ) Iterate to solve 1-Sum-Span (Alg. Inner ): • Start with " $ = ("'( ) * + ",- ) * )/2 • Run Maximal and update " $ • Repeat until no improvement.

k-Sum-Span k-Sum-Span is are inapproximable Consider subproblem k-Coalesce : • Assume we are also given k active time points ! "# for each vertex $ ∈ & • One for each of activity intervals of $ • Find an optimal activity timeline ' , which contains the corresponding active time points ! "# "∈(,#∈[+,,] . • Similar BLP and Alg. k-Maximal , . !

k-Sum-Span Iterate to solve k-Sum-Span (Alg. k-Inner ): • Start with ! "# as centroids of a k-clustering algorithm • Run k-Maximal and update ! " • Repeat until no improvement

1-Max-Span 1-Max-Span can be solved efficiently Subproblem Budget : • Assume we are also given a set of budgets ! " "∈$ of interval durations for each vertex. • Find an optimal activity timeline % = ' " "∈$ , such that length of each activity interval ' " is at most ! " .

1-Max-Span Budget can be solved optimally in linear time Map Budget into 2-SAT: • Variable ! "# for each vertex $ and timestamp % ∈ '($) . • Clause (! "# ∨ ! +# ) – cover each edge ,, $, % . • Clause (! "/ ∨ ! "# ) – ensure budget: for each 0, % ∈ '($) , such that 0 − % > 3 " • Solution for Budget : time intervals where all boolean variables are True .

1-Max-Span Linear time: • 2-SAT is solved in linear-time of the number of clauses (Aspvall et all [1]). We have ! " # clauses. • Bottleneck: SCC decomposition !(" # + ") • algorithm by Kosaraju [2] for SCC decomposition • Use of temporal structure → perform DFS in !(") . Solve 1-Max-Span by binary search to find the optimal maximum length for intervals (Algorithm Budget, !(" log(")) ).

k-Max-Span k-Max-Span inapproximable • consider two nested subproblems Subproblem k-Partition : • Assume we are also given k-1 inactive time points ! "# for each vertex $ ∈ & • One for each of gap between the activity intervals of $ • Find an optimal activity timeline ' , which interleaves with corresponding gap points ! "( "∈),(+[-,./-]

k-Max-Span Problem k-Partition can be solved in polynomial time through • iteration of Problem k-Budget, which sets a budget for each interval. Subproblem k-Budget : Assume we are given a set of budgets ! " "∈$ of interval durations • for each vertex; k-1 inactive time points % "& for each vertex • Find an optimal activity timeline ' = ) " "∈$ , such that length of • each activity interval ) "& is at most ! "& and the gap points are interleaved k-Budget can be solved *(,) , similarly to Budget

k-Max-Span Iterate to solve k-Sum-Span (Alg. k-Budget ): • Start with ! "# as mean points of the largest intervals with no activity of node $ • Solve k-Partition : – do binary search on budgets with solving k-Budget – update ! "# • Repeat until no improvement

Summary Problem 1: Sum-Span • ! = 1 NP-hard • ! > 1 inapproximable • Subproblem (k-)Partition with inner points • 2-approximation in linear time via BLP dual for (k-)Partition

Summary Problem 2: Max-Span • ! = 1 polynomially solvable • ! > 1 inapproximable • Subproblem (k-)Budget with budgets • Exact solution in linear time via 2-SAT for (k-)Budget

Experiments: case study Tweets from Helsinki region, November 2013 • Inner algorithm ( 1-Sum-Span ) • winwin xbone yandex webdesign vision walkbase winner younited zenrobotics slush13 pureview nuijankopautus nokiaegm illuusio here kirkkonummi nokia typaikka elop nordis mtvema bestvideo bron emaazing ema2013 exo bestpop emazing worldwideactexo voteaustinmahone 3 6 9 12 15 18 21 24 27 30 Nov 1

Experiments: case study Helsinki Twitter, years 2011-2013 • k-Inner algorithm with k = 3 ( k-Max-S ) • slush2013 uutisraivaaja slushpitstop slush13 comingtoslush pelotoncamp slush12 tediili sailfish jolla digitalist aller startups garage48 slush11 startupsauna aaltoes startup padlette crowdfunding slush lumia sxsw digasell ipad supercell tivitforesight slush2012 slush2008 Jan, 11 Apr Jul Oct Jan, 12 Apr Jul Oct Jan, 13 Apr Jul Oct Jan, 14

Performance: Inner • Synthetic dataset, with planted ground truth • overlap ! is set to 0.5 • values are averaged over 100 runs. 1.00 5.5 Relative total length 0.98 5.0 4.5 0.96 Quality 4.0 0.94 3.5 0.92 3.0 P 0.90 2.5 R 0.88 2.0 F 0.86 1.5 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 iterations iterations

Performance : k-Inner • Synthetic dataset, k=10 intervals 0.86 8.5 Relative total length 8.0 0.84 7.5 7.0 Quality 0.82 6.5 0.80 6.0 P 5.5 R 0.78 5.0 F 4.5 0.76 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 iterations iterations

Performance : k-Budget • Synthetic dataset, k=10 intervals 54 0.95 Relative max length 52 0.90 50 0.85 48 Quality 46 0.80 44 0.75 P 42 R 40 0.70 F 38 1 2 3 4 5 6 7 8 9 10 0.65 iterations 1 2 3 4 5 6 7 8 9 10 iterations

Baseline comparison 0.9 0.8 Baseline: greedily ’cover’ the • 0.7 F-measure 0.6 k-Inner longest activity intervals of the k-Budget 0.5 nodes. k-Baseline 0.4 0.3 0.2 0.1 2 3 4 5 6 7 8 9 10 number of intervals 160 20 140 18 Relative max length Relative total length 16 120 k-Inner 14 100 k-Budget 12 80 k-Baseline k-Inner 10 k-Budget 60 8 k-Baseline 40 6 4 20 2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10 number of intervals number of intervals

The network-untangling problem: From interactions to activity - PowerPoint PPT Presentation

The network-untangling problem: From interactions to activity timelines Polina Rozenshtein (Nordea DS Lab, Finland) Nikolaj Tatti (University of Helsinki, Finland) Aristides Gionis (Aalto University, Finland) ECML/PKDD17 + journal extension

Untangling Composite Commits Untangling Composite Commits Using Program Slicing Using Program

Untangling and Restructuring CTDB Martin Schwenke < martin@meltin.net > Samba Team IBM

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Untangling the Charge Master / Coding Relationship for ICD-10: Bringing Charge Related Issues

Untangling Knots in Lattices and Proteins A Computational Study By Rhonald Lua Adviser:

School-Level Teacher Qualifications and School Environments: Untangling Their Interrelationship

Untangling Chinas Quest for Oil through State-backed Financial Deals Dr. Peter C. Evans

Untangling the marketing supply chain ADD YOUR BRAND HERE AGILITY SPEED VISIBILITY LOSS

Andrey Ptitsyn Sidra medical and Research Center Biological pathways: untangling the hairballs

The Power of Progressions: Untangling the Knotty Areas of Teaching and Learning Mathematics

Midwinter Meeting February 29, 2020 Untangling Medication Administration in Patients with

Untangling and Unwinding Curves Dagstuhl Workshop on Computational Geometry April 27, 2017 Jeff

Untangling Header Bidding Lore Some myths, some truths, and some hope Waqar Aqeel , Debopam

Statistical significance for untangling complex genotype- phenotype connections Jun Sese

The Power of Progressions: Untangling the Knotty Areas of Teaching and Learning Mathematics

Qt and Tizen together can do more Tomasz Olszak Qt, Tizen and Open Source enthusiast Why Qt and

Source-to-Source Compilation in Racket You Want it in Which Language? University of Bergen

Sector New Y Sector Ne w Yor ork Risk Insight Risk Insight Define maritime risks

Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and

How did Linux become a mainstream embedded operating system? Chris Simmonds 2net Limited

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain,

Distance Learning for All Secondary Guidance Compared Adapted from Salem-Keizer secondary

CRIMSON CIRCLE N E T W O R K WELCOME SHAUMBRA S H O U D 5 J A N U A R Y 2 0 2 0

The network-untangling problem: From interactions to activity - PowerPoint PPT Presentation

The network-untangling problem: From interactions to activity timelines Polina Rozenshtein (Nordea DS Lab, Finland) Nikolaj Tatti (University of Helsinki, Finland) Aristides Gionis (Aalto University, Finland) ECML/PKDD17 + journal extension

Untangling Composite Commits Untangling Composite Commits Using Program Slicing Using Program

Untangling and Restructuring CTDB Martin Schwenke &lt; martin@meltin.net &gt; Samba Team IBM

Problem Definition Problem Definition Problem Definition Problem Definition Problem Definition

Texture Synthesis Presented by James Hays Problem Statement 1 Problem Statement Problem

Untangling the Charge Master / Coding Relationship for ICD-10: Bringing Charge Related Issues

Untangling Knots in Lattices and Proteins A Computational Study By Rhonald Lua Adviser:

School-Level Teacher Qualifications and School Environments: Untangling Their Interrelationship

Untangling Chinas Quest for Oil through State-backed Financial Deals Dr. Peter C. Evans

Untangling the marketing supply chain ADD YOUR BRAND HERE AGILITY SPEED VISIBILITY LOSS

Andrey Ptitsyn Sidra medical and Research Center Biological pathways: untangling the hairballs

The Power of Progressions: Untangling the Knotty Areas of Teaching and Learning Mathematics

Midwinter Meeting February 29, 2020 Untangling Medication Administration in Patients with

Untangling and Unwinding Curves Dagstuhl Workshop on Computational Geometry April 27, 2017 Jeff

Untangling Header Bidding Lore Some myths, some truths, and some hope Waqar Aqeel , Debopam

Statistical significance for untangling complex genotype- phenotype connections Jun Sese

The Power of Progressions: Untangling the Knotty Areas of Teaching and Learning Mathematics

Qt and Tizen together can do more Tomasz Olszak Qt, Tizen and Open Source enthusiast Why Qt and

Source-to-Source Compilation in Racket You Want it in Which Language? University of Bergen

Sector New Y Sector Ne w Yor ork Risk Insight Risk Insight Define maritime risks

Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and

How did Linux become a mainstream embedded operating system? Chris Simmonds 2net Limited

CU-HTK April 2002 Switchboard System Phil Woodland, Gunnar Evermann, Mark Gales, Thomas Hain,

Distance Learning for All Secondary Guidance Compared Adapted from Salem-Keizer secondary

CRIMSON CIRCLE N E T W O R K WELCOME SHAUMBRA S H O U D 5 J A N U A R Y 2 0 2 0

Untangling and Restructuring CTDB Martin Schwenke < martin@meltin.net > Samba Team IBM