A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: - - PowerPoint PPT Presentation

a quest to understand the organisation of large
SMART_READER_LITE
LIVE PREVIEW

A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: - - PowerPoint PPT Presentation

A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: FROM WILD GOATS TO BITCOINS Rmy Cazabet INTRODUCTION: THE QUEST QUEST Data coming from the real world Human/Animal/Natural activity Complex Systems Many


slide-1
SLIDE 1

A QUEST TO UNDERSTAND THE ORGANISATION OF LARGE RELATIONAL DATA: FROM WILD GOATS TO BITCOINS

Rémy Cazabet

slide-2
SLIDE 2

INTRODUCTION: THE QUEST

slide-3
SLIDE 3

QUEST

  • Data coming from the real world
  • Human/Animal/Natural activity
  • Complex Systems
  • “Many entities in interaction”
  • “The whole is more than the sum of its parts” (…?…)
  • The system is not understandable by reductionism: understanding each part

very well is not enough to understand how the system works

slide-4
SLIDE 4

QUEST

  • Data coming from the real world
  • Human/Animal/Natural activity
  • Complex Systems
  • “Many entities in interaction”
  • “The whole is more than the sum of its parts” (…?…)
  • The system is not understandable by reductionism: understanding each part

very well is not enough to understand how the system works

Note : why “understand” ?

  • Goal in itself (physics, sociology, biology, (CS ?)…)
  • Understanding => building good models => predict, detect “exceptions”, …
slide-5
SLIDE 5

TOOL: COMPLEX NETWORKS

  • Entities in relations/interaction:
  • Individuals exchange information/money/physical things
  • Genes/Proteins/Cells interact through known or unknown means
  • Web pages/articles/Patents… reference each other
  • Individuals/animals/things belong to same groups/have common traits
  • => Entities: nodes
  • =>Relations: edges
  • With/Without attributes (categories, numeric, time, …)
slide-6
SLIDE 6

TOOL: COMPLEX NETWORKS

slide-7
SLIDE 7

TOOL: COMPLEX NETWORKS

  • Networks are interesting for their structure, their
  • rganisation
  • Neighbours of my neighbours are also my neighbours ?
  • Individuals with same attributes than me are more likely to be my Nb. ?
  • There are “dense groups” (communities) ?
  • Some nodes are more “strategically” positioned ?
  • Objective: Understand/discover/analyse/reproduce this

structure

slide-8
SLIDE 8

CHAPTER ONE : WHAT I’VE DONE

slide-9
SLIDE 9

SCIENTIFIC JOURNEY

  • PhD : Toulouse, Dynamic Community Detection in Temporal

networks

  • Postdocs:
  • Tokyo (2y), Understanding cooperation in social media
  • ENS de Lyon (1y), Understanding usages of Bicycle Sharing Systems
  • Paris (1y), Fraud detection in crypto-currencies
slide-10
SLIDE 10

IZARDS (WILD GOATS)

  • Social animals
  • 20y of observations
  • (Position/co-location)
  • Persistence of groups ?
  • Despite deaths/climate

change ?

1999 1998 1997 1996 1995 1994 2001 2002 2003 2004 2005 2006 2007
slide-11
SLIDE 11

FACEBOOK

  • Can we discover your “social

circles” from your ego- networks ?

  • How do you like it ?
slide-12
SLIDE 12

TRENDING TOPICS

slide-13
SLIDE 13

TRENDING TOPICS

´ Ev´ enement d´ etect´ e Date de cr´ eation Date de fin Date de sortie D´ elai de d´ etection (j) Devil May Cry 02/12/2007 08/08/2008 31/01/2008

  • 60

Fable 2 06/12/2008 03/02/2009 18/12/2008

  • 12

Gears Of War 2 14/10/2008 29/12/2008 07/11/2008

  • 24

Assassin’s Creed 25/01/2008 26/02/2008 31/01/2008

  • 6

Soul Calibur IV 07/07/2008 15/11/2008 31/07/2008

  • 24

Uncharted 11/11/2007 02/01/2008 16/11/2007

  • 5

2009 2008 2007 METAL GEAR JEU DESARMEMENT METAL GEAR SOLID VIDEO DE JEU

slide-14
SLIDE 14

SPACE-CORRECTED COMMUNITIES

Normal community detection

slide-15
SLIDE 15

SPACE-CORRECTED COMMUNITIES

Spatially corrected communities

slide-16
SLIDE 16

USER IDENTIFICATION IN BITCOIN

btc faucet coinbase easycoin easywallet flexcoin instawallet paytunia strongcoin 1081887 1125389 1164699 136 1382255 1383742 2 2060685 2170323 2213276 221533 2272939 2373452 2450702 2523225 2594636 2913748 3017504 310121 3104470 3142946 317 3211606 3327158 339363 3525055 3596858 3708232 377177 4351029 4641355 4888339 490726 4952060 4975459 5005079 5053363 511932 52 540648 5453832 5467309 551132 573705 619957 667033 70 81113 859718

Without community detection

Ground truth

slide-17
SLIDE 17

USER IDENTIFICATION IN BITCOIN

With community detection

GT H1 H4-l2

btc faucet coinbase easycoin easywallet flexcoin instawallet paytunia strongcoin 1081887 1125389 1164699 136 1382255 1383742 2 2060685 2170323 2213276 221533 2272939 2373452 2450702 2523225 2594636 2913748 3017504 310121 3104470 3142946 317 3211606 3327158 339363 3525055 3596858 3708232 377177 4351029 4641355 4888339 490726 4952060 4975459 5005079 5053363 511932 52 540648 5453832 5467309 551132 573705 619957 667033 70 81113 859718 107506 138756 139285 145491 146623 149568 170183 182296 18993 195281 23616 34076 48774 53655 68195 82461 90460 91473

slide-18
SLIDE 18

DYNAMIC COMMUNITY DETECTION

  • “Community Discovery in Dynamic Networks: A Survey”
  • With Giulio Rossetti (Pisa)
  • 50 methods, 40-60 pages
  • To be (Should be) published in ACM Computer Surveys

(Slooooow)

slide-19
SLIDE 19

TWITTER IN TIME OF CRISIS

2 4 6 8 10 12 6th Mrach 7th March 8th March 9th March 10th March 10th March 11th March 12th March 13th March 14th March 15th March 16th March 17th March 18th March 19th March 20th March 21st March 21st March 22nd March 23rd March 24th March

Normalized Retweet Count Time (per hour)

IS only AMP only Mixed

slide-20
SLIDE 20

MASSIVE PEER COOPERATION PROCESSES

slide-21
SLIDE 21

MASSIVE PEER COOPERATION PROCESSES

slide-22
SLIDE 22

MASSIVE PEER COOPERATION PROCESSES

slide-23
SLIDE 23

MASSIVE PEER COOPERATION PROCESSES

slide-24
SLIDE 24

MASSIVE PEER COOPERATION PROCESSES

GI LI LI AGG AGG AGG BB AGG

Simple variant Complex variant Exploiting creation 0.00 0.25 0.50 0.75 1.00 20 40 60

user frequency

categories 2nd category DANCE MAD MASHUPS MUSICALPERFORMANCE ORIGINALMUSIC PICTURE SINGING VOCALOIDVOICE VOICE

Fraction of famous videos

SINGING CG3D DANCE NOCATEGORY MASHUPS MUSIC MAD MUSICALPERFORMANCE MOVIE ORIGINALMUSIC ANIMATION VOICE OTHER PICTURE VOCALOIDVOICE VOCALOIDVOICE PICTURE OTHER VOICE ANIMATION ORIGINALMUSIC MOVIE MUSICALPERFORMANCE MAD MUSIC MASHUPS NOCATEGORY DANCE CG3D SINGING 0.2 0.6

Value

Color Key 0.00 0.25 0.50 0.75 1.00 1e+01 1e+03 1e+05 userRank cumulativeFrequency Type views references
slide-25
SLIDE 25

TEMPORAL PROFILES EVOLUTION

25

5000 10000 15000 20000 25000 30000 35000 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21

MONDAY TUESDAY WEDNESDAY TURSDAY FRIDAY SATURDAY SUNDAY

5000 10000 15000 20000 25000 30000 35000 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21

MONDAY TUESDAY WEDNESDAY TURSDAY FRIDAY SATURDAY SUNDAY

5000 10000 15000 20000 25000 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21

MONDAY TUESDAY WEDNESDAY TURSDAY FRIDAY SATURDAY SUNDAY

2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21 0 3 6 9 12 15 18 21

MONDAY TUESDAY WEDNESDAY TURSDAY FRIDAY SATURDAY SUNDAY

“Commercial” “Work” “Bars-Restaurants (?)” “Leisure” NMF : extract temporal profiles

slide-26
SLIDE 26

26

Main city Mall Main commercial street Main train station

(c) TPU3

Main campuses

  • f universities
slide-27
SLIDE 27

CHAPTER 2 : WHAT I’M DOING NOW

slide-28
SLIDE 28

CHAPTER 2 : WHAT I’M DOING NOW

(Struggling)

slide-29
SLIDE 29

CHAPTER 2 : WHAT I’M DOING NOW

(Struggling) (Trying to get fundings)

slide-30
SLIDE 30

DYNAMIC COMMUNITY DETECTION:

EMPIRICAL EVALUATION

  • Survey : classification, qualitative comparison
  • Empirical evaluation => strengths, weaknesses, …
slide-31
SLIDE 31

CHAPTER 3: WHAT’S NEXT

slide-32
SLIDE 32

WHAT’S NEXT

  • I’m open to all opportunities
  • There are “theoretical” questions I would like to explore:
  • Community Detection —VS— Clustering
  • Finding automatically the best network model
  • Communities?
  • Spatial?
  • Embedding? (many works now in ML/Data Mining)
  • Core Periphery?
  • =>Multi-criteria analysis/optimisation : Model cost (information theory) VS model accuracy
slide-33
SLIDE 33

THANK YOU ! QUESTIONS WELCOME