Center for Science of Information Bryn Mawr Structural - - PowerPoint PPT Presentation

center for science of information
SMART_READER_LITE
LIVE PREVIEW

Center for Science of Information Bryn Mawr Structural - - PowerPoint PPT Presentation

Center for Science of Information Center for Science of Information Bryn Mawr Structural Information: Howard MIT Progress Report Princeton Purdue Wojciech Szpankowski Stanford Purdue University Texas A&M UC Berkeley (jointly


slide-1
SLIDE 1

Science & Technology Centers Program

Center for Science of Information

Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego UIUC University of Hawaii

National Science Foundation/Science & Technology Centers Program

Center for
 Science of Information

Structural Information: Progress Report

Wojciech Szpankowski Purdue University

(jointly with A. Grama, A. Magner, and J. Sreedharan)

1

slide-2
SLIDE 2

Science & Technology Centers Program

Center for Science of Information

Outline

  • 1. Science of Information
  • 2. TIMES: Temporal Information Maximally Extracted

from Structure

  • 3. Structural Compression
  • 4. TIMES: Recovering Partial Order
  • 5. Experimental Results
  • Synthetic Network
  • Real network
  • Functional Brain Network

2

slide-3
SLIDE 3

Science & Technology Centers Program

Center for Science of Information

3

▪ SCIENCE OF INFORMATION builds on Shannon’s principles to address key challenges in understanding information that nowadays is not only communicated but also acquired, curated, organized, aggregated, managed, processed, suitably abstracted and represented, analyzed, inferred, valued, secured, and used in various scientific, engineering, and socio-economic processes.

What is Science of Information?

CSoI MISSION: Advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems.

▪ Claude Shannon laid the foundation of information theory, demonstrating that problems of data transmission and compression (i.e., reliably reproducing data) can be precisely modeled formulated, and analyzed.

slide-4
SLIDE 4

Science & Technology Centers Program

Center for Science of Information

4

▪ Extend Information Theory to meet new challenges in biology, economics, data & social sciences, and physical

distributed systems.

▪ Understand new aspects of information (embedded) in

structure, time, space, semantics, dynamic information, limited resources, complexity, representation invariant information, and cooperation & dependency.

Center’s Goals

slide-5
SLIDE 5

Science & Technology Centers Program

Center for Science of Information

Outline

  • 1. Science of Information
  • 2. TIMES: Temporal Information Maximally Extracted

from Structure

  • 3. Structural Compression
  • 4. TIMES: Recovering Partial Order
  • 5. Experimental Results
  • Synthetic Network
  • Real network
  • Functional Brain Network

5

slide-6
SLIDE 6

Science & Technology Centers Program

Center for Science of Information

Motivation: Infection Spread

6

Infection network Nodes as patients, edges formed among friends and family. Only structure info available. Patients admitted not at the order they got infected.

slide-7
SLIDE 7

Science & Technology Centers Program

Center for Science of Information

Further Motivation

7

▪ Network of biochemical reactions: (protein-protein interaction network) Cancer proteins tend to be ancient proteins

[Srivastava et al., Nature 2010]

Study of the phylogenetic tree ▪ Social networks: spread of information ▪ Financial transaction networks: flow of capital ▪ Spread of infectious diseases: origin and initial carriers

Ebola spread network

[Saey, ScienceNews Dec 2015]

slide-8
SLIDE 8

Science & Technology Centers Program

Center for Science of Information

Formulation

8

1 3 2 10 8 4 9 5 6 7 11 12 3 1 10 2 4 8 9 5 6 7 11 12

slide-9
SLIDE 9

Science & Technology Centers Program

Center for Science of Information

Minimax Risk

9

Node age estimator is a function Minimax risk for a random graph model and a distortion measure A node age recovery problem is a tuple

Random graph model Random adversary function Distortion measure between permutations Random graph model Distortion measure Estimator Adversary distribution Relabeled graph

slide-10
SLIDE 10

Science & Technology Centers Program

Center for Science of Information

Structural Quantities on Graphs

10

Sets of permutations associated with a random graph model Set of feasible permutations Set of admissible graphs (unlabeled structures)

2 3 4 1 3 3 2 4 3 2 1 3 3 2

slide-11
SLIDE 11

Science & Technology Centers Program

Center for Science of Information

Lower Bounds on Minimax Risk

11

Approximate recovery (Kendall Tau distance) Exact recovery Theorem:

Set of admissible graphs Set of feasible permutations

slide-12
SLIDE 12

Science & Technology Centers Program

Center for Science of Information

12

Erdős–Rényi & Preferential Attachment Models

Preferential Attachment model

slide-13
SLIDE 13

Science & Technology Centers Program

Center for Science of Information

13

Bad News!

Inapproximability result for Erdős–Rényi and preferential attachment graphs

Adversary Estimator

slide-14
SLIDE 14

Science & Technology Centers Program

Center for Science of Information

14

Further Bad News!!

ML estimation is not a good approach The maximum likelihood estimation solution set satisfies with high probability.

slide-15
SLIDE 15

Science & Technology Centers Program

Center for Science of Information

Outline

  • 1. Science of Information
  • 2. TIMES: Temporal Information Maximally Extracted

from Structure

  • 3. Structural Compression
  • 4. TIMES: Recovering Partial Order
  • 5. Experimental Results
  • Synthetic Network
  • Real network
  • Functional Brain Network

15

slide-16
SLIDE 16

Science & Technology Centers Program

Center for Science of Information

16

Compression of Graphs & Structures

Theorem (Structural entropy for a broad class of graph models) For a broad class of random graph models, From and

slide-17
SLIDE 17

Science & Technology Centers Program

Center for Science of Information

17

Asymmetry of Preferential attachment case

slide-18
SLIDE 18

Science & Technology Centers Program

Center for Science of Information

Structural Entropy for PAG


18

Theorem: Structural entropy of preferential attachment graphs Theorem: Entropy of preferential attachment graphs

slide-19
SLIDE 19

Science & Technology Centers Program

Center for Science of Information

19

Results of node age recover problem so far are pessimistic

Can we do better ?

slide-20
SLIDE 20

Science & Technology Centers Program

Center for Science of Information

Outline

  • 1. Science of Information
  • 2. TIMES: Temporal Information Maximally Extracted

from Structure

  • 3. Structural Compression
  • 4. TIMES: Recovering Partial Order
  • 5. Experimental Results
  • Synthetic Network
  • Real network
  • Functional Brain Network

20

slide-21
SLIDE 21

Science & Technology Centers Program

Center for Science of Information

21

Partial Orders and Binning

1 2 3 4 7 10 11 5 6 8 9 12

1 2 3 4 5 6 7 8 9 10 11 12 Look for partial orders instead of total orders Bin 1 Bin 2 Bin 3 Bin 4 Bin 5

slide-22
SLIDE 22

Science & Technology Centers Program

Center for Science of Information

22

Precision and Recall

Recall:

How much we are able to recover?

Precision:

How good are the guessed pairs?

# of correct pairs # of pairs ordered by bins (excluding those inside bins)

Density:

slide-23
SLIDE 23

Science & Technology Centers Program

Center for Science of Information

23

Constrained Optimization Problem

Different approach: phrase as an integer program. max Precision subject to

Set of partial orders

slide-24
SLIDE 24

Science & Technology Centers Program

Center for Science of Information

Approximating via Peeling algorithm

24

1 2 3 4 7 10 11 5 6 8 9 12

2 1 3 4 7 5 10 11 6 8 9 12

1 2 3 4 5 6 7 8 9 10 11 12

Bin 1 Bin 2 Bin 3 Bin 4 Bin 5

slide-25
SLIDE 25

Science & Technology Centers Program

Center for Science of Information

Outline

  • 1. Science of Information
  • 2. TIMES: Temporal Information Maximally Extracted

from Structure

  • 3. Structural Compression
  • 4. TIMES: Recovering Partial Order
  • 5. Experimental Results
  • Synthetic Network
  • Real network
  • Functional Brain Network

25

slide-26
SLIDE 26

Science & Technology Centers Program

Center for Science of Information

26

Numerical Results

LP relaxation gives an upper bound.

slide-27
SLIDE 27

Science & Technology Centers Program

Center for Science of Information

27

Theoretical Results

Perfect pair Theorem: Typical number of perfect pairs

slide-28
SLIDE 28

Science & Technology Centers Program

Center for Science of Information

28

Theoretical Results

Conjecture:

Number of descendants of any given vertex

slide-29
SLIDE 29

Science & Technology Centers Program

Center for Science of Information

Experiments: Synthetic Graphs

How robust is the algorithm?

Uniform Attachment model

: Ranking with bins given by Peeling algorithm

slide-30
SLIDE 30

Science & Technology Centers Program

Center for Science of Information

Experiments: Real-World Networks

: Perfect pairs only (nodes with a directed path between them)

slide-31
SLIDE 31

Science & Technology Centers Program

Center for Science of Information

Experiments: Brain Networks

Network formation

  • The network has 46 nodes, each of which represents a region in the brain
  • An initial network is formed from fMRI images of a human brain in resting state
  • Each node here is a voxel and there are 243,648 voxels.
  • Each voxel has a time series of data for ~ 350s.
  • Pearson correlation coefficient is computed between time series data of each pair of

voxels.

  • If the correlation > 0.8 we form an edge between the voxels.
  • In order to form a network of regions, we make logical OR of the rows and columns in

the adjacency matrix of voxel network corresponding to each region.

Find age orderings of regions of two different species, based on fMRI images of a same activity. Conjecture: There exists high correlation between these

  • rderings of species evolved from the same genetic

parent.

slide-32
SLIDE 32

Science & Technology Centers Program

Center for Science of Information

32

0: bin with oldest nodes 15: bin with newest nodes

slide-33
SLIDE 33

Science & Technology Centers Program

Center for Science of Information

33

Bin 13 Bin 5 Bin 14 Bin 12 Bin 0 Bin 3