Corrected network measures Introduction Overlap weight Corrected - - PowerPoint PPT Presentation

corrected network measures
SMART_READER_LITE
LIVE PREVIEW

Corrected network measures Introduction Overlap weight Corrected - - PowerPoint PPT Presentation

Corrected network measures V. Batagelj Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap weight Clustering coefficient IMFM Ljubljana and IAM UP Koper Corrected clustering coefficient


slide-1
SLIDE 1

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Corrected network measures

Vladimir Batagelj

IMFM Ljubljana and IAM UP Koper

CMStatistics (ERCIM) 2015 Senate House, University of London – December 12-14, 2015

  • V. Batagelj

Corrected network measures

slide-2
SLIDE 2

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Outline

1 Introduction 2 Overlap weight 3 Corrected overlap weight 4 Clustering coefficient 5 Corrected clustering coefficient 6 Conclusions 7 References

Vladimir Batagelj: vladimir.batagelj@fmf.uni-lj.si Current version of slides (December 16, 2015, 11 : 05): http://vlado.fmf.uni-lj.si/pub/slides/ercim15.pdf

  • V. Batagelj

Corrected network measures

slide-3
SLIDE 3

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Network element importance measures

To identify important / interesting elements (nodes, links) in a network we often try to express our intuition about important / interesting element using an appropriate measure (index, weight) following the scheme larger is the measure value of an element, more important / interesting is this element Too often, in analysis of networks, researchers uncritically pick some measure from the literature.

  • V. Batagelj

Corrected network measures

slide-4
SLIDE 4

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Network element importance measures

We discuss two well known network measures: the overlap weight of an edge (Onnela et al., 2007) and the clustering coefficient of a node (Holland and Leinhardt, 1971; Watts and Strogatz, 1998) . For both of them it turns out that they are not very useful for data analytic task to identify important elements of a given

  • network. The reason for this is that they attain the largest

values on ”complete” subgraphs of relatively small size – they are more probable to appear in a network than that of larger size. We show how their definitions can be corrected in such a way that they give the expected results.

  • V. Batagelj

Corrected network measures

slide-5
SLIDE 5

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Overlap weight – definition

The (topological) overlap weight of an edge e = (u : v) ∈ E in an undirected simple graph G = (V, E) is defined as

  • (e) =

t(e) (deg(u) − 1) + (deg(v) − 1) − t(e) where t(e) is the number of triangles (cycles of length 3) to which the edge e belongs. In the case deg(u) = deg(v) = 1 we set o(e) = 0. Introducing two auxiliary quantities m(e) = min(deg(u), deg(v))−1 and M(e) = max(deg(u), deg(v))−1 we can rewrite the definiton

  • (e) =

t(e) m(e) + M(e) − t(e), M(e) > 0 and if M(e) = 0 then o(e) = 0.

  • V. Batagelj

Corrected network measures

slide-6
SLIDE 6

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Overlap weight – properties

It holds 0 ≤ t(e) ≤ m(e) ≤ M(e). Therefore m(e) + M(e) − t(e) ≥ t(e) + t(e) − t(e) = t(e) showing that 0 ≤ o(e) ≤ 1. The value o(e) = 1 is attained exactly in the case when m(e) = M(e) = t(e); and the value o(e) = 0 exactly when t(e) = 0.

  • V. Batagelj

Corrected network measures

slide-7
SLIDE 7

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

US Airports links 1997

Wiley Post-Will Rogers Mem Deadhorse Ralph Wien Memorial Fairbanks Intl Nome St Mary’s Aniak Anchorage Intl Tuluksak Akiachak Akiak Kwethluk Bethel Napaskiak Napakiak Merle K {Mudhole} Smith Tuntutuliak Eek Kongiganak Kwigillingok Quinhagak Yakutat Dillingham King Salmon Gustavus Juneau Intl Kodiak St Paul Island Sitka Port Heiden James A Johnson Petersburg Wrangell Ketchikan Intl Sand Point Cold Bay Unalaska Eareckson As Adak Naf Bellingham Intl Glacier Park Intl Minot Intl William R Fairchild Intl Grand Forks Afb Grand Forks Intl Spokane Intl Great Falls Intl Seattle-Tacoma Intl Pangborn Memorial Grant County Missoula Intll Hector Intll Duluth Intl Bismarck Muni Pullman/Moscow Regional Helena Regional Yakima Air Terminal Marquette County Lewiston-Nez Perce County Tri-Cities Walla Walla Regional Bert Mooney Billings Logan Intl Gallatin Field Eastern Oregon Regional At Pen Portland Intl Mcnary Fld Minneapolis-St Paul Intl/Wold- Bangor Intl Central Wisconsin Cherry Capital Austin Straubel Intll Burlington Intl Outagamie County Roberts Field Mahlon Sweet Field Rapid City Regional Wittman Regional Rochester Intll La Crosse Muni Portland Intl Jetport Jackson Hole Joe Foss Field Boise Air Terminal /Gowen Fld/ Mbs Intll Fanning Field Friedman Memorial North Bend Muni Muskegon County Oneida County Dane County Regional-Truax Fie Greater Rochester Intl Syracuse Hancock Intl Bishop Intll General Mitchell Intll Greater Buffalo Intl Manchester Pocatello Regional Natrona County Intl Kent County Intl Capital City Albany County Waterloo Muni Tompkins County Twin Falls-Sun Valley Regional Detroit City Sioux Gateway Dubuque Regional Rogue Valley Intl- M General Edward Lawrence Logan Worcester Muni Kalamazoo/Battle Creek Interna Detroit Metropolitan Wayne Cou Binghamton Regional/Edwin A Li Greater Rockford Elmira/Corning Regional Klamath Falls Intll Erie Intl Chicago O’hare Intl Bradley Intl Cedar Rapids Muni William B. Heilig Field Merrill C Meigs Chicago Midway Jack Mc Namara Field Theodore Francis Green State Michiana Rgnl Transportation C Toledo Express Des Moines Intl Stewart Int’l Quad-City Cleveland-Hopkins Intl Wilkes-Barre/Scranton Intl Eppley Airfield Tweed-New Haven Youngstown-Warren Regional Westchester County Fort Wayne Intll Arcata Akron-Canton Regional Lincoln Muni University Park Elko Muni-J.C. Harris Field Long Island Mac Arthur Salt Lake City Intl Burlington Regional La Guardia Newark Intl Greater Peoria Regional Lehigh Valley Intll John F Kennedy Intl Redding Muni Pittsburgh Intll Yampa Valley Bloomington/Normal Purdue University Mercer County Harrisburg Intll University Of Illinois-Willard Port Columbus Intl Quincy Muni Baldwin Field James M Cox Dayton Intl Philadelphia Intl Capital Decatur Chico Muni Stapleton Intl Indianapolis Intl Eagle County Regional Reno/Tahoe Intll Atlantic City Intll Hulman Regional Kansas City Intl Aspen-Pitkin Co/Sardy Field Baltimore-Washington Intl Walker Field Cincinnati/Northern Kentucky I Washington Dulles Intl Lake Tahoe Washington National Columbia Regional City Of Colorado Springs Muni Lambert-St Louis Intl Sacramento Metropolitan Gunnison County Sonoma County Yeager Tri-State/Milton J.Ferguson Fi Pueblo Memorial Louisville Intl Charlottesville-Albemarle Evansville Regional Blue Grass Stockton Metropolitan Greenbrier Valley Williamson County Regional Forney Aaf Metropolitan Oakland Intl Wichita Mid-Continent Modesto City-County--Harry Sha Mammoth Lakes San Francisco Intl Richmond Intll San Jose Intll Roanoke Regional/Woodrum Field Merced Municipal/Macready Fiel Springfield Regional Cape Girardeau Regional Durango-La Plata County Joplin Regional Newport News/Williamsburg Inte Barkley Regional Norfolk Intl Fresno Air Terminal Monterey Peninsula Tri-Cities Regional Tn/Va Tulsa Intl Nashville Intll Piedmont Triad Intll Mc Carran Intl Drake Field Raleigh-Durham Intll Mc Ghee Tyson Asheville Regional Meadows Field Will Rogers World Fort Smith Regional Kinston Regional Jetport At St San Luis Obispo County-Mc Ches Amarillo Intl Charlotte/Douglas Intl Flagstaff Pulliam Memphis Intl Albuquerque Intl Lovell Field Fayetteville Regional/Grannis Santa Maria Pub/Capt G Allan H Greenville-Spartanburg Albert J Ellis Adams Field Huntsville Intl-Carl T Jones F Lawton Muni Santa Barbara Muni New Hanover Intll Oxnard Burbank-Glendale-Pasadena Ontario Intl Sheppard Afb/Wichita Falls Mun Los Angeles Intl Columbia Metropolitan Palm Springs Regional Long Beach /Daugherty Field/ Myrtle Beach Intl John Wayne Airport-Orange Coun Lubbock Intl The William B Hartsfield Atlan Birmingham Intl Texarkana Regional-Webb Field Phoenix Sky Harbor Intl Bush Field Charleston Afb/Intl Dallas/Fort Worth Intl Dallas Love Field San Diego Intl-Lindbergh Fld Yuma Mcas/Yuma Intl Columbus Metropolitan Monroe Regional Shreveport Regional Abilene Regional Gregg County Tyler Pounds Field Jackson Intll Dannelly Field Savannah Intll Tucson Intl Midland Intll El Paso Intl Waco Regional Alexandria Esler Regional Mathis Field Killeen Muni Mobile Regional Easterwood Field Baton Rouge Metropolitan, Ryan Jacksonville Intl Eglin Afb Pensacola Regional Tallahassee Regional Robert Mueller Muni Panama City-Bay Co Intl Lafayette Regional Lake Charles Regional New Orleans Intl/Moisant Fld/ Houston Intercontinental Jefferson County Gainesville Regional William P Hobby San Antonio Intl Daytona Beach Intl Orlando Intl Melbourne Intll Tampa Intl St Petersburg/Clearwater Intl Corpus Christi Intl Laredo Intl Sarasota/Bradenton Intl Palm Beach Intl Southwest Florida Intl Rio Grande Valley Intl Mc Allen Miller Intl Fort Lauderdale/Hollywood Intl Miami Intl Lihue Honolulu Intl Molokai Kapalua Kahului Lanai Keahole-Kona Intll Hilo Intll Rafael Hernandez Luis Munoz Marin Intl Cyril E King Eugenio Maria De Hostos Mercedita Alexander Hamilton Johnston Atoll Saipan Intl Rota Intl Guam Intll Babelthuap/Koror Pago Pago Intl West Tinian
  • V. Batagelj

Corrected network measures

slide-8
SLIDE 8

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Edges with the largest overlap

cut at 0.8

1 0.86 1 0.80 1 0.88 1 1 0.89 1 1 1 0.81 1 1 1 0.86 1 1 0.80 0.87 0.89 1 1 1 W i l e y P

  • s

t

  • W

i l l R

  • g

e r s M e m D e a d h

  • r

s e R a l p h W i e n M e m

  • r

i a l N

  • m

e S t M a r y ’ s A n i a k T u l u k s a k A k i a c h a k T u n t u t u l i a k K

  • n

g i g a n a k K w i g i l l i n g

  • k

D i l l i n g h a m K i n g S a l m

  • n

P

  • r

t H e i d e n S a n d P

  • i

n t G l a c i e r P a r k I n t l M i s s

  • u

l a I n t l l P u l l m a n / M

  • s

c

  • w

R e g i

  • n

a l H e l e n a R e g i

  • n

a l L e w i s t

  • n
  • N

e z P e r c e C

  • u

n t y B i l l i n g s L

  • g

a n I n t l G a l l a t i n F i e l d R

  • c

h e s t e r I n t l l L a C r

  • s

s e M u n i G r e a t e r R

  • c

h e s t e r I n t l S y r a c u s e H a n c

  • c

k I n t l G r e a t e r B u f f a l

  • I

n t l T

  • m

p k i n s C

  • u

n t y E l m i r a / C

  • r

n i n g R e g i

  • n

a l W i l l i a m s

  • n

C

  • u

n t y R e g i

  • n

a l R i c h m

  • n

d I n t l l C a p e G i r a r d e a u R e g i

  • n

a l N

  • r

f

  • l

k I n t l T u l s a I n t l D r a k e F i e l d W i l l R

  • g

e r s W

  • r

l d F

  • r

t S m i t h R e g i

  • n

a l S a n L u i s O b i s p

  • C
  • u

n t y

  • M

c C h e s S a n t a M a r i a P u b / C a p t G A l l a n H G r e g g C

  • u

n t y T y l e r P

  • u

n d s F i e l d C y r i l E K i n g A l e x a n d e r H a m i l t

  • n
  • V. Batagelj

Corrected network measures

slide-9
SLIDE 9

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Zoom in

Wiley Post-Will Rogers Mem Deadhorse Ralph Wien Memorial Fairbanks Intl Nome St Mary’s Aniak Anchorage Intl Tuluksak Akiachak Akiak Kwethluk Bethel Napaskiak Napakiak Merle K {Mudhole} Smith Tuntutuliak Eek Kongiganak Kwigillingok Quinhagak Yakutat Dillingham King Salmon Gustavus Juneau Intl Kodiak St Paul Island Sitka Port Heiden James A Johnson Petersburg Wrangell Ketchikan Intl Sand Point Cold Bay Unalaska
  • V. Batagelj

Corrected network measures

slide-10
SLIDE 10

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Zoom in

T u l u k s a k A k i a c h a k A k i a k K w e t h l u k B e t h e l N a p a s k i a k N a p a k i a k T u n t u t u l i a k E e k K

  • n

g i g a n a k K w i g i l l i n g

  • k

Q u i n h a g a k

  • V. Batagelj

Corrected network measures

slide-11
SLIDE 11

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Observation

From this example we see that in real-life networks edges with the largest overlap weight tend to be edges with relatively small degrees in their end-nodes. Because of this the overlap weight is not very useful for data analytic tasks in searching for important elements of a given network. We can try to improve the overlap weight definition to better suit the data analytic goals.

  • V. Batagelj

Corrected network measures

slide-12
SLIDE 12

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Corrected overlap weight

For this we introduce a quantity µ = max

e∈E t(e)

We define a corrected overlap weight as

  • ′(e) =

t(e) µ + M(e) − t(e) By the definiton of µ for every e ∈ E it holds t(e) ≤ µ. Since M(e) − t(e) ≥ 0 also µ + M(e) − t(e) ≥ µ and therefore 0 ≤ o′(e) ≤ 1. Also o′(e) = 0 exactly when t(e) = 0. But,

  • ′(e) = 1 exactly when µ = M(e) = t(e).
  • V. Batagelj

Corrected network measures

slide-13
SLIDE 13

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

US Airports links

with the largest corrected overlap weight, cut at 0.5

0.56 0.51 . 5 2 0.52 0.52 0.53 . 5 1 0.50 0.56 . 5 9 0.50 . 5 8 0.57 0.53 0.50 0.73 0.55 0.58 0.50 0.51 . 5 5 0.50 0.58 0.54 Minneapolis-St Paul Intl/Wold- Detroit Metropolitan Wayne Cou Chicago O’hare Intl Newark Intl Pittsburgh Intll Philadelphia Intl Stapleton Intl Baltimore-Washington Intl Cincinnati/Northern Kentucky I Lambert-St Louis Intl San Francisco Intl Nashville Intll Charlotte/Douglas Intl Los Angeles Intl The William B Hartsfield Atlan Phoenix Sky Harbor Intl Dallas/Fort Worth Intl Orlando Intl

µ = 80

  • V. Batagelj

Corrected network measures

slide-14
SLIDE 14

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

US Airports links

with the largest corrected overlap weight

u v t(e) d(u) d(v)

  • ’(e)

The WB Hartsfield Atlan Charlotte/Douglas Intl = 76 101 87 0.73077 The WB Hartsfield Atlan Dallas/Fort Worth Intl = 73 101 118 0.58871 Chicago O’hare Intl Pittsburgh Intll = 80 139 94 0.57971 Chicago O’hare Intl Lambert-St Louis Intl = 80 139 94 0.57971 Dallas/Fort Worth Intl Chicago O’hare Intl = 78 118 139 0.55714 The WB Hartsfield Atlan Chicago O’hare Intl = 77 101 139 0.54610

  • V. Batagelj

Corrected network measures

slide-15
SLIDE 15

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

US Airports links

  • ′( WB Hartsfield Atlanta, Charlotte/Douglas Intl ) = 0.7308

S e a t t l e

  • T

a c

  • m

a I n t l P

  • r

t l a n d I n t l M i n n e a p

  • l

i s

  • S

t P a u l I n t l / W

  • l

d

  • G

r e a t e r R

  • c

h e s t e r I n t l S y r a c u s e H a n c

  • c

k I n t l G e n e r a l M i t c h e l l I n t l l G r e a t e r B u f f a l

  • I

n t l A l b a n y C

  • u

n t y G e n e r a l E d w a r d L a w r e n c e L

  • g

a n D e t r

  • i

t M e t r

  • p
  • l

i t a n W a y n e C

  • u

C h i c a g

  • O

’ h a r e I n t l B r a d l e y I n t l C h i c a g

  • M

i d w a y T h e

  • d
  • r

e F r a n c i s G r e e n S t a t e T

  • l

e d

  • E

x p r e s s S t e w a r t I n t ’ l C l e v e l a n d

  • H
  • p

k i n s I n t l W i l k e s

  • B

a r r e / S c r a n t

  • n

I n t l E p p l e y A i r f i e l d F

  • r

t W a y n e I n t l l A k r

  • n
  • C

a n t

  • n

R e g i

  • n

a l L

  • n

g I s l a n d M a c A r t h u r S a l t L a k e C i t y I n t l L a G u a r d i a N e w a r k I n t l L e h i g h V a l l e y I n t l l J

  • h

n F K e n n e d y I n t l P i t t s b u r g h I n t l l H a r r i s b u r g I n t l l P

  • r

t C

  • l

u m b u s I n t l J a m e s M C

  • x

D a y t

  • n

I n t l P h i l a d e l p h i a I n t l S t a p l e t

  • n

I n t l I n d i a n a p

  • l

i s I n t l K a n s a s C i t y I n t l B a l t i m

  • r

e

  • W

a s h i n g t

  • n

I n t l C i n c i n n a t i / N

  • r

t h e r n K e n t u c k y I W a s h i n g t

  • n

D u l l e s I n t l W a s h i n g t

  • n

N a t i

  • n

a l L a m b e r t

  • S

t L

  • u

i s I n t l G u n n i s

  • n

C

  • u

n t y Y e a g e r L

  • u

i s v i l l e I n t l C h a r l

  • t

t e s v i l l e

  • A

l b e m a r l e B l u e G r a s s S a n F r a n c i s c

  • I

n t l R i c h m

  • n

d I n t l l R

  • a

n

  • k

e R e g i

  • n

a l / W

  • d

r u m F i e l d N

  • r

f

  • l

k I n t l T r i

  • C

i t i e s R e g i

  • n

a l T n / V a T u l s a I n t l N a s h v i l l e I n t l l P i e d m

  • n

t T r i a d I n t l l M c C a r r a n I n t l R a l e i g h

  • D

u r h a m I n t l l M c G h e e T y s

  • n

A s h e v i l l e R e g i

  • n

a l W i l l R

  • g

e r s W

  • r

l d K i n s t

  • n

R e g i

  • n

a l J e t p

  • r

t A t S t C h a r l

  • t

t e / D

  • u

g l a s I n t l M e m p h i s I n t l A l b u q u e r q u e I n t l L

  • v

e l l F i e l d F a y e t t e v i l l e R e g i

  • n

a l / G r a n n i s G r e e n v i l l e

  • S

p a r t a n b u r g A l b e r t J E l l i s A d a m s F i e l d H u n t s v i l l e I n t l

  • C

a r l T J

  • n

e s F N e w H a n

  • v

e r I n t l l O n t a r i

  • I

n t l L

  • s

A n g e l e s I n t l C

  • l

u m b i a M e t r

  • p
  • l

i t a n M y r t l e B e a c h I n t l J

  • h

n W a y n e A i r p

  • r

t

  • O

r a n g e C

  • u

n T h e W i l l i a m B H a r t s f i e l d A t l a n B i r m i n g h a m I n t l P h

  • e

n i x S k y H a r b

  • r

I n t l B u s h F i e l d C h a r l e s t

  • n

A f b / I n t l D a l l a s / F

  • r

t W

  • r

t h I n t l S a n D i e g

  • I

n t l

  • L

i n d b e r g h F l d C

  • l

u m b u s M e t r

  • p
  • l

i t a n M

  • n

r

  • e

R e g i

  • n

a l S h r e v e p

  • r

t R e g i

  • n

a l J a c k s

  • n

I n t l l D a n n e l l y F i e l d S a v a n n a h I n t l l E l P a s

  • I

n t l M

  • b

i l e R e g i

  • n

a l B a t

  • n

R

  • u

g e M e t r

  • p
  • l

i t a n , R y a n J a c k s

  • n

v i l l e I n t l P e n s a c

  • l

a R e g i

  • n

a l T a l l a h a s s e e R e g i

  • n

a l R

  • b

e r t M u e l l e r M u n i P a n a m a C i t y

  • B

a y C

  • I

n t l N e w O r l e a n s I n t l / M

  • i

s a n t F l d / H

  • u

s t

  • n

I n t e r c

  • n

t i n e n t a l G a i n e s v i l l e R e g i

  • n

a l W i l l i a m P H

  • b

b y S a n A n t

  • n

i

  • I

n t l D a y t

  • n

a B e a c h I n t l O r l a n d

  • I

n t l M e l b

  • u

r n e I n t l l T a m p a I n t l S a r a s

  • t

a / B r a d e n t

  • n

I n t l P a l m B e a c h I n t l S

  • u

t h w e s t F l

  • r

i d a I n t l F

  • r

t L a u d e r d a l e / H

  • l

l y w

  • d

I n t l M i a m i I n t l H

  • n
  • l

u l u I n t l L u i s M u n

  • z

M a r i n I n t l C y r i l E K i n g

  • V. Batagelj

Corrected network measures

slide-16
SLIDE 16

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6

Overlap weights

  • verlap

corrOverlap

  • V. Batagelj

Corrected network measures

slide-17
SLIDE 17

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison – minDeg(e)/maxDeg(e)

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Overlap weights

minDeg/maxDeg

  • verlap
  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6

Overlap weights

minDeg/maxDeg corrOverlap

  • V. Batagelj

Corrected network measures

slide-18
SLIDE 18

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison – maxDeg(e)

  • 20

40 60 80 100 120 140 0.0 0.2 0.4 0.6 0.8 1.0

Overlap weights

maxDeg

  • verlap
  • 20

40 60 80 100 120 140 0.0 0.2 0.4 0.6

Overlap weights

maxDeg corrOverlap

  • V. Batagelj

Corrected network measures

slide-19
SLIDE 19

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison – minDeg(e)

  • 20

40 60 80 100 120 0.0 0.2 0.4 0.6 0.8 1.0

Overlap weights

minDeg

  • verlap
  • 20

40 60 80 100 120 0.0 0.2 0.4 0.6

Overlap weights

minDeg corrOverlap

  • V. Batagelj

Corrected network measures

slide-20
SLIDE 20

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison – # of triangles

  • 20

40 60 80 0.0 0.2 0.4 0.6 0.8 1.0

Overlap weights

#triangles

  • verlap
  • 20

40 60 80 0.0 0.2 0.4 0.6

Overlap weights

#triangles corrOverlap

  • V. Batagelj

Corrected network measures

slide-21
SLIDE 21

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Clustering coefficient

For a node u ∈ V in an undirected simple graph G = (V, E) its clustering coefficient is measuring a local density in node u and is defined as cc(u) = |E(N(u))| |E(Kdeg(u))| = 2 · E(u) deg(u) · (deg(u) − 1), deg(u) > 1 where N(u) is the set of neighbors of node u. If deg(u) ≤ 1 then cc(u) = 0. It is easy to see that E(u) = 1 2

  • e∈S(u)

t(e) where S(u) is the star in node u. It holds 0 ≤ cc(u) ≤ 1. cc(u) = 1 exactly when E(N(u)) is isomorphic to Kdeg(u).

  • V. Batagelj

Corrected network measures

slide-22
SLIDE 22

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

US Airports links with clustering coefficient = 1

1 Wiley Post-Will Rogers Mem 28 Kwethluk 55 Kongiganak 2 Ralph Wien Memorial 29 Hector Intll 56 Bellingham Intl 3 Aniak 30 Tompkins County 57 La Crosse Muni 4 Toledo Express 31 Cape Girardeau Regional 58 Hilo Intll 5 Myrtle Beach Intl 32 Merced Municipal/Macready Fie 59 Rochester Intl 6 Rota Intl 33 King Salmon 60 Kapalua 7 Jack Mc Namara Field 34 Modesto City-County--Harry Sh 61 Lihue 8 Port Heiden 35 Natrona County Intl 62 Mc Allen Miller Intl 9 New Hanover Intll 36 Williamson County Regional 63 Rio Grande Valley Intl 10 Santa Maria Pub/Capt G Allan 37 Deadhorse 64 Eareckson As 11 Fayetteville Regional/Grannis 38 Nome 65 Corpus Christi Intl 12 Lovell Field 39 Akiak 66 St Petersburg/Clearwater Int 13 St Paul Island 40 Dillingham 67 Lehigh Valley Intll 14 Elmira/Corning Regional 41 Evansville Regional 68 Gainesville Regional 15 San Luis Obispo County-Mc Che 42 Charlottesville-Albemarle 69 Burlington Regional 16 Binghamton Regional/Edwin A L 43 Bishop Intll 70 Lafayette Regional 17 Fort Smith Regional 44 Gunnison County 71 Tuntutuliak 18 St Mary’s 45 Friedman Memorial 72 Tallahassee Regional 19 Asheville Regional 46 Aspen-Pitkin Co/Sardy Field 73 University Park 20 Molokai 47 Mbs Intll 74 Sand Point 21 Worcester Muni 48 Kwigillingok 75 Tyler Pounds Field 22 Drake Field 49 Minot Intl 76 Tweed-New Haven 23 Dubuque Regional 50 Pago Pago Intl 77 Gregg County 24 Tri-Cities Regional Tn/Va 51 Babelthuap/Koror 78 Wilkes-Barre/Scranton Intl 25 Monterey Peninsula 52 Decatur 79 Eastern Oregon Regional At 26 Detroit City 53 Quincy Muni Baldwin Field 80 Stewart Intl 27 Joplin Regional 54 Rafael Hernandez

Again we see that the clustering coefficient attains its largest value in nodes with relatively small degree. The probability that we get a complete subgraph on N(u) is decreasing fast with increasing of deg(u).

  • V. Batagelj

Corrected network measures

slide-23
SLIDE 23

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Corrected clustering coefficient

To get a corrected version of the clustering coefficient we proposed in Pajek to replace deg(u) in the denominator with ∆ = maxv∈V deg(v). In this paper we propose another solution – we replace deg(u) − 1 with µ: cc′(u) = 2 · E(u) µ · deg(u), deg(u) > 0 To show that 0 ≤ cc′(u) ≤ 1 we have to consider two cases:

  • a. deg(u) ≥ µ: then for v ∈ N(u) we have degN(u)(v) ≤ µ and therefore

2 · E(u) =

  • v∈N(u)

degN(u)(v) ≤

  • v∈N(u)

µ = µ · deg(u)

  • b. deg(u) < µ: then deg(u) − 1 ≤ µ and therefore

2 · E(u) ≤ deg(u) · (deg(u) − 1) ≤ µ · deg(u) The value cc′(u) = 1 is attained in the case a on a µ-core, and in the case b on Kµ+1.

  • V. Batagelj

Corrected network measures

slide-24
SLIDE 24

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

US Airports links

with the largest corrected clustering coefficient

Rank Value Id Rank Value Id

  • 1

0.3739 Cleveland-Hopkins Intl 26 0.2990 Minneapolis-St Paul Intl/Wold- 2 0.3700 General Edward Lawrence Logan 27 0.2956 General Mitchell Intll 3 0.3688 Orlando Intl 28 0.2942 Phoenix Sky Harbor Intl 4 0.3595 Tampa Intl 29 0.2935 Palm Beach Intl 5 0.3488 Cincinnati/Northern Kentucky I 30 0.2914 Charlotte/Douglas Intl 6 0.3457 Detroit Metropolitan Wayne Cou 31 0.2881 Memphis Intl 7 0.3455 Newark Intl 32 0.2859 Lambert-St Louis Intl 8 0.3429 Baltimore-Washington Intl 33 0.2847 San Diego Intl-Lindbergh Fld 9 0.3415 Miami Intl 34 0.2824 Pittsburgh Intll 10 0.3405 Washington National 35 0.2762 Stapleton Intl 11 0.3379 Nashville Intll 36 0.2724 Washington Dulles Intl 12 0.3359 John F Kennedy Intl 37 0.2661 Dallas/Fort Worth Intl 13 0.3347 Philadelphia Intl 38 0.2595 Raleigh-Durham Intll 14 0.3335 Indianapolis Intl 39 0.2541 Chicago O’hare Intl 15 0.3335 La Guardia 40 0.2489 San Francisco Intl 16 0.3311 Mc Carran Intl 41 0.2386 Greater Buffalo Intl 17 0.3301 Fort Lauderdale/Hollywood Intl 42 0.2295 John Wayne Airport-Orange Coun 18 0.3106 New Orleans Intl/Moisant Fld/ 43 0.2241 Seattle-Tacoma Intl 19 0.3095 Bradley Intl 44 0.2211 Sarasota/Bradenton Intl 20 0.3045 Port Columbus Intl 45 0.2207 Ontario Intl 21 0.3038 Los Angeles Intl 46 0.2175 Syracuse Hancock Intl 22 0.3036 Houston Intercontinental 47 0.2163 San Jose Intll 23 0.3036 Kansas City Intl 48 0.2158 Norfolk Intl 24 0.3017 Southwest Florida Intl 49 0.2144 Salt Lake City Intl 25 0.3002 The William B Hartsfield Atlan 50 0.2056 Greater Rochester Intl

  • V. Batagelj

Corrected network measures

slide-25
SLIDE 25

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Cleveland-Hopkins Intl neighbors

S e a t t l e

  • T

a c

  • m

a I n t l M i n n e a p

  • l

i s

  • S

t P a u l I n t l / W

  • l

d

  • G

e n e r a l M i t c h e l l I n t l l G r e a t e r B u f f a l

  • I

n t l G e n e r a l E d w a r d L a w r e n c e L

  • g

a n D e t r

  • i

t M e t r

  • p
  • l

i t a n W a y n e C

  • u

C h i c a g

  • O

’ h a r e I n t l B r a d l e y I n t l C h i c a g

  • M

i d w a y T h e

  • d
  • r

e F r a n c i s G r e e n S t a t e L a G u a r d i a N e w a r k I n t l J

  • h

n F K e n n e d y I n t l P i t t s b u r g h I n t l l Y a m p a V a l l e y P h i l a d e l p h i a I n t l S t a p l e t

  • n

I n t l I n d i a n a p

  • l

i s I n t l A t l a n t i c C i t y I n t l l K a n s a s C i t y I n t l B a l t i m

  • r

e

  • W

a s h i n g t

  • n

I n t l C i n c i n n a t i / N

  • r

t h e r n K e n t u c k y I W a s h i n g t

  • n

D u l l e s I n t l W a s h i n g t

  • n

N a t i

  • n

a l L a m b e r t

  • S

t L

  • u

i s I n t l L

  • u

i s v i l l e I n t l S a n F r a n c i s c

  • I

n t l N

  • r

f

  • l

k I n t l N a s h v i l l e I n t l l M c C a r r a n I n t l R a l e i g h

  • D

u r h a m I n t l l C h a r l

  • t

t e / D

  • u

g l a s I n t l L

  • s

A n g e l e s I n t l T h e W i l l i a m B H a r t s f i e l d A t l a n P h

  • e

n i x S k y H a r b

  • r

I n t l D a l l a s / F

  • r

t W

  • r

t h I n t l N e w O r l e a n s I n t l / M

  • i

s a n t F l d / H

  • u

s t

  • n

I n t e r c

  • n

t i n e n t a l O r l a n d

  • I

n t l T a m p a I n t l S a r a s

  • t

a / B r a d e n t

  • n

I n t l P a l m B e a c h I n t l S

  • u

t h w e s t F l

  • r

i d a I n t l F

  • r

t L a u d e r d a l e / H

  • l

l y w

  • d

I n t l M i a m i I n t l

  • V. Batagelj

Corrected network measures

slide-26
SLIDE 26

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison

  • 0.0

0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3

Clustering coefficients

clusCoef corrClusCoef

  • 20

40 60 80 100 120 140 200 400 600 800 1000 1200 1400

Clustering coefficients

deg #edges

  • V. Batagelj

Corrected network measures

slide-27
SLIDE 27

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Comparison – degrees

  • 20

40 60 80 100 120 140 0.0 0.2 0.4 0.6 0.8 1.0

Clustering coefficients

deg clusCoef

  • 20

40 60 80 100 120 140 0.0 0.1 0.2 0.3

Clustering coefficients

deg corrClusCoef

  • V. Batagelj

Corrected network measures

slide-28
SLIDE 28

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Conclusions

In the corrected measures we can replace µ with ∆. Its advantage is that it can be easier computed; but the corresponding measure is less ‘sensitive’.

  • V. Batagelj

Corrected network measures

slide-29
SLIDE 29

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

References I

  • P. W. Holland and S. Leinhardt (1971). ”Transitivity in structural

models of small groups”. Comparative Group Studies 2: 107–124. Onnela, J.P., Saramaki, J., Hyvonen, J., Szabo, G., Lazer, D., Kaski, K., Kertesz, J., Barabasi, A.L.: Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences 104(18), 7332 (2007) paper

  • D. J. Watts and Steven Strogatz (June 1998). ”Collective dynamics
  • f ’small-world’ networks”. Nature 393 (6684): 440–442.

Wikipedia: Clustering coefficient Wikipedia: Overlap coefficient

  • V. Batagelj

Corrected network measures

slide-30
SLIDE 30

Corrected network measures

  • V. Batagelj

Introduction Overlap weight Corrected

  • verlap weight

Clustering coefficient Corrected clustering coefficient Conclusions References

Acknowledgments

This work was supported in part by the Slovenian Research Agency (research program P1-0294 and research projects J5-5537 and J1-5433). The attendance on the CMStatistics (ERCIM) 2015 Conference was partially supported by the COST Action IC1408 – CRoNoS.

  • V. Batagelj

Corrected network measures