nalysis in bibliometrics ne network rk ana Lovro ubelj - - PowerPoint PPT Presentation

nalysis in bibliometrics
SMART_READER_LITE
LIVE PREVIEW

nalysis in bibliometrics ne network rk ana Lovro ubelj - - PowerPoint PPT Presentation

nalysis in bibliometrics ne network rk ana Lovro ubelj University of Ljubljana, Faculty of Computer and Information Science CWTS 17 ovenia chicken Sl Slov Pannonian flat like NL :) Alps 2864 m Ljubljana karst seaside


slide-1
SLIDE 1

ne network rk ana nalysis in bibliometrics

Lovro Šubelj

University of Ljubljana, Faculty of Computer and Information Science CWTS ‘17

slide-2
SLIDE 2

Sl Slov

  • venia “chicken”

Ljubljana Alps ≤ 2864 m seaside < 50 km :( karst caves & wine Pannonian flat like NL :)

slide-3
SLIDE 3

University of Lj

Ljubljana

  • since 1919 271st in CWTS Leiden Ranking 2017
  • 26 members 23 faculties & 3 academies
  • 40,110 students & 5,730 staff in 2016
slide-4
SLIDE 4

Faculty of Co Computer and Information Science

  • since 1996 cs study since 1973
  • ≈1,300 students & ≈180 staff
  • BSc, MSc, PhD cs, prog, math, mm
  • research cs, db, is, dm, ml, ai, nets
slide-5
SLIDE 5

ne networks courses

slide-6
SLIDE 6

talk ou

  • utline
  • 1. reliability of bibliographic databases

Šubelj, L., Fiala, D., & Bajec, M. (2014). Scientific Reports, 4, 6496. Šubelj, L., Bajec, M., Boshkoska, B. M., et al. (2015). PLoS ONE, 10(5), e0127390.

  • 2. modeling paper citation networks

Šubelj, L., & Bajec, M. (2013). In Proceedings of the LSNA ‘13, pp. 527–530. Šubelj, L., Žitnik, S., & Bajec, M. (2014). In Proceedings of the NetSci ’14, p. 1.

  • 3. clustering paper citation networks

Šubelj, L., Van Eck, N. J., & Waltman, L. (2016).PLoS ONE, 11(4), e0154404.

slide-7
SLIDE 7

bibliographic databases re reliability

  • databases basis for research & evaluation
  • databases can differ substantially

different databases often give quite different conclusions

  • content & structure can differ substantially

coverage, timespan, features, accuracy, acquisition etc.

  • only informal notions on their reliability

particular case of reliability of structure of citation networks

slide-8
SLIDE 8

structure of ci citation networks

  • statisticsof citation networks
  • mostly consistent with outliers
  • utliers due to data acquisition in most cases
  • comparison over one statistic
  • comparison over many statistics?

same problem in machine learning community

slide-9
SLIDE 9

me methodology of database comparison

  • network statistics— residuals — database rank
  • mean ranks of databases over many statistics
  • residuals since “true database” is not known

database reliability seen as consistency with other databases

Studentized statistics residuals ˆ xij Two-tailed Student statistics t-tests H0 : ˆ xij = 0 at P -value = 0.1 Student t-distribution with d.f. N − 2 ∃ρij : H1 ∀ˆ xij : H0 Pairwise Spearman correlations ρij Two-tailed Fisher independence z-tests H0 : ρij = 0 at P -value = 0.01 Standard normal distribution ∀ρij : H0 ∃ˆ xij : H1 Residuals mean ranks Ri One-tailed Friedman rank test H0 : Ri = Rj at P -value = 0.1 χ2-distribution with d.f. N − 1 H0 H1 Residuals mean ranks Ri Two-tailed Nemenyi post-hoc test H0 : Ri = Rj at P -value = 0.1 Studentized range with d.f. N25 H0

1 2 3 4

slide-10
SLIDE 10

comparison of ci citation networks

  • comparison of different citation networks

results robust to selection of networks, statistics, patterns etc.

  • comparison of different information networks

P -value = 0.1

1 2 3 4 5 6

WoS Cora arXiv APS PubMed DBLP

A P→P

slide-11
SLIDE 11

comparison of bi bibl bliographi phic ne networks

  • A paper citation networks information networks
  • C author collaboration networks social networks
  • B author citationnetworks social-information networks

P -value = 0.1

1 2 3 4 5 6

WoS Cora arXiv APS PubMed DBLP P -value = 0.1

1 2 3 4 5 6

Cora arXiv WoS PubMed DBLP APS

A P→P B A↔A

P -value = 0.1

1 2 3 4 5 6

DBLP WoS Cora APS PubMed arXiv

C A−A

A B C

there is no “best” database!

slide-12
SLIDE 12

talk ou

  • utline
  • 1. reliability of bibliographic databases

Šubelj, L., Fiala, D., & Bajec, M. (2014). Scientific Reports, 4, 6496. Šubelj, L., Bajec, M., Boshkoska, B. M., et al. (2015). PLoS ONE, 10(5), e0127390.

  • 2. modeling paper citation networks

Šubelj, L., & Bajec, M. (2013). In Proceedings of the LSNA ‘13, pp. 527–530. Šubelj, L., Žitnik, S., & Bajec, M. (2014). In Proceedings of the NetSci ’14, p. 1.

  • 3. clustering paper citation networks

Šubelj, L., Van Eck, N. J., & Waltman, L. (2016).PLoS ONE, 11(4), e0154404.

slide-13
SLIDE 13

models of ci citation networks

  • generative models of citation networks

to reason about structure, evolution, dynamics, future etc.

  • many possible applications in bibliometrics

y z a x i y z a x i y z a x i

slide-14
SLIDE 14

fo forest fire network model

  • each new node i forms links as follows
  • 1. i selects initial ambassador a and links to a
  • 2. i selects its neighbors y, z and links to y, z
  • 3. y, z are taken as new ambassadorsof i

y z a x i w v y z a x i w v

slide-15
SLIDE 15

forest fire ci citation model

  • each new paper i cites as follows
  • 1. i selects initial paper a and cites a
  • 2. i selects its references y, z and cites y, z
  • 3. y, z are taken as new reading for i
  • then authors read all cited papers and vice-versa
  • only ≈20% references read (Simkin & Roychowdhury, 2003)

y z a x i w v y z a x i w v

slide-16
SLIDE 16

realistic ci citation model

  • each new paper i cites as follows
  • 1. i selects initial paper a and can cite a
  • 2. i selects its references y, z and can cite y, z
  • 3. some referencesare taken as new reading for i
  • read & cited papers modeled independently

y z a x i w v y z a x i w v

slide-17
SLIDE 17

directed ci citation model

  • directed dynamics much more complicated
  • model reproduces WoS citation networks
  • clear optima (peak) in model parameters
slide-18
SLIDE 18

im implic licat atio ions of citation model

  • ne read paper ≈

five two cited papers!

slide-19
SLIDE 19

talk ou

  • utline
  • 1. reliability of bibliographic databases

Šubelj, L., Fiala, D., & Bajec, M. (2014). Scientific Reports, 4, 6496. Šubelj, L., Bajec, M., Boshkoska, B. M., et al. (2015). PLoS ONE, 10(5), e0127390.

  • 2. modeling paper citation networks

Šubelj, L., & Bajec, M. (2013). In Proceedings of the LSNA ‘13, pp. 527–530. Šubelj, L., Žitnik, S., & Bajec, M. (2014). In Proceedings of the NetSci ’14, p. 1.

  • 3. clustering paper citation networks

Šubelj, L., Van Eck, N. J., & Waltman, L. (2016).PLoS ONE, 11(4), e0154404.

slide-20
SLIDE 20

cl clustering citation networks

  • clustering papers based
  • n direct citation relations

research areas or topics of papers

  • systematic comparison of

large number of methods

network clustering and partitioning

there is no “best” method!

slide-21
SLIDE 21

thank you!

network convexity

LCN2 seminar next Friday at 4pm in Snellius