PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS
Author: Ludo Waltman and Erjia Yan Presenter: Erjia Yan
Boğaziçi University, Istanbul ISSI, June 29
PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: - - PowerPoint PPT Presentation
PAGERANK-RELATED METHODS FOR ANALYZING CITATION NETWORKS Author: Ludo Waltman and Erjia Yan Presenter: Erjia Yan Boazii University, Istanbul ISSI, June 29 Objectives understandings of PageRank applications of PageRank in
Boğaziçi University, Istanbul ISSI, June 29
Objectives | 2
NON-RECURSIVE
citations
publications
RECURSIVE
– AuthorRank (Liu et al., 2005) – Y-factor (Bollen et al., 2006) – CiteRank (Walker et al., 2007) – FutureRank (Sayyadi & Getoor, 2009) – Eigenfactor (Bergstrom & West, 2008) – SCImago (SCImago, 2007) – weighted PageRank (Ding, 2011; Yan & Ding, 2011) – …
A comparison | 3
NON-RECURSIVE RECURSIVE
A comparison | 4
– non-recursive methods take into account only the local structure of a citation network; thus, a citation originating from Nature or Science has the same weight as a citation
– using recursive methods to take into account the global structure of a citation network such that citations
than those originating from lowly cited nodes
Observations and motivations | 5
– the concept was first proposed by Pinski and Narin in 1976 (influence weight); PageRank was introduced as a method for ranking web pages by Brin and Page in 1998
– where α denotes the damping factor parameter, Bi denotes the set of all web pages that link to web page i, mj denotes the number of web pages to which web page j links, and n denotes the total number of web pages to be ranked.
Basics of PageRank | 6
n m p p
i
B j j j i
1 ) 1 ( α α − + = ∑
∈
– the larger the number of web pages that link to web page i, the higher the PageRank value of web page i – the higher the PageRank values of the web pages that link to web page i, the higher the PageRank value of web page i – for those web pages that link to web page i, the smaller the number of other web pages to which these web pages link, the higher the PageRank value of web page i – the closer the damping factor parameter α is set to 1, the stronger the above effects
PageRank meanings | 7
– 1: PageRank won’t converge – just below 1 (e.g., 0.9999): extremely sensitive to small changes in the network of links – 0.5: according to Chen et al. (2007), 0.5 is preferred for citation networks based on the assumption that authors on average will browse as far as two degrees of references (references and references’ cited references, thus 1-1/2=0.5) – 0.85: the default (coincide the “six degrees of separation”: 1-1/6≈0.85)
Damping factor | 8
Applications | 9
Tutorials | 10
Tools and materials | 11
Steps 1-5 | 12
– on Windows systems, a command such as copy *.txt merged_data.txt can be entered in the Command Prompt tool – in the resulting file, make sure to remove all lines ‘FN Thomson Reuters Web of Knowledge VR 1.0’ except for the first one and all lines ‘EF’ except for the last one
– change the extension of the text file that contains your bibliographic data from .txt into .isi.
Steps 6-7 | 13
Steps 8-9 | 14
Steps 10-12 | 15
Step 13 | 16
Steps 14-19 | 17
Step 19 | 18 function p = calc_PageRank(C, alpha, n_iterations) % Take care of dangling nodes. m = sum(C, 2); C(m == 0, :) = 1; % Create a row-normalized matrix. n = length(C); m = sum(C, 2); C = spdiags(1 ./ m, 0, n, n) * C; % Apply the power method. p = repmat(1 / n, [1 n]); for i = 1:n_iterations p = alpha * p * C + (1 - alpha) / n; end
Steps 20-21 | 19
The resulted PageRank scores for the journals
Other citation network types | 20
Thank you | 21