Inference, aggregation and graphics for top- k rank lists Michael G. - PowerPoint PPT Presentation

Inference, aggregation and graphics for top- k rank lists Michael G. Schimek 1 a 2 Shili Lin 3 Eva Budinsk´ a 4 Alena Myˇ siˇ ckov´ 1 Medical University of Graz and Danube University Krems, Austria 2 Swiss Institute of Bioinformatics, Lausanne, Switzerland 3 Ohio State University, Columbus, USA 4 Humboldt University, Berlin, Germany useR! 2009, Rennes, France, July 8-10, 2009 university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Motivation In various fields of application we are confronted with lists of distinct objects in rank order The ordering might be due to a measure of strength of evidence or to an assessment based on expert knowledge or a technical device The ranking might also represent some measurement taken on the objects which might not be comparable across the lists, for instance, because of different assessment technologies or levels of measurement error Our aim is to consolidate such lists of common objects to provide computationally tractable solutions , hence appropriate algorithms and graphs university-logo to develop an R package named TopkLists M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

General assumptions Let us assume ℓ assessors or laboratories ( j = 1 , 2 , . . . , ℓ ) assigning rank positions to the same set of N distinct objects Assessment of N distinct objects according to the extent to which a particular attribute is present All assessors, independently of each other, rank the same objects between 1 and N on the basis of relative performance The ranking is from 1 to N , without ties Missing assessments are allowed The ℓ assessors produce ℓ ranked lists τ j There are ( ℓ 2 − ℓ ) / 2 possible pairs of such lists τ j university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

The problem Our overall goal is to identify a subset of objects that is characterized by high conformity across the lists It is implied that there is similarity between the rankings which can be evaluated by a distance measure d (a permutation metric) Such measures are Kendall’s τ Spearman’s footrule In practice we have truncated lists and incomplete rankings of objects in some or all of the lists caused by missing assignments Because of that penalized distance measures are required university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

The problem continued In most applications, especially for large or huge numbers N of objects, it is unlikely that consensus prevails As result only the top-ranked objects matter (the remainder ones show random ordering) Quite often we observe a general decrease, not necessarily monotone, of the probability for consensus rankings with increasing distance from the top rank position Typically there is reasonable conformity in the rankings for the first, say k , elements of the lists This motivates the notion of top- k rank lists as known from information retrieval literature Important application field : Integration and meta analysis of gene expression data (microarray experiments) university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Computational aspects and algorithms List aggregation by means of brute force is limited to the situation where N is very small ℓ is very small the k ’s are equal and a priori known Our purpose is to solve this computational problem for a realistic setting There are 3 subtasks respectively algorithms : Selection of the ˆ k ’s for all possible pairs of lists τ j 1 Integration of partial information from the pairs of lists via a 2 graphical tool Calculation of a set of objects characterized by rankings of 3 high conformity across the lists up to some global index ¯ k university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Selection of the ˆ k ’s Moderate deviation-based inference for random degeneration in paired rank lists (Hall and Schimek, 2009) For the estimation of the point of degeneration j 0 into noise independent Bernoulli random variables are assumed A general decrease of the probability p j (need not be monotone) for concordance of rankings with increasing distance j from the top rank is assumed Several tuning parameters ( δ, ν, . . . ) are required to account for the closeness of the assessors’ rankings and the degree of randomness in the assignments The algorithm represents a simplified mathematical model; It is embedded in an iterative scheme to account for irregular rankings university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Graphical integration of paired ranked lists Define a partial reference list L 0 1 ; anyone of the 2 lists with max j (ˆ k j ) objects among all pairwise comparisons L 0 1 gives the ordering of the objects O i in the heatmap and defines the vertical axis 1 ’s highest ranking { max j (ˆ Take L 0 k j ) + δ } objects O i The partial lists L 2 , L 3 , . . . , L ℓ are ordered from highest to lowest by their individual k j when compared to the reference list L 0 1 (one column per list) In each cell we represent: (1) top- k membership , ’ yes ’ is denoted by color ’grey’ and ’ no ’ by ’white’, (2) distance of a current object O i ∈ L 0 1 from its position in the other list, color scale from ’red’ identical to ’yellow’ far distant (integer value denotes distance with negative sign if to the left, and positive sign if to the right) university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Calculation of a set of highly conforming objects Cross-entropy Monte Carlo (CEMC) for consolidation of top- k objects (Lin and Ding, 2009) Assume a random matrix X and a corresponding probability matrix p Given the probability mass function P v ( x ) , any realization x of X uniquely determines the corresponding top- k candidate list without reference to the probability matrix p Stochastic search to find an ordering x ∗ that corresponds to an optimal τ ∗ satisfying the minimization criterion Iterative CEMC algorithm in two steps : (i) simulation step in which random samples from P v ( x ) are drawn, (ii) update step for improved samples increasingly concentrating around an x ∗ (correspond to optimal τ ∗ ) university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Graphics tool example: top- k integration of 5 gene expression lists ( N = 120 , ˆ k j ∈ [ 20 , 38 ]) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 L3 L1 L5 L2 L4 L1 L5 L2 L4 L4 L2 L5 L2 L5 1 19 5 18 2 6 25 19 0 2 18 96 12 25 19 2 22 16 4 16 17 16 1 41 18 7 4 7 15 6 88 1 4 1 44 6 7 −2 17 16 3 10 1 25 19 15 5 2 −2 −2 12 8 3 0 1 4 7 21 10 23 7 13 7 9 6 72 1 −3 3 30 5 12 14 15 11 6 6 48 4 8 91 19 13 −3 1 14 71 5 7 3 8 7 29 4 28 −2 1 −4 12 −3 7 13 11 19 10 66 9 11 71 30 9 82 −3 27 −1 8 3 9 11 5 11 NA 2 71 81 1 6 6 6 38 9 20 4 2 1 10 11 −3 25 −5 −2 8 2 20 7 1 −1 37 4 80 11 2 3 4 4 −2 23 −6 −7 7 13 25 3 20 NA −10 −9 31 8 6 0 2 4 −10 6 12 16 12 21 12 −8 −2 64 −7 3 0 33 7 −12 3 13 14 17 24 17 −6 24 3 76 1 1 −5 −2 4 −4 14 9 2 12 6 15 17 −2 1 −2 31 13 −1 21 −4 4 −5 75 2 0 −5 −11 −12 2 −9 −8 0 −8 −9 −14 16 23 5 5 21 NA −9 −15 −12 −16 3 −4 13 −5 21 17 15 24 8 9 −9 72 −8 −3 −12 0 1 −14 −13 −12 18 4 22 23 22 19 5 −3 −12 −11 −3 7 −10 −13 −17 22 −1 −13 3 −2 20 8 −10 −8 10 −3 20 NA −9 60 16 23 −17 24 −19 79 NA 4 23 70 28 NA 6 24 NA 21 34 91 62 38 8 41 45 60 50 NA NA 78 46 22 45 30 103 114 NA NA 38 51 77 NA NA 48 23 65 103 28 24 31 7 7 72 8 111 NA 51 72 NA 63 25 87 15 −2 35 NA 110 73 8 71 31 57 32 2 0 34 16 47 62 26 44 53 84 7 6 NA 37 12 NA 10 27 59 40 58 29 20 46 45 NA NA 44 28 37 106 30 29 72 53 58 NA 55 66 60 119 NA 47 NA 9 33 33 37 24 14 30 51 45 64 39 NA NA 0 65 62 3 31 68 31 47 NA 25 33 NA NA 64 −1 32 66 75 31 33 102 NA NA 45 99 19 25 33 62 34 99 −1 18 24 59 −1 NA 55 55 university-logo 42 35 27 35 113 61 NA 36 95 6 37 74 38 119 42 M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Inference, aggregation and graphics for top- k rank lists Michael G. - PowerPoint PPT Presentation

Inference, aggregation and graphics for top- k rank lists Michael G. Schimek 1 a 2 Shili Lin 3 Eva Budinsk a 4 Alena My si ckov 1 Medical University of Graz and Danube University Krems, Austria 2 Swiss Institute of Bioinformatics,

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

Nested Lists Nested Lists Lists can hold any object Lists are themselves objects

Part 16: Group Recommender Systems Rank Aggregation and Balancing Techniques Francesco Ricci

More lists Readings: HtDP , sections 11, 12, 13 (Intermezzo 2). Topics: Sorting a list List

csci 210: Data Structures Linked lists Summary Today linked lists single-linked

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Using Lists and Tables Student Web Presence Guidelines Summary 1. Purpose of lists 2. Using

Chapter 4: (Pointers and) Linked Lists Pointer variables Operations on pointer variables

3.1. Lists Chapter 3 Linear Structures: Lists Definition A list is an element collection with

CS 61A Lecture 10 Announcements Lists ['Demo'] Working with Lists 4 Working with Lists

Containers Announcements Lists ['Demo'] Working with Lists 4 Working with Lists >>>

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Course : Data mining Topic : Rank aggregation Aristides Gionis Aalto University Department of

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &

5/18/2015 City of Florence Neighborhood Redevelopment Strategy South Carolina Community

Formal Design of Composite Physically Unclonable Function Durga Prasad Sahoo Debdeep

Some RNN Variants Arun Mallya Best viewed with Computer Modern fonts installed Outline

Atmospheric Neutrino Fluxes: The use of muon fluxes to Improve the Accuracy in Low Energies. May,

Extreme scale matrix factorizations in Exploration Seismology Felix J. Herrmann SLIM Georgia

FY 2015 Regional CoC Debriefing Norm Suchar Director, Office of Special Needs Assistance

ARCHER Training Courses Sponsors Reusing this material This work is licensed under a Creative

Overview of Component SPARS-J Search System SPARS-J Outline System architecture Ranking method

Inference, aggregation and graphics for top- k rank lists Michael G. - PowerPoint PPT Presentation

Inference, aggregation and graphics for top- k rank lists Michael G. Schimek 1 a 2 Shili Lin 3 Eva Budinsk a 4 Alena My si ckov 1 Medical University of Graz and Danube University Krems, Austria 2 Swiss Institute of Bioinformatics,

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Graphics Murray Cole Graphics 1 Graphics 2 Graphics 3 Graphics 4 Graphics 5 Graphics 6

Nested Lists Nested Lists Lists can hold any object Lists are themselves objects

Part 16: Group Recommender Systems Rank Aggregation and Balancing Techniques Francesco Ricci

More lists Readings: HtDP , sections 11, 12, 13 (Intermezzo 2). Topics: Sorting a list List

csci 210: Data Structures Linked lists Summary Today linked lists single-linked

CS378 - Mobile Computing 3D Graphics 2D Graphics android.graphics library for 2D graphics

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Using Lists and Tables Student Web Presence Guidelines Summary 1. Purpose of lists 2. Using

Chapter 4: (Pointers and) Linked Lists Pointer variables Operations on pointer variables

3.1. Lists Chapter 3 Linear Structures: Lists Definition A list is an element collection with

CS 61A Lecture 10 Announcements Lists ['Demo'] Working with Lists 4 Working with Lists

Containers Announcements Lists ['Demo'] Working with Lists 4 Working with Lists &gt;&gt;&gt;

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Course : Data mining Topic : Rank aggregation Aristides Gionis Aalto University Department of

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &amp;

5/18/2015 City of Florence Neighborhood Redevelopment Strategy South Carolina Community

Formal Design of Composite Physically Unclonable Function Durga Prasad Sahoo Debdeep

Some RNN Variants Arun Mallya Best viewed with Computer Modern fonts installed Outline

Atmospheric Neutrino Fluxes: The use of muon fluxes to Improve the Accuracy in Low Energies. May,

Extreme scale matrix factorizations in Exploration Seismology Felix J. Herrmann SLIM Georgia

FY 2015 Regional CoC Debriefing Norm Suchar Director, Office of Special Needs Assistance

ARCHER Training Courses Sponsors Reusing this material This work is licensed under a Creative

Overview of Component SPARS-J Search System SPARS-J Outline System architecture Ranking method

Containers Announcements Lists ['Demo'] Working with Lists 4 Working with Lists >>>

Supervised Rank Aggregation Approach for Link Prediction in Complex Networks Manisha Pujari &