inference aggregation and graphics for top k rank lists
play

Inference, aggregation and graphics for top- k rank lists Michael G. - PowerPoint PPT Presentation

Inference, aggregation and graphics for top- k rank lists Michael G. Schimek 1 a 2 Shili Lin 3 Eva Budinsk a 4 Alena My si ckov 1 Medical University of Graz and Danube University Krems, Austria 2 Swiss Institute of Bioinformatics,


  1. Inference, aggregation and graphics for top- k rank lists Michael G. Schimek 1 a 2 Shili Lin 3 Eva Budinsk´ a 4 Alena Myˇ siˇ ckov´ 1 Medical University of Graz and Danube University Krems, Austria 2 Swiss Institute of Bioinformatics, Lausanne, Switzerland 3 Ohio State University, Columbus, USA 4 Humboldt University, Berlin, Germany useR! 2009, Rennes, France, July 8-10, 2009 university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  2. Motivation In various fields of application we are confronted with lists of distinct objects in rank order The ordering might be due to a measure of strength of evidence or to an assessment based on expert knowledge or a technical device The ranking might also represent some measurement taken on the objects which might not be comparable across the lists, for instance, because of different assessment technologies or levels of measurement error Our aim is to consolidate such lists of common objects to provide computationally tractable solutions , hence appropriate algorithms and graphs university-logo to develop an R package named TopkLists M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  3. General assumptions Let us assume ℓ assessors or laboratories ( j = 1 , 2 , . . . , ℓ ) assigning rank positions to the same set of N distinct objects Assessment of N distinct objects according to the extent to which a particular attribute is present All assessors, independently of each other, rank the same objects between 1 and N on the basis of relative performance The ranking is from 1 to N , without ties Missing assessments are allowed The ℓ assessors produce ℓ ranked lists τ j There are ( ℓ 2 − ℓ ) / 2 possible pairs of such lists τ j university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  4. The problem Our overall goal is to identify a subset of objects that is characterized by high conformity across the lists It is implied that there is similarity between the rankings which can be evaluated by a distance measure d (a permutation metric) Such measures are Kendall’s τ Spearman’s footrule In practice we have truncated lists and incomplete rankings of objects in some or all of the lists caused by missing assignments Because of that penalized distance measures are required university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  5. The problem continued In most applications, especially for large or huge numbers N of objects, it is unlikely that consensus prevails As result only the top-ranked objects matter (the remainder ones show random ordering) Quite often we observe a general decrease, not necessarily monotone, of the probability for consensus rankings with increasing distance from the top rank position Typically there is reasonable conformity in the rankings for the first, say k , elements of the lists This motivates the notion of top- k rank lists as known from information retrieval literature Important application field : Integration and meta analysis of gene expression data (microarray experiments) university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  6. Computational aspects and algorithms List aggregation by means of brute force is limited to the situation where N is very small ℓ is very small the k ’s are equal and a priori known Our purpose is to solve this computational problem for a realistic setting There are 3 subtasks respectively algorithms : Selection of the ˆ k ’s for all possible pairs of lists τ j 1 Integration of partial information from the pairs of lists via a 2 graphical tool Calculation of a set of objects characterized by rankings of 3 high conformity across the lists up to some global index ¯ k university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  7. Selection of the ˆ k ’s Moderate deviation-based inference for random degeneration in paired rank lists (Hall and Schimek, 2009) For the estimation of the point of degeneration j 0 into noise independent Bernoulli random variables are assumed A general decrease of the probability p j (need not be monotone) for concordance of rankings with increasing distance j from the top rank is assumed Several tuning parameters ( δ, ν, . . . ) are required to account for the closeness of the assessors’ rankings and the degree of randomness in the assignments The algorithm represents a simplified mathematical model; It is embedded in an iterative scheme to account for irregular rankings university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  8. Graphical integration of paired ranked lists Define a partial reference list L 0 1 ; anyone of the 2 lists with max j (ˆ k j ) objects among all pairwise comparisons L 0 1 gives the ordering of the objects O i in the heatmap and defines the vertical axis 1 ’s highest ranking { max j (ˆ Take L 0 k j ) + δ } objects O i The partial lists L 2 , L 3 , . . . , L ℓ are ordered from highest to lowest by their individual k j when compared to the reference list L 0 1 (one column per list) In each cell we represent: (1) top- k membership , ’ yes ’ is denoted by color ’grey’ and ’ no ’ by ’white’, (2) distance of a current object O i ∈ L 0 1 from its position in the other list, color scale from ’red’ identical to ’yellow’ far distant (integer value denotes distance with negative sign if to the left, and positive sign if to the right) university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  9. Calculation of a set of highly conforming objects Cross-entropy Monte Carlo (CEMC) for consolidation of top- k objects (Lin and Ding, 2009) Assume a random matrix X and a corresponding probability matrix p Given the probability mass function P v ( x ) , any realization x of X uniquely determines the corresponding top- k candidate list without reference to the probability matrix p Stochastic search to find an ordering x ∗ that corresponds to an optimal τ ∗ satisfying the minimization criterion Iterative CEMC algorithm in two steps : (i) simulation step in which random samples from P v ( x ) are drawn, (ii) update step for improved samples increasingly concentrating around an x ∗ (correspond to optimal τ ∗ ) university-logo M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

  10. Graphics tool example: top- k integration of 5 gene expression lists ( N = 120 , ˆ k j ∈ [ 20 , 38 ]) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 L3 L1 L5 L2 L4 L1 L5 L2 L4 L4 L2 L5 L2 L5 1 19 5 18 2 6 25 19 0 2 18 96 12 25 19 2 22 16 4 16 17 16 1 41 18 7 4 7 15 6 88 1 4 1 44 6 7 −2 17 16 3 10 1 25 19 15 5 2 −2 −2 12 8 3 0 1 4 7 21 10 23 7 13 7 9 6 72 1 −3 3 30 5 12 14 15 11 6 6 48 4 8 91 19 13 −3 1 14 71 5 7 3 8 7 29 4 28 −2 1 −4 12 −3 7 13 11 19 10 66 9 11 71 30 9 82 −3 27 −1 8 3 9 11 5 11 NA 2 71 81 1 6 6 6 38 9 20 4 2 1 10 11 −3 25 −5 −2 8 2 20 7 1 −1 37 4 80 11 2 3 4 4 −2 23 −6 −7 7 13 25 3 20 NA −10 −9 31 8 6 0 2 4 −10 6 12 16 12 21 12 −8 −2 64 −7 3 0 33 7 −12 3 13 14 17 24 17 −6 24 3 76 1 1 −5 −2 4 −4 14 9 2 12 6 15 17 −2 1 −2 31 13 −1 21 −4 4 −5 75 2 0 −5 −11 −12 2 −9 −8 0 −8 −9 −14 16 23 5 5 21 NA −9 −15 −12 −16 3 −4 13 −5 21 17 15 24 8 9 −9 72 −8 −3 −12 0 1 −14 −13 −12 18 4 22 23 22 19 5 −3 −12 −11 −3 7 −10 −13 −17 22 −1 −13 3 −2 20 8 −10 −8 10 −3 20 NA −9 60 16 23 −17 24 −19 79 NA 4 23 70 28 NA 6 24 NA 21 34 91 62 38 8 41 45 60 50 NA NA 78 46 22 45 30 103 114 NA NA 38 51 77 NA NA 48 23 65 103 28 24 31 7 7 72 8 111 NA 51 72 NA 63 25 87 15 −2 35 NA 110 73 8 71 31 57 32 2 0 34 16 47 62 26 44 53 84 7 6 NA 37 12 NA 10 27 59 40 58 29 20 46 45 NA NA 44 28 37 106 30 29 72 53 58 NA 55 66 60 119 NA 47 NA 9 33 33 37 24 14 30 51 45 64 39 NA NA 0 65 62 3 31 68 31 47 NA 25 33 NA NA 64 −1 32 66 75 31 33 102 NA NA 45 99 19 25 33 62 34 99 −1 18 24 59 −1 NA 55 55 university-logo 42 35 27 35 113 61 NA 36 95 6 37 74 38 119 42 M. G. Schimek et al. Inference, aggregation and graphics for top- k rank lists

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend