finding outstanding aspects and contrast subspaces
play

Finding Outstanding Aspects and Contrast Subspaces Jian Pei School - PowerPoint PPT Presentation

Finding Outstanding Aspects and Contrast Subspaces Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca CHIRC Computational Health Intelligence Research Centre Population health powered by big data


  1. Finding Outstanding Aspects and Contrast Subspaces Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca

  2. CHIRC • Computational Health Intelligence Research Centre – Population health powered by big data – Healthcare business intelligence – Predictive health analytics • A collaborative research initiative with industry leaders • Technology transferred to industry – Multi-million US dollars financial gain per year for industry partners J. Pei: Finding Outstanding Aspects and Contrast Subspaces 2

  3. In what aspect is he most similar to cases of coronary artery disease and, at the same time, dissimilar to adiposity? Symptoms: overweight, high blood pressure, back pain, short of breadth, chest pain, cold sweat … J. Pei: Finding Outstanding Aspects and Contrast Subspaces 5

  4. Fraud Suspect Analysis • An insurance analyst is investigating a suspicious claim • How is the claim compared with the normal and fraud claims? – In what aspects the suspicious case is most similar to fraudulent cases and different from normal claims? J. Pei: Finding Outstanding Aspects and Contrast Subspaces 6

  5. Don’t You Ever Google Yourself? • Big data makes one know oneself better • 57% American adults search themselves on Internet – Good news: those people are better paid than those who haven’t done so! (Investors.com) • Egocentric analysis becomes more and more important with big data J. Pei: Finding Outstanding Aspects and Contrast Subspaces 7

  6. Egocentric Analysis • How am I different from (more often than not, better than) others? • In what aspects am I good? J. Pei: Finding Outstanding Aspects and Contrast Subspaces 8

  7. Contrast Subspace Finding • Given a set of labeled objects in two classes • For a query object q that is also labeled, the contrast subspace is the one where q is most likely to belong to the target class against the other class J. Pei: Finding Outstanding Aspects and Contrast Subspaces 9

  8. Related Work • Finding patterns and models that manifest drastic differences from one class against the other – Example: emerging patterns • Subspace outlier detection – The query object may not be an outlier • Typicality queries do not consider subspaces J. Pei: Finding Outstanding Aspects and Contrast Subspaces 10

  9. Problem Formulation LC S ( q ) = L S ( q | O + ) • Find subspaces maximizing L S ( q | O − ) • To avoid triviality, consider only subspaces where L S ( q | O + ) ≥ δ J. Pei: Finding Outstanding Aspects and Contrast Subspaces 11

  10. Density Estimation • Density estimated by − distS ( q,o )2 1 L S ( q | O ) = ˆ X 2 h 2 f S ( q, O ) = e S √ | O | 2 π h S o ∈ O • Then, − distS ( q,o )2 2 h 2 P e S + ˆ = | O − | h S − f S ( q, O + ) o ∈ O + LC S ( q, O + , O − ) = · ˆ − distS ( q,o )2 | O + | h S + f S ( q, O − ) 2 h 2 P e S − o ∈ O − J. Pei: Finding Outstanding Aspects and Contrast Subspaces 12

  11. Complexity • MAX SNP-hard – Reduction from the emerging pattern mining problem • Impossible to design a good approximation algorithm J. Pei: Finding Outstanding Aspects and Contrast Subspaces 13

  12. A Monotonic Bound • is not monotonic in subspaces L S ( q | O + ) • Develop an upper bound of , which L S ( q | O + ) is monotonic in subspaces – Sort all the dimensions in their standard deviation descending order – Let be the set of children of S in the subspace set enumeration tree using the standard S deviation descending order � distS ( q,o )2 opt max )2 1 2( σ S h 0 – L ∗ S ( q | O + ) = P e √ | O + | 2 πσ 0 min h 0 opt min o ∈ O + – min = min { σ S 0 | S 0 ∈ S} , h 0 opt min = min { h S 0 opt | S 0 ∈ S} , and σ 0 opt max = max { h S 0 opt | S 0 ∈ S} h 0 J. Pei: Finding Outstanding Aspects and Contrast Subspaces 14

  13. Monotonic Bound For a query object q , a set of objects O , and subspaces S 1 , S 2 such that S 1 is an ancestor of S 2 in the subspace set enumeration tree using the standard deviation descending order in O + , L ∗ S 1 ( q | O + ) ≥ L S 2 ( q | O + ). Baseline algorithm time complexity: O (2 | D | · ( | O + | + | O − | )) J. Pei: Finding Outstanding Aspects and Contrast Subspaces 15

  14. Bounding Using Neighborhoods • Divide the neighborhood of an object into two parts and the S ( q ) = { o ∈ O | dist S ( q, o ) ≤ ✏ } N ✏ rest • Then, S ( q | O ) + L rest L S ( q | O ) = L N ✏ ( q | O ) S − distS ( q,o )2 1 2 h 2 S ( q | O ) = P L N ✏ e √ S | O | 2 π h S S ( q ) o ∈ N ✏ − distS ( q,o )2 1 2 h 2 L rest ( q | O ) = P e √ S S | O | 2 π h S o ∈ O \ N ✏ S ( q ) J. Pei: Finding Outstanding Aspects and Contrast Subspaces 16

  15. Bounding the Rest • Let be the maximum distance dist S ( q | O ) between q and all objects in O in subspace S − distS ( q,O )2 ✏ 2 | O | − | N ✏ S ( q ) | ( q | O ) ≤ | O | − | N ✏ S ( q ) | 2 h 2 − 2 h 2 ≤ L rest 2 π h S · e 2 π h S · e S S √ √ S | O | | O | J. Pei: Finding Outstanding Aspects and Contrast Subspaces 17

  16. Bounding For a query object q , a set of objects O and ✏ ≥ 0, S ( q | O ) ≤ L S ( q | O ) ≤ UL ✏ S ( q | O ) LL ✏ where 0 1 S ( q,o )2 − distS ( q,O )2 − dist ✏ 1 X 2 h 2 2 h 2 S ( q | O ) = + ( | O | − | N ✏ S ( q ) | ) e LL ✏ e √ S S @ A | O | 2 ⇡ h S o ∈ N ✏ S ( q ) and 0 1 S ( q,o )2 − dist ✏ ✏ 2 1 − X 2 h 2 2 h 2 S ( q | O ) = + ( | O | − | N ✏ S ( q ) | ) e UL ✏ e √ S S @ A | O | 2 ⇡ h S o ∈ N ✏ S ( q ) For a query object q , a set of objects O + , a set of objects O − , and ✏ ≥ 0, LC S ( q ) ≤ UL ✏ S ( q | O + ) S ( q | O − ) . LL ✏ J. Pei: Finding Outstanding Aspects and Contrast Subspaces 18

  17. Algorithm J. Pei: Finding Outstanding Aspects and Contrast Subspaces 19

  18. Dimensionality of Inlying Contrast Subspaces J. Pei: Finding Outstanding Aspects and Contrast Subspaces 20

  19. Dimensionality of Outlying Contrast Subspaces J. Pei: Finding Outstanding Aspects and Contrast Subspaces 21

  20. Runtime J. Pei: Finding Outstanding Aspects and Contrast Subspaces 22

  21. In Which Aspects Johnson Is Good? 30 25 Joe Points/game 20 15 10 4 5 30 0 0 1 2 3 4 3 Personal foul Personal foul 25 Joe Points/game 20 2 15 10 1 5 Joe 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Assist Assist J. Pei: Finding Outstanding Aspects and Contrast Subspaces 23

  22. Fraud Investigation • Given a set of claims in an insurance company • For a claim c, in which aspects c is most different from the other claims? J. Pei: Finding Outstanding Aspects and Contrast Subspaces 24

  23. Outlying/Outstanding Aspect Mining • Given a set of objects in a multi-dimensional space • For an object q, find the subspaces where q is most unusual compared to the rest of the data J. Pei: Finding Outstanding Aspects and Contrast Subspaces 25

  24. Differences from Outlier Detection • Outlier detection finds objects that are different from the rest of the data • The query object in outlying aspect finding may not be an outlier J. Pei: Finding Outstanding Aspects and Contrast Subspaces 26

  25. Problem Formulation • A set of objects O in full space D = { D 1 , . . . , D d } • Query object q • The density of q measures how outlying (uncommon) q is – Density estimation n n ✓ o − o i ◆ f h ( o ) = 1 K h ( o − o i ) = 1 ˆ X X K n nh h i =1 i =1 • Find a subspace where the density of q is lowest? J. Pei: Finding Outstanding Aspects and Contrast Subspaces 27

  26. Why Rank Statistics? • Densities in different subspaces are not comparable • We compare the same set of objects in different subspaces • Rank statistics rank S ( o ) = |{ o 0 | o 0 ∈ O, OutDeg ( o 0 ) < OutDeg ( o ) }| + 1 J. Pei: Finding Outstanding Aspects and Contrast Subspaces 28

  27. Unsupervised Problem Formulation Given a set of objects O in a multidimensional space D , a query object q 2 O and a maximum dimensionality threshold 0 < `  | D | , a subspace S ✓ D (0 < | S |  ` ) is called a minimal outlying subspace of q if 1. (Rank minimality) there does not exist another subspace S 0 ✓ D ( S 0 6 = ; ), such that rank S 0 ( q ) < rank S ( q ); and 2. (Subspace minimality) there does not exist another subspace S 00 ⇢ S such that rank S 00 ( q ) = rank S ( q ). The problem of outlying aspect mining is to find the minimal outlying subspaces of q . J. Pei: Finding Outstanding Aspects and Contrast Subspaces 29

  28. Density Estimation for Ranking ( q.Di − o.Di )2 P − 2 h 2 f S ( q ) ∼ ˜ ˆ X f S ( q ) = e Di ∈ S Di o ∈ O • Invariance Given a set of objects O in space S = { D 1 , . . . , D d } , define a linear transfor- mation g ( o ) = ( a 1 o.D 1 + b 1 , . . . , a d o.D d + b d ) for any o ∈ O , where a 1 , . . . , a d and b 1 , . . . , b d are real numbers. Let O 0 = { g ( o ) | o ∈ O } be the transformed data set. For any objects o 1 , o 2 ∈ O such that ˜ f S ( o 1 ) > ˜ f S ( o 2 ) in O , ˜ f S ( g ( o 1 )) > ˜ f S ( g ( o 2 )) if the product kernel is used and the bandwidths are set using H¨ ardle’s rule of thumb J. Pei: Finding Outstanding Aspects and Contrast Subspaces 30

  29. Algorithm Framework J. Pei: Finding Outstanding Aspects and Contrast Subspaces 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend