jointly modeling relevance and sensitivity for search
play

Jointly Modeling Relevance and Sensitivity for Search Among - PowerPoint PPT Presentation

Jointly Modeling Relevance and Sensitivity for Search Among Sensitive Content Mahmoud F. Sayed , Douglas W. Oard 2 Image credit: HITEC Dubai 10,045 FOIA requests ~ 30k work-related emails 3 E-Discovery Requesting Party Responding Party


  1. Jointly Modeling Relevance and Sensitivity for Search Among Sensitive Content Mahmoud F. Sayed , Douglas W. Oard

  2. 2 Image credit: HITEC Dubai

  3. 10,045 FOIA requests ~ 30k work-related emails 3

  4. E-Discovery Requesting Party Responding Party 1. Formulation 2. Acquisition 3. Review for Relevance 4. Review for 5. Analysis Privilege ~ 75% total cost ~ 1 month 4

  5. Motivation ● Review is expensive ○ Hiring law firms ● Review is time-consuming ○ Long elapsed time between request and its response ○ Not effective access to information ● Objective is to build “Search and Protection Engines” ○ Protect sensitive content Learning to Rank ○ Still retrieve relevant content ○ Affordable ○ Fast Automatic Sensitivity Classification 5

  6. Proposed Approaches Prefilter Documents Result Filter Ranker Query Sensitivity Classifier Postfilter Documents Result Ranker Filter Query Sensitivity Classifier 6

  7. How to evaluate such approaches? 7

  8. Discounted Cumulative Gain (DCG) Highly Relevant Somewhat Relevant Not Relevant Retrieved +3 +1 0 Not Retrieved 0 0 0 Highly Relevant Somewhat Relevant Not Relevant DCG 5 = 5.7 8

  9. Cost-Sensitive DCG (CS-DCG) Highly Relevant Somewhat Relevant Not Relevant Retrieved +3 +1 0 Not Retrieved 0 0 0 Sensitive Not Sensitive Retrieved -10 0 Not Retrieved 0 0 Highly Relevant Somewhat Relevant Sensitive Neither Relevant nor Sensitive CS-DCG 5 = 5.7 CS-DCG 5 = -4.3 9

  10. Normalized CS-DCG (nCS-DCG) Worst Ranking Best Ranking Highly Relevant Somewhat Relevant CS-DCG worst = -19.8 CS-DCG 5 = -4.3 CS-DCG 5 = 5.7 CS-DCG best = 5.95 Sensitive nCS-DCG 5 = 0.60 nCS-DCG 5 = 0.71 Neither Relevant 10 nor Sensitive

  11. Experiments 11

  12. LETOR OHSUMED Test Collection ● 348,566 medical publications ○ Fields: title, abstract, Medical Subject Heading (MeSH), etc ○ 14,430 (w/rel judgements) for eval ○ 334,136 for sensitivity classifier training ● 106 queries (~150 rel judgements per query) ○ 3 levels: (2) Highly Relevant, (1) Somewhat Relevant, and (0) Not Relevant ● Simulating “sensitivity” ○ 2 MeSH labels represent sensitive content (out of 118) ■ Male Urogenital Diseases [C12] ■ Female Urogenital Diseases and Pregnancy Complications [C13] ○ 12.2% of judged documents are sensitive 12

  13. Sensitivity is Topic-Dependent Hard topics Easy topics 13

  14. nCS-DCG@10 Comparison 14

  15. Proposed Approaches Prefilter Listwise LtR Optimizing nCS-DCG Documents Result Filter Ranker Joint Query Sensitivity Documents Result Ranker Classifier Sensitivity Query Classifier Postfilter Documents Result Ranker Filter Query Sensitivity Classifier 15

  16. nCS-DCG@10 Comparison Listwise LtR 16

  17. CS-DCG@10 Comparison 20.7% 44.3% 27.3% 25.4% Can we reduce number of queries with negative CS-DCG scores? 17

  18. Cluster-Based Replacement (CBR) 11% ● Similar to diversity ranking ○ Retrieved documents are clustered ○ For any potentially sensitive document 20.7% in the result list is replaced with a document in the same cluster but less sensitive 20 clusters using repeated bisection 18

  19. CBR Adversely Affects nCS-DCG No filter Prefilter Postfilter Joint unclustered clustered unclustered clustered unclustered clustered unclustered clustered BM25 0.727 0.779* 0.800 0.797 0.800 0.797 0.727 0.779* 0.761 0.764 0.811* 0.785 0.817* 0.785 0.727 0.790* Linear reg. 0.765 0.771 0.812* 0.788 0.823* 0.792 0.753 0.786* LambdaMart AdaRank 0.756 0.779 0.822* 0.792 0.817* 0.791 0.823* 0.799 Coor. Ascent 0.762 0.781 0.816* 0.791 0.818* 0.790 0.842* 0.805 * Indicates two-tailed t-test with p<0.05 19

  20. Conclusion ● Proposed CS-DCG and nCS-DCG to balance between relevance and sensitivity ● Joint modeling approach yields better performance than straightforward approaches ● Cluster-based replacement can reduce number of queries with negative CS-DCG scores 20

  21. Next Steps ● Train a sensitivity classifier with fewer examples ● Build test collections with real sensitivities ● Experiment with tri-state classification ○ Sensitive ○ Needs human review ○ Not Sensitive 21

  22. Data and code can be found at https://github.com/mfayoub/SASC Thanks! Mahmoud F. Sayed mfayoub@cs.umd.edu 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend