bayesian counterfactual risk minimization
play

Bayesian Counterfactual Risk Minimization Ben London (blondon@) - PowerPoint PPT Presentation

Bayesian Counterfactual Risk Minimization Ben London (blondon@) Amazon Music Ted Sandler (sandler@) Amazon Music International Conference on Machine Learning Long Beach, CA, June 11, 2019 Learning from Logged Data Pull log data e.g., user i


  1. Bayesian Counterfactual Risk Minimization Ben London (blondon@) Amazon Music Ted Sandler (sandler@) Amazon Music International Conference on Machine Learning Long Beach, CA, June 11, 2019

  2. Learning from Logged Data Pull log data e.g., user i listened to item j Launch new Design/train if unsuccessful policy new rec policy if successful A/B test new policy vs old policy

  3. Problem 1: Bandit Feedback • Only observe outcomes from actions taken • e.g., only get feedback on recommendations Alexa, play music Here’s a station you might like …

  4. Problem 2: Bias • Logged data is biased • Policy typically not uniform distribution high support → better estimate • User typically doesn’t see everything • Bias affects inferences • Self-fulfilling prophecies; “rich get richer” low support • Miss key insights due to insufficient support → who knows?

  5. <latexit sha1_base64="PSf7DjZY4iJDoPC0SXiPXT/dCmw=">AChnicjVFdaxNBFJ1dP5rGr9g+nIxCAmUkK1K+6AQUMHCqYtZNdldjLbDJ0vZmYlyzj9nz7D/wFziZ5sKLghQuHc869dzhTac6sm06/J+mdu/fu7/X2+w8ePnr8ZPD04NyqxhA6J4orc1lhSzmTdO6Y4/RSG4pFxelFdf2u0y+UmOZkp9dq2kh8JVkNSPYRaochPyDLv1ofZSblRpDXrWxq/cBNjzeEpqN1uOwgM4TrXgMRf8mx1obtb6BvDaY+Cx4GadsI0rP3mbhiwRTMsiPdnq3BHfEN1iXbBy8LlkoB8NsMt0U/BsM0a7OysHPfKlI6h0hGNrF9lUu8Jj4xjhNPTzxlKNyTW+osIJRbUFn4TU4AXkVlCrUxs6WD/j7hsbC2FV0CuxW9k+tI/+mLRpXnxaeSd04Ksn2UN1wcAq6zGHJDCWOtxFgYlh8K5AVjqm4+DO3roh2SWsbN4BoQUSzsv3/C+n8eJK9nBx/ejWcvdnF1UP0HM0Qhk6QTP0EZ2hOSLoR7KfHCSHaS+dpK/Tk601TXYzh+hWpbNfrhDCKg=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="vq06mY9CIP/+RPZL4LrwG8KBOrc=">ACf3icjVFdSxwxFM1Mv7arat9CV0ERXsMBntugsVhL74aKGrws50yGQzazDJDElmcUjzQwv9Ff4Cs+uCVrohcDJuefmHk6KmjNt4vhXEL54+er1m87b7tr6u/cbvc2tC101itAxqXilrgqsKWeSjg0znF7VimJRcHpZ3Hxd9C/nVGlWye+mrWkm8EykhFsPJX35ilWM8FkbtOaOZgewLRUmFjkrPRX3YjcshPkfkj4SeXsUeDle3hB/IS3Odt31qZLNxM1KzIbR0kSD4ajgzj6PDgcoMSD0fAoQcjVOXMu7/VRFC8L/hv0warO895dOq1I6g0hGOtJyiuTWaxMoxw6rpo2mNyQ2e0YmHEguqM7t05OCOZ6awrJQ/0sAl+eExULrVhReKbC51s97C/JvUljymFmawbQyV5WFQ2HJoKLsKGU6YoMbz1ABPFvFdIrGPz/gvebJFtFNav8CFC0UXlzp7v+FdJFE6DBKvh31T7+s4uqAbfAR7AEjsEpOAPnYAwI+B2EwVqwHgbhbhiF8YM0DFYzH8CTCkf3fJ69cA=</latexit> IPS Policy Optimization • Use inverse propensity score (IPS) estimator n 1 π ( a i | x i ) X logged propensity p i = π 0 ( a i | x i ) arg min − r i n p i π i =1 • IPS is an unbiased estimator of expected reward n a ∼ π ( x ) [ ρ ( x, a )] ≈ 1 π ( a i | x i ) X E E r i ( x, ρ ) ∼ D n p i i =1 • Caveat: logging policy must have full support

  6. <latexit sha1_base64="vq06mY9CIP/+RPZL4LrwG8KBOrc=">ACf3icjVFdSxwxFM1Mv7arat9CV0ERXsMBntugsVhL74aKGrws50yGQzazDJDElmcUjzQwv9Ff4Cs+uCVrohcDJuefmHk6KmjNt4vhXEL54+er1m87b7tr6u/cbvc2tC101itAxqXilrgqsKWeSjg0znF7VimJRcHpZ3Hxd9C/nVGlWye+mrWkm8EykhFsPJX35ilWM8FkbtOaOZgewLRUmFjkrPRX3YjcshPkfkj4SeXsUeDle3hB/IS3Odt31qZLNxM1KzIbR0kSD4ajgzj6PDgcoMSD0fAoQcjVOXMu7/VRFC8L/hv0warO895dOq1I6g0hGOtJyiuTWaxMoxw6rpo2mNyQ2e0YmHEguqM7t05OCOZ6awrJQ/0sAl+eExULrVhReKbC51s97C/JvUljymFmawbQyV5WFQ2HJoKLsKGU6YoMbz1ABPFvFdIrGPz/gvebJFtFNav8CFC0UXlzp7v+FdJFE6DBKvh31T7+s4uqAbfAR7AEjsEpOAPnYAwI+B2EwVqwHgbhbhiF8YM0DFYzH8CTCkf3fJ69cA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> <latexit sha1_base64="8NS4mO5kWFTfLpxFhH3ZJ5ymWCA=">ACSXicjVDLSsNAFJ3UV62vqks3g0VQkJEre1CKbhxWcGqkIQwmU7q0JkzEzEPs7/oxuFfwMXYkrp2kXKgoeuHA4574QcKoVKb5YpSmpmdm58rzlYXFpeWV6urahYxTgUkXxywWVwGShNGIdBVjFwlgiAeMHIZDE5G/uUNEZLG0bnKEuJx1I9oSDFSWvKr7dwtljiH3i5Wbdts9Fs7Zr1g8Zew7I1aTX3bcsaJj4dwiPoJtQ3t5FPoXsHb32641drVt0sAP8mNTBx6+ub0Yp5xECjMkpWOZifJyJBTFjAwrbipJgvA9YmjaYQ4kV5ePDmEW1rpwTAWuiIFC/XrRI64lBkPdCdH6lr+9Ebib56TqrDp5TRKUkUiPD4UpgyqGI5igz0qCFYs0wRhQfWvEF8jgbDS4X67wrMeCaXeAHkGuW6OZeV/IV3YdUvzs/1a+3gSVxlsgE2wDSxwCNrgFHRAF2BwDx7BE3g2HoxX4934GLeWjMnMOviG0tQna3itoA=</latexit> IPS Policy Optimization • Use inverse propensity score (IPS) estimator n 1 π ( a i | x i ) X logged propensity p i = π 0 ( a i | x i ) arg min − r i n p i π i =1 • Problem: IPS has high variance ! " ($|&) ! " ($|&) !($|&) !((|&)

  7. <latexit sha1_base64="IUqBlo6BGJCKN8j7pt+Y8e1iNg=">ACNHicdVBNb9NAEF2Xj5bw0QBHLisipKIiy3ZLSCUOlbhwbAVJK8VRNF6Pm1V3bXd3jLAs9z/wZ9or/AwkbohrD/0F3aRBoghG2tXTe/Nmdl9SKmkpCL57K7du37m7unavc/Bw0fr3cdPRraojMChKFRhDhOwqGSOQ5Kk8LA0CDpReJAcv5vrB5/QWFnkH6kucaLhKJeZFECOmnY3N/npKY+Vc6TAY3tiqIln4C7Cz9SMwLTtRlzKVx9etNuL/CjKOgPdnjgv+5v9cPIgZ3BdhSGPSDRfXYsvam3cs4LUSlMSehwNpxGJQ0acCQFArbTlxZLEcwxGOHcxBo50i0+1/IVjUp4Vxp2c+IL909GAtrbWievUQDP7tzYn/6WNK8oGk0bmZUWYi+tFWaU4FXyeE+lQUGqdgCEke6tXMzAgCX40tuk4xs24C1zXrmwHRfS7yT4/8Eo8sMtP9rf7u2+Xca1xp6x52yDhewN2Xv2R4bMsG+sHP2lX3zrwf3k/v13Xrirf0PGU3yru4ApDq4s=</latexit> <latexit sha1_base64="cWvFQAa06y/zjgRhGkPYJf5mUuE=">ACVnicjVDLSgMxFM2Mr1pfoy7dBItQUtHBV0oFNy4ESrYKnTqkEkzNTJDElGHOJ8lT9jt/oHfoCY1oIPFLwQODn3HuTE6WMKl2vDx13anpmdq40X15YXFpe8VbX2irJCYtnLBEXkdIEUYFaWmqGblOJUE8YuQqGpyO9Ks7IhVNxKXOU9LlqC9oTDHSlgq98wDJPqciNEFKCxjswCWCBu/MJeVcZDQ0/84kbAXRnST4O1V9GIeID3Id0uTBrSIvQqfq0+Lvg3qIBJNUPvNeglONEaMyQUh2/nuquQVJTzEhRDjJFUoQHqE86FgrEieqa8bcLuGWZHowTaY/QcMx+7TCIK5XzyDo50rfqpzYif9M6mY6PuoaKNE4I9FcagTuAoQ9ijkmDNcgsQltS+FeJbZFPRNulvW3jeI7GyEyDPIbfmRJX/F1J7r+bv1/YuDiqN40lcJbABNkEV+OAQNMAZaIWwOARDMEzeHGenDd3xp37sLrOpGcdfCvXewf7/bVW</latexit> CRM Principle • Counterfactual Risk Minimization (CRM) principle n 1 π ( a i | x i ) q X ˆ arg min + λ Var( π , S ) − r i n p i π i =1 variance regularization • Motivated by PAC risk analysis • Stochastic optimization of variance regularizer is tricky • Policy optimization for exponential models (POEM) algorithm [Swaminathan & Joachims, ICML 2015]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend