graphing crumbling cookies
play

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and - PowerPoint PPT Presentation

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn What is a device graph? a dataset that organizes digital identifiers that we create as we use the internet identifiers (IDs): browser cookies or advertising


  1. Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn

  2. What is a device graph? • a dataset that organizes digital identifiers that we create as we use the internet • identifiers (IDs): browser cookies or advertising IDs • a graph is a set of vertices and edges • a list of pairs of identifiers that are in some way related id_1 id_2 score 3D0F8F 54D3A8 3.936 7F3E10 6FFE0A 1.400 8764CF 10AFC8 3.440 501EE5 62A1F3 3.045 1F39D3 4B2686 4.763 638581 85B16 1.917 • related: same person, same household • example: two digital IDs that login with same email • Why? Targeting, content customization and accurate measurement bobfano@gmail.com bobfano@gmail.com 2

  3. Building a graph using IP-colocation • IP space is intimate • Your devices share an IP when connected to the same WiFi router • You share an IP with family, friends and co-workers . . . • ideal world: static residential IPs IP 2 IP n IP 1 • problem: IPs are dynamic, mobile operator/corporate IPs, coffee shops • observation: even when IP changes, devices travel through IP-space together over course of weeks basic idea: associate devices with each other, not IP IP 1 , IP 2 , … 3

  4. Building a graph IP 1 day 1: iPhone is home with PC ½ 1 day 2: iPhone is home alone . IP 2 . day 3: iPhone is at work with 8 devices ⅛ . day 4: iPhone is at home with PC IP 3 • score proportional to number of days two devices spend alone on an IP 4 Malloy, M., Barford, P., Alp, E. C., Koller, J., & Jewell, A. (2017, August). Internet Device Graphs. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1913-1921). ACM.

  5. Comscore’s Device Graph Comparison Benchmark Graphs* Graph Nodes Edges LiveJournal 4.8M 69M Twitter 42M 1.5B UK web graph 2007 109M 3.7B Yahoo Web 1.4B 6.6B Comscore’s Device Graph (April 2019) Facebook Graph 2016 1.39B 400B • 12 countries • 3.4 Billion nodes (cookies/advertising IDs) • 17.1 Billion edges (relationships) 5 *Adapted from: Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., & Muthukrishnan, S. (2015). One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment, 8(12), 1804-1815..

  6. Community Detection HH 1 HH 5 finding community structure HH 2 HH 3 HH 4 • goal: group identifiers into cohorts (person and household level groupings) • community detection in graphs is a well studied problem • Literature/code for finding community structure (but not billions of nodes/edges) • Louvain Modularity* 6 *Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

  7. Challenge: non-persistent IDs • 3.4 Billion persistent IDs (in 12 countries) • 5-10x more non -persistent IDs • excluded from graphing process • incognito/private browsing (session cookies) • ITP (Intelligent Tracking Prevention) • 20+ Billion IDs worldwide not amenable to graphing or community detection 7

  8. <latexit sha1_base64="vaSE6dHl+KuSivsID1FiIdDGIHQ=">ACWnicbVHbSgMxEM2u93qrlzdfgkVQkLJbBX0UfdG3CrYVumWZTbNtMHsxmVXKsj/piwj+imBaV7CtA4EzZ85MJidBKoVGx/mw7IXFpeWV1bXK+sbm1nZ1Z7etk0wx3mKJTNRjAJpLEfMWCpT8MVUcokDyTvB0M653XrjSIokfcJTyXgSDWISCARrKrz57EeCQgcxvCt+lnhKDIYJSySv1JA/Ry+nxRKi/K5pJKf0N+3DyHeLk9NpQWNa0DACr5+gLkd7hV+tOXVnEnQeuCWokTKafvXNDGBZxGNkErTuk6KvRwUCiZ5UfEyzVNgTzDgXQNjiLju5RNrCnpkmD4NE2VOjHTC/u3IdJ6FAVGOd5az9bG5H+1bobhZS8XcZohj9nPRWEmKSZ07DPtC8UZypEBwJQwu1I2BAUMzW9UjAnu7JPnQbtRd8/qjfvz2tV1acqOSCH5Ji45IJckVvSJC3CyDv5spatFevTtu01e/1Haltlzx6ZCnv/G1wqtKI=</latexit> <latexit sha1_base64="z3vgj/ehOLGS3WySOAGJFqMvTc=">ACM3icbVDLSgMxFM34rPVdenmYhEUpcyoAhC0Y26qmBV6NQhk6ZtMkMSUYsw/yTG3/EhSAuFHrP5hpu/B14MLJufeQe08Yc6aN6z47I6Nj4xOThani9Mzs3HxpYfFCR4kitE4iHqmrEGvKmaR1wynV7GiWIScXoY3R3n/8pYqzSJ5bnoxbQrckazNCDZWCkqnvgiju9R0qYQMjgLvegMOwE+Bwb4tn0lY8wU2XSXSk1oWeJswcLRwz7W/Qx8ksS5MyiV3YrbB/wl3pCU0RC1oPTotyKSCoN4VjrhufGpliZRjhNCv6iaYxJje4QxuWSiyobqb9mzNYtUoL2pGyJQ301e+OFAuteyK0k/n6+ncvF/rNRLT3mumTMaJoZIMPmonHEwEeYDQYoSw3uWYKY3RVIFytMjI25aEPwfp/8l1xsVbztytbZTrl6OIyjgJbRClpDHtpFVXSMaqiOCLpHT+gVvTkPzovz7nwMRkecoWcJ/YDz+QUK6gm</latexit> <latexit sha1_base64="GBzhxlA8csF0BFoAqow+QKrutzI=">AB7nicbVBNSwMxEJ2tX7V+VT16CRbBU9mtgh6LXjxWsB/QLiWbZtvQbBKSrFiW/gvHhTx6u/x5r8xbfegrQ8GHu/NMDMvUpwZ6/vfXmFtfWNzq7hd2tnd2z8oHx61jEw1oU0iudSdCBvKmaBNynHaUpTiJO29H4dua3H6k2TIoHO1E0TPBQsJgRbJ3U7mGltHzqlyt+1Z8DrZIgJxXI0eiXv3oDSdKECks4NqYb+MqGdaWEU6npV5qMJkjIe06jACTVhNj93is6cMkCx1K6ERXP190SGE2MmSeQ6E2xHZtmbif953dTG12HGhEotFWSxKE45shLNfkcDpimxfOIJpq5WxEZY2JdQmVXAjB8surpFWrBhfV2v1lpX6Tx1GEziFcwjgCupwBw1oAoExPMrvHnKe/HevY9Fa8HLZ47hD7zPH5Swj7o=</latexit> <latexit sha1_base64="9KQfBfWlZmi6WmeVgDo3hyOIgNQ=">ACSXicbVBNSxBEO1Z49fGmDUevRZAgphmdGA4kniJbmt4K7CzjLW9PZoY3/R3SMuw/y9XHLz5n/IxUNCyCm9H4H48aDg9atXVPXLjeDOx/F91Fh4tbi0vLafL32Zv1ta+Nd3+nSUtajWmh7nqNjgivW89wLdm4sQ5kLdpZfH0/6ZzfMOq7VqR8bNpR4qXjBKfogZa2LVOb6tuIF1JBWwOEwVMoVbKcS/ZWV1dunSUfYeYb4Ti8dtJgpmign/1zmRogRWOsvoWpRlFUx8GbtdpxJ54CnpNkTtpkjm7WuktHmpaSKU8FOjdIYuOHFVrPqWB1My0dM0iv8ZINAlUomRtW0yRq+BCUERTahlIepur/ExVK58YyD87Jke5pbyK+1BuUvjgYVlyZ0jNFZ4uKUoDXMIkVRtwy6sU4EKSWh1uBXqF6kP4zRBC8vTLz0l/t5PsdXZPrWPs/jWCFb5D3ZJgnZJ0fkC+mSHqHkG/lBfpJf0foIfod/ZlZG9F8ZpM8QmPhL7w7sbw=</latexit> Backfilling Key Ideas: • Once cohorts of persistent IDs are defined, find the IP addresses that are associated with the cohort over time: C 1 → { (IP 1 , day 1 ) , (IP 2 , day 2 ) , . . . } • Ruleset: if the persistent IDs defined by the IP address are synonymous with the group defined by cohort, then assign non-persistent IDs to cohort: if { i : i ∈ (IP 1 , day 1 ) } ∩ V p ≈ C 1 then C + 1 = { i : i ∈ (IP 1 , day 1 ) } ∪ C 1 • Precision and recall are used to define approximate equality ( ) ≈ • Results: assign additional 2+ Billion IDs to cohorts in the US 8

  9. Privacy • Internet is great. It’s funded by ads. respecting user privacy efficiency: more relevant ads • Current/future landscape • Increases in non-persistent identifiers and rejection of 3 rd party cookies • Safari, Firefox, likely more to come • Legislation - GDPR (Europe) and CCPA (California) • Favor large entities with login information (Google, Facebook, Apple) 9

  10. How to opt-out • Reject 3 rd party cookies. • Turn off your advertising ID. 10

  11. Questions? Device Graph Publications • Graphing Crumbling Cookies, AdKDD (Malloy, Koller, Cahn) • Device Graphing by Example, KDD 2018 (Funkhouser, Malloy, Alp, Poon, Barford) • Internet Device Graphs, KDD 2017 (Malloy, Barford, Alp, Koller, Jewell) 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend