Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and - - PowerPoint PPT Presentation
Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and - - PowerPoint PPT Presentation
Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn What is a device graph? a dataset that organizes digital identifiers that we create as we use the internet identifiers (IDs): browser cookies or advertising
2
What is a device graph?
- a dataset that organizes digital identifiers that we create as we use the internet
- identifiers (IDs): browser cookies or advertising IDs
- a graph is a set of vertices and edges
- a list of pairs of identifiers that are in some way related
- related: same person, same household
- example: two digital IDs that login with same email
- Why? Targeting, content customization and accurate measurement
bobfano@gmail.com bobfano@gmail.com
id_1 id_2 score 3D0F8F 54D3A8 3.936 7F3E10 6FFE0A 1.400 8764CF 10AFC8 3.440 501EE5 62A1F3 3.045 1F39D3 4B2686 4.763 638581 85B16 1.917
3
Building a graph using IP-colocation
- IP space is intimate
- Your devices share an IP when connected to the same WiFi router
- You share an IP with family, friends and co-workers
- ideal world: static residential IPs
- problem: IPs are dynamic, mobile operator/corporate IPs, coffee shops
- bservation: even when IP changes, devices travel through IP-space together over course of weeks
IP1 IP2
. . .
IPn
basic idea: associate devices with each other, not IP
IP1 , IP2, …
4
Building a graph
Malloy, M., Barford, P., Alp, E. C., Koller, J., & Jewell, A. (2017, August). Internet Device Graphs. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1913-1921). ACM.
. . .
IP1 day 1: iPhone is home with PC day 2: iPhone is home alone IP2 day 3: iPhone is at work with 8 devices IP3 day 4: iPhone is at home with PC
⅛
½ 1
- score proportional to number of days two devices spend alone on an IP
5
Comscore’s Device Graph
Comscore’s Device Graph (April 2019)
- 12 countries
- 3.4 Billion nodes (cookies/advertising IDs)
- 17.1 Billion edges (relationships)
*Adapted from: Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., & Muthukrishnan, S. (2015). One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment, 8(12), 1804-1815..
Comparison Benchmark Graphs*
Graph Nodes Edges LiveJournal 4.8M 69M Twitter 42M 1.5B UK web graph 2007 109M 3.7B Yahoo Web 1.4B 6.6B Facebook Graph 2016 1.39B 400B
6 finding community structure
HH1 HH2 HH3 HH4 HH5
Community Detection
- goal: group identifiers into cohorts (person and household level groupings)
- community detection in graphs is a well studied problem
- Literature/code for finding community structure (but not billions of nodes/edges)
- Louvain Modularity*
*Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
7
Challenge: non-persistent IDs
- 3.4 Billion persistent IDs (in 12 countries)
- 5-10x more non-persistent IDs
- excluded from graphing process
- incognito/private browsing (session cookies)
- ITP (Intelligent Tracking Prevention)
- 20+ Billion IDs worldwide not amenable to graphing or community detection
8
Backfilling
Key Ideas:
- Once cohorts of persistent IDs are defined, find the IP addresses that are
associated with the cohort over time:
- Ruleset: if the persistent IDs defined by the IP address are synonymous with
the group defined by cohort, then assign non-persistent IDs to cohort:
- Precision and recall are used to define approximate equality ( )
- Results: assign additional 2+ Billion IDs to cohorts in the US
if {i : i ∈ (IP1, day1)} ∩ Vp ≈ C1
<latexit sha1_base64="9KQfBfWlZmi6WmeVgDo3hyOIgNQ=">ACSXicbVBNSxBEO1Z49fGmDUevRZAgphmdGA4kniJbmt4K7CzjLW9PZoY3/R3SMuw/y9XHLz5n/IxUNCyCm9H4H48aDg9atXVPXLjeDOx/F91Fh4tbi0vLafL32Zv1ta+Nd3+nSUtajWmh7nqNjgivW89wLdm4sQ5kLdpZfH0/6ZzfMOq7VqR8bNpR4qXjBKfogZa2LVOb6tuIF1JBWwOEwVMoVbKcS/ZWV1dunSUfYeYb4Ti8dtJgpmign/1zmRogRWOsvoWpRlFUx8GbtdpxJ54CnpNkTtpkjm7WuktHmpaSKU8FOjdIYuOHFVrPqWB1My0dM0iv8ZINAlUomRtW0yRq+BCUERTahlIepur/ExVK58YyD87Jke5pbyK+1BuUvjgYVlyZ0jNFZ4uKUoDXMIkVRtwy6sU4EKSWh1uBXqF6kP4zRBC8vTLz0l/t5PsdXZPrWPs/jWCFb5D3ZJgnZJ0fkC+mSHqHkG/lBfpJf0foIfod/ZlZG9F8ZpM8QmPhL7w7sbw=</latexit>then C+
1 = {i : i ∈ (IP1, day1)} ∪ C1
<latexit sha1_base64="z3vgj/ehOLGS3WySOAGJFqMvTc=">ACM3icbVDLSgMxFM34rPVdenmYhEUpcyoAhC0Y26qmBV6NQhk6ZtMkMSUYsw/yTG3/EhSAuFHrP5hpu/B14MLJufeQe08Yc6aN6z47I6Nj4xOThani9Mzs3HxpYfFCR4kitE4iHqmrEGvKmaR1wynV7GiWIScXoY3R3n/8pYqzSJ5bnoxbQrckazNCDZWCkqnvgiju9R0qYQMjgLvegMOwE+Bwb4tn0lY8wU2XSXSk1oWeJswcLRwz7W/Qx8ksS5MyiV3YrbB/wl3pCU0RC1oPTotyKSCoN4VjrhufGpliZRjhNCv6iaYxJje4QxuWSiyobqb9mzNYtUoL2pGyJQ301e+OFAuteyK0k/n6+ncvF/rNRLT3mumTMaJoZIMPmonHEwEeYDQYoSw3uWYKY3RVIFytMjI25aEPwfp/8l1xsVbztytbZTrl6OIyjgJbRClpDHtpFVXSMaqiOCLpHT+gVvTkPzovz7nwMRkecoWcJ/YDz+QUK6gm</latexit>≈
<latexit sha1_base64="GBzhxlA8csF0BFoAqow+QKrutzI=">AB7nicbVBNSwMxEJ2tX7V+VT16CRbBU9mtgh6LXjxWsB/QLiWbZtvQbBKSrFiW/gvHhTx6u/x5r8xbfegrQ8GHu/NMDMvUpwZ6/vfXmFtfWNzq7hd2tnd2z8oHx61jEw1oU0iudSdCBvKmaBNynHaUpTiJO29H4dua3H6k2TIoHO1E0TPBQsJgRbJ3U7mGltHzqlyt+1Z8DrZIgJxXI0eiXv3oDSdKECks4NqYb+MqGdaWEU6npV5qMJkjIe06jACTVhNj93is6cMkCx1K6ERXP190SGE2MmSeQ6E2xHZtmbif953dTG12HGhEotFWSxKE45shLNfkcDpimxfOIJpq5WxEZY2JdQmVXAjB8surpFWrBhfV2v1lpX6Tx1GEziFcwjgCupwBw1oAoExPMrvHnKe/HevY9Fa8HLZ47hD7zPH5Swj7o=</latexit>C1 → {(IP1, day1), (IP2, day2), . . . }
<latexit sha1_base64="vaSE6dHl+KuSivsID1FiIdDGIHQ=">ACWnicbVHbSgMxEM2u93qrlzdfgkVQkLJbBX0UfdG3CrYVumWZTbNtMHsxmVXKsj/piwj+imBaV7CtA4EzZ85MJidBKoVGx/mw7IXFpeWV1bXK+sbm1nZ1Z7etk0wx3mKJTNRjAJpLEfMWCpT8MVUcokDyTvB0M653XrjSIokfcJTyXgSDWISCARrKrz57EeCQgcxvCt+lnhKDIYJSySv1JA/Ry+nxRKi/K5pJKf0N+3DyHeLk9NpQWNa0DACr5+gLkd7hV+tOXVnEnQeuCWokTKafvXNDGBZxGNkErTuk6KvRwUCiZ5UfEyzVNgTzDgXQNjiLju5RNrCnpkmD4NE2VOjHTC/u3IdJ6FAVGOd5az9bG5H+1bobhZS8XcZohj9nPRWEmKSZ07DPtC8UZypEBwJQwu1I2BAUMzW9UjAnu7JPnQbtRd8/qjfvz2tV1acqOSCH5Ji45IJckVvSJC3CyDv5spatFevTtu01e/1Haltlzx6ZCnv/G1wqtKI=</latexit>9
Privacy
- Internet is great. It’s funded by ads.
- Current/future landscape
- Increases in non-persistent identifiers and rejection of 3rd party cookies
- Safari, Firefox, likely more to come
- Legislation - GDPR (Europe) and CCPA (California)
- Favor large entities with login information (Google, Facebook, Apple)
efficiency: more relevant ads respecting user privacy
10
How to opt-out
- Reject 3rd party cookies.
- Turn off your advertising ID.
11
Questions?
Device Graph Publications
- Graphing Crumbling Cookies, AdKDD (Malloy, Koller, Cahn)
- Device Graphing by Example, KDD 2018 (Funkhouser, Malloy, Alp, Poon, Barford)
- Internet Device Graphs, KDD 2017 (Malloy, Barford, Alp, Koller, Jewell)