Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and - - PowerPoint PPT Presentation

graphing crumbling cookies
SMART_READER_LITE
LIVE PREVIEW

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and - - PowerPoint PPT Presentation

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn What is a device graph? a dataset that organizes digital identifiers that we create as we use the internet identifiers (IDs): browser cookies or advertising


slide-1
SLIDE 1

Graphing Crumbling Cookies

AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn

slide-2
SLIDE 2

2

What is a device graph?

  • a dataset that organizes digital identifiers that we create as we use the internet
  • identifiers (IDs): browser cookies or advertising IDs
  • a graph is a set of vertices and edges
  • a list of pairs of identifiers that are in some way related
  • related: same person, same household
  • example: two digital IDs that login with same email
  • Why? Targeting, content customization and accurate measurement

bobfano@gmail.com bobfano@gmail.com

id_1 id_2 score 3D0F8F 54D3A8 3.936 7F3E10 6FFE0A 1.400 8764CF 10AFC8 3.440 501EE5 62A1F3 3.045 1F39D3 4B2686 4.763 638581 85B16 1.917

slide-3
SLIDE 3

3

Building a graph using IP-colocation

  • IP space is intimate
  • Your devices share an IP when connected to the same WiFi router
  • You share an IP with family, friends and co-workers
  • ideal world: static residential IPs
  • problem: IPs are dynamic, mobile operator/corporate IPs, coffee shops
  • bservation: even when IP changes, devices travel through IP-space together over course of weeks

IP1 IP2

. . .

IPn

basic idea: associate devices with each other, not IP

IP1 , IP2, …

slide-4
SLIDE 4

4

Building a graph

Malloy, M., Barford, P., Alp, E. C., Koller, J., & Jewell, A. (2017, August). Internet Device Graphs. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1913-1921). ACM.

. . .

IP1 day 1: iPhone is home with PC day 2: iPhone is home alone IP2 day 3: iPhone is at work with 8 devices IP3 day 4: iPhone is at home with PC

½ 1

  • score proportional to number of days two devices spend alone on an IP
slide-5
SLIDE 5

5

Comscore’s Device Graph

Comscore’s Device Graph (April 2019)

  • 12 countries
  • 3.4 Billion nodes (cookies/advertising IDs)
  • 17.1 Billion edges (relationships)

*Adapted from: Ching, A., Edunov, S., Kabiljo, M., Logothetis, D., & Muthukrishnan, S. (2015). One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment, 8(12), 1804-1815..

Comparison Benchmark Graphs*

Graph Nodes Edges LiveJournal 4.8M 69M Twitter 42M 1.5B UK web graph 2007 109M 3.7B Yahoo Web 1.4B 6.6B Facebook Graph 2016 1.39B 400B

slide-6
SLIDE 6

6 finding community structure

HH1 HH2 HH3 HH4 HH5

Community Detection

  • goal: group identifiers into cohorts (person and household level groupings)
  • community detection in graphs is a well studied problem
  • Literature/code for finding community structure (but not billions of nodes/edges)
  • Louvain Modularity*

*Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

slide-7
SLIDE 7

7

Challenge: non-persistent IDs

  • 3.4 Billion persistent IDs (in 12 countries)
  • 5-10x more non-persistent IDs
  • excluded from graphing process
  • incognito/private browsing (session cookies)
  • ITP (Intelligent Tracking Prevention)
  • 20+ Billion IDs worldwide not amenable to graphing or community detection
slide-8
SLIDE 8

8

Backfilling

Key Ideas:

  • Once cohorts of persistent IDs are defined, find the IP addresses that are

associated with the cohort over time:

  • Ruleset: if the persistent IDs defined by the IP address are synonymous with

the group defined by cohort, then assign non-persistent IDs to cohort:

  • Precision and recall are used to define approximate equality ( )
  • Results: assign additional 2+ Billion IDs to cohorts in the US

if {i : i ∈ (IP1, day1)} ∩ Vp ≈ C1

<latexit sha1_base64="9KQfBfWlZmi6WmeVgDo3hyOIgNQ=">ACSXicbVBNSxBEO1Z49fGmDUevRZAgphmdGA4kniJbmt4K7CzjLW9PZoY3/R3SMuw/y9XHLz5n/IxUNCyCm9H4H48aDg9atXVPXLjeDOx/F91Fh4tbi0vLafL32Zv1ta+Nd3+nSUtajWmh7nqNjgivW89wLdm4sQ5kLdpZfH0/6ZzfMOq7VqR8bNpR4qXjBKfogZa2LVOb6tuIF1JBWwOEwVMoVbKcS/ZWV1dunSUfYeYb4Ti8dtJgpmign/1zmRogRWOsvoWpRlFUx8GbtdpxJ54CnpNkTtpkjm7WuktHmpaSKU8FOjdIYuOHFVrPqWB1My0dM0iv8ZINAlUomRtW0yRq+BCUERTahlIepur/ExVK58YyD87Jke5pbyK+1BuUvjgYVlyZ0jNFZ4uKUoDXMIkVRtwy6sU4EKSWh1uBXqF6kP4zRBC8vTLz0l/t5PsdXZPrWPs/jWCFb5D3ZJgnZJ0fkC+mSHqHkG/lBfpJf0foIfod/ZlZG9F8ZpM8QmPhL7w7sbw=</latexit>

then C+

1 = {i : i ∈ (IP1, day1)} ∪ C1

<latexit sha1_base64="z3vgj/ehOLGS3WySOAGJFqMvTc=">ACM3icbVDLSgMxFM34rPVdenmYhEUpcyoAhC0Y26qmBV6NQhk6ZtMkMSUYsw/yTG3/EhSAuFHrP5hpu/B14MLJufeQe08Yc6aN6z47I6Nj4xOThani9Mzs3HxpYfFCR4kitE4iHqmrEGvKmaR1wynV7GiWIScXoY3R3n/8pYqzSJ5bnoxbQrckazNCDZWCkqnvgiju9R0qYQMjgLvegMOwE+Bwb4tn0lY8wU2XSXSk1oWeJswcLRwz7W/Qx8ksS5MyiV3YrbB/wl3pCU0RC1oPTotyKSCoN4VjrhufGpliZRjhNCv6iaYxJje4QxuWSiyobqb9mzNYtUoL2pGyJQ301e+OFAuteyK0k/n6+ncvF/rNRLT3mumTMaJoZIMPmonHEwEeYDQYoSw3uWYKY3RVIFytMjI25aEPwfp/8l1xsVbztytbZTrl6OIyjgJbRClpDHtpFVXSMaqiOCLpHT+gVvTkPzovz7nwMRkecoWcJ/YDz+QUK6gm</latexit>

<latexit sha1_base64="GBzhxlA8csF0BFoAqow+QKrutzI=">AB7nicbVBNSwMxEJ2tX7V+VT16CRbBU9mtgh6LXjxWsB/QLiWbZtvQbBKSrFiW/gvHhTx6u/x5r8xbfegrQ8GHu/NMDMvUpwZ6/vfXmFtfWNzq7hd2tnd2z8oHx61jEw1oU0iudSdCBvKmaBNynHaUpTiJO29H4dua3H6k2TIoHO1E0TPBQsJgRbJ3U7mGltHzqlyt+1Z8DrZIgJxXI0eiXv3oDSdKECks4NqYb+MqGdaWEU6npV5qMJkjIe06jACTVhNj93is6cMkCx1K6ERXP190SGE2MmSeQ6E2xHZtmbif953dTG12HGhEotFWSxKE45shLNfkcDpimxfOIJpq5WxEZY2JdQmVXAjB8surpFWrBhfV2v1lpX6Tx1GEziFcwjgCupwBw1oAoExPMrvHnKe/HevY9Fa8HLZ47hD7zPH5Swj7o=</latexit>

C1 → {(IP1, day1), (IP2, day2), . . . }

<latexit sha1_base64="vaSE6dHl+KuSivsID1FiIdDGIHQ=">ACWnicbVHbSgMxEM2u93qrlzdfgkVQkLJbBX0UfdG3CrYVumWZTbNtMHsxmVXKsj/piwj+imBaV7CtA4EzZ85MJidBKoVGx/mw7IXFpeWV1bXK+sbm1nZ1Z7etk0wx3mKJTNRjAJpLEfMWCpT8MVUcokDyTvB0M653XrjSIokfcJTyXgSDWISCARrKrz57EeCQgcxvCt+lnhKDIYJSySv1JA/Ry+nxRKi/K5pJKf0N+3DyHeLk9NpQWNa0DACr5+gLkd7hV+tOXVnEnQeuCWokTKafvXNDGBZxGNkErTuk6KvRwUCiZ5UfEyzVNgTzDgXQNjiLju5RNrCnpkmD4NE2VOjHTC/u3IdJ6FAVGOd5az9bG5H+1bobhZS8XcZohj9nPRWEmKSZ07DPtC8UZypEBwJQwu1I2BAUMzW9UjAnu7JPnQbtRd8/qjfvz2tV1acqOSCH5Ji45IJckVvSJC3CyDv5spatFevTtu01e/1Haltlzx6ZCnv/G1wqtKI=</latexit>
slide-9
SLIDE 9

9

Privacy

  • Internet is great. It’s funded by ads.
  • Current/future landscape
  • Increases in non-persistent identifiers and rejection of 3rd party cookies
  • Safari, Firefox, likely more to come
  • Legislation - GDPR (Europe) and CCPA (California)
  • Favor large entities with login information (Google, Facebook, Apple)

efficiency: more relevant ads respecting user privacy

slide-10
SLIDE 10

10

How to opt-out

  • Reject 3rd party cookies.
  • Turn off your advertising ID.
slide-11
SLIDE 11

11

Questions?

Device Graph Publications

  • Graphing Crumbling Cookies, AdKDD (Malloy, Koller, Cahn)
  • Device Graphing by Example, KDD 2018 (Funkhouser, Malloy, Alp, Poon, Barford)
  • Internet Device Graphs, KDD 2017 (Malloy, Barford, Alp, Koller, Jewell)