Macro- and Microscopic Analysis of the Internet Economy from Network - - PowerPoint PPT Presentation

macro and microscopic analysis of the internet economy
SMART_READER_LITE
LIVE PREVIEW

Macro- and Microscopic Analysis of the Internet Economy from Network - - PowerPoint PPT Presentation

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup Macro- and Microscopic Analysis of the Internet Economy from Network Measurements Jakub Mikians


slide-1
SLIDE 1

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Macro- and Microscopic Analysis of the Internet Economy from Network Measurements

Jakub Mikians

UPC BarcelonaTech

March 3, 2015

1 / 78

slide-2
SLIDE 2

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

2 / 78

slide-3
SLIDE 3

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

3 / 78

slide-4
SLIDE 4

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Introduction

Scale of the Internet economy

Scale of the Internet and its economy

One fifth of the global GDP growth in recent years 75% of Internet economic impact comes from traditional industries

Share in global GDP between 3.4% and 4.1%

  • 1. United States
  • 2. China
  • 3. Japan
  • 4. Germany
  • 5. INTERNET
  • 6. ...

4 / 78

slide-5
SLIDE 5

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Introduction

Internet biggest players

Some biggest players present in the ICT1 market before the Internet

AT&T Comcast . . .

. . . other companies are children of the digital economy

Google Amazon . . .

Interactions between those largest players shape the Internet economy at the macro scale

1Information and Communications Technology

5 / 78

slide-6
SLIDE 6

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Introduction

Regular user

On the other side – a regular user of the Internet For a large user base, Internet is an important place of work, retail and social interaction 40% world population online Cumulative decisions of the users impact network economics At the same time the users generate a wide spectrum of personal

  • information. This information is desired by the online marketing

companies and e-retailers Interactions between the users, retailers and service providers contribute the Internet economy at micro scale

6 / 78

slide-7
SLIDE 7

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Introduction

In this thesis we look at the Internet economy from two different perspectives: Macro-scale:

Flow of the traffic is directly related to flow of the money between ASes We examine traffic flowing between AS-es. Characterize traffic between AS-es Propose method to generate synthetic traffic matrices

Micro-scale:

Investigate economic phenomenon at the intersection of the user’s personal information and retail business - price discrimination (PD). Will look for the empirical evidences that PD exists on the Internet We present a feasible and scalable approach to investigate PD – crowd sourcing

7 / 78

slide-8
SLIDE 8

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

8 / 78

slide-9
SLIDE 9

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Macroscopic view - ITM

Introduction

AS-level, the highest level of organization of the Internet Traffic flowing between AS-es can be described by the Interdomain Traffic Matrix (ITM) ITM describes traffic between the largest business entities, therefore it is directly related to the network macroeconomics Insight into ITM → insight into the Internet macro economy Knowledge of the traffic - better peering decisions, be ahead of the competition Publicly available interdomain traffic data is a scarce resource - sensitive business information Need to be able to create synthetic matrices for research purposes

9 / 78

slide-10
SLIDE 10

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Macroscopic view - ITM

Introduction

We investigate characteristics of the ITM, describe it quantitatively from a perspective of a large research network We analyse:

Sparsity Statistical distribution of the traffic Observe that the distribution can be related to congestions in a network

10 / 78

slide-11
SLIDE 11

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Macroscopic view - ITM

Introduction

Knowledge of ITM useful in other research areas - economics, peering, routing We propose a novel method to generate synthetic traffic matrices:

Stems from first-principles (connection-based approach) Recognizes the fact that the traffic is a mixture of different applications Regional artifacts – different popularities of the content

11 / 78

slide-12
SLIDE 12

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

12 / 78

slide-13
SLIDE 13

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

G´ EANT - most complete source of direct measurements of interdomain traffic available to the researchers We focus on spatial properties Sampled NetFlow data

13 / 78

slide-14
SLIDE 14

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

trace W trace M trace Y 1 week 1 month 52 weeks period Nov 22–28, 2010 Nov 1–30, 2010 from Jan 4, 2010 flows 3.91 × 109 1.99 × 1010 2.17 × 1011 packets 3.61 × 1012 1.74 × 1013 1.70 × 1014 bytes 3.26 × 1015 1.55 × 1016 1.45 × 1017 NetFlow data volume 111 GB 476 GB 5.75 TB

Table: Parameters of the G´ EANT NetFlow traces.

14 / 78

slide-15
SLIDE 15

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Sparsity

We define sparsity as a ratio of number of zeros in the ITM to all

  • bservable items in the matrix

Challenge - how to know if there is no traffic between ASes or the traffic is not routed through G´ EANT? Only lower bound of the sparsity can be estimated. Observed sparsity > 45%

15 / 78

slide-16
SLIDE 16

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Statistical distributions

We find that 94% of the rows is heavy-tailed (top 15% entries in each row account for 95% of the traffic) Distributions resemble LogNormal or Pareto

10

4

10

6

10

8 10 10 10 12

10

−5

10

−4

10

−3

10

−2

10

−1

10 traffic [Bytes] CCDF

data LogNormal Pareto

(a) Pareto-like (D = 0.88)

10

4

10

6

10

8 10 10 10 12

10

−5

10

−4

10

−3

10

−2

10

−1

10 traffic [Bytes] CCDF

data LogNormal Pareto

(b) LogNormal-like

(D = 0.27)

10

4

10

6

10

8 10 10 10 12

10

−5

10

−4

10

−3

10

−2

10

−1

10 traffic [Bytes] CCDF

data LogNormal Pareto

(c) In the middle

(D = 0.43)

Figure: Instances of the generated traffic distribution. The tail of the distribution varies between the “straight” Pareto-like to the “bent” LogNormal-like.

16 / 78

slide-17
SLIDE 17

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Shape and throughput 0.2 0.4 0.6 0.8 1 10

−2

10

−1

10 10

1

10

2

10

3

10

4

tail type [0−lognormal , 1−pareto] throughput [mbps] 267k 10k

Figure: Type of the distribution tail and average throughput. Each dot is a separate AS. The dot size indicates the number of visible non-zero prefixes.

17 / 78

slide-18
SLIDE 18

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Congestion

Shape of the distribution can be caused by congestions in the

  • bserved network

Traffic bottlenecks can cause “tail truncation” effect2 Hypothesis: in a congested network every new connection would compete for bandwidth, so there should be negative correlation between # of flows and throughput

2“I tube, you tube, everybody tubes: analyzing the world’s largest user generated

content video system”, Cha, M. and Kwak, H. and Rodriguez, P. and Ahn, Y.Y. and Moon, S., Proc. of the 7th ACM SIGCOMM conference, 2007

18 / 78

slide-19
SLIDE 19

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Congestion 2 2.5 3 3.5 4 x 10

4

38 40 42 44 46 # flows median throughput [kbps]

(a) Congested

200 300 400 500 40 60 80 100 # flows median throughput [kbps]

(b) Not congested Figure: Number of flows and the median throughput for a LogNormal-like (a) and Pareto-like (b) AS. 22–24 Nov 2010, 10:00–20:00. A few extreme outliers in (b) are not drawn.

Correlation [−0.77, −0.85] for LogNormal (presumably congested) and insignificant for Pareto (presumably not congested)

19 / 78

slide-20
SLIDE 20

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Correlations between rows

Measured for 15k+ pairs of rows Calculated Spearman correlations 99% of the correlations positive, up to 0.85, 0.28 average

20 / 78

slide-21
SLIDE 21

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Low effective rank

Matrix with low effective rank can be approximated by a linear combination of small number of rows and columns Analysis of the eigenvalues confirms low rankness

50 100 150 200 0.1 0.2 0.3 0.4 0.5 Eigenvalue Relative mean magniture 236x236 (1) 200x200 (2) 150x150 (5) 100x100 (19) 50x50 (133)

Figure: Eigenvalues of the submatrices (relative magnitudes). Only a small number of the values is significant, what indicates a low effective rank.

21 / 78

slide-22
SLIDE 22

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

22 / 78

slide-23
SLIDE 23

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Research on internet economy, e.g. peering decisions, requires models of ITM Scarce data - need to build synthetic matrices that preserve properties of real ITMs Our algorithm (ITMgen) is:

Connection-level Takes into account different applications and its relative popularity Different content types (application types, forward / reverse traffic ratio) Regional effects (content is popular in one region and not in other region) Alternative to the existing gravity model3

3“An empirical approach to modeling inter-AS traffic matrices” Chang, H. and

Jamin, S. and Mao, Z.M. and Willinger, W., IMC, 2005

23 / 78

slide-24
SLIDE 24

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Traffic model

Traffic from AS i to AS j can be expressed as Ti,j =

  • κ

  • Sipκ

i (j) + dκSjpκ j (i)

  • (1)

Two terms in the summation represent the traffic generated from a user due to application κ, and the traffic produced by that application in the reverse direction κ – application pκ

i (j) – relative popularity content related to application κ,

subjective to i (a)symmetry in the two directions of traffic due to application κ is denoted by dκ, and this parameter is application-dependent. mκ – contribution of each application to the overall traffic mix. Si – relative size of an AS (number of users in AS i)

24 / 78

slide-25
SLIDE 25

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Parametrizing the model

Parametrizing is definitely challenging.. . We focus only on WEB and P2P content Alexa.com – popularity of content per country, top 1 mln pages

  • penbittorrent.com for P2P statistics

Open marketing reports to model sizes of ASes (number of users), combined with P2P data and whois information Packet level traces from CESCA to model application-level characteristics

25 / 78

slide-26
SLIDE 26

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Parametrizing popularity of Web content

10 10

1

10

2

10

3

10

4

rank of AS 10

  • 7

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 10

1

10

2

popularity DE ES US global

Figure: WEB popularity distribution of ASes, globally and for three example regions.

“Popularity” of the content hosted by different ASes in different countries is similar across regions. Top 1mln pages from Alexa Does not reflect real traffic between ASes, but can serve as a basis

  • f comparison

26 / 78

slide-27
SLIDE 27

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Parametrizing popularity of P2P content

10 10

1

10

2

10

3

10

4

10

5

Ranked ASes 10 10

1

10

2

10

3

10

4

10

5

Number of P2P peers

Figure: P2P activity distribution.

Data gathered from BitTorrent crawls

27 / 78

slide-28
SLIDE 28

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Parametrizing application mix

Application level characteristics obtained from direct monitoring of CESCA link for 14 days. WEB traffic ratio log10(dκ): (0.4, 1.5) P2P traffic ratio: (−0.87, 1.25). mP2P = 0.65, mWEB = 0.35.

28 / 78

slide-29
SLIDE 29

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Validating – traffic distribution

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 normalized traffic 10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 CCDF AS A AS B AS C 500 1000 2000 3000

(a) Traffic produced

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 normalized traffic 10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 CCDF AS A AS B AS C 500 1000 2000 3000

(b) Traffic consumed Figure: Statistical distribution of the traffic produced and consumed by the

  • bserved ASes, for the Telefonica data (dashed line) and the model (solid) for

the synthetic ITMs of different sizes.

29 / 78

slide-30
SLIDE 30

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Validating – regional effects

CESCA 500 1000 2000 3000 gravity data source / ITM size 20 40 60 80 100 regional traffic as a fraction of total traffic [%]

(a) Traffic exchanged with ASes

within same region; matrices of 4 different sizes are shown.

CESCA ITMgen gravity 20 40 60 80 100 % of traffic that is regional

(b) Regional traffic of content

providers.

Figure: Regional traffic exchange.

Regional exchange of the traffic is achieved by grouping ASes in 10 “regions” with equal number of ASes and properly parametrizing pκ

i (j).

Gravity model consistently underestimates regional traffic

30 / 78

slide-31
SLIDE 31

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Synthesizing ITM

Use case - cloud storage

The model can be used to evaluate what-if scenarios Arbitrary, common sense parameters (for sake of the exercise) Users generate an additional 5% of upstream traffic Traffic is skewed, traffic ratio log10(dST) with a normal distribution N(0.7, 0.2) Simulation suggests that ASes providing cloud storage will increase traffic from 16% to 20% Overall traffic generated by all ASes will increase by 9.1% ITMgen can be used in evaluating what-if scenarios – important in economical research

31 / 78

slide-32
SLIDE 32

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

32 / 78

slide-33
SLIDE 33

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Price discrimination - setting the price of a given product for each customer individually according to the customer’s valuation of the

  • product. The same product is offered to the customers with

different prices Economic phenomenon at the very “bottom” of the Internet economy (micro scale) Research at the intersection of e-retail and user personal information Empirically demonstrate existence of PD Analyse information vectors facilitating PD

33 / 78

slide-34
SLIDE 34

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Economic model of Internet services

Economic model behind the Internet services: Offer service for “free” Attract users Collect information Monetize information What happens with all that information?

34 / 78

slide-35
SLIDE 35

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Microscopic view - Price Discrimination

Introduction

Popular answer - it is used for targeted advertising Investigate alternative hypothesis - the information is used for price discrimination (PD) E-commerce market size: $961 billion Does price discrimination, facilitated by personal information, exist in the Internet?

35 / 78

slide-36
SLIDE 36

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Microscopic view - Price Discrimination

Introduction

Develop methodology to investigate PD in the Internet Find empirical evidences that PD exists in the Internet Present examples of PD on multiple e-commerce websites Uncover information vectors facilitating PD We show that crowd sourcing is a feasible method to investigate PD at large scale We build, deploy and evaluate a working (!) system

36 / 78

slide-37
SLIDE 37

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

37 / 78

slide-38
SLIDE 38

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Information vectors

Examined information vectors: System and browser differences Geographical location of the originating query Certain traits of the user (e.g. affluent vs budget concious)

38 / 78

slide-39
SLIDE 39

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Findings

No evidence for system/browser based PD We find location-based PD We find PD based on user traits (i.e. originating URL revealing that user is budget-sensitive) We find search discrimination4 based on user traits (personas)

4E.g. returning more expensive products to buyers with a higher willingness to pay.

It operates on multiple products trying to steer buyers towards an appropriate price range

39 / 78

slide-40
SLIDE 40

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Setup

Machines with different browsers and systems (no wget) Proxy servers in different geographical locations (PlanetLab) Trained “personas” (user profiles with particular traits) 35 product categories, 200 distinct vendors Over 600 concrete products Manually checked that the differences cannot be explained by differences in taxation, shipping costs or custom duties

40 / 78

slide-41
SLIDE 41

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Different locations – different countries

5 10 15 20 Base price [$] −50 50 100 150 200 Difference [%]

US/NY Spain Germany Korea Brasil

Figure: Price differences at Amazon based on the customer’s geographic location using the prices in New York, USA as reference. For each of the considered products there exist at least two locations with different prices.

Differences of prices for Kindle e-books at Amazon Top 100 e-books (21% difference in majority of the cases, up to 166%) Steam (store.steampowered.com, e-entertainment online retailer) – 300 products, differences for 20% of products between Germany and Spain, 3.5% products differed between US, Brazil and Korea

41 / 78

slide-42
SLIDE 42

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Different locations – same US state

Boston Worcester Springfield Lowell

Figure: Price differences at staples.com. The dot sizes mark the mean price surplus for the locations, from 0% (small dots) up to 3.9% (large dots)

staples.com, single state, 29 products, 200 ZIP codes Outskirts have larger prices than big cities!

42 / 78

slide-43
SLIDE 43

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Personal information – trained profiles

d i g i t a l c a m e r a D V D p l a y e r w r i s t w a t c h h e a d p h

  • n

e s j e a n s f

  • r

m e n M P 3 p l a y e r 100 200 300 400 500 price [$] affluent budget clean

(a) Prices (mean/min/max) shown by

Google to the different personas. The median number of products in each category per persona is 12.

A t h e n s B u d a p e s t R

  • m

e B e r l i n V i e n n a L

  • n

d

  • n

W a r s a w B a r c e l

  • n

a 50 100 150 200 250 300 Mean top-10 price [$]

affluent budget

(b) Mean prices (with std. deviations)

  • f top-10 results from Cheaptickets.com

returned to affluent and budget

  • personas. The mean difference is 15%,

and can be even as high as 50%.

Difference in search results – search discrimination, steering users towards different prices according to their preferences No price discrimination observed

43 / 78

slide-44
SLIDE 44

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Personal information – URL of origin

10 10

1

10

2

10

3

10

4

base price, without redirection [$] −50 −40 −30 −20 −10 10 price difference [%]

Figure: Price difference at the Shoplet.com online retailer site, with- and without redirection from a price aggregator.

URL of origin can indicate user’s sensitiveness to prices (e.g. customer comes from a discount site). nextag.com – price aggregator, shoplet.com – retailer Mean difference between prices with- and without redirection: 23%

44 / 78

slide-45
SLIDE 45

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

45 / 78

slide-46
SLIDE 46

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

User-centric approach

How can a real user know if she is subject to PD? Manually sampling selected sites is not enough Crowdsourcing:

Allows end-users to point the examples of sites engaging in PD Allows extracting prices without requiring our manual intervention Broadens scope of the measurement

46 / 78

slide-47
SLIDE 47

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

The Price $heriff

A browser plug-in installed by a user User highlights a price on an e-retailer site Confirm check for the price (click a pop-up) Back-end service will access the examined page from 14 different vantage points Price is extracted from the contacted pages, compared and presented to the user

47 / 78

slide-48
SLIDE 48

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

The Price $heriff – Results

340 users from 18 different countries Products from 600 domains Afterwards we crawled selected 21 domains with up to 100 products per retailer, on a daily basis for a week

48 / 78

slide-49
SLIDE 49

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Results – Crowd-collected dataset

www.amazon.com www.hotels.com store.steampowered.com www.misssixty.com www.energie.it www.sears.com eu.abercrombie.com www.tuscanyleather.it www.guess.eu www.overstock.com www.booking.com www.net−a−porter.com www.autotrader.com shop.replay.it www.mauijim.com store.refrigiwear.it store.murphynye.com www.elnaturalista.com www.jeansshop.com www.kobobooks.com www.luisaviaroma.com store.killah.com www.digitalrev.com www.scitec−nutrition.es www.staples.com www.zavvi.com www.bookdepository.co.uk 1.0 1.2 1.4 1.6 1.8 2.0 Price ratios

Figure: Magnitude of price differences per domains

Prices vary between 15% and 40%, and even up to 200% in extreme cases Some retailers not very popular - crowd sourcing is useful

49 / 78

slide-50
SLIDE 50

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Results – Crawled dataset

www.chainreactioncycles.com www.scitec−nutrition.es www.elnaturalista.com www.net−a−porter.com www.homedepot.com www.bookdepository.co.uk store.murphynye.com www.hotels.com www.energie.it www.kobobooks.com www.misssixty.com www.guess.eu www.digitalrev.com www.rightstart.com www.amazon.com www.mauijim.com www.autotrader.com store.killah.com store.refrigiwear.it www.tuscanyleather.it www.luisaviaroma.com 1.0 1.2 1.4 1.6 1.8 2.0 Price ratio (max/min price)

Figure: Magnitude of price variability per domain

Selected retailers found due to crowd sourcing, examined with a systematic crawl Price variation was a repeatable phenomenon – observed in up to 100% requests Prices vary between 10% and 30% for most of the retailers

50 / 78

slide-51
SLIDE 51

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Price variations for all products

1 2 3 4 5 Minimal price of the product ($) Maximal ratio of price difference 1 100 10K

Figure: Maximal ratio of price differences per product price (all stores)

Characterizing variations from the perspective of products Is there any correlation between price of the product and the price variation? PD in entire range of products, from $10 to $10k

51 / 78

slide-52
SLIDE 52

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Investigating specific retailers

5 10 20 50 100 200 500 2000 5000 1.0 1.1 1.2 1.3 1.4

www.digitalrev.com

Minimal price of the product ($) Ratio of price difference New York UK Finland

(a) www.digitalrev.com

20 50 100 200 1.0 1.1 1.2 1.3 1.4

www.energie.it

Minimal price of the product ($) Ratio of price difference New York UK Finland

(b) www.energie.it Figure: Ratio of price differences per product price

Examples of differences of prices with multiplicative term or additive term

52 / 78

slide-53
SLIDE 53

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Differences per location – US

Albany

1.00 1.20 1.00 1.20 1.00 1.20 1.00 1.25 1.00 1.25

Boston LA

1.00 1.25 1.00 1.25

Chicago Lincoln

1.00 1.25 1.00 1.20 1.00 1.25 1.00 1.20 1.00 1.20

New York

Figure: Magnitude of price difference per location – www.homedepot.com

Pair-wise comparisons of how the prices differ between two locations for a specific retailer. Prices relative to a minimum price of the product across all the vantage points. E.g. New York is consistently more expensive than Chicago, but there are also mixed pairs (Boston and Lincoln)

53 / 78

slide-54
SLIDE 54

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Differences per location – per country

Belgium

1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0

Brazil Finland

1.0 2.0 1.0 2.0

Germany Spain

1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0

USA

(a) www.amazon.com

Brazil

1.0 1.2 1.4 1.0 1.2 1.4 1.0 1.2 1.4 1.0 1.3 1.0 1.3

Finland Germany

1.0 1.3 1.0 1.3

Spain UK

1.0 1.3 1.0 1.2 1.4 1.0 1.3 1.0 1.2 1.4 1.0 1.2 1.4

USA

(b) store.killah.com Figure: Magnitude of price difference per location

Per-country differences Not consistent for different retailers (e.g. US and Germany)

54 / 78

slide-55
SLIDE 55

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Logged in users

10 20 30 40 5 10 15 20 Price ($) Product # W/o login User A User B User C

Figure: The impact of login on the price of Kindle ebooks at www.amazon.com

Users logged in and not logged in to Amazon

55 / 78

slide-56
SLIDE 56

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

56 / 78

slide-57
SLIDE 57

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Conclusions and Further Work

Macroscopic view – Interdomain Traffic Matrix

Scarcity of the inter-AS data makes the ITM research difficult but not impossible – value in measuring qualitative properties of ITM In Paper I - we analysed G´ EANT dataset. Research should be extended with other sources of data, preferably commercial system In I and II focused on spatial properties of ITM – research on temporal properties could shed light on long term evolution of ITM There might exist correlation between inter-AS traffic and congestion (I). Especially valuable for network operators Openly available data combined with direct measurement (II) can be used to generate snapshots of ITM Synthesis can be improved by better parametrization of the model Model generates static snapshots, temporal aspect should be included as well Our model is topology agnostic – explore how to generate synthetic topologies that match the synthetic matrices

57 / 78

slide-58
SLIDE 58

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Conclusions and Further Work

Microscopic view – Price Discrimination

In III and IV we show empirically that PD exists in the Internet In III we analysed different information vectors leading to PD and show that prices can change according to user’s location or personal traits In IV we argue that crowd-sourcing is a feasible method to conduct research on PD We present an Internet user-oriented online tool Scaling the experiment (non-trivial engineering effort) Use crowdsourcing to not only gather, but also assess gathered data Uncover economical and technical mechanisms behind PD

58 / 78

slide-59
SLIDE 59

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

1

Introduction

2

Macroscopic view: Interdomain Traffic Matrix Introduction Paper I: Characterizing ITM Paper II: Synthesizing ITM

3

Microscopic view: Price Discrimination Introduction Paper III: Detecting price discrimination Paper IV: Crowd assisted search of PD

4

Conclusions and Further Work

5

Contributions

59 / 78

slide-60
SLIDE 60

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Introduction

Contributions

Macroeconomic aspects - analysing and synthesizing ITM

I “Towards a statistical characterization of the interdomain traffic matrix.” Jakub Mikians, Amogh Dhamdhere, Constantine Dovrolis, Pere Barlet-Ros, and Josep Sol´ e-Pareta. IFIP Networking conference, 2012. II “ITMgen - A first-principles approach to generating synthetic interdomain traffic matrices.” Jakub Mikians, Nikolaos Laoutaris, Amogh Dhamdhere, and Pere Barlet-Ros. IEEE International Conference on Communications – ICC, 2013.

Microeconomic aspects - price discrimination

III “Detecting price and search discrimination on the Internet.” Jakub Mikians, L´ aszl´

  • Gyarmati, Vijay Erramilli, and Nikolaos Laoutaris.

ACM HotNets, 2012. IV “Crowd-assisted search for price discrimination in e-commerce: first results.” Jakub Mikians, L´ aszl´

  • Gyarmati Gyarmati, Vijay Erramilli,

and Nikolaos Laoutaris. ACM CoNEXT, 2013.

60 / 78

slide-61
SLIDE 61

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Contributions

Other activities

Our work on price discrimination on the internet, presented in Paper III, was mentioned in several The Wall Street Journal articles: “How the Journal Tested Prices and Deals Online, WSJ, 2012, Dec “Websites Vary Prices, Deals Based on Users’ Information”, WSJ, 2012, Dec “Want a Deal Online? Pose as a Bargain Shopper”, WSJ, 2013, Jan Research described in Papers III and IV was presented at LAP/CPC/ICPEN conference (Antwerp, 16-17 April 2013) as invited talk. London Action Plan (LAP) is a network of anti-spam government authorities and leading technologists. International Consumer Protection Enforcement Network (ICPEN) and EU Consumer Protection Cooperation Network (CPC) are focused on broad enforcement and policy consumer protection initiatives. I am co-author of a paper on personalized advertising: “Understanding Interest-based Behavioural Targeted Advertising” Juan Miguel Carrascosa, Jakub Mikians, Ruben Cuevas, Vijay Erramilli and Nikolaos Laoutaris, CoRR, 2014.

61 / 78

slide-62
SLIDE 62

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Contributions

Processing bulk of Internet traffic data

Initial work on processing bulk of Internet traffic data:

V “A practical approach to portscan detection in very high-speed links.” Jakub Mikians, Pere Barlet-Ros, Josep Sanjuas-Cuxart, and Josep Sol´ e-Pareta. Passive and Active Measurement – PAM, 2011.

62 / 78

slide-63
SLIDE 63

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Macro- and Microscopic Analysis of the Internet Economy from Network Measurements

Jakub Mikians

UPC BarcelonaTech

March 3, 2015

63 / 78

slide-64
SLIDE 64

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Characterizing ITM

Distribution of parameters 10

−1 10

10

1

10

2

10

3

0.5 1 1.5 throughput [mbps] Pareto α

(a) Pareto α

10

−1 10

10

1

10

2

10

3

0.1 0.2 0.3 0.4 throughput [mbps] Coefficient of Variation

(b) LogNormal, coefficient of variation Figure: Distribution parameters as a function of throughput.

64 / 78

slide-65
SLIDE 65

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Shape of the tail of a distribution

To compare the shape of the previous distribution, we define a metric D that indicates if the tail is LogNormal-like or Pareto-like. Let F be an empirical CDF of the sample, and let FP and FL be the CDFs of the Pareto and LogNormal distributions that fit the tail of the sample. We measure the difference in the tail using the Kolmogorov-Smirnov metric: KS(F1, F2) = max|F1(x) − F2(x)| only for values of x that are in the tail. We define D as D = KS(F, FL) KS(F, FL) + KS(F, FP) (2)

65 / 78

slide-66
SLIDE 66

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Detecting price discrimination

Third party resources

0% 10% 20% 30% 40% 50% googleapis.com scorecardresearch.com yieldmanager.com google.com fbcdn.net facebook.net googleadservices.com facebook.com doubleclick.net google-analytics.com

Figure: Presence of third party resources on the sites used for training personas.

66 / 78

slide-67
SLIDE 67

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Results – Crowd-collected dataset

www.amazon.com www.hotels.com store.steampowered.com www.misssixty.com www.energie.it www.sears.com eu.abercrombie.com www.tuscanyleather.it www.guess.eu www.overstock.com www.booking.com www.net−a−porter.com www.autotrader.com shop.replay.it www.mauijim.com store.refrigiwear.it store.murphynye.com www.elnaturalista.com www.jeansshop.com www.kobobooks.com www.luisaviaroma.com store.killah.com www.digitalrev.com www.scitec−nutrition.es www.staples.com www.zavvi.com www.bookdepository.co.uk Number of requests with price differences 2 5 10 20 50

(a) Domains with the highest number of

request where price differences occurred

www.amazon.com www.hotels.com store.steampowered.com www.misssixty.com www.energie.it www.sears.com eu.abercrombie.com www.tuscanyleather.it www.guess.eu www.overstock.com www.booking.com www.net−a−porter.com www.autotrader.com shop.replay.it www.mauijim.com store.refrigiwear.it store.murphynye.com www.elnaturalista.com www.jeansshop.com www.kobobooks.com www.luisaviaroma.com store.killah.com www.digitalrev.com www.scitec−nutrition.es www.staples.com www.zavvi.com www.bookdepository.co.uk 1.0 1.2 1.4 1.6 1.8 2.0 Price ratios

(b) Magnitude of price differences per

domains

Prices vary between 15% and 40%, and even up to 200% in extreme cases Some retailers not very popular - crowd sourcing is useful

67 / 78

slide-68
SLIDE 68

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Results – Crawled dataset

store.killah.com store.refrigiwear.it www.bookdepository.co.uk www.digitalrev.com www.energie.it www.guess.eu www.mauijim.com www.misssixty.com www.net−a−porter.com www.tuscanyleather.it store.murphynye.com www.elnaturalista.com www.chainreactioncycles.com www.luisaviaroma.com www.scitec−nutrition.es www.hotels.com www.kobobooks.com www.amazon.com www.homedepot.com www.autotrader.com www.rightstart.com Extent of price differences 0.0 0.2 0.4 0.6 0.8 1.0

(c) Measure extent of price variations for

different domains

www.chainreactioncycles.com www.scitec−nutrition.es www.elnaturalista.com www.net−a−porter.com www.homedepot.com www.bookdepository.co.uk store.murphynye.com www.hotels.com www.energie.it www.kobobooks.com www.misssixty.com www.guess.eu www.digitalrev.com www.rightstart.com www.amazon.com www.mauijim.com www.autotrader.com store.killah.com store.refrigiwear.it www.tuscanyleather.it www.luisaviaroma.com 1.0 1.2 1.4 1.6 1.8 2.0 Price ratio (max/min price)

(d) Magnitude of price variability per

domain

Selected retailers found due to crowd sourcing, examined with a systematic crawl Price variation was a repeatable phenomenon – observed in up to 100% requests Prices vary between 10% and 30% for most of the retailers

68 / 78

slide-69
SLIDE 69

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Differences per locations

Belgium − Liege Brazil − Sao Paulo Finland − Tampere Germany − Berlin Spain (Linux,FF) Spain (Mac,Safari) Spain (Win,Chrome) UK − London USA − Boston USA − Chicago USA − Lincoln USA − Los Angeles USA − New York USA − Albany 1.0 1.2 1.4

Figure: Magnitude of price differences per location (all)

Do users from certain locations tend to pay more for the same product than others?

69 / 78

slide-70
SLIDE 70

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Crowd assisted search of PD

Differences per location – Finland

store.killah.com store.murphynye.com store.refrigiwear.it www.amazon.com www.autotrader.com www.bookdepository.co.uk www.chainreactioncycles.com www.digitalrev.com www.elnaturalista.com www.energie.it www.guess.eu www.homedepot.com www.hotels.com www.kobobooks.com www.luisaviaroma.com www.mauijim.com www.misssixty.com www.net−a−porter.com www.rightstart.com www.scitec−nutrition.es www.tuscanyleather.it 1.0 1.2 1.4 1.6 1.8 2.0

Figure: Magnitude of price differences per domains in Tampere, Finland

70 / 78

slide-71
SLIDE 71

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Introduction

Processing bulk traffic data

Need to process bulk backbone data Excellence in using network traffic gathering and analysing tools As a result of the initial exercises we developed a novel portscan detection algorithm.5

5Paper V: “A Practical Approach to Portscan Detection in Very High-Speed Links”

71 / 78

slide-72
SLIDE 72

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

Early discard handshake packets that are not needed to detect portscans A couple of Bloom filters effectively discards up to 85% of the packets The method requires less than 1MB of memory to accurately monitor 10Gb/s link

72 / 78

slide-73
SLIDE 73

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

DRAM - too slow Sampling - large impact on portscan detection

73 / 78

slide-74
SLIDE 74

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

Bloom filters: Ignore legitimate handshakes (whitelist) Ignore failed connections that does not correspond to scans (e.g. TCP retransmissions) Drops 85% of the packets Finding scanners – known problem of finding top-k elements a data stream

74 / 78

slide-75
SLIDE 75

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

Figure: Algorithm description.

75 / 78

slide-76
SLIDE 76

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

bf whitelist – track legitimate connections bf syn – tracks repeated syn packets After that the connections are counted in an effective top-k counting data structure

76 / 78

slide-77
SLIDE 77

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

Table: Statistics of the traces. trace C only accounts for Syn/SynAck packets.

trace A trace B trace C trace A0 30min @ 1GigE 2h @ OC-3 30min @ 10GigE 30min @ 1GigE date 2010-05-18 2010-04-16 2010-07-29 2010-05-18 TCP packets 228,848,927 144,885,865 13,978,845 97,380,742 TCP sources 188,136 263,055 467,264 89,086 TCP flows 2,892,334 5,199,928 11,526,323 1,133,392 average usage 879.1 Mb/s 185 Mb/s 3.5 Gb/s n/a 77 / 78

slide-78
SLIDE 78

Introduction Macroscopic view: Interdomain Traffic Matrix Microscopic view: Price Discrimination Conclusions and Further Work Contributions Backup

Portscan Backup Slides

50 100 150 100 200 300 400 500 600 threshold number of sources reported as scanners span−dec

  • riginal top−k

hash table 1 10 100 1000 1 100 10000

(a) trace A - 1 Gb/s UPC link

50 100 150 100 200 300 400 500 threshold number of sources reported as scanners span−dec

  • riginal top−k

hash table 1 10 100 10000 1 100 10000

(b) trace B - MAWI traffic

50 100 150 500 1000 1500 2000 2500 threshold number of sources reported as scanners span−dec

  • riginal top−k

hash table 1 10 100 1000 1 100 10000

(c) trace C - 10 Gb/s CESCA

link

100 200 300 400 500 600 500 1000 1500 threshold number of sources reported as scanners 1 100 10000 1 100 10000

(d) online - 10GigE link

78 / 78