Randomness in Competitions Eli Ben-Naim Complex Systems Group & - - PowerPoint PPT Presentation

randomness in competitions
SMART_READER_LITE
LIVE PREVIEW

Randomness in Competitions Eli Ben-Naim Complex Systems Group & - - PowerPoint PPT Presentation

Randomness in Competitions Eli Ben-Naim Complex Systems Group & Center for Nonlinear Studies Los Alamos National Laboratory Sidney Redner and Federico Vazquez (Los Alamos & Boston University) Nicholas Hengartner (Los Alamos) Micha


slide-1
SLIDE 1

Randomness in Competitions

Eli Ben-Naim

Complex Systems Group & Center for Nonlinear Studies Los Alamos National Laboratory Sidney Redner and Federico Vazquez (Los Alamos & Boston University) Nicholas Hengartner (Los Alamos) Micha Ben-Naim (Los Alamos Middle School)

Talk, papers available from: http://cnls.lanl.gov/~ebn

slide-2
SLIDE 2

Plan

  • 1. Modeling competitions
  • 2. Tournaments (post season, trees)
  • 3. Leagues (regular season, complete graphs)
  • 4. Championships (new algorithm, regular graphs)
  • 5. Modeling social dynamics
slide-3
SLIDE 3

Motivation

  • Evolution: species compete, fitter wins
  • Society: people compete for social status
  • Economics: companies compete for market

share

  • Arts, science, politics: awards, prizes, elections

Competition is everywhere

slide-4
SLIDE 4

Why sports?

  • Sports competition results are:
  • Accurate
  • Widely available
  • Complete

Sports as a laboratory for understanding competition

slide-5
SLIDE 5

Theme

  • Competitions are not perfectly predictable
  • Outcome of a single competition is stochastic
  • Winner of a series of competitions (league,

tournament) is also subject to randomness

Randomness is inherent

slide-6
SLIDE 6
  • I. Modeling competitions
slide-7
SLIDE 7

What is the most competitive sport?

Soccer Baseball Hockey Basketball Football Can competitiveness be quantified? How can competitiveness be quantified?

slide-8
SLIDE 8
  • Teams ranked by win-loss record
  • Win percentage
  • Standard deviation in win-percentage
  • Cumulative distribution = Fraction of

teams with winning percentage < x

Parity of a sports league

Major League Baseball American League 2005 Season-end Standings

σ =

  • x2 − x2

F(x)

0.400 < x < 0.600 σ = 0.08 In baseball x = Number of wins Number of games

slide-9
SLIDE 9

Data

  • 300,000 Regular season games (all games ever played)
  • 5 Major sports leagues in United States & England

sport league full name country years games soccer FA Football Association 1888-2005 43,350 baseball MLB Major League Baseball 1901-2005 163,720 hockey NHL National Hockey League 1917-2005 39,563 basketball NBA National Basketball Association 1946-2005 43,254 football NFL National Football League 1922-2004 11,770

source: http://www.shrpsports.com/ http://www.the-english-football-archive.com/

slide-10
SLIDE 10

0.2 0.4 0.6 0.8 1

x

0.2 0.4 0.6 0.8 1

F(x)

NFL NBA NHL MLB

Distribution of winning percentage clearly distinguishes sports

σ

  • Baseball most competitive?
  • Football least competitive?

data theory

Standard deviation in winning percentage

0.05 0.10 0.15 0.20 0.25 MLB FA NHL NBA NFL

0.210 0.150 0.120 0.102 0.084

Fort and Quirk, 1995

slide-11
SLIDE 11

“Everything should be made as simple as possible but not simpler”

Freeman Dyson

slide-12
SLIDE 12

“Simple Physics”

slide-13
SLIDE 13
  • Two, randomly selected, teams play
  • Outcome of game depends on team record
  • Weaker team wins with probability q<1/2
  • Stronger team wins with probability p>1/2
  • When two equal teams play, winner picked randomly
  • Initially, all teams are equal (0 wins, 0 losses)
  • Teams play once per unit time

The competition model

x = 1 2

− →

  • q = 1/2

random q = 0 deterministic

(i, j) →

  • (i + 1, j)

probability p (i, j + 1) probability 1 − p i > j p + q = 1

slide-14
SLIDE 14
  • Probability distribution functions
  • Evolution of the probability distribution
  • Closed equations for the cumulative distribution

Boundary Conditions Initial Conditions

Rate equation approach

dgk dt = (1 − q)(gk−1Gk−1 − gkGk) + q(gk−1Hk−1 − gkHk) + 1 2

  • g2

k−1 − g2 k

  • gk = fraction of teams with k wins

Gk =

k−1

  • j=0

gj = fraction of teams with less than k wins

Hk = 1 − Gk+1 =

  • j=k+1

gj

G0 = 0 G∞ = 1

Gk(t = 0) = 1

better team wins worse team wins equal teams play

Nonlinear Difference-Differential Equations

dGk dt = q(Gk−1 − Gk) + (1/2 − q)

  • G2

k−1 − G2 k

slide-15
SLIDE 15

An exact solution

  • Stronger always wins (q=0)
  • Transformation into a ratio
  • Nonlinear equations reduce to linear recursion
  • Exact solution

dGk dt = Gk(Gk − Gk−1) dPk dt = Pk−1 Gk = Pk Pk+1 Gk = 1 + t + 1

2!t2 + · · · + 1 k!tk

1 + t + 1

2!t2 + · · · + 1 (k+1)!tk+1

slide-16
SLIDE 16

0.5 1 1.5 2

x

0.2 0.4 0.6 0.8 1

F(x)

t=10 t=20 t=100 scaling theory

Long-time asymptotics

  • Long-time limit
  • Scaling form
  • Scaling function

Gk → k + 1 t F(x) = x

Seek similarity solutions Use winning percentage as scaling variable

Gk → F k t

slide-17
SLIDE 17

Scaling analysis

  • Rate equation
  • Treat number of wins as continuous
  • Stationary distribution of winning percentage
  • Scaling equation

dGk dt = q(Gk−1 − Gk) + (1/2 − q)

  • G2

k−1 − G2 k

  • Gk+1 − Gk → ∂G

∂k

Gk(t) → F(x) x = k t [(x − q) − (1 − 2q)F(x)] dF dx = 0 ∂G ∂t + [q + (1 − 2q)G] ∂G ∂k = 0

Inviscid Burgers equation

∂v ∂t + v ∂v ∂x = 0

slide-18
SLIDE 18

Scaling solution

  • Stationary distribution of winning percentage
  • Distribution of winning percentage is uniform
  • Variance in winning percentage

F(x) =        0 < x < q x − q 1 − 2q q < x < 1 − q 1 1 − q < x.

f(x) = F ′(x) =        0 < x < q 1 1 − 2q q < x < 1 − q 1 − q < x.

σ = 1/2 − q √ 3

q

1 − q 1

F(x)

x

q

1 − q

x

f(x)

1 2q − 1

− →

  • q = 1/2

perfect parity q = 0 maximum disparity

slide-19
SLIDE 19

0.2 0.4 0.6 0.8 1

x

0.2 0.4 0.6 0.8 1

F(x)

Theory t=100 t=500 200 400 600 800 1000

t

0.1 0.2 0.3 0.4 0.5

  • 1

4 √ 3

NFL MLB

League games MLB 160 FA 40 NHL 80 NBA 80 NFL 16

Approach to scaling

  • Winning percentage distribution approaches scaling solution
  • Correction to scaling is very large for realistic number of games
  • Large variance may be due to small number of games

Numerical integration of the rate equations, q=1/4

Variance inadequate to characterize competitiveness!

σ(t) = 1/2 − q √ 3 + f(t)

Large!

t−1/2 t−1/2

slide-20
SLIDE 20

0.2 0.4 0.6 0.8 1

x

0.2 0.4 0.6 0.8 1

F(x)

NFL NBA NHL MLB

The distribution of win percentage

  • Treat q as a fitting parameter, time=number of games
  • Allows to estimate qmodel for different leagues
slide-21
SLIDE 21
  • Upset frequency as a measure of predictability
  • Addresses the variability in the number of games
  • Measure directly from game-by-game results
  • Ties: count as 1/2 of an upset (small effect)
  • Ignore games by teams with equal records
  • Ignore games by teams with no record

The upset frequency

q = Number of upsets Number of games

slide-22
SLIDE 22

1900 1920 1940 1960 1980 2000

year

0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48

q

FA MLB NHL NBA NFL

The upset frequency

League q qmodel FA 0.452 0.459 MLB 0.441 0.413 NHL 0.414 0.383 NBA 0.365 0.316 NFL 0.364 0.309

Soccer, baseball most competitive Basketball, football least competitive q differentiates the different sport leagues!

slide-23
SLIDE 23

1900 1920 1940 1960 1980 2000

year

0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28

  • NFL

NBA NHL MLB FA

1900 1920 1940 1960 1980 2000

year

0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48

q

FA MLB NHL NBA NFL

Evolution with time

  • Parity, predictability mirror each other
  • Football, baseball increasing competitiveness
  • Soccer decreasing competitiveness (past 60 years)

σ = 1/2 − q √ 3 S.J. Gould, Full House, The spread of excellence from Pluto to Darwin, 1996

slide-24
SLIDE 24
  • I. Discussion
  • Model limitation: it does not incorporate
  • Game location: home field advantage
  • Game score
  • Upset frequency dependent on relative team

strength

  • Unbalanced schedule
  • Model advantages:
  • Simple, involves only 1 parameter
  • Enables quantitative analysis
slide-25
SLIDE 25
  • 1. Conclusions
  • Parity characterized by variance in winning percentage
  • Parity measure requires standings data
  • Parity measure depends on season length
  • Predictability characterized by upset frequency
  • Predictability measure requires game results data
  • Predictability measure independent of season length
  • Two-team competition model allows quantitative

modeling of sports competitions

slide-26
SLIDE 26
  • 2. Tournaments

(post-season, trees)

slide-27
SLIDE 27

Single-elimination Tournaments

Binary Tree Structure

slide-28
SLIDE 28
  • Two teams play, loser is eliminated
  • Teams have inherent strength (or fitness) x
  • Outcome of game depends on team strength

The competition model

N → N/2 → N/4 → · · · → 1 (x1, x2) →

  • x1

probability 1 − q x2 probability q x1 < x2 x

strong weak

x4 x3 x2 x1 x5

slide-29
SLIDE 29
  • Number of teams
  • = Cumulative probability distribution

function for teams with fitness less than x to win an N-team tournament

  • Closed equations for the cumulative distribution

Recursive approach

Nonlinear Recursion Equation

GN(x) N = 2k = 1, 2, 4, 8, . . .

G2N(x) = 2p GN(x) + (1 − 2p) [GN(x)]2

slide-30
SLIDE 30
  • 1. Scale of Winner
  • 2. Scaling Function
  • 3. Algebraic Tail

0.2 0.4 0.6 0.8 1

x

0.2 0.4 0.6 0.8 1

GN(x)

N=1 N=2 N=4 N=8 N=16

Scaling properties

  • 1. Large tournaments produce strong winners
  • 3. High probability for an upset

GN(x) → Ψ (x/x∗) 1 − Ψ(z) ∼ zln 2p/ ln 2q x∗ ∼ N − ln 2p/ ln 2

slide-31
SLIDE 31

2 4 6 8 10

z

0.2 0.4 0.6 0.8 1

Ψ(z)

N=2

1

N=2

4

N=2

7

N=2

10

N=

10

  • 1

10 10

1

10

2

z

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 Ψ'(z)

Universal shape Broad tail

Ψ(2pz) = 2pΨ(z) + (1 − 2p)Ψ2(z)

Ψ′(z) ∼ zln 2p/ ln 2q−1

The scaling function

slide-32
SLIDE 32

4 8 12 16

x

0.2 0.4 0.6 0.8 1

G16(x)

Theory Simulation Tournament Data

College Basketball

  • Teams ranked 1-16

Well defined favorite Well defined underdog

  • 4 winners each year
  • Theory: q=0.18
  • Simulation: q=0.22
  • Data: q=0.27
  • Data: 1978-2006
  • 1600 games

1 2 3 4 5 6 7 8 9 10 11 12 45 24 14 10 5 6 1 4 1 0 2 0

slide-33
SLIDE 33

1980 1985 1990 1995 2000 2005

year

0.2 0.25 0.3 0.35

q

Men Women

Evolution, Men vs Women

slide-34
SLIDE 34
  • 2. Conclusions
  • Strong teams fare better in large tournaments
  • Tournaments can produce major upsets
  • Distribution of winner relates parity with predictability
  • Tournaments are efficient but not fair
slide-35
SLIDE 35
  • 3. Leagues

(regular season, complete graphs)

slide-36
SLIDE 36

League champions

  • N teams with fixed ranking
  • In each game, favorite and underdog are well defined
  • Favorite wins with probability p>1/2

Underdog wins with probability q<1/2

  • Each team plays t games against random opponents
  • Regular random graph
  • Team with most wins is the champion

p + q = 1

How many games are needed for best team to win?

slide-37
SLIDE 37

Random walk approach

  • Probability team ranked n wins a game
  • Number of wins performs a biased random walk
  • Team n can finish first at early times as long as
  • Rank of champion as function of N and t

n 1 2 3 N . . . . . .

Pn = p n − 1 N − 1 + q N − n N − 1

wn = Pn t ±

  • Dn t

(2p − 1) n N t ∼ √ t n∗ ∼ N √ t n p q

slide-38
SLIDE 38

10 10

1

10

2

10

3

N

10 10

2

10

4

10

6

10

8

< T >

slope=3 simulation

Length of season

  • For best team to finish first
  • Each team must play
  • Total number of games

T ∼ N 3 t ∼ N 2 1 ∼ N √ t

  • 1. Normal leagues are too short
  • 2. Normal leagues: rank of winner
  • 3. League champions are a transient!

∼ √ N

slide-39
SLIDE 39

Distribution of outcomes

  • Scaling distribution for the rank of champion
  • Probability worse team wins decays exponentially
  • Gaussian tail because
  • Normal league: Prob. (weakest team wins)

Leagues are fair: upset champions extremely unlikely

QN(t) ∼ exp(−const × t) Qn(t) ∼ 1 n∗ ψ n n∗

  • ψ(z) ∼ exp
  • −const × z2

n∗ ∼ N √ t

∼ exp(−N) ψ

  • t1/2

∼ exp(−t)

slide-40
SLIDE 40

1 4 8 12 16

n

0.05 0.10 0.15 0.20 0.25 0.30

Pn

league tournament

Leagues versus Tournaments

16 teams, q=0.4

n league tourna ment 1 24.5 12.9 2 18.2 11.4 3 13.6 10.1 4 10.3 8.9 5 7.9 7.9 6 6.1 7.1 7 4.7 6.3 8 3.7 5.7 9 2.9 5.1 10 2.2 4.6 11 1.7 4.2 12 1.3 3.8 13 1.0 3.4 14 0.81 3.1 15 0.63 2.8 16 0.49 2.6

n∗ ∼ √ N

slide-41
SLIDE 41
  • 3. Conclusions
  • Leagues are fair but inefficient
  • Leagues do not produce major upsets
slide-42
SLIDE 42
  • 4. Championships

(regular random graphs and complete graphs)

slide-43
SLIDE 43

One preliminary round

  • Preliminary round
  • Teams play a small number of games
  • Top M teams advance to championship round
  • Bottom N-M teams eliminated
  • Best team must finish no worse than M place
  • Championship round: plenty of games
  • Total number of games
  • Minimal when

M ∼ N α t ∼ N 2 M 2 T ∼ N t

1 2 3 N M

T ∼ M 3 T ∼ N 3−2α + N 3α M ∼ N 3/5 T ∼ N 9/5

slide-44
SLIDE 44

Two preliminary rounds

  • Two stage elimination
  • Second round
  • Minimize number of games
  • Further improvement in efficiency

N → N α2 → N α2α1 → 1 T2 ∼ N 3−2α2 + N α2(3−2α1) + N 3α1α2 3 − 2α2 = α2(3 − 2α1) − → α2 = 15 19 T ∼ N 27/19

slide-45
SLIDE 45

Multiple preliminary rounds

  • Each additional round further reduces T
  • Gradual elimination
  • Teams play a small number of games initially

Optimal linear scaling achieved using many rounds

Preliminary elimination is very efficient!

T∞ ∼ N M∞ ∼ N 1/3

Tk ∼ N γk

γk = 1 1 − (2/3)k+1

N → N

57 65 → N 57 65 15 19 → N 57 65 15 19 3 5 → 1

γk = 3, 9 5, 27 19, 81 65, · · ·

  • ptimal size of playoffs!
slide-46
SLIDE 46
  • 4. Conclusions
  • Gradual elimination is fair and efficient
  • Preliminary rounds reduce the number of games
  • In preliminary round, teams play a small number of

games and almost all teams advance to next round

slide-47
SLIDE 47
  • 5. Social Dynamics
slide-48
SLIDE 48

Competition and social dynamics

  • Teams are agents
  • Number of wins represents fitness or wealth
  • Agents advance by competing against each other
  • Competition is a mechanism for social differentiation
slide-49
SLIDE 49
  • Agents advance by competition
  • Agent decline due to inactivity
  • Rate equations
  • Scaling equations

The social diversity model

k → k − 1 with rate r

dGk dt = r(Gk+1 − Gk) + pGk−1(Gk−1 − Gk) + (1 − p)(1 − Gk)(Gk−1 − Gk) − 1 2(Gk − Gk−1)2

[(p + r − 1 + x) − (2p − 1)F(x)] dF dx = 0

(i, j) →

  • (i + 1, j)

probability p (i, j + 1) probability 1 − p i > j

slide-50
SLIDE 50

Social structures

  • 1. Middle class

Agents advance at different rates

  • 2. Middle+lower class

Some agents advance at different rates Some agents do not advance

  • 3. Lower class

Agents do not advance

  • 4. Egalitarian class

All agents advance at equal rates Bonabeau 96 Sports

slide-51
SLIDE 51

Concluding remarks

  • Mathematical modeling of competitions sensible
  • Minimalist models are a starting point
  • Randomness a crucial ingredient
  • Validation against data is necessary for

predictive modeling

slide-52
SLIDE 52

“Prediction is very difficult, especially about the future.”

Niels Bohr

slide-53
SLIDE 53

Publications

  • How to Choose a Champion
  • E. Ben-Naim, N.W. Hengartner
  • Phys. Rev. E, submitted (2007)
  • Scaling in Tournaments
  • E. Ben-Naim, S. Redner, F. Vazquez

Europhysics Letters 77, 30005 (2007)

  • What is the Most Competitive Sport?
  • E. Ben-Naim, F. Vazquez, S. Redner

physics/0512143

  • Dynamics of Multi-Player Games
  • E. Ben-Naim, B. Kahng, and J.S. Kim
  • J. Stat. Mech. P07001 (2006)
  • On the Structure of Competitive Societies
  • E. Ben-Naim, F. Vazquez, S. Redner
  • Eur. Phys. Jour. B 26 531 (2006)
  • Dynamics of Social Diversity
  • E. Ben-Naim and S. Redner
  • J. Stat. Mech. L11002 (2005)
slide-54
SLIDE 54

0.40 0.45 0.50 0.55 0.60

x

0.2 0.4 0.6 0.8 1

F(x)

NFL NHL MLB

All time records of teams