Random permutations and the two-parameter Poisson-Dirichlet - - PowerPoint PPT Presentation

random permutations and the two parameter poisson
SMART_READER_LITE
LIVE PREVIEW

Random permutations and the two-parameter Poisson-Dirichlet - - PowerPoint PPT Presentation

Random permutations and the two-parameter Poisson-Dirichlet distribution. Sasha Gnedin Queen Mary, University of London Sasha Gnedin Random permutations and the 2-parameter PD Sasha Gnedin Random permutations and the 2-parameter PD Sasha


slide-1
SLIDE 1

Random permutations and the two-parameter Poisson-Dirichlet distribution.

Sasha Gnedin Queen Mary, University of London

Sasha Gnedin Random permutations and the 2-parameter PD

slide-2
SLIDE 2

Sasha Gnedin Random permutations and the 2-parameter PD

slide-3
SLIDE 3

Sasha Gnedin Random permutations and the 2-parameter PD

slide-4
SLIDE 4

Sasha Gnedin Random permutations and the 2-parameter PD

slide-5
SLIDE 5

The Pitman-Yor definition

  • PD(α, θ) is a probability law for a sequence of random frequencies

P ↓ = (P1, P2, · · · ), with P1 > P2 > · · · > 0,

  • j

Pj = 1,

  • btained by arranging in decreasing order another sequence
  • P = (

P1, P2, · · · )

Sasha Gnedin Random permutations and the 2-parameter PD

slide-6
SLIDE 6

The Pitman-Yor definition

  • PD(α, θ) is a probability law for a sequence of random frequencies

P ↓ = (P1, P2, · · · ), with P1 > P2 > · · · > 0,

  • j

Pj = 1,

  • btained by arranging in decreasing order another sequence
  • P = (

P1, P2, · · · )

  • Pj = Wj

j−1

  • i=1

(1 − Wj), j = 1, 2, . . . , where Wi’s independent, with Wj

L

= Beta(1 − α, θ + αj)

Sasha Gnedin Random permutations and the 2-parameter PD

slide-7
SLIDE 7

Two algorithms for size-biased ordering

  • Conventional sampling without replacement algorithm:

For p1, p2, . . . with s = pj < ∞, a size-biased pick ˜ p1 := pJ is defined by setting P(J = j) = pj/s. Removing J from N, resp. pJ from p1, p2, · · · , and iterating the SB-picking yields a SBP of N, resp.

  • f p1, p2, · · ·

Sasha Gnedin Random permutations and the 2-parameter PD

slide-8
SLIDE 8
  • Ranking algorithm to arrange p1, p2, · · · in SB order:

kth iteration only deals with p1, · · · , pk. After 1, · · · , k have been arranged as i1, · · · , ik with (q1, · · · , qk) := (pi1, · · · , pik) the relative rank ρk+1 of k + 1 is determined by moving k + 1 left-to-right through i1, · · · , ik until settling in position ρk+1 = m ∈ {1, · · · , k + 1} with odds pk+1 : (qm + · · · + qk). The infinite SB order is defined by ρ1, ρ2, . . .

Sasha Gnedin Random permutations and the 2-parameter PD

slide-9
SLIDE 9
  • Ranking algorithm to arrange p1, p2, · · · in SB order:

kth iteration only deals with p1, · · · , pk. After 1, · · · , k have been arranged as i1, · · · , ik with (q1, · · · , qk) := (pi1, · · · , pik) the relative rank ρk+1 of k + 1 is determined by moving k + 1 left-to-right through i1, · · · , ik until settling in position ρk+1 = m ∈ {1, · · · , k + 1} with odds pk+1 : (qm + · · · + qk). The infinite SB order is defined by ρ1, ρ2, . . .

  • k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showing

that the finite orders are consistent under restrictions (cf also P-Tran ’12).

Sasha Gnedin Random permutations and the 2-parameter PD

slide-10
SLIDE 10
  • Ranking algorithm to arrange p1, p2, · · · in SB order:

kth iteration only deals with p1, · · · , pk. After 1, · · · , k have been arranged as i1, · · · , ik with (q1, · · · , qk) := (pi1, · · · , pik) the relative rank ρk+1 of k + 1 is determined by moving k + 1 left-to-right through i1, · · · , ik until settling in position ρk+1 = m ∈ {1, · · · , k + 1} with odds pk+1 : (qm + · · · + qk). The infinite SB order is defined by ρ1, ρ2, . . .

  • k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showing

that the finite orders are consistent under restrictions (cf also P-Tran ’12).

  • Works also if pj = ∞ although in this case the SB order is not a

well-order.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-11
SLIDE 11
  • Ranking algorithm to arrange p1, p2, · · · in SB order:

kth iteration only deals with p1, · · · , pk. After 1, · · · , k have been arranged as i1, · · · , ik with (q1, · · · , qk) := (pi1, · · · , pik) the relative rank ρk+1 of k + 1 is determined by moving k + 1 left-to-right through i1, · · · , ik until settling in position ρk+1 = m ∈ {1, · · · , k + 1} with odds pk+1 : (qm + · · · + qk). The infinite SB order is defined by ρ1, ρ2, . . .

  • k steps yield 1, · · · , k (resp. p1, · · · , pk) in size-biased order, showing

that the finite orders are consistent under restrictions (cf also P-Tran ’12).

  • Works also if pj = ∞ although in this case the SB order is not a

well-order.

  • When p1 = p2 = · · · we have the ranks ρk independent, uniform on

[k] := {1, · · · , k}, and the resulting order is the exchangeable infinite

  • rder (Aldous ’83), which restricts to [k] as uniformly distributed

permutation.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-12
SLIDE 12

Characterisation of PD by SBP

  • If

P1 is independent of ( P2, P3, · · · )/(1 − P1) then the stick-breaking factors Yj are independent and (excluding some trivial cases) P ↓ L = PD(α, θ) for some α, θ. – McCloskey ’65, P ’96, G-Haulk-P ’09

Sasha Gnedin Random permutations and the 2-parameter PD

slide-13
SLIDE 13

The arrangement problem

Ordered representations of PD involve

  • either an increasing jump

process (random c.d.f.) (Ft, t ≥ 0),

  • or interval partition of [0, 1] into components of [0, 1] \ Z, for Z a

random measure-0 closed set. Every such representation implies certain arrangement P ∗ of the frequencies P↓

j ’s in accord with the natural ordering of jump-times, resp.

component intervals.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-14
SLIDE 14

The arrangement problem

Ordered representations of PD involve

  • either an increasing jump

process (random c.d.f.) (Ft, t ≥ 0),

  • or interval partition of [0, 1] into components of [0, 1] \ Z, for Z a

random measure-0 closed set. Every such representation implies certain arrangement P ∗ of the frequencies P↓

j ’s in accord with the natural ordering of jump-times, resp.

component intervals.

  • The arrangement problem concerns features of this induced order P ∗,

characterization of PD and sub-families, as well as connection of P ∗ to the well-orders P ↓ and P .

Sasha Gnedin Random permutations and the 2-parameter PD

slide-15
SLIDE 15

A combinatorial counterpart of the arrangement problem

  • Recall that

Pj is the asymptotic frequency of the jth occupied table in the Dubins-Pitman Chinese Restaurant

Sasha Gnedin Random permutations and the 2-parameter PD

slide-16
SLIDE 16

A combinatorial counterpart of the arrangement problem

  • Recall that

Pj is the asymptotic frequency of the jth occupied table in the Dubins-Pitman Chinese Restaurant

Sasha Gnedin Random permutations and the 2-parameter PD

slide-17
SLIDE 17

A combinatorial counterpart of the arrangement problem

  • Recall that

Pj is the asymptotic frequency of the jth occupied table in the Dubins-Pitman Chinese Restaurant

Sasha Gnedin Random permutations and the 2-parameter PD

slide-18
SLIDE 18

A combinatorial counterpart of the arrangement problem

  • Recall that

Pj is the asymptotic frequency of the jth occupied table in the Dubins-Pitman Chinese Restaurant When the occupancy numbers are n1, . . . , nk, (n1 + · · · + nk = n)

  • sits at occupied table j with probability

nj−α n+θ ,

  • occupies a new table with probability

θ+kα n+θ .

Sasha Gnedin Random permutations and the 2-parameter PD

slide-19
SLIDE 19
  • Hence a n-sample from P ∗ has the structure of composition (ordered

partition) Π∗

n of integer n, with the CRP ‘table’ occupancy counts

arranged in the corresponding order. The Π∗

n’s are consistent as n

varies.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-20
SLIDE 20
  • Hence a n-sample from P ∗ has the structure of composition (ordered

partition) Π∗

n of integer n, with the CRP ‘table’ occupancy counts

arranged in the corresponding order. The Π∗

n’s are consistent as n

varies.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-21
SLIDE 21
  • Hence a n-sample from P ∗ has the structure of composition (ordered

partition) Π∗

n of integer n, with the CRP ‘table’ occupancy counts

arranged in the corresponding order. The Π∗

n’s are consistent as n

varies. Z, U1, · · · , Un

  • sample uniform[0,1] points U1, . . . , Un
  • scan the gaps in Z in the left-to-right order
  • record the sizes of clusters in each occupied gap

Sasha Gnedin Random permutations and the 2-parameter PD

slide-22
SLIDE 22

Subordinator ‘bridge’ representations of PD

  • For (St, t ≥ 0) a subordinator with S0 = 0 and tilted by manipulating

the distribution of (T, ST) Ft = St ST , 0 ≤ t ≤ T

  • depending on choice of subordinator (gamma, stable, generalized

gamma) some restricted range of (α, θ) ∈ [0, 1) × [0, ∞) may be covered –McCloskey ’65, Kingman ’75, Perman-PY ’92, PY ’97, P ’03

Sasha Gnedin Random permutations and the 2-parameter PD

slide-23
SLIDE 23

Subordinator ‘bridge’ representations of PD

  • For (St, t ≥ 0) a subordinator with S0 = 0 and tilted by manipulating

the distribution of (T, ST) Ft = St ST , 0 ≤ t ≤ T

  • depending on choice of subordinator (gamma, stable, generalized

gamma) some restricted range of (α, θ) ∈ [0, 1) × [0, ∞) may be covered –McCloskey ’65, Kingman ’75, Perman-PY ’92, PY ’97, P ’03

  • The induced order P ∗ is exchangeable, i.e. P↓

j (equivalently,

Pj’s) are shuffled ‘uniformly at random’.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-24
SLIDE 24

The self-similar representation of PD(α, 0)

  • Let Z be the α-stable set (e.g., the zero set of BM for α = 1/2), so

Z ∩ [0, 1] is the range of α-stable subordinator (St, t ≥ 0) ‘cut’ by passing level 1.

  • The SB pick

P1 is the size of the rightmost ‘meander’ interval, while all other frequencies occur in the exchangeable order (PY ’96).

  • Exactly the same (and not just in the n → ∞ regime) arrangement of

parts occurs on the combinatorial level of composition Π∗ (P ’97).

Sasha Gnedin Random permutations and the 2-parameter PD

slide-25
SLIDE 25

Regenerative compositions

  • For subordinator (St, t ≥ 0) (with S0 = 0) the ‘discrete’ c.d.f.

Ft := 1 − e−St is known as a neutral-to-the right prior.

  • Compositions Π∗

n have a

(characteristic) first-part deletion property: given the first part is m, deleting the part from Π∗

n yields a

copy of Π∗

n−m.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-26
SLIDE 26

Regenerative compositions

  • For subordinator (St, t ≥ 0) (with S0 = 0) the ‘discrete’ c.d.f.

Ft := 1 − e−St is known as a neutral-to-the right prior.

  • Compositions Π∗

n have a

(characteristic) first-part deletion property: given the first part is m, deleting the part from Π∗

n yields a

copy of Π∗

n−m.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-27
SLIDE 27
  • The distribution of Π∗

n has a product form

p∗(n1, . . . , nk) =

k

  • j=1

q(nj + · · · + nk , nj) where (assuming zero drift) q(n, m) : = Φ(n, m) Φ(n) , Φ(λ) := ∞ (1 − e−λx)ν(dx), Φ(n, m) := n m ∞ (1 − e−x)me−(n−m)xν(dx) and ν is the L´ evy measure of the subordinator.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-28
SLIDE 28
  • The NTR/regenerative representation is an intrinsic property of

unordered objects P ↓/partition structure (Πn), by the virtue of p∗(n) = E[ Pn−1

1

] = E  

j

(P↓

j )n

  .

  • Moreover, if (unordered) partitions Πn (derived by sampling from

some frequencies P ↓) have some kind of part-deletion property, then they can be represented by regenerative compositions Π∗

n.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-29
SLIDE 29
  • The NTR/regenerative representation is an intrinsic property of

unordered objects P ↓/partition structure (Πn), by the virtue of p∗(n) = E[ Pn−1

1

] = E  

j

(P↓

j )n

  .

  • Moreover, if (unordered) partitions Πn (derived by sampling from

some frequencies P ↓) have some kind of part-deletion property, then they can be represented by regenerative compositions Π∗

n.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-30
SLIDE 30

The regenerative representation of PD

The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) is regenerative:

Sasha Gnedin Random permutations and the 2-parameter PD

slide-31
SLIDE 31

The regenerative representation of PD

The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) is regenerative:

  • there is a L´

evy measure with the upper tail να,θ[x, ∞) = (1 − e−x)−αe−xθ, x ≥ 0

Sasha Gnedin Random permutations and the 2-parameter PD

slide-32
SLIDE 32

The regenerative representation of PD

The Redwoods Park Theorem: for 0 ≤ α < 1, θ ≥ 0, the PD(α, θ) is regenerative:

  • there is a L´

evy measure with the upper tail να,θ[x, ∞) = (1 − e−x)−αe−xθ, x ≥ 0

  • which corresponds to

Φ(λ) = λ Γ(1 − α)Γ(λ + θ) Γ(λ + 1 − α + θ) q(n, m) = n m

  • (1 − α)m−1

(θ + n − m)n−1 (n − m)α + mθ n and agrees with the Ewens-Pitman sampling formula, e.g. p∗(n) = (1 − α)n−1 (1 + θ)n−1 .

Sasha Gnedin Random permutations and the 2-parameter PD

slide-33
SLIDE 33

Subfamily PD(0, θ)

  • For

ν0,θ(dx) = θe−xθdx, x ≥ 0 (St, t ≥ 0) is compound Poisson with exponential jumps

  • The range of Ft = 1 − e−St is a Poisson point process with rate

function θ(1 − y)−1, y ∈ [0, 1), hence points obtainable by i.i.d. beta(1, θ) stick-breaking

  • P ∗ is a size-biased permutation of P ↓, likewise Π∗

n has parts in the

SB order

  • If a partition structure has the SB-part deletion property, it is Ewens’

– Kingman ’78

Sasha Gnedin Random permutations and the 2-parameter PD

slide-34
SLIDE 34

Subfamily PD(α, α)

  • The (α, α) L´

evy measure να,α(dx) = αex (ex − 1)α+1 dx, x ≥ 0 was introduced by Lamperti in ’73. He observed that eSt is a time-changed α-stable subordinator starting at 1, with the range 1 + Zα (for Zα α-stable regenerative set). Hence the range of Ft = 1 − e−St is Zα/(Zα + 1), which is a known representation of the α-stable bridge

  • P ∗ is an exchangeable ordering of P ↓
  • If the first part and the last part of regenerative Π∗

n have the same

distribution (for every n), then the partition structure is Pitman’s (α, α).

Sasha Gnedin Random permutations and the 2-parameter PD

slide-35
SLIDE 35

Subfamily PD(α, 0)

  • The (α, 0) L´

evy measure is να,0(dx) = αeαx (ex − 1)α+1 dx + δ∞(dx), x ≥ 0

  • Killed subordinator (St) belongs to the family of Lamperti-stable

subordinators (Chaumont and Caballero ’06)

  • P ∗ has the last part

P1, and the other parts are in exchangeable

  • rder. Same for Π∗
  • n. This kind of ordering is characteristic for (α, 0)

among the regenerative structures.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-36
SLIDE 36

Alternative proof of the regenerative representation for α, θ > 0

  • To construct a path of subordinator with L´

evy measure ν[x, ∞) = e−xθ(1 − e−x)−α

(a) split [0, ∞) at points E1 + · · · + Ej of homogeneous Poisson(θ) process, (b) run a Lamperti-stable (α, 0) (killed) subordinator from the origin (0, 0) until crossing E1 at some time τ1, (c) start another copy of this (α, 0)-subordinator from (E1, τ1) and run it until passing E1 + E2, etc.

  • The construction agrees with PY ’97, Corollary to Proposition 21: For

0 ≤ α < 1, θ ≥ 0, P ↓ can be obtained by first splitting the unity according to PD(0, θ), then further fragmenting each piece by an independent copy of the α-stable set.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-37
SLIDE 37

Size-cosize biased permutations

  • Fix ξ ∈ [0, ∞]. For positive p1, . . . , pk with s = p1 + · · · + pk, a

ξ-biased pick is a random element pJ, where P(J = j) = ξpj + (s − pj) s(ξ + k − 1) .

  • ∞-biased pick is size-biased,
  • 1-biased pick is uniformly random,
  • 0-biased pick is cosize biased.

Removing J, pJ and iterating yields a ξ-biased permutation of p1, . . . , pk, denoted permξ(p1, . . . , pn). This is only defined for k < ∞!

Sasha Gnedin Random permutations and the 2-parameter PD

slide-38
SLIDE 38

The arrangement problem for fixed n

Let Π↓

n be the (α, θ)-partition of n, and let ξ = θ/α.

  • partition structure (Π↓

n) has the ξ-biased pick deletion property, which

is characteristic;

  • Π∗

n is obtained from (Π↓ n) by arranging the parts in the ξ-biased order.

ξ-biased permutations are not consistent under restrictions . . . How to extend to p1, p2, . . . with infinitely many pj > 0?

Sasha Gnedin Random permutations and the 2-parameter PD

slide-39
SLIDE 39

Permutations/orders with biased record counts

  • For ξ ∈ [0, ∞], define ⊳ξ as a random order on N with independent

ranks ρ1, ρ2, . . . such that P(ρk = i) =

  • 1

k+ξ−1 for 1 ≤ i ≤ k − 1, ξ k+ξ−1 for i = k.

  • ⊳ξ restricts to [k] as a permutation with distribution

ξ#records (ξ)k

Sasha Gnedin Random permutations and the 2-parameter PD

slide-40
SLIDE 40

A solution to the arrangement problem

  • Key observation:

permξ(p1, . . . , pk) L = ⊳ξ ◦ perm∞(p1, . . . , pk) where the record-biased order ⊳ξ and SBP perm∞ are independent. The RHS makes sense for infinite p1, p2, . . . – P-Winkel ’09, G-Haulk-P ’09

  • Conclusion: For 0 ≤ α < 1, θ ≥ 0, the regenerative version P ∗ of

PD(α, θ) can be obtained by arranging the SB frequencies P in the independent order ⊳θ/α.

Sasha Gnedin Random permutations and the 2-parameter PD

slide-41
SLIDE 41

Selfsimiar, stationary regenerative representations of PD

  • Z ⊂ [0, ∞) is selfsimilar if Z L

= cZ for c > 0

  • In particular, Z ∩ [0, 1] could be the range of (e−(St+X), t ≥ 0) for

mean-µ subordinator (St) and independent X with distribution P(X ∈ dx) = µ−1ν[x, ∞]dx.

  • Then the induced composition structure (Π∗

n) has a last-bit deletion

property in the representation via binary code (e.g. 1010001 for (2,4,1)), and a version of the last-part deletion property. –Young ’05, GP ’05

Sasha Gnedin Random permutations and the 2-parameter PD

slide-42
SLIDE 42
  • For 0 ≤ α < 1,

θ > −α, PD(α, θ) has such representation where 1 − e−X = P1

L

= beta(1 − α, θ + α) and (St) is subordinator with ν[x, ∞] = e−x(θ+α)(1 − e−x)−α. –PY ’96, GP ’05

  • Then the SB pick

P1 is in the last position, while other frequencies are frequencies are arranged in the order inverse to the regenerative- PD(α, θ + α).

  • Example: the BM (1/2, 0) interval partition is left-to-right

regenerative (NTR), and at the same time right-to-left stationary-regenerative (hence self-similar).

  • Example: The BB (1/2, 1/2) interval partition is NTR, but also has a

realization as a stationary version of regenerative-PD(1/2, 1)

Sasha Gnedin Random permutations and the 2-parameter PD

slide-43
SLIDE 43

? + ? + ? =?

Sasha Gnedin Random permutations and the 2-parameter PD

slide-44
SLIDE 44

? + ? + ? =? Intelligence + effort + persistence = excellence

Sasha Gnedin Random permutations and the 2-parameter PD