The Halting Problem Theorem. There is no algorithm for deciding - - PDF document

the halting problem
SMART_READER_LITE
LIVE PREVIEW

The Halting Problem Theorem. There is no algorithm for deciding - - PDF document

2001 1 2001 2 The Halting Problem Theorem. There is no algorithm for deciding whether a computer program ever halts. Information-theoretic proof. Without restricting the generality assume that all programs Notes on the


slide-1
SLIDE 1

2001 1

✬ ✫ ✩ ✪

Notes on the Complexity

  • f Algorithms

Professor Cristian S. Calude Email: cristian@cs.auckland.ac.nz

2001 2

✬ ✫ ✩ ✪

The Halting Problem

  • Theorem. There is no algorithm for deciding

whether a computer program ever halts. Information-theoretic proof. Without restricting the generality assume that all programs incorporate inputs – which are coded as natural

  • numbers. So, a program may run forever or may

just eventually stop, in which case it prints a natural number.

2001 3

✬ ✫ ✩ ✪ Assume that there exists a halting program deciding whether an arbitrary program will ever halt. Construct the following program:

  • 1. read a natural N;
  • 2. generate all programs up to N bits in

size;

  • 3. use the halting program to check for

each generated program whether it halts;

  • 4. simulate the running of the above

generated programs, and

  • 5. output the double of the biggest value
  • utput by these programs.

2001 4

✬ ✫ ✩ ✪ The above program halts for every natural N. How long is it? It is about log N bits. Reason: to know N we need log2 N bits (in binary); the rest

  • f the program is a constant, so our program is

log N + O(1) bits. Observe that there is a big difference between the size – in bits – of our program and the size of the

  • utput produced by this program. Indeed, for large

enough N, our program will belong to the set of programs having less than N bits. Hint: log N + O(1) < N.

slide-2
SLIDE 2

2001 5

✬ ✫ ✩ ✪ Accordingly, the program will be generated by itself – at some stage of the computation. In this case we have got a contradiction since our program will output a natural number two times bigger than the output produced by itself! Hence, the hypothesis regarding the existence of the halting program is false.

2001 6

✬ ✫ ✩ ✪

Instantaneous Coding

Consider two alphabets Y = {y1, y2, ..., yN}, A = {a1, a2, ..., aQ}, having 2 ≤ Q < N elements. A (finite) code is an injective function ϕ : Y → A∗. An instantaneous code is a code for which ϕ(Y ) is prefix-free (a set is prefix-free if no string in the set is a proper prefix of a different string in the set).

2001 7

✬ ✫ ✩ ✪ Example: Y = {y1, y2, y3, y4}, A = {0, 1} y1 y2 y3 y4 ϕ1 : 00 01 10 11 ϕ2 : 10 110 1110 11110 ϕ3 : 10 10 110 1110 ϕ4 : 01 011 0111 01111

2001 8

✬ ✫ ✩ ✪ Kraft’s Theorem. Let (ni) i = 1, 2, . . . , N be positive integers. These numbers are the lengths of the code–strings

  • f an instantaneous code ϕ : Y → A∗ iff

N

  • i=1

Q−ni ≤ 1.

slide-3
SLIDE 3

2001 9

✬ ✫ ✩ ✪

  • Proof. Let ϕ : Y → A∗ be an instantaneous code

with | ϕ( yi ) | = ni , 1 ≤ i ≤ N. Let ri = the number of the code–strings of length i. Clearly, rj = 0 for j > m = max{n1, . . . , nN}, so r1 ≤ Q, r2 ≤ (Q − r1) Q = Q2 − r1 Q, r3 ≤ ((Q − r1) Q − r2) Q = Q3 − r1 Q2 − r2 Q, . . . rm ≤ Qm − r1Qm−1 − r2Qm−2 − · · · − rm−1Q

2001 10

✬ ✫ ✩ ✪ Dividing the inequality rm ≤ Qm − n1Qm−1 − r2Qm−2 − · · · − rm−1Q by Qm, we get

m

  • i=1

ri Q−i ≤ 1. Finally,

m

  • i=1

ri Q−i =

N

  • j=1

Q−nj ≤ 1.

2001 11

✬ ✫ ✩ ✪ Conversely, r1 Q−1 ≤

m

  • i=1

ri Q−i ≤ 1, r1 Q−1 + r2 Q−2 ≤

m

  • i=1

ri Q−i ≤ 1, . . . r1 Q−1 + r2 Q−2 + · · · + rm Q−m ≤

m

  • i=1

riQ−i ≤ 1.

2001 12

✬ ✫ ✩ ✪ So, r1 ≤ Q, r2 ≤ (Q − r1) Q, . . . rm ≤ Qm − r1 Qm−1 − · · · − rm−1 Q, showing that we have enough elements to construct the instantaneous code. The inequality

N

  • i=1

Q−ni ≤ 1 is called Kraft’s inequality.

slide-4
SLIDE 4

2001 13

✬ ✫ ✩ ✪ A Chaitin computer or simply a computer is a p.c. (partially computable) function C : A∗

  • → A∗ having a prefix–free domain.

A universal (Chaitin) computer is a (Chaitin) computer U : A∗

  • → A∗

such that for every computer C : A∗

  • → A∗

there exists a constant c — depending upon U and C — such that if C(x) = y, then there exists x′ ∈ A∗ such that

  • U ( x′ ) = y,
  • | x′ | ≤ | x | + c.

2001 14

✬ ✫ ✩ ✪

  • Theorem. There exists a p.c. function

F : N+ × A∗

  • → A∗

such that for every p.c. function ϕ : A∗

  • → A∗

with prefix–free domain there exists an i ∈ N+ such that F ( i, x ) = ϕ ( x ), for all x ∈ A∗.

2001 15

✬ ✫ ✩ ✪ Theorem. There exists a universal computer. Proof: Let F : N+ × A∗

A∗ be universal p.c. function for all Chaitin computers, and define U ( ai

1 a2 x )

= F ( i, x ), i ≥ 1, x ∈ A∗.

2001 16

✬ ✫ ✩ ✪ Program-size complexity induced by a computer C : A∗

  • → A∗

is the function HC ( x ) = min{ |u| | C( u ) = x}, ( min ∅ = ∞). Fix a fixed universal computer U and put H ( x ) = HU(x). The canonical program corresponding to x is x∗ = min{u ∈ A∗ | U ( u ) = x}, where min is taken according to the quasi-lexicographical order.

slide-5
SLIDE 5

2001 17

✬ ✫ ✩ ✪

  • Lemma. For all x ∈ A∗

1) x∗ exists, x∗ = λ. 2) x = U(x∗), 3) H(x) = |x∗|.

  • Corollary. For all x ∈ A∗,

0 < H(x) < ∞.

2001 18

✬ ✫ ✩ ✪ Theorem (Invariance). For every computer C there exists a constant c such that H ( x ) ≤ HC ( x ) + c, for all x ∈ A∗. Theorem. For all universal computers U, U ′ there exists a constant c > 0 such that |HU(x) − HU ′(x)| ≤ c, for all x ∈ A∗.

2001 19

✬ ✫ ✩ ✪ Theorem (Kraft-Chaitin). Let ϕ : N+

  • → N be

a p.c. function having as domain an initial segment of N+. The following two statements are equivalent: (1) We can effectively construct an injective p.c. function θ : dom(ϕ) → A∗ such that: a) for every n ∈ dom(ϕ), |θ(n)| = ϕ(n), b) range(θ) is prefix-free. (2) One has:

  • i∈dom(ϕ)

Q−ϕ(i) ≤ 1.

2001 20

✬ ✫ ✩ ✪ The set of canonical programs is CP = {x∗ | x ∈ A∗}.

  • Theorem. The set of canonical programs is

immune, i.e. it is infinite and has no infinite c.e. subset.

slide-6
SLIDE 6

2001 21

✬ ✫ ✩ ✪

  • Proof. The set CP is clearly infinite, as the

function x → x∗ is injective. We proceed now by contradiction, starting with the assumption that there exists an infinite c.e. set S ⊂ CP. Let S be enumerated by the injective computable function f : N → A∗. Define the function g : N → A∗ by g(0) = f(0), g(n + 1) = f(min j[|f(j)| > n + 1]). It is straightforward to check that g is (total) computable, S′ = g(N+) is c.e., infinite, S′ ⊂ S and |g(i)| > i, for all i > 0.

2001 22

✬ ✫ ✩ ✪ We can construct a computer C such that for every i ≥ 1, there exists a string u such that C(u) = g(i) and |u| ≤ log i + 2 log log i + 1 ≤ 3 log i. By the Invariance Theorem we get a constant c1 such that for all i ∈ N, H(g(i)) ≤ HC(g(i)) + c1 ≤ 3 log i + c1.

2001 23

✬ ✫ ✩ ✪ We continue with a result which is interesting in itself: Intermediate Step. There exists a constant c2 ≥ 0 such that for every x in CP, one has: H(x) ≥ |x| − c2. Construct a computer D(u) = U(U(u)) and pick the constant c2 coming from the Invariance Theorem (applied to U and D).

2001 24

✬ ✫ ✩ ✪ Take x = y∗, z = x∗. One has: D(z) = U(U(z)) = U(U(x∗)) = U(x) = U(y∗) = y, so HD(y) ≤ H(x), |x| = |y∗| = H(y) ≤ HD(y) + c2 ≤ H(x) + c2. For i ≥ 1, if g(i) ∈ CP, then |g(i)| > i, so i − c2 < |g(i)| − c2 ≤ H(g(i)) ≤ 3 log i + c1, and consequently only a finite number of elements in S′ can be in CP.

slide-7
SLIDE 7

2001 25

✬ ✫ ✩ ✪ Corollary. The function f : A∗ → A∗, f(x) = x∗ is not computable.

  • Proof. The function f is injective and its range is

exactly CP.

  • Theorem. The program-size complexity H(x) is

semi-computable from above, but not computable.

2001 26

✬ ✫ ✩ ✪

  • Proof. We have to prove that the “approximation

from above” of the graph of H(x), i.e. the set {(x, n) | x ∈ A∗, n ∈ N, H(x) < n} is c.e. This is easy since H(x) < n iff there exist y ∈ A∗ and t ∈ N such that |y| < n and U(y) = x in at most t steps. For the second part of the theorem we prove a bit more, namely: Claim. There is no p.c. function ϕ : A∗

  • → N

with infinite domain and such that H(x) = ϕ(x), for all x ∈ dom(ϕ).

2001 27

✬ ✫ ✩ ✪ Assume, by absurdity, that H(x) = ϕ(x), for all x ∈ dom(ϕ), where ϕ : A∗ o →N is a p.c. function with an infinite domain. Let B ⊂ dom(ϕ) be a computable, infinite set and let f : A∗ o →A∗ be the partial function given by f(ai

1a2) = min{x ∈ B | H(x) ≥ Qi}, i ≥ 1.

Since ϕ(x) = H(x), for x ∈ B, it follows that f is a p.c. function. Moreover, f has a computable graph and f takes as values strings of arbitrarily long length.

2001 28

✬ ✫ ✩ ✪ For infinitely many i > 0, H(f(ai

1a2)) ≥ Qi.

(Recall that C(ai

1a2) = f(ai 1a2) is a computer.)

Accordingly, in view of the Invariance Theorem, for infinitely many i > 0, we have: Qi ≤ H(f(ai

1a2)) ≤ HC(f(ai 1a2)) + c ≤ i + 1 + c.

This yields a contradiction.

slide-8
SLIDE 8

2001 29

✬ ✫ ✩ ✪

  • Problem. Can the halting problem be solved if
  • ne could compute program-size complexity? The

answer is affirmative.

  • Lemma. If an n-bit program p halts, then the

time t it takes to halt satisfies H(t) ≤ n + c.

2001 30

✬ ✫ ✩ ✪ So if p has run for time T without halting, and T has the property that t ≥ T = ⇒ H(t) > n + c, then p will never halt. Consider the c.e. set of all true upper bounds on H: the set of all true upper bounds {H(x) ≤ k} is computably enumerable. Imagine enumerating this set, and keep track of the time. Assuming that H is computable, compute H(x) for each n-bit string x. Then enumerate {H(x) ≤ k} until we get the best possible upper bound on H(x) for all n-bit strings x.

2001 31

✬ ✫ ✩ ✪ Let β(n) be defined to be the time it takes to enumerate enough of the set of all true upper bounds on program-size complexity until one

  • btains the correct value of H(x) for all n-bit

strings x. If one is given n and β(n) or any number greater than β(n), one can use this to determine an n-bit bit string xmax with maximum possible complexity H(xmax) = n + H(string(n)) + O(1).

2001 32

✬ ✫ ✩ ✪ Thus any number k ≥ β(n) has n + H(string(n)) − c′ < H(xmax) ≤ H(string(k)) + H(string(n)) + c′′ and H(string(k)) > n − c′ − c′′. Thus we can use β(n), which is computable from H, with the Lemma to solve the halting problem as follows: an n-bit program p halts iff it halts before time β(n + c + c′ + c′′).

slide-9
SLIDE 9

2001 33

✬ ✫ ✩ ✪

  • Theorem. There exists a natural constant c > 0

such that for all x ∈ A+, H(x) ≤ |x| + 2 log |x| + c.

  • Proof. Construct the computer C(d(x)) = x,

where |d(x)| = |x| + 2 log |x| + 1. The inequality follows from the Invariance Theorem.

2001 34

✬ ✫ ✩ ✪

  • Definition. Given a computer C we define the

following “probabilities”: PC(x) =

  • {u∈A∗ | for some u with

C(u)=x}

Q−|u|. In the case C = U we put, using the common convention, P(x) = PU(x). We say that PC(x) is the (absolute) algorithmic probability of C with output x (it measures the probability that C produces x).

2001 35

✬ ✫ ✩ ✪

  • Lemma. For every computer C,
  • x∈A∗

PC(x) ≤ 1.

  • Proof. We can write:
  • x∈A∗

PC(x) =

  • x∈A∗
  • {u∈A∗ | C(u)=x}

Q−|u| =

  • u∈dom(C)

Q−|u| ≤ 1. The number Ω(C) =

  • x∈A∗

PC(x) expresses the (absolute) halting probability of the computer C. If C = U, then put Ω = Ω(U).

2001 36

✬ ✫ ✩ ✪

  • Lemma. For every computer C and all strings x:

PC(x) ≥ Q−HC(x).

  • Proof. One has

PC(x) =

  • {u∈A∗ | C(u)=x}

Q−|u| and HC(x) = |u|, C(u) = x.

slide-10
SLIDE 10

2001 37

✬ ✫ ✩ ✪

  • Scholium. For all x ∈ A∗ :

0 < P(x) < 1.

  • Proof. P(x) ≥ Q−H(x) = Q−|x∗| > 0.
  • x∈A∗ P(x) ≤ 1 and the fact that each term of

the series is non-zero we deduce that P(x) < 1.

  • Corollary. 0 < Ω ≤ 1.

In fact, Ω ≤ 1 as Ω is random, a fact discussed later.

2001 38

✬ ✫ ✩ ✪

Coding Theorem

One has: H(x) = − logQ P(x) + O(1).

  • Proof. First notice that

PC(x) =

  • {u∈A∗ | C(u)=x}

Q−|u| and HC(x) = |u|, for some u with C(u) = x.

2001 39

✬ ✫ ✩ ✪ Next we prove the formula: H(x) ≤ − logQ PC(x) + O(1). The set T = {(x, n) ∈ A∗ × N | PC(x) > Q−n} = {(x, n) ∈ A∗ × N |

m

  • i=1

Q−|yi| > Q−n, for some y1, . . . , ym ∈ dom(C)) with C(yi) = x } is c.e. Let B = {(x, n + 1) ∈ A∗ × N | (x, n) ∈ T} and put M =

  • (x,n+1)∈B

Q−(n+1) = Q−1

  • (x,n)∈T

Q−n.

2001 40

✬ ✫ ✩ ✪ We shall prove that M ≤ 1. Notation: For every real α, if Qn < α ≤ Qn+1 for some integer n, then put n = lgQα. The following relations hold true: 1. if α > 0, then QlgQα < α, 2. if α > 0, then lgQα < logQ α ≤ lgQα + 1, 3. if α is a positive real and m is an integer, then lgQα ≥ m iff logQ α > m.

slide-11
SLIDE 11

2001 41

✬ ✫ ✩ ✪ The first two relations are direct consequences of the definition of lgQ. If α > 0 and m is an integer, then from Qn < α ≤ Qn+1 and lgQα ≥ m we deduce m ≤ lgQα = n = logQ Qn < logQ α. Conversely, if logQ α > m, Qn < α ≤ Qn+1, then Qn+1 ≥ α > Qm, so n + 1 > m, i.e. n = lgQα ≥ m (n, m ∈ Z).

2001 42

✬ ✫ ✩ ✪ Define the sets Nx = {n ∈ N | PC(x) > Q−n}, x ∈ A∗. Since n ∈ Nx implies n + 1 ∈ Nx it follows that Nx is infinite. Moreover, M = Q−1

  • {n∈Nx | x∈A∗}

Q−n, and n ∈ Nx ⇐ ⇒ PC(x) > Q−n ⇐ ⇒ logQ PC(x) > −n ⇐ ⇒ lgQPC(x) ≥ −n.

2001 43

✬ ✫ ✩ ✪ Accordingly,

  • n∈Nx

Q−n =

  • n≥−lgQPC(x)

Q−n = 1 Q − 1QlgQPC(x)+1 < Q Q − 1PC(x) ≤ QPC(x), and finally M = Q−1

x∈A∗

  • n∈Nx

Q−n ≤

  • x∈A∗

PC(x) ≤ 1.

2001 44

✬ ✫ ✩ ✪ Using the Kraft-Chaitin Theorem we construct a computer D : A∗

  • → A∗ satisfying the following

property: For every (x, n) ∈ T there exists a string v ∈ A∗ such that D(v) = x and |v| = n + 1.

slide-12
SLIDE 12

2001 45

✬ ✫ ✩ ✪ We prove that D satisfies the relation: HD(x) = −lgQPC(x) + 1. Notice that: D(v) = x ⇐ ⇒ (x, |v|) ∈ B ⇐ ⇒ PC(x) > Q1−|v| and HD(x) = min{|v| | v ∈ A∗, D(v) = x} = min{|v| | v ∈ A∗, PC(x) > Q1−|v|} = min{|v| | v ∈ A∗, |v| ≥ 1 − lgQPC(x)} = 1 − lgQPC(x).

2001 46

✬ ✫ ✩ ✪

Random Strings

What is a random string? A detailed analysis, at both empirical and formal levels, will suggest that the correct question is not Is x a random string? but To what extent is x random?

2001 47

✬ ✫ ✩ ✪ Berry’s Paradox. Consider the number

  • ne million, one hundred one thousand,
  • ne hundred twenty one.

This number appears to be the first number not nameable in under ten words. However, the above expression has only nine words! The classification of numbers as interesting versus dull is inherently ambiguous! There can be no dull numbers: if they were, the first such number would be interesting on account of its dullness.

2001 48

✬ ✫ ✩ ✪ The Paradox of Randomness. Consider the following 32-length binary strings: x = 00000000000000000000000000000000, y = 10011001100110011001100110011001, z = 00001001100000010100000010100010, u = 01101000100110101101100110100101. According to classical probability theory the strings x, y, z, u are all equally probable, i.e. the probability of each is 2−32. However, a simple analysis reveals that these four strings are extremely different from the point of view of regularity.

slide-13
SLIDE 13

2001 49

✬ ✫ ✩ ✪ Laplace’s Explanation. In the game of heads and tails, if head comes up a hundred times in a row then this appears to us extraordinary, because after dividing the nearly infinite number

  • f combinations that can arise in a

hundred throws into regular sequences, or those in which we observe a rule that is easy to grasp, and into irregular sequences, the latter are incomparably more numerous. In other words: the non-random strings are the strings possessing some kind of regularity, and since the number of all those strings (of a given length) is small, the occurrence of such a string is extraordinary.

2001 50

✬ ✫ ✩ ✪ An Informal Argument. Every canonical program should be random, independently of whether it generates a random output or not. Indeed, assume that x is a minimal program generating y. If x is not random, then there exists a program z generating x which is substantially smaller than x. Now, consider the program from z calculate x, then from x calculate y. This program is only a few letters longer than z, and thus it should be much shorter than x, which was supposed to be minimal. We have reached a contradiction.

2001 51

✬ ✫ ✩ ✪ Let f : N → A∗ be an injective, computable function. a) One has:

  • n≥0

Q−H(f(n)) ≤ 1. b) Consider a computable function g : N+ → N+. i) If

n≥1 Q−g(n) = ∞, then

H(f(n)) > g(n), for infinitely many n ∈ N+. ii) If

n≥1 Q−g(n) < ∞, then

H(f(n)) ≤ g(n)+O(1).

2001 52

✬ ✫ ✩ ✪

  • Proof. a) It is plain that:
  • n≥0

Q−H(f(n)) ≤

  • x∈A∗

Q−H(x) ≤

  • x∈A∗

P(x) ≤ 1. b) i) Assume first that

n≥1 Q−g(n) = ∞. If

there exists a natural N such that H(f(n)) ≤ g(n), for all n ≥ N, then we get a contradiction: ∞ =

  • n≥N

Q−g(n) ≤

  • n≥N

Q−H(f(n)) ≤

  • n≥0

Q−H(f(n)) ≤ 1.

slide-14
SLIDE 14

2001 53

✬ ✫ ✩ ✪ In view of the hypothesis in b) ii), there exists a natural N such that

n≥N Q−g(n) ≤ 1. We can

use Kraft-Chaitin Theorem in order to construct a computer C : A∗

  • → A∗ with the following

property: For every n ≥ N there exists x ∈ A∗ with |x| = g(n) and C(x) = f(n). So, there exists a natural c such that for all n ≥ N, H(f(n)) ≤ HC(f(n)) + c ≤ g(n) + c.

2001 54

✬ ✫ ✩ ✪ Examples. 1)

n≥0 Q−H(string(n)) ≤ 1.

2) Take g(n) = ⌊logQ n⌋. It is seen that

  • n≥1

Q−g(n) = ∞, so H(string(n)) > ⌊logQ n⌋, for infinitely many n ≥ 1.

2001 55

✬ ✫ ✩ ✪ 3) For g(n) = 2⌊logQ n⌋, one has:

  • n≥1

Q−g(n) ≤ Q

  • n≥1

1 n2 < ∞, so H(string(n)) ≤ 2⌊logQ n⌋ + O(1). For Q > 2 and g(n) = ⌊logQ−1 n⌋, one has:

  • n≥1

Q−g(n) ≤ Q

  • n≥1

1 nlogQ−1 Q < ∞, so H(string(n)) ≤ ⌊logQ−1 n⌋ + O(1).

2001 56

✬ ✫ ✩ ✪

  • Theorem. For every n ∈ N, one has:

max

x∈An H(x) = n + H(string(n)) + O(1).

The above discussion may be concluded with the following definition. Let Σ : N → N be the function defined by Σ(n) = max

x∈An H(x).

We define the random strings of length n to be the strings with maximal self-delimiting complexity among the strings of length n, i.e. the strings x ∈ An having H(x) ≈ Σ(n).

slide-15
SLIDE 15

2001 57

✬ ✫ ✩ ✪ Formally, a string x ∈ A∗ is Chaitin m-random (m is a natural number) if H(x) ≥ Σ(|x|) − m; x is Chaitin random if it is 0-random. The above definition depends upon the fixed universal computer U; the generality of the approach comes from the Invariance Theorem. Obviously, for every length n and for every m ≥ 0 there exists a Chaitin m-random string x of length n. Denote by RANDC

m, RANDC,

respectively, the sets of Chaitin m-random strings and random strings. It is worth noticing that the property of Chaitin m-randomness is asymptotic. Indeed, for x ∈ RANDC

m, the larger is the difference between

|x| and m, the more random is x. There is no sharp dividing line between randomness and pattern, but it looks as though all x ∈ RANDC

m

with m ≤ H(string(|x|)) have a true random behaviour.

2001 58

✬ ✫ ✩ ✪

  • Question. How many strings x ∈ An have

maximal complexity, i.e. H(x) = Σ(|x|)? Answer: There exists a natural constant c > 0 such that γ(n) = #{x ∈ An | H(x) = Σ(|x|)} > Qn−c, for all natural n. Fix a natural base Q ≥ 2, and write γ(n) in base

  • Q. The resulting string—over the alphabet

containing the letters 0, 1, . . . , Q − 1—is itself ... random! This string is random because it represents a large number. Let (n)Q the base-Q representation of the natural n, and σ(n) = |(γ(n))Q|. Then: H(0n−σ(n)(γ(n))Q) = Σ(n) + O(1).

2001 59

✬ ✫ ✩ ✪ How large is c? Out of Qn strings of length n, at most Q + Q2 + · · · + Qn−m−1 = (Qn−m − 1)/(Q − 1) can be described by programs of length less than n − m. The ratio between (Qn−m − 1)/(Q − 1) and Qn is less than 10−i as Qm ≥ 10i, irrespective

  • f the value of n. For instance, this happens in

case Q = 2, m = 20, i = 6; it says that less than

  • ne in a million among the binary strings of any

given length is not Chaitin 20-random.

2001 60

✬ ✫ ✩ ✪ Is the Definition Adequate? A natural computational requirement: there should be no algorithmic way to recognize which strings are random.

  • Theorem. For every t ≥ 0, the set RANDC

t is

immune.

  • Proof. Let us introduce the set

Complext = {x ∈ A∗ | H(x) ≥ |x| − t}, and prove that the set Complext is immune. As RANDC

t is an infinite subset of Complext, we

deduce that RANDC

t itself is immune.

slide-16
SLIDE 16

2001 61

✬ ✫ ✩ ✪ Assume, by absurdity, that D is an infinite computable subset of Complext. Define the p.c. function F : A∗ o →A∗ by F(ai

1a2) = min{x ∈ D | |x| ≥ t + 2(i + 1)}.

It is plain that F has a computable graph. Furthermore H(F(ai

1a2)) ≥ |F(ai 1a2)|−t ≥ t+2(i+1)−t = 2(i+1).

For infinitely many natural i, we have 2(i+1) ≤ H(F(ai

1a2)) ≤ HF (F(ai 1a2))+c ≤ i+1+c.

This yields a contradiction.

2001 62

✬ ✫ ✩ ✪

Random Sequences

In this part we address the problem of defining the notion of random sequence. It is quite clear that all informal requirements discussed for random strings should transfer automatically to random sequences. But in this case we have also to cope with all the traps of

  • infinity. Indeed, it is not difficult to shuffle 52

cards, but we may ask about the real significance

  • f shuffling the points of the unit interval!

2001 63

✬ ✫ ✩ ✪

  • Example. Almost all real numbers, when

expressed in any base, contain every possible digit

  • r possible string of digits.
  • Proof. Let A = {0, 1, . . . , Q − 1}, with Q ≥ 2.

Notice that for all a ∈ A and x ∈ A∗, x does not contain the digit a iff x ∈ (A \ {a})∗. Accordingly, for every k > 0 N(k) = #{x ∈ A∗ | |x| ≤ k, x doesn’t contain a} = (Q − 1)k+1 − 1 Q − 2 , and N(k) #{x ∈ A∗ | |x| ≤ k} = ((Q − 1)k+1 − 1)(Q − 1) (Qk+1 − 1)(Q − 2) , so lim

k→∞

N(k) #{x ∈ A∗ | |x| ≤ k} = 0.

2001 64

✬ ✫ ✩ ✪ We may now write the formula: lim

k→∞

#{x ∈ A∗ | |x| ≤ k, x does contain a} #{x ∈ A∗ | |x| ≤ k} = 1 which shows that almost all reals, when expressed in any scale Q ≥ 2, contain every possible digit a ∈ {0, 1, . . . , Q − 1}. The case of strings of digits can be easy settled just by working with a large enough base. For instance, if the string 957 never

  • ccurs in the ordinary decimal for some number,

then the digit 957 never occurs in base 1000.

slide-17
SLIDE 17

2001 65

✬ ✫ ✩ ✪

  • Notation. The set of all sequences over the

alphabet A is denoted by Aω, i.e. Aω = {x | x = x1x2 . . . xn . . . , xi ∈ A}. For every sequence x = x1x2 . . . xn . . . ∈ Aω put: a) x(n) = x1x2 . . . xn ∈ A∗, n > 0, b) xm,n = xmxm+1 . . . xn, in case n ≥ m > 0 and xm,n = λ, in the remaining cases.

2001 66

✬ ✫ ✩ ✪ For every x = x1 . . . xm ∈ A∗ and y = y1 . . . yn . . . ∈ Aω we denote by xy the concatenation sequence x1 . . . xmy1 . . . yn . . .; in particular, λy = y. For X ⊂ A∗, put XAω = {xy | x ∈ X, y ∈ Aω}. In case X is a singleton, i.e. X = {x}, we write xAω instead of XAω.

2001 67

✬ ✫ ✩ ✪

  • Difficulties. The main idea is to isolate the set
  • f all sequences having “all verifiable” properties

that from the point of view of classical probability theory are effectively satisfied with “probability

  • ne” with respect to the unbiased discrete

probability. The unbiased discrete probability on A is defined by the function h : 2A → [0, 1], h(X) = #X Q , for all subsets X ⊂ A. Hence, h({ai}) = Q−1, for every 1 ≤ i ≤ Q. This uniform measure induces the product measure µ on Aω: for all strings x ∈ A∗, µ(xAω) = Q−|x|.

2001 68

✬ ✫ ✩ ✪ If x = x1x2 . . . xn ∈ A∗ is a string of length n, then µ(xAω) = Q−n and the expression µ(. . .) can be interpreted as “the probability that a sequence y = y1y2 . . . yn . . . ∈ Aω has the first element y1 = x1, the second element y2 = x2, . . . , the nth element yn = xn”. Independence means that the probability of an event of the form yi = xi does not depend upon the probability of the event yj = xj.

slide-18
SLIDE 18

2001 69

✬ ✫ ✩ ✪ Every open set G ⊂ Aω is µ measurable and µ(G) =

  • x∈X

Q−|x|, where G = XAω =

  • x∈X

xAω, for some prefix-free subset X ⊂ A∗. Finally, S ⊂ Aω is a null set in case for every real ε > 0 there exists an open set Gε which contains S and µ(Gε) < ε. For instance, every enumerable subset

  • f Aω is a null set.

2001 70

✬ ✫ ✩ ✪ An important result which can be easily proven is the following: The union of an enumerable sequence of null sets is still a null set. A property P of sequences x ∈ Aω is true almost everywhere in the sense of µ in case the set of sequences not having the property P is a null set. The main example of such a property was discovered by Borel and it is known as the Law of Large Numbers. Consider the binary alphabet A = {0, 1} and for every sequence x = x1x2 . . . xm . . . ∈ Aω and natural number n ≥ 1 put Sn(x) = x1 + x2 + · · · + xn.

2001 71

✬ ✫ ✩ ✪ Borel’s Theorem can be phrased as follows: The limit of Sn/n, when n → ∞, exists almost everywhere in the sense of µ and has the value 1/2. In other words, there exists a null set S ⊂ Aω such that for every x ∈ S, lim

n→∞ Sn(x)/n = 1

2.

2001 72

✬ ✫ ✩ ✪ The above properties are asymptotic, in the sense that the infinite behaviour of a sequence x determines if x does or does not have such a

  • property. Kolmogorov has proven a result (known

as the All or Nothing Law) stating that practically any conceivable property is true or false almost everywhere with respect to µ. It is clear that a sequence satisfying a property false almost everywhere with respect to µ is very “particular”. Accordingly, it is tempting to try to say that a sequence x is “random” iff it satisfies every property true almost everywhere with respect to µ.

slide-19
SLIDE 19

2001 73

✬ ✫ ✩ ✪ Unfortunately, we may define for every sequence x the property Px as following: y satisfies Px iff for every n ≥ 1 there exists a natural m ≥ n such that xm = ym. Every Px is an asymptotic property which is true almost everywhere with respect to µ and x does not have property Px. Accordingly, no sequence can verify all properties true almost everywhere with respect to µ. The above definition is vacuous!

2001 74

✬ ✫ ✩ ✪ However, there is a way to overcome the above difficulty: We consider not all asymptotic properties true almost everywhere with respect to µ, but only a sequence of such properties. So, the important question becomes: What sequences of properties should be considered? Clearly, the “larger” the chosen sequence of properties is, the “more random” will be the sequences satisfying that sequence of properties. In the context of our discussion a constructive selection criterion seems to be quite natural. Accordingly, we will impose the minimal computational restriction on objects, i.e. each set

  • f strings will be c.e., and every convergent

process will be regulated by a computable function.

2001 75

✬ ✫ ✩ ✪ Consider the compact topological space (Aω, τ). The basic open sets are exactly the sets xAω, with x ∈ A∗. Accordingly, an open set G ⊂ Aω is

  • f the form G = XAω, where X ⊂ A∗.

a) A constructively open set G ⊂ Aω is an

  • pen set G = XAω for which X ⊂ A∗ is c.e.

b) A constructive sequence of constructively

  • pen sets, for short, c.s.c.o. sets is a sequence

(Gm)m≥1 of constructively open sets Gm = XmAω such that there exists an c.e. set X ⊂ A∗ × N with Xm = {x ∈ A∗ | (x, m) ∈ X}, for all natural m ≥ 1.

2001 76

✬ ✫ ✩ ✪ c) A constructively null set S ⊂ Aω is a set such that there exists a c.s.c.o. sets (Gm)m≥1 for which S ⊂

  • m≥1

Gm, and lim

m→∞ µ(Gm) = 0, constructively,

i.e. there exists an increasing, unbounded, computable function H : N → N such that µ(Gm) < Q−k whenever m ≥ H(k). It is clear that µ(S) = 0, for every constructive null set, but the converse is not true.

slide-20
SLIDE 20

2001 77

✬ ✫ ✩ ✪ A sequence is not random if it belongs to a constructively null set. Denote by rand the set of random sequences. The binary expansion of Ω is a random sequence. Martin-L¨

  • f’s Theorem. The set Aω \ rand is a

maximal constructive null set. More precisely, Aω \ rand equals the union of all constructive null sets.

2001 78

✬ ✫ ✩ ✪ Chaitin-Schnorr Theorem. A sequence x ∈ Aω is random iff there exists a natural c > 0 such that H(x(n)) ≥ n − c, for all natural n ≥ 1. Chaitin Theorem. A sequence x ∈ Aω is random iff lim

n→∞(H(x(n)) − n) = ∞. 2001 79

✬ ✫ ✩ ✪ Properties of Random Sequences. It is an intuitive fact (although not operational) that if we delete (or add) a million letters from (to) the beginning of a random sequence, the new sequence thus obtained is still random.

  • Notation. For u, v, y ∈ A∗ and x ∈ Aω, if

x = yuz, for some z ∈ Aω, then we write x(y; u → v) = yvz. Two particular cases are interesting:

  • 1. (Addition of a string) The case y = u = λ:

x = z and x(y; u → v) = vz = vx.

  • 2. (Deletion of a string) The case y = v = λ:

x = uz and x(y; u → v) = z.

2001 80

✬ ✫ ✩ ✪

  • Proposition. Let x = yuz be in Aω

(y, u ∈ A∗, z ∈ Aω). The following two assertions are equivalent: a) The sequence x is random. b) For every v ∈ A∗, the sequence x(y; u → v) is random.

slide-21
SLIDE 21

2001 81

✬ ✫ ✩ ✪

  • Proposition. Let x ∈ Aω be a sequence for

which there exists a strictly increasing sequence of naturals i(k), k ≥ 1 such that the set {(i(k), xi(k))|k ≥ 1} is computable. Then x is non-random.

  • Example. Let x = x1x2 . . . xn . . . be in Aω.

Assume that there exists 1 ≤ i ≤ Q such that the set Xi = {t ∈ N+ | xt = ai} includes an infinite c.e. set M. Then x is non-random.

2001 82

✬ ✫ ✩ ✪ (von Mises) Start with an arbitrary sequence x = x1x2 . . . xn . . . over the alphabet A = {0, 1} and define a new sequence y = y1y2 . . . yn . . ., over the alphabet {0, 1, 2}, by y1 = x1, yn = xn−1 + xn, n ≥ 2. Then, y is not random, even if x is a random sequence. The motivation is simple: the strings 02 and 20 never appear in y. (Actually, there are many

  • ther strings which do not appear in y.)

2001 83

✬ ✫ ✩ ✪ A seemingly minor change in the above example makes a major change. For x = x1x2 · · · with x1, x2, . . . ∈ {0, 1} define H(x) = y1y2 · · · with y1, y2, . . . ∈ {0, 1} by yi = x1, if i = 1, xi−1 ⊕ xi, if i > 1. If x is random, then so is H(x).

2001 84

✬ ✫ ✩ ✪ A real α ∈ (0, 1) is random if its binary expansion is a random sequence.

  • Theorem. a) Randomness is invariant under the

change of base: if x ∈ {0, 1, . . . , i}ω, y ∈ {0, 1, . . . , j}ω, i, j ≥ 2 and ∞

t=0 xti−t

= ∞

t=0 ytj−t, then x is random iff y is random.

b) Randomness is invariant under finite

  • permutations. In fact, a more general result holds

true: if π : N → N is a computable injective function, and x1x2 · · · is a random sequence, then the sequence xπ(1)xπ(2) · · · is also random.

slide-22
SLIDE 22

2001 85

✬ ✫ ✩ ✪ Consider the following situation: We have to choose one of 17 options at random—truly at

  • random. How could we achieve this with a

physical device? There seem to be two ways. First assume we have a physical random process (say in quantum physics) with equidistributed normalized outputs in the real interval (0, 1]. Assuming there are no problems with measuring the outcomes we may subdivide the interval into 17 subintervals and say the outcome is i ∈ N if the observed physical outcome falls into the interval ( i−1

17 , i 17]. Replacing a true random

process by a pseudo-random number generator, this is, how one usually obtains pseudo-random numbers within a give interval.

2001 86

✬ ✫ ✩ ✪ Another possibility is to use an unbiased icosahedral die—a die with 20 faces numbered from 1 through 20; geometrically this is possible. When tossing the die we accept outcomes between 1 and 17 and simply ignore outcomes between 18 and 20. Both procedures lead to the same result—with probability 1, that is, the sequences obtained in either way are exactly the truly random sequences in an alphabet with 17 letters.

2001 87

✬ ✫ ✩ ✪ Let A be an alphabet and let B ⊂ A, B = A with card(B) ≥ 2. Let µA and µB be the uniform probabilities on Aω and Bω. Let h : A∗ → B∗ be the homomorphism defined by h(x) = x, if x ∈ B, λ, if x ∈ A \ B, for x ∈ A. The natural extension H of h to A∞ = A∗ ∪ Aω is randomness-preserving.

2001 88

✬ ✫ ✩ ✪ Eliminating a symbol preserves randomness. Eliminating systematically a string does not, in general, preserve randomness. Is there any “inverse” transformation, going from random sequence over an alphabet to random sequences over a larger alphabet?

slide-23
SLIDE 23

2001 89

✬ ✫ ✩ ✪

  • Theorem. If U is a universal computer, then ΩU

is random. Let f be a computable one-to-one function which enumerates PROGU, the domain of U. Let ωk = k

j=0 2−|f(j)|. Clearly, (ωk) is a computable,

increasing sequence of rationals converging to ΩU, so ΩU is c.e. Consider the binary expansion of ΩU = 0.Ω0Ω1 · · · We define a machine C as follows: on input x ∈ Σ∗, C first “tries to compute” y = U(x) and the smallest number t with ωt ≥ 0.y. If successful, C(x) is the first (in quasi-lexicographical order) string not belonging to the set {U(f(0)), U(f(1)), . . . , U(f(t))}; otherwise, C(x) = ∞ if U(x) = ∞ or t does not exist.

2001 90

✬ ✫ ✩ ✪ If x ∈ PROGC and x′ is a string with U(x) = U(x′), then C(x) = C(x′). Applying this to x ∈ PROGC and the canonical program x′ = (U(x))∗ of U(x) yields HC(C(x)) ≤ |x′| = HU(U(x)). Furthermore, by the universality of U, for all x ∈ PROGC: HU(C(x)) ≤ HC(C(x)) + O(1) ≤ HU(U(x)) + O(1). (1)

2001 91

✬ ✫ ✩ ✪ Fix n and assume that x is a string with U(x) = Ω0Ω1 · · · Ωn−1. Then C(x) < ∞. Let t be the smallest number (computed in the second step of the computation of C) with ωt ≥ 0.Ω0Ω1 · · · Ωn−1. We have 0.Ω0Ω1 · · · Ωn−1 ≤ ωt < ωt +

  • s=t+1

2−|f(s)| (2) = ΩU ≤ 0.Ω0Ω1 · · · Ωn−1 + 2−n.

2001 92

✬ ✫ ✩ ✪ Hence, ∞

s=t+1 2−|f(s)| ≤ 2−n, which implies

|f(s)| ≥ n, for every s ≥ t + 1. From the construction of C we conclude that HU(C(x)) ≥ n. Using (1) we obtain n ≤ HU(C(x)) ≤ HC(C(x)) + O(1) ≤ HU(U(x)) + O(1) = HU(Ω0Ω1 · · · Ωn−1) + O(1). which proves that the sequence Ω0Ω1 · · · is random, i.e. ΩU is random.

slide-24
SLIDE 24

2001 93

✬ ✫ ✩ ✪

  • Theorem. Let α ∈ (0, 1). The following

conditions are equivalent:

  • 1. The real α is c.e. and random.
  • 2. The real α is the halting probability of some

universal computer U, α = ΩU.

2001 94

✬ ✫ ✩ ✪

Information-Theoretic Incompleteness

We will formulate all results relative to ZFC, Zermelo-Fraenkel set theory with choice.

  • Theorem. Let U be a universal computer. Then,

ZFC, if arithmetically sound, can prove only finitely many statements of the form “HU(x) > m”. In fact, there is a constant c > 0 such that ZFC cannot prove the statement “HU(x) > m” if m > HU(ZFC) + c. So, all true statements “HU(x) > m” (an infinite set) are unprovable in ZFC.

2001 95

✬ ✫ ✩ ✪ Chaitin’s Second Information-theoretic Incompleteness Theorem. Let U be a universal computer. If ZFC is arithmetically sound, then ZFC can determine the value of only finitely many bits of ΩU. We can explicitly compute a bound on the number of bits of ΩU which ZFC can determine, but the bound is not computable. For example, in 1997 Chaitin has constructed a universal computer ULisp and a theory T such that T can determine the value of at most HULisp(T) + 15, 328 bits of ΩU.

2001 96

✬ ✫ ✩ ✪ Can we ‘know’ the (finitely many) bits which ZFC can determine? For every c.e. and random real α we can construct a universal computer U such that α = ΩU and ZFC is able to determine finitely (but as many as we want) bits of ΩU. Theorem (Solovay, 2000). We can effectively construct a universal Chaitin machine USolovay such that ZFC, if arithmetically sound, cannot determine any single bit of ΩUSolovay.

slide-25
SLIDE 25

2001 97

✬ ✫ ✩ ✪ Chaitin’s theorem holds true for any universal computer while Solovay constructed a specific

  • computer. A computer for which Peano

Arithmetic can prove its universality and ZFC cannot determine more than the initial block of 1’s of the binary expansion of its halting probability will be called Solovay machine. Which c.e. and random reals are halting probabilities of Solovay machines? Theorem (Calude, 2001). Assume that ZFC is arithmetically sound. Then, every c.e. and random real is the halting probability of a Solovay computer.

2001 98

✬ ✫ ✩ ✪ For example, if α ∈ (3/4, 7/8) is c.e. and random, then in the worst case ZFC can determine its first two bits (11), but no more. Assume that ZFC is arithmetically sound. Then, every c.e. and random real α ∈ (0, 1/2) is the halting probability of a Solovay computer which cannot determine any single bit of α. No c.e. and random real α ∈ (1/2, 1) has the above property.

2001 99

✬ ✫ ✩ ✪ Solovay’s result can be re-phrased in terms of Diophantine equations:

  • Theorem. There exists a universal Chaitin

machine USolovay so that ZFC, if arithmetically sound, cannot prove the true statement “the first bit of ΩUSolovay is 0”.

2001 100

✬ ✫ ✩ ✪ Theorem (Calude 2000). For every binary string s = s1s2 . . . sn we can effectively construct a Solovay machine USolovay such that the binary expansion of ΩUSolovay has the string 0s1s2 . . . sn as prefix. Hence, the following statements “The 0th binary digit of the expansion of ΩUSolovayis 0”, “The 1st binary digit of the expansion of ΩUSolovayis s1”, “The 2nd binary digit of the expansion of ΩUSolovayis s2”, . . . “The nth binary digit of the expansion of ΩUSolovayis sn”, are true but unprovable in ZFC.