The sentinel prefix-code counting lemma for Krieger's finite generator theorem (M2→C3) #

This file proves the self-contained combinatorial heart of the symbolic-coding step (C3) of Krieger's finite generator theorem (issue #15): the sentinel code. The classical Krieger construction (Downarowicz, Entropy in Dynamical Systems, §4.2 Lemma 4.2.5 / Exercise 3.8; Shields, The Ergodic Theory of Discrete Sample Paths, on strongly-separated codes; Lind–Marcus, Symbolic Dynamics and Coding, on prefix codes) needs to inject the (≤ kᴺ) N-names of a generator into short blocks over a small Fin l alphabet, so that a decoder reading the raw code stream can re-find the cutting points between successive blocks. The standard device is a sentinel: a fixed symbol s : Fin l reserved as a terminator and used nowhere else inside a block. Then in any concatenation of such blocks the sentinel marks exactly the block boundaries, so the stream is uniquely decodable.

We formalize the construction in its sharpest, most reusable form, decoupled from any dynamics:

The code map sentinelEncode emb s sends a data word d : List (Fin (l - 1)) (the "name" digits) to the block d.map emb ++ [s], where emb : Fin (l - 1) ↪ {a : Fin l // a ≠ s} is any embedding of the data alphabet into the non-sentinel letters. The block has length |d| + 1 and ends in the sentinel.
Counting / injection. sentinelEncode is injective (sentinelEncode_injective), and its fixed-length variant sentinelEncodeFn (sentinelEncodeFn_injective) embeds the length-m data words Fin m → Fin (l-1) into the length-(m+1) sentinel blocks; there are (l-1)ᵐ of them (card_dataWord, card_nonSentinel). Hence any name set of size ≤ (l-1)ᵐ injects into the blocks (exists_sentinelEncoding), with a log-count bound kᴺ ≤ (l-1)ᵐ ⇔ N·log k ≤ m·log(l-1) (pow_le_pow_iff_log) — i.e. blocks of length m + 1 = O(N) suffice whenever log k < log(l-1).
Decodability (prefix-free / comma-free). The defining structural property: a sentinelEncode block contains the sentinel only at its last position (sentinel_count_eq_one, notMem_sentinelData). Hence in a concatenation of blocks (sentinelEncodeList) the sentinels are exactly the block ends, and the decoder recovers the block decomposition by splitting at the sentinels (sentinelEncodeList_injective, the unique-decodability statement C3 consumes).

The interface C3 consumes #

The deliverable for the next wave is exists_sentinelEncoding: for an alphabet Fin l with a reserved sentinel s and any finite name set Name, if Fintype.card Name fits in (l-1)ᵐ (in particular whenever Name = Fin k-names of length N, card Name ≤ kᴺ, and N · log k ≤ m · log(l-1), the pow_le_pow_iff_log regime that needs two free symbols plus a sentinel, 2 ≤ l - 1), there is an injection enc : Name ↪ List (Fin l) whose images are length-(m+1) sentinel blocks (each with count s = 1) and whose concatenations are uniquely decodable (sentinelEncodeList_injective): distinct name-streams give distinct code-streams. This is exactly the symbolic code the column-coding partition is read off from.

References #

Tomasz Downarowicz, Entropy in Dynamical Systems, Cambridge (2011), §4.2 (Lemma 4.2.5) and Exercise 3.8 (the sentinel/marker coding of names).
Paul C. Shields, The Ergodic Theory of Discrete Sample Paths, GSM 13, AMS (1996), §I.9 (strongly-separated / marker codes).
Douglas Lind and Brian Marcus, An Introduction to Symbolic Dynamics and Coding, Cambridge (1995), §8 (prefix codes, unique decodability).

The sentinel prefix-code counting lemma for Krieger's finite generator theorem (M2→C3) #

The interface C3 consumes #

References #

The data alphabet: non-sentinel letters of Fin l #

The sentinel encoding of a single data word #

Unique decodability of a concatenation of sentinel blocks #

Fixed-length names: the counting / injection layer #

The logarithmic length bound: blocks of length O(N) suffice #

The data alphabet: non-sentinel letters of `Fin l` #

The logarithmic length bound: blocks of length `O(N)` suffice #