The sharp Shannon–McMillan–Breiman theorem (Breiman/Bruin telescoping) #

This file develops the sharp Shannon–McMillan–Breiman (SMB) theorem, sharpening the crude name-count upper bound limsup (1/n)·iₙ ≤ log (card ι) of Oseledets.Krieger.SMB to the Kolmogorov–Sinai entropy rate h(P,T), for an ergodic measure-preserving transformation T and a finite measurable partition P.

The route is the Breiman/Bruin telescoping identity (Einsiedler–Lindenstrauss–Ward, Entropy in Ergodic Theory, Ch. 2; Bruin, Ergodic Theory I, Lecture 15). The conditional information weight of the P-cell i₀ given the past coding g, condInfoWeight, is the surprise -log (μ(B-cell ∩ Pᵢ₀)/μ(B-cell)) of learning a point's P-cell is i₀ once its B-coding (the future itinerary) is known. The chain rule I_{P∨Q} = I_Q + I_{P|Q} becomes, at the level of these weights, the one-step factorization infoWeight_{n+1}(cons i₀ g) = infoWeight_n(g) + condInfoWeight i₀ g, which telescopes to iₙ(x) = ∑_{j<n} g_{n-j}(Tʲx).

Results proved here (sorry-free) #

condInfoWeight — the conditional information weight -log (μ(Bᵦ ∩ Pᵢ)/μ(Bᵦ)) and its nonnegativity (condInfoWeight_nonneg).
ksJoinCells_cons — the cell cons-factorization cell_{n+1}(cons i₀ g) = Pᵢ₀ ∩ T⁻¹ cell_n(g).
infoWeight_succ_eq — the one-step telescoping factorization of the iterated-join information weight: peeling the first coordinate splits infoWeight_{n+1} into the n-step weight (shifted by T) plus the conditional weight of the first symbol given the T-shifted n-step past. The positivity hypothesis on the (n+1)-cell is load-bearing — the pointwise identity fails at codes whose join cell is null (where -log 0 = 0 is a junk value), which is precisely why the Breiman identity for the information functions holds only μ-a.e.
ksEntropySeq_succ_eq_add_condEntropyGivenPartition / ksEntropySeq_succ_eq_add_condEntropy — the integrated (entropy-level) chain rule H(⋁₀ⁿ) = H(⋁₀ⁿ⁻¹) + H(P | σ(T⁻¹⋁₀ⁿ⁻¹)), the μ-mean of infoWeight_succ_eq, with the conditioning in σ-algebra form.
ksEntropySeq_eq_sum_condEntropy — the telescoped Breiman sum H(⋁₀ⁿ⁻¹) = ∑_{k<n} H(P | σ(T⁻¹⋁₀ᵏ⁻¹)).
ksEntropyPartition_eq_condEntropy_iSup — the sharp Kolmogorov–Sinai rate as a conditional entropy: h(P,T) = H(P | ⋁_{j≥1} T⁻ʲP), the integral-level SMB rate (the ∫ g = h that the pointwise a.e. theorem converges to), proved unconditionally via the telescoped sum + the fixed-partition Lévy theorem condEntropy_tendsto_iSup + the Cesàro mean Filter.Tendsto.cesaro.

The remaining pointwise residual (precise) #

What is not yet proved is the pointwise a.e. convergence (1/n)·iₙ(x) → h(P,T) for μ-a.e. x under Ergodic T μ. By the integral-level identity above the target h is correct; the gap is the a.e. statement, which decomposes into (with gₖ the conditional information function):

R3/R4 (a.e. main term). (1/n)∑_{j<n} g_∞(Tʲx) → ∫ g_∞ = h a.e., from tendsto_birkhoffAverage_ae_integral and ksEntropyPartition_eq_condEntropy_iSup (∫ g_∞ = h). The a.e. Lévy limit gₖ → g_∞ is MeasureTheory.tendsto_ae_condExp (used inside condEntropy_tendsto_iSup).
R5 (Chung L¹ maximal domination — the genuine analytic gap). The Cesàro tail (1/n)∑_{j<n}(g_{n-j} − g_∞)(Tʲx) → 0 a.e. needs g* := ⨆ₖ gₖ ∈ L¹(μ). The per-cell maximal estimate μ{x ∈ Pᵢ : g* > λ} ≤ e^{−λ} (a Doob stopping-time bound on the conditional-probability martingale pₖ = E(𝟙_{Pᵢ}|Cₖ): on {τ = first k with pₖ < e^{−λ}} ∈ Cₖ, μ(Pᵢ ∩ {τ=k}) = ∫_{τ=k} pₖ ≤ e^{−λ}μ(τ=k)) gives, by the layer-cake formula, ∫ g* ≤ log(card ι) + 1 < ∞. Mathlib has Doob's maximal_ineq but not this L¹ integrability; it is a ≈150-line development (Chung 1961). This is the one item that blocks the headline.

References #

M. Einsiedler, E. Lindenstrauss, T. Ward, Entropy in Ergodic Theory and Topological Dynamics, Ch. 2 (SMB).
H. Bruin, Ergodic Theory I (Univ. Wien), Lecture 15 (the telescoping/Breiman proof).
L. Breiman, The individual ergodic theorem of information theory, Ann. Math. Statist. 28 (1957), 809–811; correction 31 (1960), 809–810.
K. L. Chung, A note on the ergodic theorem of information theory, Ann. Math. Statist. 32 (1961), 612–614. (The L¹ maximal domination.)

source

noncomputable def Oseledets.Krieger.condInfoWeight {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} {β : Type u_3} [Fintype β] (B : Entropy.MeasurePartition μ β) (P : Entropy.MeasurePartition μ ι) (i : ι) (b : β) :

ℝ

The conditional information weight of the P-cell index i given the B-cell index b: the surprise -log (μ(Bᵦ ∩ Pᵢ)/μ(Bᵦ)) of learning that a point of Bᵦ also lies in Pᵢ, i.e. the value of the conditional information function I_{P | σ(B)} on the cell Bᵦ for a point whose P-cell is Pᵢ. Here B may have a different (finite) index type than P.

Equations

Oseledets.Krieger.condInfoWeight B P i b = -Real.log ((μ (B.cells b ∩ P.cells i)).toReal / (μ (B.cells b)).toReal)

Instances For

source

theorem Oseledets.Krieger.condInfoWeight_nonneg {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {β : Type u_3} [Fintype β] (B : Entropy.MeasurePartition μ β) (P : Entropy.MeasurePartition μ ι) (i : ι) (b : β) :

0 ≤ condInfoWeight B P i b

The conditional information weight is nonnegative: the conditional probability μ(Bᵦ ∩ Pᵢ)/μ(Bᵦ) lies in [0,1], so its log is nonpositive.

source

theorem Oseledets.Krieger.ksJoinCells_cons {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (n : ℕ) (i₀ : ι) (g : Fin n → ι) :

(Entropy.ksJoin hT P (n + 1)).cells (Fin.cons i₀ g) = P.cells i₀ ∩ T ⁻¹' (Entropy.ksJoin hT P n).cells g

Cons-factorization of the flat-join cell. The cell of the (n+1)-fold iterated join at the code Fin.cons i₀ g is the P-cell i₀ (the 0-th symbol) intersected with the T-pullback of the n-fold join cell at g (the future symbols). This is the cell-level form of ⋁₀ⁿ T⁻ᵏP = P ∨ T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP).

source

theorem Oseledets.Krieger.infoWeight_succ_eq {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (n : ℕ) (i₀ : ι) (g : Fin n → ι) (hpos : μ (P.cells i₀ ∩ T ⁻¹' (Entropy.ksJoin hT P n).cells g) ≠ 0) :

infoWeight hT P (n + 1) (Fin.cons i₀ g) = infoWeight hT P n g + condInfoWeight (Entropy.MeasurePartition.pullback hT (Entropy.ksJoin hT P n)) P i₀ g

One-step telescoping factorization of the information weight. Peeling the first symbol of the (n+1)-step coding splits the iterated-join information weight into the n-step weight of the future coding g plus the conditional information weight of the first symbol i₀ given the T-pullback of the n-step past: infoWeight_{n+1}(cons i₀ g) = infoWeight_n(g) + condInfoWeight (T⁻¹(⋁₀ⁿ⁻¹)) P i₀ g.

This is the pure measure-algebra core of the Breiman identity iₙ = ∑_{j<n} g_{n-j}∘Tʲ. It rests on the cell factorization ksJoinCells_cons, the measure-preservation of T (so the past cell has the same measure before and after pulling back along T), and the additivity of -log on the product μ(Pᵢ₀ ∩ T⁻¹C) = (μ(T⁻¹C ∩ Pᵢ₀)/μ(T⁻¹C)) · μ(T⁻¹C).

The positivity hypothesis hpos : μ(Pᵢ₀ ∩ T⁻¹C) ≠ 0 is load-bearing: at a code whose (n+1)-cell is null but whose n-past cell is not, the -logs do not split additively (-log 0 = 0 is a junk value), so the identity is false there. This is exactly why the telescoping identity for the information functions holds only μ-a.e.: the codes realised by points all have positive-measure cells off a null set.

source

theorem Oseledets.Krieger.ksEntropySeq_succ_eq_add_condEntropyGivenPartition {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (n : ℕ) :

Entropy.ksEntropySeq hT P (n + 1) = Entropy.ksEntropySeq hT P n + Entropy.condEntropyGivenPartition μ (Entropy.MeasurePartition.pullback hT (Entropy.ksJoin hT P n)).cells P.cells

Integrated (entropy-level) telescoping chain rule. The (n+1)-step join entropy splits as the n-step join entropy of the future plus the conditional entropy of P given the T-pullback B := T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP) of the n-step past, additive over the cells of B: H(⋁₀ⁿ T⁻ᵏP) = H(⋁₀ⁿ⁻¹ T⁻ᵏP) + Hₐdd(P | T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP)).

This is the μ-integral of the pointwise one-step factorization infoWeight_succ_eq, obtained cleanly from the absolute chain rule entropy_join_eq_add_condEntropyGivenPartition applied to the join B ∨ P (which, reindexed by g' ↦ (tail g', head g'), is the (n+1)-fold join) together with the T-invariance of join entropy (entropy_pullback).

source

theorem Oseledets.Krieger.ksEntropySeq_succ_eq_add_condEntropy {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} [StandardBorelSpace α] (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (n : ℕ) :

Entropy.ksEntropySeq hT P (n + 1) = Entropy.ksEntropySeq hT P n + Entropy.condEntropy μ (Entropy.generatedSigmaAlgebra μ (Entropy.MeasurePartition.pullback hT (Entropy.ksJoin hT P n))) P.cells

Integrated telescoping chain rule, σ-algebra form. Bridging the additive-over-cells conditional entropy of ksEntropySeq_succ_eq_add_condEntropyGivenPartition to the σ-algebra conditional entropy via condEntropyGivenPartition_eq_condEntropy_generated, the (n+1)-step join entropy splits as H(⋁₀ⁿ T⁻ᵏP) = H(⋁₀ⁿ⁻¹ T⁻ᵏP) + H(P | σ(T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP))), where the conditioning σ-algebra is the one generated by the cells of the T-pullback of the n-step past join. This is the entropy-level Breiman telescoping with the conditioning expressed in the form consumed by the fixed-partition Lévy theorem condEntropy_tendsto_iSup.

source

theorem Oseledets.Krieger.ksEntropySeq_eq_sum_condEntropy {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} [StandardBorelSpace α] (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (n : ℕ) :

Entropy.ksEntropySeq hT P n = ∑ k ∈ Finset.range n, Entropy.condEntropy μ (Entropy.generatedSigmaAlgebra μ (Entropy.MeasurePartition.pullback hT (Entropy.ksJoin hT P k))) P.cells

Telescoped Breiman sum (entropy level). Iterating the σ-algebra chain rule, the n-step join entropy is the sum of the conditional entropies of P given the T-pullback of the k-step past, for k = 0, …, n-1: H(⋁₀ⁿ⁻¹ T⁻ᵏP) = ∑_{k<n} H(P | σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP))).

Each summand H(P | σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP))) is condEntropy μ (σ(T⁻¹ ksJoin P k)) P.cells, the conditional entropy of P given the k-step future past — a decreasing sequence in k (more conditioning), converging to the relative entropy rate h(P,T) by condEntropy_tendsto_iSup. This is the integral-level statement of the Breiman telescoping iₙ = ∑_{j<n} g_{n-j}∘Tʲ (its μ-mean), and combined with the Fekete limit it identifies h(P,T) = lim_k H(P | σ(T⁻¹(⋁₀ᵏ⁻¹))) (a Cesàro mean of a convergent sequence converges to the same limit).

source

theorem Oseledets.Krieger.generatedSigmaAlgebra_pullback_eq_pulledBack {α : Type u_1} [mα : MeasurableSpace α] {μ : MeasureTheory.Measure α} {T : α → α} {β : Type u_3} [Fintype β] (hT : MeasureTheory.MeasurePreserving T μ μ) (R : Entropy.MeasurePartition μ β) :

Entropy.generatedSigmaAlgebra μ (Entropy.MeasurePartition.pullback hT R) = Entropy.generatedSigmaAlgebra μ (Entropy.MeasurePartition.pulledBack hT R)

The generated σ-algebra of the (Fekete) pullback partition coincides with that of the (factor) pulledBack partition: both have the cells i ↦ T⁻¹' R.cells i, and the generated σ-algebra depends only on the range of the cells.

source

theorem Oseledets.Krieger.generatedSigmaAlgebra_pullback_ksJoin_mono {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) :

Monotone fun (k : ℕ) => Entropy.generatedSigmaAlgebra μ (Entropy.MeasurePartition.pullback hT (Entropy.ksJoin hT P k))

The conditioning σ-algebras σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) of the Breiman telescoping form an increasing sequence in k: the (k+1)-step past join refines the k-step one, and T-pullback preserves refinement.

source

theorem Oseledets.Krieger.ksEntropyPartition_eq_condEntropy_iSup {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} [StandardBorelSpace α] [Nonempty ι] (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) :

Entropy.ksEntropyPartition hT P = Entropy.condEntropy μ (⨆ (k : ℕ), Entropy.generatedSigmaAlgebra μ (Entropy.MeasurePartition.pullback hT (Entropy.ksJoin hT P k))) P.cells

The sharp Kolmogorov–Sinai rate as a conditional entropy (integral-level SMB rate). The Kolmogorov–Sinai entropy h(P,T) equals the conditional Shannon entropy of P given the σ-algebra of the strict future ⨆ₖ σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) = σ(⋁_{j≥1} T⁻ʲP): h(P,T) = H(P | ⋁_{j≥1} T⁻ʲP).

This is the integral-level statement of the sharp SMB rate (the μ-mean of the pointwise Breiman identity iₙ/n → h), and the precise identity ∫ g = h that the pointwise a.e. theorem converges to. It is assembled, unconditionally (no ergodicity, no maximal-function domination), from the telescoped Breiman sum ksEntropySeq_eq_sum_condEntropy, the fixed-partition Lévy theorem condEntropy_tendsto_iSup applied to the increasing conditioning family (generatedSigmaAlgebra_pullback_ksJoin_mono), and the Cesàro convergence Filter.Tendsto.cesaro — the average of a convergent sequence has the same limit — matched against the Fekete limit tendsto_ksEntropySeq.

Documentation

Oseledets.Krieger.SMBSharp

The sharp Shannon–McMillan–Breiman theorem (Breiman/Bruin telescoping) #

Results proved here (sorry-free) #

The remaining pointwise residual (precise) #

References #