The sharp Shannon–McMillan–Breiman theorem (Breiman/Bruin telescoping) #
This file develops the sharp Shannon–McMillan–Breiman (SMB) theorem, sharpening the crude
name-count upper bound limsup (1/n)·iₙ ≤ log (card ι) of Oseledets.Krieger.SMB to the
Kolmogorov–Sinai entropy rate h(P,T), for an ergodic measure-preserving transformation T
and a finite measurable partition P.
The route is the Breiman/Bruin telescoping identity (Einsiedler–Lindenstrauss–Ward,
Entropy in Ergodic Theory, Ch. 2; Bruin, Ergodic Theory I, Lecture 15). The
conditional information weight of the P-cell i₀ given the past coding g,
condInfoWeight, is the surprise -log (μ(B-cell ∩ Pᵢ₀)/μ(B-cell)) of learning a point's P-cell
is i₀ once its B-coding (the future itinerary) is known. The chain rule
I_{P∨Q} = I_Q + I_{P|Q}
becomes, at the level of these weights, the one-step factorization
infoWeight_{n+1}(cons i₀ g) = infoWeight_n(g) + condInfoWeight i₀ g,
which telescopes to iₙ(x) = ∑_{j<n} g_{n-j}(Tʲx).
Results proved here (sorry-free) #
condInfoWeight— the conditional information weight-log (μ(Bᵦ ∩ Pᵢ)/μ(Bᵦ))and its nonnegativity (condInfoWeight_nonneg).ksJoinCells_cons— the cell cons-factorizationcell_{n+1}(cons i₀ g) = Pᵢ₀ ∩ T⁻¹ cell_n(g).infoWeight_succ_eq— the one-step telescoping factorization of the iterated-join information weight: peeling the first coordinate splitsinfoWeight_{n+1}into then-step weight (shifted byT) plus the conditional weight of the first symbol given theT-shiftedn-step past. The positivity hypothesis on the(n+1)-cell is load-bearing — the pointwise identity fails at codes whose join cell is null (where-log 0 = 0is a junk value), which is precisely why the Breiman identity for the information functions holds onlyμ-a.e.ksEntropySeq_succ_eq_add_condEntropyGivenPartition/ksEntropySeq_succ_eq_add_condEntropy— the integrated (entropy-level) chain ruleH(⋁₀ⁿ) = H(⋁₀ⁿ⁻¹) + H(P | σ(T⁻¹⋁₀ⁿ⁻¹)), theμ-mean ofinfoWeight_succ_eq, with the conditioning in σ-algebra form.ksEntropySeq_eq_sum_condEntropy— the telescoped Breiman sumH(⋁₀ⁿ⁻¹) = ∑_{k<n} H(P | σ(T⁻¹⋁₀ᵏ⁻¹)).ksEntropyPartition_eq_condEntropy_iSup— the sharp Kolmogorov–Sinai rate as a conditional entropy:h(P,T) = H(P | ⋁_{j≥1} T⁻ʲP), the integral-level SMB rate (the∫ g = hthat the pointwise a.e. theorem converges to), proved unconditionally via the telescoped sum + the fixed-partition Lévy theoremcondEntropy_tendsto_iSup+ the Cesàro meanFilter.Tendsto.cesaro.
The remaining pointwise residual (precise) #
What is not yet proved is the pointwise a.e. convergence (1/n)·iₙ(x) → h(P,T) for μ-a.e.
x under Ergodic T μ. By the integral-level identity above the target h is correct; the gap
is the a.e. statement, which decomposes into (with gₖ the conditional information function):
- R3/R4 (a.e. main term).
(1/n)∑_{j<n} g_∞(Tʲx) → ∫ g_∞ = ha.e., fromtendsto_birkhoffAverage_ae_integralandksEntropyPartition_eq_condEntropy_iSup(∫ g_∞ = h). The a.e. Lévy limitgₖ → g_∞isMeasureTheory.tendsto_ae_condExp(used insidecondEntropy_tendsto_iSup). - R5 (Chung
L¹maximal domination — the genuine analytic gap). The Cesàro tail(1/n)∑_{j<n}(g_{n-j} − g_∞)(Tʲx) → 0a.e. needsg* := ⨆ₖ gₖ ∈ L¹(μ). The per-cell maximal estimateμ{x ∈ Pᵢ : g* > λ} ≤ e^{−λ}(a Doob stopping-time bound on the conditional-probability martingalepₖ = E(𝟙_{Pᵢ}|Cₖ): on{τ = first k with pₖ < e^{−λ}} ∈ Cₖ,μ(Pᵢ ∩ {τ=k}) = ∫_{τ=k} pₖ ≤ e^{−λ}μ(τ=k)) gives, by the layer-cake formula,∫ g* ≤ log(card ι) + 1 < ∞. Mathlib has Doob'smaximal_ineqbut not thisL¹integrability; it is a≈150-line development (Chung 1961). This is the one item that blocks the headline.
References #
- M. Einsiedler, E. Lindenstrauss, T. Ward, Entropy in Ergodic Theory and Topological Dynamics, Ch. 2 (SMB).
- H. Bruin, Ergodic Theory I (Univ. Wien), Lecture 15 (the telescoping/Breiman proof).
- L. Breiman, The individual ergodic theorem of information theory, Ann. Math. Statist. 28 (1957), 809–811; correction 31 (1960), 809–810.
- K. L. Chung, A note on the ergodic theorem of information theory, Ann. Math. Statist.
32 (1961), 612–614. (The
L¹maximal domination.)
The conditional information weight of the P-cell index i given the B-cell index b:
the surprise -log (μ(Bᵦ ∩ Pᵢ)/μ(Bᵦ)) of learning that a point of Bᵦ also lies in Pᵢ, i.e. the
value of the conditional information function I_{P | σ(B)} on the cell Bᵦ for a point whose
P-cell is Pᵢ. Here B may have a different (finite) index type than P.
Equations
Instances For
The conditional information weight is nonnegative: the conditional probability
μ(Bᵦ ∩ Pᵢ)/μ(Bᵦ) lies in [0,1], so its log is nonpositive.
Cons-factorization of the flat-join cell. The cell of the (n+1)-fold iterated join at the
code Fin.cons i₀ g is the P-cell i₀ (the 0-th symbol) intersected with the T-pullback of
the n-fold join cell at g (the future symbols). This is the cell-level form of
⋁₀ⁿ T⁻ᵏP = P ∨ T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP).
One-step telescoping factorization of the information weight. Peeling the first symbol of
the (n+1)-step coding splits the iterated-join information weight into the n-step weight of the
future coding g plus the conditional information weight of the first symbol i₀ given the
T-pullback of the n-step past:
infoWeight_{n+1}(cons i₀ g) = infoWeight_n(g) + condInfoWeight (T⁻¹(⋁₀ⁿ⁻¹)) P i₀ g.
This is the pure measure-algebra core of the Breiman identity iₙ = ∑_{j<n} g_{n-j}∘Tʲ. It rests
on the cell factorization ksJoinCells_cons, the measure-preservation of T (so the past cell has
the same measure before and after pulling back along T), and the additivity of -log on the
product μ(Pᵢ₀ ∩ T⁻¹C) = (μ(T⁻¹C ∩ Pᵢ₀)/μ(T⁻¹C)) · μ(T⁻¹C).
The positivity hypothesis hpos : μ(Pᵢ₀ ∩ T⁻¹C) ≠ 0 is load-bearing: at a code whose
(n+1)-cell is null but whose n-past cell is not, the -logs do not split additively
(-log 0 = 0 is a junk value), so the identity is false there. This is exactly why the
telescoping identity for the information functions holds only μ-a.e.: the codes realised by
points all have positive-measure cells off a null set.
Integrated (entropy-level) telescoping chain rule. The (n+1)-step join entropy splits as
the n-step join entropy of the future plus the conditional entropy of P given the T-pullback
B := T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP) of the n-step past, additive over the cells of B:
H(⋁₀ⁿ T⁻ᵏP) = H(⋁₀ⁿ⁻¹ T⁻ᵏP) + Hₐdd(P | T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP)).
This is the μ-integral of the pointwise one-step factorization infoWeight_succ_eq, obtained
cleanly from the absolute chain rule entropy_join_eq_add_condEntropyGivenPartition applied to the
join B ∨ P (which, reindexed by g' ↦ (tail g', head g'), is the (n+1)-fold join) together
with the T-invariance of join entropy (entropy_pullback).
Integrated telescoping chain rule, σ-algebra form. Bridging the additive-over-cells
conditional entropy of ksEntropySeq_succ_eq_add_condEntropyGivenPartition to the σ-algebra
conditional entropy via condEntropyGivenPartition_eq_condEntropy_generated, the (n+1)-step join
entropy splits as
H(⋁₀ⁿ T⁻ᵏP) = H(⋁₀ⁿ⁻¹ T⁻ᵏP) + H(P | σ(T⁻¹(⋁₀ⁿ⁻¹ T⁻ᵏP))),
where the conditioning σ-algebra is the one generated by the cells of the T-pullback of the
n-step past join. This is the entropy-level Breiman telescoping with the conditioning expressed in
the form consumed by the fixed-partition Lévy theorem condEntropy_tendsto_iSup.
Telescoped Breiman sum (entropy level). Iterating the σ-algebra chain rule, the n-step
join entropy is the sum of the conditional entropies of P given the T-pullback of the k-step
past, for k = 0, …, n-1:
H(⋁₀ⁿ⁻¹ T⁻ᵏP) = ∑_{k<n} H(P | σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP))).
Each summand H(P | σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP))) is condEntropy μ (σ(T⁻¹ ksJoin P k)) P.cells, the
conditional entropy of P given the k-step future past — a decreasing sequence in k (more
conditioning), converging to the relative entropy rate h(P,T) by condEntropy_tendsto_iSup. This
is the integral-level statement of the Breiman telescoping iₙ = ∑_{j<n} g_{n-j}∘Tʲ (its μ-mean),
and combined with the Fekete limit it identifies h(P,T) = lim_k H(P | σ(T⁻¹(⋁₀ᵏ⁻¹))) (a Cesàro
mean of a convergent sequence converges to the same limit).
The generated σ-algebra of the (Fekete) pullback partition coincides with that of the
(factor) pulledBack partition: both have the cells i ↦ T⁻¹' R.cells i, and the generated
σ-algebra depends only on the range of the cells.
The conditioning σ-algebras σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) of the Breiman telescoping form an
increasing sequence in k: the (k+1)-step past join refines the k-step one, and T-pullback
preserves refinement.
The sharp Kolmogorov–Sinai rate as a conditional entropy (integral-level SMB rate).
The Kolmogorov–Sinai entropy h(P,T) equals the conditional Shannon entropy of P given the
σ-algebra of the strict future ⨆ₖ σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) = σ(⋁_{j≥1} T⁻ʲP):
h(P,T) = H(P | ⋁_{j≥1} T⁻ʲP).
This is the integral-level statement of the sharp SMB rate (the μ-mean of the pointwise Breiman
identity iₙ/n → h), and the precise identity ∫ g = h that the pointwise a.e. theorem
converges to. It is assembled, unconditionally (no ergodicity, no maximal-function domination),
from the telescoped Breiman sum ksEntropySeq_eq_sum_condEntropy, the fixed-partition Lévy theorem
condEntropy_tendsto_iSup applied to the increasing conditioning family
(generatedSigmaAlgebra_pullback_ksJoin_mono), and the Cesàro convergence
Filter.Tendsto.cesaro — the average of a convergent sequence has the same limit — matched against
the Fekete limit tendsto_ksEntropySeq.