The Shannon–McMillan–Breiman theorem (entropy equipartition) #
This file builds towards the pointwise Shannon–McMillan–Breiman (SMB) theorem — the
entropy-equipartition property that underlies Krieger's finite-generator theorem (issue #15).
For a measure-preserving T on a probability space (α, μ) and a finite measurable partition P,
the information functions iₙ(x) = -log μ(atomₙ(x)) (built in Oseledets.Krieger.InfoFunction)
satisfy
(1/n)·iₙ(x) → h(P,T) = ksEntropyPartition for μ-a.e. x.
What is proved here (sorry-free) #
Oseledets.Krieger.ae_limsup_div_infoFun_le_log_card— the crude name-count upper bound:limsup (1/n)·iₙ(x) ≤ log (card ι)a.e. This is the Algoet–Cover engine (ae_forall_eventually_div_infoFun_le) fed the uniform competing measureqₙ ≡ (card ι)⁻ⁿ, whose partition-function bound∫⁻ exp(iₙ − n·log N) ≤ 1(lintegral_exp_infoFun_sub_log_card_le_one) is the Markov–Borel–Cantelli core. No ergodic theorem, no martingale.
The crude bound is the honest Birkhoff-free part of the SMB upper half. Sharpening the rate from
log (card ι) to the Kolmogorov–Sinai entropy h(P,T) — and proving the matching lower bound —
requires the conditional-information martingale and the Birkhoff ergodic theorem; that is the
content of the blueprint below.
Blueprint for the sharp theorem (the Breiman/ELW route) #
The cleanest route (Einsiedler–Lindenstrauss–Ward, Entropy in Ergodic Theory, Ch. 2; Bruin,
Ergodic Theory I, Lecture 15; Breiman 1957/Chung 1961) is a single telescoping identity rather
than a two-sided sandwich. Write the conditional information function
I_{P|𝒜}(x) = -log (μ⟦P(x) | 𝒜⟧)(x) and gₖ(x) = I_{P | ⋁_{j=1}^{k-1} T⁻ʲP}(x) (with g₁ = I_P).
The chain rule I_{P∨Q} = I_Q + I_{P|Q} telescopes to the exact identity
iₙ(x) = ∑_{j=0}^{n-1} g_{n-j}(Tʲ x). [Bruin Lec. 15; ELW Ch. 2]
Let g = lim_k gₖ (a.e. and in L¹, by Lévy downward martingale convergence). Then
(1/n)·iₙ(x) = (1/n)∑_{j<n} g(Tʲx) + (1/n)∑_{j<n} (g_{n-j} − g)(Tʲx).
- The first term
→ ∫ g dμ = H(P | ⋁_{j≥1} T⁻ʲP) = h(P,T)by Birkhoff (repo:tendsto_birkhoffAverage_ae_integral) and the conditional-entropy formula forh. - The second (Cesàro tail)
→ 0via the Chung dominationg* := supₙ gₙ ∈ L¹, a maximal dominatorG_N = sup_{k≥N}|gₖ − g| → 0, and dominated convergence + the ergodic theorem.
The convergence gₖ → g together with ∫ gₖ = H(P | ⋁₁^{k-1} T⁻ʲP) → h is exactly the repo's
condEntropy_tendsto_iSup (the fixed-partition Lévy theorem) once I_{P|𝒜} is in place.
Dependency-ordered residual sub-lemmas (NOT proved here) #
The single genuinely missing piece of infrastructure is the pointwise conditional information
function I_{P|𝒜} and its chain rule / telescoping identity; everything else is assembled from
existing repo results. See the module note at the bottom of this file for the precise Lean
signatures and the honest assessment of the hardest residual (Chung's L¹ domination).
References #
- P. Algoet, T. Cover, A sandwich proof of the Shannon–McMillan–Breiman theorem, Ann. Probab. 16 (1988), 899–909.
- M. Einsiedler, E. Lindenstrauss, T. Ward, Entropy in Ergodic Theory and Topological Dynamics, Ch. 2 (SMB).
- H. Bruin, Ergodic Theory I (Univ. Wien), Lecture 15 (the telescoping/Breiman proof).
- L. Breiman, The individual ergodic theorem of information theory, Ann. Math. Statist. 28 (1957), 809–811; correction 31 (1960), 809–810.
- K. L. Chung, A note on the ergodic theorem of information theory, Ann. Math. Statist.
32 (1961), 612–614. (The
L¹maximal domination.)
The uniform partition-function bound. Feeding the Algoet–Cover engine the uniform
competing measure qₙ ≡ (card ι)⁻ⁿ amounts to checking
∫⁻ x, ofReal (exp (iₙ x − n·log N)) ∂μ ≤ 1, where N = card ι.
The integrand is constant on each itinerary fiber Fₘ = {itinerary = g}
(infoFun_eq_sum_indicator): there iₙ = -log μ(cell g), so the value is
(μ cell g)⁻¹ · N⁻ⁿ, and μ(Fₘ) = μ(cell g) (measure_itinerary_fiber). Hence the integral is
∑_g [μ(cell g) > 0] · N⁻ⁿ ≤ Nⁿ · N⁻ⁿ = 1 because the join has at most Nⁿ non-null cells.
Crude name-count upper bound (Algoet–Cover engine, uniform competing measure).
For μ-almost every x, limsup_{n} (1/n)·iₙ(x) ≤ log (card ι).
This is the Markov–Borel–Cantelli core of the SMB upper half: instantiate the abstract engine
ae_forall_eventually_div_infoFun_le at the partition's information functions and rate
R = log (card ι), with the uniform partition-function bound
lintegral_exp_infoFun_sub_log_card_le_one. It is Birkhoff-free; sharpening
log (card ι) ⤳ h(P,T) is the blueprint above.
Blueprint residuals — precise Lean signatures for the sharp theorem #
The sharp SMB theorem is reduced to the following sub-lemmas, in dependency order. None is proved
here; each is stated as the exact signature a follow-up should fill, with the cleanest known route.
The hardest is R5 (Chung's L¹ domination); the rest are mechanical given the repo infra.
R1 (conditional information function). Define, for a sub-σ-algebra 𝒜 ≤ mα,
condInfoFun 𝒜 P x := -Real.log ((condExpKernel μ 𝒜 x) (P.cells (P-index of x))).toReal
(equivalently -log (μ⟦P(x) | 𝒜⟧ x)). Prove measurability and
∫ condInfoFun 𝒜 P = condEntropy μ 𝒜 P.cells (mirrors integral_infoFun_eq).
R2 (chain rule / telescoping). infoFun hT P n x = ∑ j in Finset.range n, condInfoFun (σ of the (n-1-j)-step *future* join) P (T^[j] x) — the Bruin/ELW identity
iₙ(x) = ∑_{j<n} g_{n-j}(Tʲx). Pure measure algebra from I_{P∨Q} = I_Q + I_{P|Q}.
R3 (Lévy limit). gₖ := condInfoFun (⋁_{1}^{k-1} T⁻ʲP) P → g a.e. and in L¹, where
g := condInfoFun (⋁_{j≥1} T⁻ʲP) P. This is MeasureTheory.tendsto_ae_condExp composed with
-log, plus condEntropy_tendsto_iSup for the L¹/integral statement.
R4 (Birkhoff term). (1/n)∑_{j<n} g(Tʲx) → ∫ g dμ = H(P | ⋁_{j≥1}T⁻ʲP) = h(P,T) a.e., from
tendsto_birkhoffAverage_ae_integral and the KS conditional-entropy formula
ksEntropyPartition hT P = condEntropy μ (⋁_{j≥1} comap (T^[j]) σP) P.cells
(itself condEntropy_tendsto_iSup + the chain-rule telescoping of ksEntropySeq).
R5 (Chung domination — HARDEST). g* := ⨆ₙ gₙ ∈ L¹(μ), hence by dominated convergence the
Cesàro tail (1/n)∑_{j<n}(g_{n-j} − g)(Tʲx) → 0 a.e. This is the one genuinely analytic gap:
Mathlib has Doob's maximal inequality (maximal_ineq) but NOT the L log L/L¹ integrability of
the conditional-information maximal function for a finite partition (Chung 1961). Best route:
the explicit Chung estimate μ{g* > λ} ≤ (something)·e^{-λ} per cell, giving
∫ g* ≤ H(P) + (card ι)/e or similar; this needs a fresh ≈150-line development.
SMB (assembly). ∀ᵐ x, Tendsto (fun n => iₙ x / n) atTop (𝓝 (ksEntropyPartition hT P)),
by R2 ▸ R3 ▸ R4 ▸ R5 and a Cesàro/squeeze.
Why this is far more tractable here than general SMB (key infrastructure finding) #
In the Breiman route the conditioning σ-algebra is always σ(ksJoin hT P k) — a partition
σ-algebra, never a general sub-σ-algebra (except in the single k → ∞ Lévy limit). For a partition
σ-algebra the repo already has an explicit, computable representative of the conditional
expectation: Oseledets.Entropy.condCandidate B tⱼ (CondGivenPartitionBridge.lean) is the
piecewise-constant function ω ↦ μ(Bᵢ ∩ tⱼ)/μ(Bᵢ) on cell Bᵢ ∋ ω, with
condCandidate_ae_eq_condExp proving it is μ⟦tⱼ | σ(B)⟧. Consequently:
R1condInfoFun (σ(ksJoin Q k)) Pis just-logof the finite indicator sumcondCandidate— measurable on the nose, no generalcondExpmachinery;∫ = condEntropyiscondEntropyGivenPartition_eq_condEntropy_generated(already proved).R2the chain-rule telescoping usesentropy_join_eq_add_condEntropyGivenPartition/condEntropy_join_eq(already proved,CondChainRule.lean) lifted to the pointwise information functions — finite measure algebra on the explicitcondCandidate.R4the KS conditional-entropy identity is essentiallytendsto_condEntropy_genJoin_div(CondKSMovingLimit.lean, proved), which already showsH(A_n | σ(B_n))/n → condKsEntropyPartition; withQ = Pand the saturation⨆ₙ σ(ksJoin P n) = mα(a generator hypothesis) this isksEntropyPartition hT P = condEntropy μ (⋁_{j≥1}…) P.cells.
So the only genuinely missing analysis is R5 (Chung's L¹ domination of the conditional-
information maximal function) plus the mechanical pointwise telescoping bookkeeping — a sharply
delimited gap, NOT a from-scratch SMB.
Ergodicity (precise hypothesis for the constant limit) #
The a.e. limit = h = ksEntropyPartition (a constant) requires Ergodic T μ; the crude bound
above (ae_limsup_div_infoFun_le_log_card) needs neither ergodicity nor Birkhoff. Without
ergodicity the Birkhoff term R4 converges to the invariant conditional expectation
(μ[g | invariants T]) (the relative entropy rate), giving Algoet–Cover's non-ergodic
Theorem 3 form ∀ᵐ x, iₙ x / n → (μ[g | invariants T]) x.