The Shannon–McMillan–Breiman theorem (entropy equipartition) #

This file builds towards the pointwise Shannon–McMillan–Breiman (SMB) theorem — the entropy-equipartition property that underlies Krieger's finite-generator theorem (issue #15). For a measure-preserving T on a probability space (α, μ) and a finite measurable partition P, the information functions iₙ(x) = -log μ(atomₙ(x)) (built in Oseledets.Krieger.InfoFunction) satisfy (1/n)·iₙ(x) → h(P,T) = ksEntropyPartition for μ-a.e. x.

What is proved here (sorry-free) #

Oseledets.Krieger.ae_limsup_div_infoFun_le_log_card — the crude name-count upper bound: limsup (1/n)·iₙ(x) ≤ log (card ι) a.e. This is the Algoet–Cover engine (ae_forall_eventually_div_infoFun_le) fed the uniform competing measure qₙ ≡ (card ι)⁻ⁿ, whose partition-function bound ∫⁻ exp(iₙ − n·log N) ≤ 1 (lintegral_exp_infoFun_sub_log_card_le_one) is the Markov–Borel–Cantelli core. No ergodic theorem, no martingale.

The crude bound is the honest Birkhoff-free part of the SMB upper half. Sharpening the rate from log (card ι) to the Kolmogorov–Sinai entropy h(P,T) — and proving the matching lower bound — requires the conditional-information martingale and the Birkhoff ergodic theorem; that is the content of the blueprint below.

Blueprint for the sharp theorem (the Breiman/ELW route) #

The cleanest route (Einsiedler–Lindenstrauss–Ward, Entropy in Ergodic Theory, Ch. 2; Bruin, Ergodic Theory I, Lecture 15; Breiman 1957/Chung 1961) is a single telescoping identity rather than a two-sided sandwich. Write the conditional information function I_{P|𝒜}(x) = -log (μ⟦P(x) | 𝒜⟧)(x) and gₖ(x) = I_{P | ⋁_{j=1}^{k-1} T⁻ʲP}(x) (with g₁ = I_P). The chain rule I_{P∨Q} = I_Q + I_{P|Q} telescopes to the exact identity

iₙ(x) = ∑_{j=0}^{n-1} g_{n-j}(Tʲ x). [Bruin Lec. 15; ELW Ch. 2]

Let g = lim_k gₖ (a.e. and in L¹, by Lévy downward martingale convergence). Then (1/n)·iₙ(x) = (1/n)∑_{j<n} g(Tʲx) + (1/n)∑_{j<n} (g_{n-j} − g)(Tʲx).

The first term → ∫ g dμ = H(P | ⋁_{j≥1} T⁻ʲP) = h(P,T) by Birkhoff (repo: tendsto_birkhoffAverage_ae_integral) and the conditional-entropy formula for h.
The second (Cesàro tail) → 0 via the Chung domination g* := supₙ gₙ ∈ L¹, a maximal dominator G_N = sup_{k≥N}|gₖ − g| → 0, and dominated convergence + the ergodic theorem.

The convergence gₖ → g together with ∫ gₖ = H(P | ⋁₁^{k-1} T⁻ʲP) → h is exactly the repo's condEntropy_tendsto_iSup (the fixed-partition Lévy theorem) once I_{P|𝒜} is in place.

Dependency-ordered residual sub-lemmas (NOT proved here) #

The single genuinely missing piece of infrastructure is the pointwise conditional information function I_{P|𝒜} and its chain rule / telescoping identity; everything else is assembled from existing repo results. See the module note at the bottom of this file for the precise Lean signatures and the honest assessment of the hardest residual (Chung's L¹ domination).

References #

P. Algoet, T. Cover, A sandwich proof of the Shannon–McMillan–Breiman theorem, Ann. Probab. 16 (1988), 899–909.
M. Einsiedler, E. Lindenstrauss, T. Ward, Entropy in Ergodic Theory and Topological Dynamics, Ch. 2 (SMB).
H. Bruin, Ergodic Theory I (Univ. Wien), Lecture 15 (the telescoping/Breiman proof).
L. Breiman, The individual ergodic theorem of information theory, Ann. Math. Statist. 28 (1957), 809–811; correction 31 (1960), 809–810.
K. L. Chung, A note on the ergodic theorem of information theory, Ann. Math. Statist. 32 (1961), 612–614. (The L¹ maximal domination.)

source

theorem Oseledets.Krieger.lintegral_exp_infoFun_sub_log_card_le_one {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (n : ℕ) [Nonempty ι] :

∫⁻ (x : α), ENNReal.ofReal (Real.exp (infoFun hT P n x - ↑n * Real.log ↑(Fintype.card ι))) ∂μ ≤ 1

The uniform partition-function bound. Feeding the Algoet–Cover engine the uniform competing measure qₙ ≡ (card ι)⁻ⁿ amounts to checking ∫⁻ x, ofReal (exp (iₙ x − n·log N)) ∂μ ≤ 1, where N = card ι.

The integrand is constant on each itinerary fiber Fₘ = {itinerary = g} (infoFun_eq_sum_indicator): there iₙ = -log μ(cell g), so the value is (μ cell g)⁻¹ · N⁻ⁿ, and μ(Fₘ) = μ(cell g) (measure_itinerary_fiber). Hence the integral is ∑_g [μ(cell g) > 0] · N⁻ⁿ ≤ Nⁿ · N⁻ⁿ = 1 because the join has at most Nⁿ non-null cells.

source

theorem Oseledets.Krieger.ae_limsup_div_infoFun_le_log_card {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) [Nonempty ι] :

∀ᵐ (x : α) ∂μ, Filter.limsup (fun (n : ℕ) => 1 / ↑n * infoFun hT P n x) Filter.atTop ≤ Real.log ↑(Fintype.card ι)

Crude name-count upper bound (Algoet–Cover engine, uniform competing measure). For μ-almost every x, limsup_{n} (1/n)·iₙ(x) ≤ log (card ι).

This is the Markov–Borel–Cantelli core of the SMB upper half: instantiate the abstract engine ae_forall_eventually_div_infoFun_le at the partition's information functions and rate R = log (card ι), with the uniform partition-function bound lintegral_exp_infoFun_sub_log_card_le_one. It is Birkhoff-free; sharpening log (card ι) ⤳ h(P,T) is the blueprint above.

Blueprint residuals — precise Lean signatures for the sharp theorem #

The sharp SMB theorem is reduced to the following sub-lemmas, in dependency order. None is proved here; each is stated as the exact signature a follow-up should fill, with the cleanest known route. The hardest is R5 (Chung's L¹ domination); the rest are mechanical given the repo infra.

R1 (conditional information function). Define, for a sub-σ-algebra 𝒜 ≤ mα, condInfoFun 𝒜 P x := -Real.log ((condExpKernel μ 𝒜 x) (P.cells (P-index of x))).toReal (equivalently -log (μ⟦P(x) | 𝒜⟧ x)). Prove measurability and ∫ condInfoFun 𝒜 P = condEntropy μ 𝒜 P.cells (mirrors integral_infoFun_eq).

R2 (chain rule / telescoping). infoFun hT P n x = ∑ j in Finset.range n, condInfoFun (σ of the (n-1-j)-step *future* join) P (T^[j] x) — the Bruin/ELW identity iₙ(x) = ∑_{j<n} g_{n-j}(Tʲx). Pure measure algebra from I_{P∨Q} = I_Q + I_{P|Q}.

R3 (Lévy limit). gₖ := condInfoFun (⋁_{1}^{k-1} T⁻ʲP) P → g a.e. and in L¹, where g := condInfoFun (⋁_{j≥1} T⁻ʲP) P. This is MeasureTheory.tendsto_ae_condExp composed with -log, plus condEntropy_tendsto_iSup for the L¹/integral statement.

R4 (Birkhoff term). (1/n)∑_{j<n} g(Tʲx) → ∫ g dμ = H(P | ⋁_{j≥1}T⁻ʲP) = h(P,T) a.e., from tendsto_birkhoffAverage_ae_integral and the KS conditional-entropy formula ksEntropyPartition hT P = condEntropy μ (⋁_{j≥1} comap (T^[j]) σP) P.cells (itself condEntropy_tendsto_iSup + the chain-rule telescoping of ksEntropySeq).

R5 (Chung domination — HARDEST). g* := ⨆ₙ gₙ ∈ L¹(μ), hence by dominated convergence the Cesàro tail (1/n)∑_{j<n}(g_{n-j} − g)(Tʲx) → 0 a.e. This is the one genuinely analytic gap: Mathlib has Doob's maximal inequality (maximal_ineq) but NOT the L log L/L¹ integrability of the conditional-information maximal function for a finite partition (Chung 1961). Best route: the explicit Chung estimate μ{g* > λ} ≤ (something)·e^{-λ} per cell, giving ∫ g* ≤ H(P) + (card ι)/e or similar; this needs a fresh ≈150-line development.

SMB (assembly). ∀ᵐ x, Tendsto (fun n => iₙ x / n) atTop (𝓝 (ksEntropyPartition hT P)), by R2 ▸ R3 ▸ R4 ▸ R5 and a Cesàro/squeeze.

Why this is far more tractable here than general SMB (key infrastructure finding) #

In the Breiman route the conditioning σ-algebra is always σ(ksJoin hT P k) — a partition σ-algebra, never a general sub-σ-algebra (except in the single k → ∞ Lévy limit). For a partition σ-algebra the repo already has an explicit, computable representative of the conditional expectation: Oseledets.Entropy.condCandidate B tⱼ (CondGivenPartitionBridge.lean) is the piecewise-constant function ω ↦ μ(Bᵢ ∩ tⱼ)/μ(Bᵢ) on cell Bᵢ ∋ ω, with condCandidate_ae_eq_condExp proving it is μ⟦tⱼ | σ(B)⟧. Consequently:

R1 condInfoFun (σ(ksJoin Q k)) P is just -log of the finite indicator sum condCandidate — measurable on the nose, no general condExp machinery; ∫ = condEntropy is condEntropyGivenPartition_eq_condEntropy_generated (already proved).
R2 the chain-rule telescoping uses entropy_join_eq_add_condEntropyGivenPartition / condEntropy_join_eq (already proved, CondChainRule.lean) lifted to the pointwise information functions — finite measure algebra on the explicit condCandidate.
R4 the KS conditional-entropy identity is essentially tendsto_condEntropy_genJoin_div (CondKSMovingLimit.lean, proved), which already shows H(A_n | σ(B_n))/n → condKsEntropyPartition; with Q = P and the saturation ⨆ₙ σ(ksJoin P n) = mα (a generator hypothesis) this is ksEntropyPartition hT P = condEntropy μ (⋁_{j≥1}…) P.cells.

So the only genuinely missing analysis is R5 (Chung's L¹ domination of the conditional- information maximal function) plus the mechanical pointwise telescoping bookkeeping — a sharply delimited gap, NOT a from-scratch SMB.

Ergodicity (precise hypothesis for the constant limit) #

The a.e. limit = h = ksEntropyPartition (a constant) requires Ergodic T μ; the crude bound above (ae_limsup_div_infoFun_le_log_card) needs neither ergodicity nor Birkhoff. Without ergodicity the Birkhoff term R4 converges to the invariant conditional expectation (μ[g | invariants T]) (the relative entropy rate), giving Algoet–Cover's non-ergodic Theorem 3 form ∀ᵐ x, iₙ x / n → (μ[g | invariants T]) x.

Documentation

Oseledets.Krieger.SMB