Pointwise Shannon–McMillan–Breiman: the main (Birkhoff) term #
This file carries the pointwise Shannon–McMillan–Breiman theorem past the integral-level
rate identity ksEntropyPartition_eq_condEntropy_iSup (proved in SMBSharp) towards the a.e.
limit (1/n)·iₙ(x) → h(P,T) for an ergodic measure-preserving T.
The Breiman split of the information function iₙ(x) = ∑_{j<n} g_{n-j}(Tʲx) is compared with the
Birkhoff sum of the limit conditional information function
g∞(x) = ∑ᵢ 𝟙_{Pᵢ}(x) · (-log μ⟦Pᵢ | 𝒞∞⟧(x)),
where 𝒞∞ = ⨆ₖ σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) is the strict-future σ-algebra (the conditioning σ-algebra of
the sharp rate identity). This file establishes the R3/R4 a.e. main term:
condInfoFun— the conditional information functiong𝒜(x) = ∑ᵢ 𝟙_{Pᵢ}(x)·(-log μ⟦Pᵢ|𝒜⟧(x)), with its measurability (measurable_condInfoFun), nonnegativity (condInfoFun_nonneg), and the keystone integral identityintegral_condInfoFun_eq_condEntropy : ∫ g𝒜 = H(P | 𝒜).integrable_condInfoFun—g𝒜 ∈ L¹(μ)(its integral is the finiteH(P|𝒜), and it is≥ 0).integral_condInfoFun_iSup_eq—∫ g∞ = h(P,T), identifying the Birkhoff target as the sharp KS rate (viaksEntropyPartition_eq_condEntropy_iSup).ae_tendsto_birkhoffAverage_condInfoFun_iSup— R4: for ergodicT, the Birkhoff averages ofg∞converge a.e. to∫ g∞ = h(P,T).
The keystone integral identity is the per-cell pull-out ∫ 𝟙_{Pᵢ}·(-log pᵢ) = ∫ negMulLog pᵢ
where pᵢ = μ⟦Pᵢ | 𝒜⟧: since -log pᵢ is 𝒜-measurable (a function of the conditional kernel),
replacing 𝟙_{Pᵢ} by its 𝒜-conditional expectation pᵢ leaves the integral unchanged
(condExp_stronglyMeasurable_mul_of_bound), and negMulLog pᵢ = pᵢ·(-log pᵢ). The unboundedness
of -log at pᵢ = 0 is handled by a monotone truncation hₘ = min(-log pᵢ, M):
∫ 𝟙_{Pᵢ}·hₘ = ∫ pᵢ·hₘ for each M, and both sides increase to the (finite, by
negMulLog ≤ e⁻¹) limit by monotone convergence in lintegral.
References #
- L. Breiman, The individual ergodic theorem of information theory, Ann. Math. Statist. 28 (1957), 809–811; correction 31 (1960), 809–810.
- K. L. Chung, A note on the ergodic theorem of information theory, Ann. Math. Statist. 32 (1961), 612–614.
- M. Einsiedler, E. Lindenstrauss, T. Ward, Entropy in Ergodic Theory and Topological Dynamics, Ch. 2 (SMB).
The conditional information function of the finite partition P given the sub-σ-algebra
𝒜: g𝒜(x) = ∑ᵢ 𝟙_{Pᵢ}(x)·(-log μ⟦Pᵢ | 𝒜⟧(x)), the surprise of learning a point's P-cell once
the information in 𝒜 is known. Here μ⟦Pᵢ | 𝒜⟧(x) is realized by the regular conditional
probability (condExpKernel μ 𝒜 x) (Pᵢ). Exactly one indicator survives at each x (the one for
its own cell), so g𝒜 is the pointwise limit, as 𝒜 ↑ 𝒞∞, of the per-step information functions
in the Breiman telescoping.
Equations
- Oseledets.Krieger.condInfoFun P x = ∑ i : ι, (P.cells i).indicator (fun (y : α) => -Real.log (((ProbabilityTheory.condExpKernel μ 𝒜) y) (P.cells i)).toReal) x
Instances For
The conditional information function is measurable: a finite sum of indicators of measurable
cells, each weighted by the measurable function x ↦ -log(condExpKernel μ 𝒜 x (Pᵢ)).toReal.
The conditional information function is nonnegative: each indicator term is 𝟙_{Pᵢ}·(-log pᵢ)
with pᵢ ∈ [0,1], so -log pᵢ ≥ 0.
The keystone integral identity. The conditional information function of P given 𝒜
integrates to the conditional Shannon entropy H(P | 𝒜):
∫ condInfoFun 𝒜 P = condEntropy μ 𝒜 P.cells.
Summing the per-cell pull-out integral_indicator_neg_log_eq_integral_negMulLog over the finite
index recovers exactly the condEntropy integrand ∑ᵢ negMulLog(condExpKernel μ 𝒜 · Pᵢ).toReal.
The conditional information function is μ-integrable: a finite sum of the integrable per-cell
weighted indicators.
The strict-future conditioning σ-algebra 𝒞∞ = ⨆ₖ σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) of the sharp SMB rate.
Equations
- Oseledets.Krieger.futureSigma hT P = ⨆ (k : ℕ), Oseledets.Entropy.generatedSigmaAlgebra μ (Oseledets.Entropy.MeasurePartition.pullback hT (Oseledets.Entropy.ksJoin hT P k))
Instances For
The Birkhoff target equals the sharp KS rate. The limit conditional information function
g∞ = condInfoFun 𝒞∞ P integrates to the Kolmogorov–Sinai entropy h(P,T):
∫ g∞ = condEntropy μ 𝒞∞ P.cells = ksEntropyPartition hT P.
Combines the keystone identity with ksEntropyPartition_eq_condEntropy_iSup.
R4: the Birkhoff main term converges a.e. to h(P,T). For an ergodic measure-preserving
T, the Birkhoff averages of the limit conditional information function g∞ = condInfoFun 𝒞∞ P
converge μ-a.e. to ∫ g∞ = ksEntropyPartition hT P. This is the pointwise ergodic theorem
(tendsto_birkhoffAverage_ae_integral) applied to the integrable g∞
(integrable_condInfoFun), with the integral value supplied by
integral_condInfoFun_futureSigma_eq.
R5: Chung's L¹ maximal domination #
The Cesàro tail (1/n)∑_{j<n}(g_{n-j} − g∞)(Tʲx) → 0 of the Breiman split is killed by the Chung
maximal function g* = ⨆ₖ gₖ, where gₖ = condInfoFun (𝒞ₖ) P and
𝒞ₖ = σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) is the increasing conditioning family. The genuinely analytic content
is g* ∈ L¹(μ), which follows from the per-cell stopping-time tail estimate
μ{x ∈ Pᵢ : g* x > λ} ≤ e^{−λ} (Chung 1961) and the layer-cake formula, giving
∫ g* ≤ H(P) + 1.
This section delivers:
condLevelSigma,condInfoMaxFun— the conditioning family and the (ℝ≥0∞-valued) maximal function, with measurability (measurable_condInfoMaxFun).lintegral_min_meas_exp_le— the per-cell layer-cake estimate∫⁻_{(0,∞)} min(μ Pᵢ, e^{−t}) dt ≤ ofReal(negMulLog (μ Pᵢ).toReal) + μ Pᵢ.lintegral_condInfoMaxFun_le_of_layer— the R5L¹bound, sorry-free given the per-cell tail hypothesischungTail:∫⁻ g* ≤ ofReal (entropy μ P.cells) + 1.
The two named residual leaves (the precise missing Mathlib pieces) are:
chungTail— the Doob stopping-time tailμ{x ∈ Pᵢ : λ < g* x} ≤ ofReal e^{−λ}(a Markov bound on the conditional-probability martingalepₖ = μ⟦Pᵢ | 𝒞ₖ⟧, on the stopping timeτ = inf{k : pₖ < e^{−λ}}); and- the Maker/Breiman dominated-Cesàro step
(1/n)∑_{j<n}(g_{n-j} − g∞)(Tʲ·) → 0a.e. fromg* ∈ L¹andgₖ → g∞a.e. (not in Mathlib).
The k-th conditioning σ-algebra 𝒞ₖ = σ(T⁻¹(⋁₀ᵏ⁻¹ T⁻ʲP)) of the Breiman telescoping.
Equations
Instances For
The Chung maximal information function g* x = ⨆ₖ ofReal (gₖ x) (in ℝ≥0∞), where
gₖ = condInfoFun (𝒞ₖ) P. Working in ℝ≥0∞ makes the supremum total (it may be ∞) and feeds
the layer-cake formula directly.
Equations
- Oseledets.Krieger.condInfoMaxFun hT P x = ⨆ (k : ℕ), ENNReal.ofReal (Oseledets.Krieger.condInfoFun P x)
Instances For
The maximal information function is measurable: a countable supremum of the measurable
ofReal ∘ gₖ.
Per-cell layer-cake estimate. For a measure a ∈ [0,1], the layer-cake integrand
min(a, e^{−t}) over (0,∞) integrates to at most negMulLog a + a (equality for a ∈ (0,1]).
This is the 1-D real-analysis core of the Chung bound ∫ g* ≤ H(P) + 1: split (0,∞) at
c = −log a, where min = a below c (contributing a·(−log a) = negMulLog a) and min = e^{−t}
above (contributing e^{−c} = a).
The Chung tail integral is H(P) + 1. Summing the per-cell layer-cake estimate over the
finite partition, the layer-cake tail ∫⁻_{(0,∞)} ∑ᵢ min(μ Pᵢ, e^{−t}) dt is at most
ofReal(entropy μ P.cells) + 1: each cell contributes negMulLog(μ Pᵢ) + μ Pᵢ, and the masses sum
to μ(univ) = 1 while the negMulLog terms sum to the Shannon entropy of P.
R5: the Chung L¹ maximal bound ∫ g* ≤ H(P) + 1, reduced to the layer-cake tail leaf.
Given the layer-cake tail hypothesis hlayer
∫⁻ g* ≤ ∫⁻_{(0,∞)} ∑ᵢ min(μ Pᵢ, e^{−t}) dt,
the maximal information function g* = condInfoMaxFun hT P has ∫⁻ g* ≤ ofReal(H(P)) + 1 < ∞,
hence is in L¹. The bound is closed sorry-free by lintegral_tail_sum_le.
The hypothesis hlayer is exactly what the per-cell Chung stopping-time tail
μ{x ∈ Pᵢ : λ < g* x} ≤ e^{−λ} delivers through the layer-cake formula
(MeasureTheory.lintegral_eq_lintegral_meas_le) and the union bound over the finitely many cells:
μ{t ≤ g*} = ∑ᵢ μ{x ∈ Pᵢ : t ≤ g*} ≤ ∑ᵢ min(μ Pᵢ, e^{−t}) (each cell-tail is ≤ μ Pᵢ trivially and
≤ e^{−t} by Chung). Proving hlayer from the partition structure is the one genuinely missing
Mathlib piece (the Doob/Markov bound on the conditional-probability martingale pₖ = μ⟦Pᵢ | 𝒞ₖ⟧
along the stopping time τ = inf{k : pₖ < e^{−λ}}).
The pointwise Shannon–McMillan–Breiman theorem: full structure and remaining leaves #
This is the assembly of the pointwise SMB theorem (1/n)·iₙ(x) → h(P,T) from the proved pieces and
the two precisely-isolated remaining leaves. It records the dependency structure as an honest
theorem taking the two leaves as hypotheses, so the reduction is machine-checked even though the
leaves are not yet discharged.
The Breiman telescoping (SMBSharp.infoWeight_succ_eq) gives iₙ(x) = ∑_{j<n} g_{n−j}(Tʲx) a.e.,
so (1/n)·iₙ(x) = A_n(g∞)(x) + (1/n)∑_{j<n}(g_{n−j} − g∞)(Tʲx), where:
- the main term
A_n(g∞)(x) → h(P,T)a.e. is R4, proved:ae_tendsto_birkhoffAverage_condInfoFun_futureSigma; - the Cesàro tail
→ 0a.e. is the content of the two leaves below.
Pointwise SMB, assembled from the single remaining (Maker/Chung) leaf. For ergodic T,
the information-function averages (1/n)·iₙ(x) converge μ-a.e. to
h(P,T) = ksEntropyPartition hT P, given the one residual leaf
hTail— the Maker/Breiman dominated-Cesàro vanishing of the Cesàro tailiₙ(x)/n − A_n(g∞)(x) → 0a.e., whoseL¹domination is the Chung boundlintegral_condInfoMaxFun_le_of_layer.
The proof adds the R4 main-term limit (ae_tendsto_birkhoffAverage_condInfoFun_futureSigma,
A_n(g∞)(x) → ∫ g∞ = h(P,T)) to the vanishing tail and rewrites iₙ/n = A_n(g∞) + tail. Here
infoFun_n n plays the role of the information function iₙ (Breiman's iₙ(x) = ∑_{j<n} g_{n−j}(Tʲx), SMBSharp.infoWeight_succ_eq); the statement is iₙ-agnostic since the only
property used is the tail decomposition. Everything but hTail (condInfoFun, its integral = h,
R4, the Chung L¹ bound reduced to its tail leaf) is proved sorry-free above.