The name-count / asymptotic-equipartition covering bound (Krieger M2, C2) #

This file proves the name-count / asymptotic-equipartition (AEP) covering bound that drives the coding construction (C3) of Krieger's finite generator theorem (issue #15). For a finite measurable partition P of a measure-preserving system, with Kolmogorov–Sinai entropy h = ksEntropyPartition hT P, the bound says: for every ε > 0 and every large enough rank N, the rank-N names (atoms of the iterated join ⋁₀ᴺ⁻¹ T⁻ᵏP) needed to cover all but ε of the space number at most ⌊exp(N(h+ε))⌋.

This is the covering form of the Shannon–McMillan–Breiman upper bound, the exact object the coding combinatorics consume to turn log k > h into the existence of a Fin k-valued code.

The two halves of the argument #

The classical AEP covering argument (Walters, An Introduction to Ergodic Theory, Ch. 4; Einsiedler–Lindenstrauss–Ward, Entropy in Ergodic Theory, §2–3; Downarowicz, Entropy in Dynamical Systems, §3.1) splits into:

Pigeonhole count (unconditional, proved here in full). The "good" names — those whose join-cell has measure ≥ exp(−N·R) — number at most exp(N·R). Proof: the cells are pairwise μ-a.e. disjoint, so their total measure is the measure of their union, hence ≤ 1; with each ≥ exp(−N·R), a pigeonhole gives #good · exp(−N·R) ≤ 1. This is card_goodNames_le_exp / card_goodNames_le_exp_entropy.
Covering (the SMB upper half). The good names cover ≥ 1−ε of the space, i.e. the union of the bad cells (measure < exp(−N(h+ε))) is μ-small. This is exactly the convergence in measure μ {x | (1/N)·infoFunₙ(x) > h+ε} → 0, the upper half of the Shannon–McMillan–Breiman theorem in measure. It is parameterized here as the hypothesis UpperSMBInMeasure (a sorry-free Prop), because the unconditional sharp rate h (rather than the crude log (card ι) of Oseledets.Krieger.SMB) is the documented residual R5 of Oseledets.Krieger.SMBSharp — the Chung L¹ maximal-function domination. See the module note at the bottom for the precise minimal analytic input and the cheaper (martingale-free) block-product route that discharges it.

Main definitions #

Oseledets.Krieger.goodNames — the Finset of rank-N names whose cell has measure ≥ exp(−N·R).
Oseledets.Krieger.UpperSMBInMeasure — the in-measure SMB upper bound at rate h, the exact analytic input the covering half needs (the parameterized residual).

Main results #

Oseledets.Krieger.card_goodNames_le_exp — pigeonhole count: #goodNames ≤ exp(N·R).
Oseledets.Krieger.measure_iUnion_goodNames_ge — the good cells have union-measure ≥ μ {x | (1/N)·infoFunₙ ≤ R}, the covering content (also unconditional).
Oseledets.Krieger.exists_cover_names_card_le — the C3-facing covering bound: under UpperSMBInMeasure, for every ε > 0 and all large N there is a Finset S of rank-N names with μ (⋃ g ∈ S, cell g) ≥ 1 − ε and S.card ≤ ⌊exp(N(h+ε))⌋.

References #

P. Walters, An Introduction to Ergodic Theory, GTM 79, Springer (1982), Ch. 4 (entropy, the AEP / covering number of n-names).
M. Einsiedler, E. Lindenstrauss, T. Ward, Entropy in Ergodic Theory and Topological Dynamics, §2 (SMB) and §3 (Krieger generator).
T. Downarowicz, Entropy in Dynamical Systems, Cambridge (2011), §3.1 (AEP, covering bound).

source

noncomputable def Oseledets.Krieger.goodNames {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (N : ℕ) (R : ℝ) :

Finset (Fin N → ι)

The good names of rank N at rate R: the codes g : Fin N → ι whose iterated-join cell ⋂ₖ T⁻ᵏ P_{g k} has measure at least exp(−N·R). These are the names that carry non-negligible mass; the pigeonhole bound card_goodNames_le_exp shows there are at most exp(N·R) of them.

Equations

Oseledets.Krieger.goodNames hT P N R = {g : Fin N → ι | ENNReal.ofReal (Real.exp (-(↑N * R))) ≤ μ ((Oseledets.Entropy.ksJoin hT P N).cells g)}

Instances For

source

theorem Oseledets.Krieger.mem_goodNames {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (N : ℕ) {R : ℝ} {g : Fin N → ι} :

g ∈ goodNames hT P N R ↔ ENNReal.ofReal (Real.exp (-(↑N * R))) ≤ μ ((Entropy.ksJoin hT P N).cells g)

source

theorem Oseledets.Krieger.card_goodNames_le_exp {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (N : ℕ) (R : ℝ) :

↑(goodNames hT P N R).card ≤ Real.exp (↑N * R)

Pigeonhole count of the good names. Since the join cells are pairwise μ-a.e. disjoint, the sum of the measures of the good cells equals the measure of their union, which is at most 1; as each good cell has measure ≥ exp(−N·R), the number of good names is at most exp(N·R).

This is the elementary half of the AEP covering bound: it holds unconditionally (no SMB, no ergodicity), for any rate R. The covering half (that the good names cover most of the space) is the SMB upper bound, parameterized separately.

source

theorem Oseledets.Krieger.measure_nullAtom_eq_zero {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (N : ℕ) :

μ {x : α | μ (atomOf hT P N x) = 0} = 0

The set of points lying in a null rank-N cell. It is μ-null: it is contained in the finite union of the null join-cells, each of measure 0. This is the negligible part on which the information function iₙ(x) = -log μ(atomₙ(x)) is the junk value -log 0 = 0 and so cannot witness good-name membership.

source

theorem Oseledets.Krieger.measure_iUnion_goodNames_ge {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) (N : ℕ) (hN : 1 ≤ N) (R : ℝ) :

μ {x : α | 1 / ↑N * infoFun hT P N x ≤ R} ≤ μ (⋃ g ∈ goodNames hT P N R, (Entropy.ksJoin hT P N).cells g)

Covering content of the good names (unconditional). The union of the good rank-N cells covers at least the set {x | (1/N)·infoFunₙ(x) ≤ R} of points whose N-name information rate is at most R: such a point's own atom has measure ≥ exp(−N·R), so its name is good and it lies in the corresponding good cell. The only exception is the μ-null set of points sitting in a null cell (where infoFunₙ = -log 0 = 0 is a junk value), which is absorbed by measure_nullAtom_eq_zero.

This is the covering half's unconditional content: it reduces the covering bound μ (⋃ good cells) ≥ 1 − ε to the in-measure SMB upper bound μ {x | (1/N)·infoFunₙ > R} ≤ ε.

source

def Oseledets.Krieger.UpperSMBInMeasure {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) :

Prop

The in-measure Shannon–McMillan–Breiman upper bound at the sharp rate h = ksEntropyPartition hT P — the single parameterized analytic input the covering bound needs. It asserts that, for every ε > 0, the measure of the set of points whose N-name information rate exceeds h + ε tends to 0: μ {x | h + ε < (1/N)·infoFunₙ(x)} → 0.

This is strictly weaker than the pointwise a.e. SMB convergence (1/N)·infoFunₙ → h (it is the "in measure / McMillan L¹" form of the upper half only), and it is the exact statement the covering bound consumes. It is the documented residual of Oseledets.Krieger.SMBSharp: the integral-level rate identity ksEntropyPartition_eq_condEntropy_iSup and the Fekete rate tendsto_ksEntropySeq are already proved here; what remains is the concentration (in measure) of (1/N)·infoFunₙ around its mean ksEntropySeq N / N → h.

See the module note below for the cheapest known route to discharge it (the martingale-free block-product competing-measure bound fed to the engine Oseledets.Krieger.ae_forall_eventually_div_infoFun_le, which gives the sharp a.e. upper bound limsup (1/N)·infoFunₙ ≤ h and hence this in-measure form by dominated convergence on the bounded-below indicator).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem Oseledets.Krieger.exists_cover_names_card_le {α : Type u_1} {ι : Type u_2} [mα : MeasurableSpace α] [Fintype ι] {μ : MeasureTheory.Measure α} [MeasureTheory.IsProbabilityMeasure μ] {T : α → α} (hT : MeasureTheory.MeasurePreserving T μ μ) (P : Entropy.MeasurePartition μ ι) [Nonempty ι] (hsmb : UpperSMBInMeasure hT P) {ε : ℝ} (hε : 0 < ε) :

∀ᶠ (N : ℕ) in Filter.atTop , ∃ (S : Finset (Fin N → ι)), 1 - ENNReal.ofReal ε ≤ μ (⋃ g ∈ S, (Entropy.ksJoin hT P N).cells g) ∧ ↑S.card ≤ ↑⌊Real.exp (↑N * (Entropy.ksEntropyPartition hT P + ε))⌋

The C3-facing name-count / AEP covering bound. Assume the in-measure SMB upper bound UpperSMBInMeasure. Then for every ε > 0 and all sufficiently large rank N, there is a Finset S of rank-N names (codes g : Fin N → ι) such that

the union of the corresponding join-cells covers all but ε of the space: 1 − ENNReal.ofReal ε ≤ μ (⋃ g ∈ S, cell g), and
the number of names is at most ⌊exp(N·(h+ε))⌋: S.card ≤ ⌊exp(N·(h+ε))⌋,

where h = ksEntropyPartition hT P. The Finset is the good names goodNames hT P N (h+ε): the card bound is the unconditional pigeonhole card_goodNames_le_exp, and the covering bound combines the unconditional covering content measure_iUnion_goodNames_ge (good cells cover {(1/N)·infoFunₙ ≤ h+ε}) with UpperSMBInMeasure (the complement has measure < ε, eventually).

This is exactly the object the coding construction (C3) consumes: with Real.log k > h, picking ε small makes ⌊exp(N·(h+ε))⌋ < kᴺ, so the S names embed into Fin k-codes, covering 1 − ε of the space — the combinatorial seed of the Fin k-valued generator.

The minimal SMB input and the route to discharge `UpperSMBInMeasure` #

Everything above is unconditional except the single hypothesis UpperSMBInMeasure, the in-measure Shannon–McMillan–Breiman upper bound at the sharp rate h = ksEntropyPartition hT P: ∀ ε > 0, μ {x | (1/N)·infoFunₙ(x) > h+ε} → 0. This is the exact analytic input the AEP covering bound needs — strictly weaker than the pointwise a.e. SMB ((1/N)·infoFunₙ → h), being the upper half only and in measure (McMillan, not Breiman).

Why the crude bound and plain Markov are not enough. Oseledets.Krieger.ae_limsup_div_infoFun_le_log_card (Birkhoff-free, Markov + Borel–Cantelli) gives only the rate log (card ι) ≥ h — the integrand exp(infoFunₙ − N·R) of the engine is fixed, so the uniform competing measure is hard-wired into it and the rate cannot drop below log (card ι) by that instantiation alone. Plain Markov on infoFunₙ is also too weak: ∫ (1/N)·infoFunₙ = ksEntropySeq N / N → h, so μ {(1/N)·infoFunₙ > h+ε} ≤ (ksEntropySeq N / N)/(h+ε) → h/(h+ε), a positive constant. The content of the upper bound is the concentration of (1/N)·infoFunₙ about its mean h — genuine SMB content, not bookkeeping.

Route 1 — block / likelihood-ratio (engine + Birkhoff, no martingale). Recommended discharge. Feed the engine Oseledets.Krieger.ae_forall_eventually_div_log_le (the ∫⁻ gₙ ≤ 1 form) the likelihood ratio gₙ(x) = qₙ(name x) / μ(cellₙ(x)), where qₙ is a competing sub-probability on the N-names, not the fixed exp(infoFunₙ − N·R). For block length m, take qₙ(name) = ∏_{blocks b} μ(P_m-cell of block b) (the m-block product). Then ∫⁻ gₙ = ∑_names qₙ ≤ 1 automatically, and the engine gives, a.e., limsup (1/N)·log gₙ ≤ 0, i.e. limsup (1/N)·(infoFunₙ + log qₙ) ≤ 0, i.e. limsup (1/N)·infoFunₙ ≤ limsup −(1/N) log qₙ = limsup (1/N) ∑_{blocks} I_{P_m}(T^{jm}·). The right side is a Birkhoff average of I_{P_m} along T^m, which converges a.e. (ergodic case) to (1/m)·∫ I_{P_m} = (1/m)·H(P_m) = ksEntropySeq m / m (Oseledets.Entropy.tendsto_birkhoffAverage_ae_integral + integral_infoFun_eq). Letting m → ∞ with ksEntropySeq m / m → h (Oseledets.Entropy.tendsto_ksEntropySeq) yields the sharp a.e. upper bound limsup (1/N)·infoFunₙ ≤ h, hence UpperSMBInMeasure (the a.e. limsup bound gives, for each ε, (1/N)·infoFunₙ ≤ h+ε eventually a.e.; the bounded indicators 𝟙{(1/N)·infoFunₙ > h+ε} then → 0 in L¹ by dominated convergence — i.e. convergence in measure to 0).

This route needs ergodicity / the Birkhoff ergodic theorem (already in the repo) but no martingale, no conditional information function, no Chung L¹ maximal domination. The genuine remaining work is (a) the m-block product likelihood ratio and its ∫⁻ ≤ 1 (finite measure algebra on the append factorization ksJoinCells_append, repo KSEntropy.lean), and (b) the Birkhoff evaluation of (1/N) ∑_{blocks} I_{P_m}∘T^{jm} — an ≈100–150-line development. This is the minimal discharge, since only the upper half in measure is needed.

Route 2 — full pointwise SMB via SMBSharp (heavier). The documented residual R5 of Oseledets.Krieger.SMBSharp (Chung's L¹ maximal-function domination, ≈150 lines on top of the telescoping already proved there) gives the full a.e. SMB (1/N)·infoFunₙ → h, from which UpperSMBInMeasure is immediate. Route 1 is lighter for this corollary because it avoids the martingale entirely; both require the ergodic theorem.

Minimal analytic input named. Either (Route 1) the m-block product partition-function identity ∫⁻ (∏_{blocks} μ(P_m-cell)) / μ(cellₙ) ∂μ ≤ 1 plus a.e. Birkhoff convergence of the block average of I_{P_m}; or (Route 2) Chung's g* = ⨆ₖ gₖ ∈ L¹. Both reduce to the already-proved tendsto_ksEntropySeq for the final m → ∞ (Route 1) / k → ∞ (Route 2) passage.

Documentation

Oseledets.Krieger.NameCountSharp

The name-count / asymptotic-equipartition covering bound (Krieger M2, C2) #

The two halves of the argument #

Main definitions #

Main results #

References #

The minimal SMB input and the route to discharge `UpperSMBInMeasure` #

The name-count / asymptotic-equipartition covering bound (Krieger M2, C2) #

The two halves of the argument #

Main definitions #

Main results #

References #

The minimal SMB input and the route to discharge UpperSMBInMeasure #

The minimal SMB input and the route to discharge `UpperSMBInMeasure` #