The martingale-free Markov–Borel–Cantelli engine for the SMB upper bound #
This file develops the Algoet–Cover "Lemma 1" — the elementary, martingale-free core of the upper half of the Shannon–McMillan–Breiman (SMB) theorem that Krieger's finite-generator theorem needs. It is the one-line consequence of Markov's inequality and the Borel–Cantelli lemma:
If
gₙ ≥ 0are random variables with∫ gₙ ≤ 1for alln, thenlimsup_{n→∞} (1/n) · log gₙ ≤ 0almost everywhere.
This is exactly Lemma 1 of Algoet–Cover (1988); see the references. The whole point of the
sandwich proof is that the upper bound on the information function reduces to applying this
engine to a likelihood ratio gₙ = qₙ / pₙ of an approximating sub-probability qₙ against the
true atom measure pₙ = μ(atomₙ), because such a ratio automatically integrates to ≤ 1.
Main definitions / results #
Oseledets.Krieger.ae_forall_eventually_div_log_le— the engine (always-true form). For a sequenceg : ℕ → α → ℝ≥0∞of a.e.-measurable functions with∫⁻ gₙ ∂μ ≤ 1, almost everyxsatisfies: for everyε > 0, eventually(1/n) · log (gₙ x).toReal ≤ ε. Pure Markov + Borel–Cantelli, no ergodic theorem and no martingale convergence.Oseledets.Krieger.ae_limsup_div_log_toReal_le_zero— thelimsuprepackaging, under the extra hypothesis1 ≤ gₙ x(which makes the sequence bounded below, so itslimsupis not a junk value):limsup_n (1/n) · log (gₙ x).toReal ≤ 0a.e.Oseledets.Krieger.ae_forall_eventually_div_infoFun_le— the information-function name-count bound at an abstract rateR: given measurablef : ℕ → α → ℝ(the information functions) and the partition-function bound∫⁻ exp(fₙ − n·R) ∂μ ≤ 1, almost everyxsatisfies, for everyε > 0, eventually(1/n) · fₙ x ≤ R + ε. InstantiatingR = log #cellsgives the crude Birkhoff-free name-count boundlimsup (1/n)·infoFunₙ ≤ log #cells.
The conversion of the rate to the Kolmogorov–Sinai entropy h(P,T) is the second half of the
sandwich and requires the Birkhoff ergodic theorem to evaluate the per-symbol codelength of the
approximating measure as a time average — see the module note below. That half is intentionally
NOT in this file.
Note on the rate (why this file stops at the engine) #
The pure Markov+Borel–Cantelli engine, applied to the uniform approximation
qₙ ≡ 1/#atoms, gives only the crude bound limsup (1/n) infoFunₙ ≤ log(card ι) a.e.
To reach the sharp rate h(P,T) = ksEntropyPartition, Algoet–Cover feed the engine a non-uniform
likelihood ratio and then invoke the ergodic theorem (their eq. (33)); for the finite-partition
(Krieger) case the m-block approximation closes the gap to h via ksEntropySeq m / m → h
WITHOUT martingales. This file proves the engine and the uniform corollary; the Birkhoff step is
left to a companion file.
References #
- P. H. Algoet and T. M. Cover, A sandwich proof of the Shannon–McMillan–Breiman theorem,
Ann. Probab. 16 (1988), 899–909. (Lemma 1 is
ae_limsup_div_log_toReal_le_zero.) - M. Einsiedler, T. Ward, Ergodic Theory with a view towards Number Theory, Ch. 2.
Algoet–Cover Lemma 1, the always-true eventually form (the martingale-free engine).
If g : ℕ → α → ℝ≥0∞ are a.e.-measurable with ∫⁻ gₙ ∂μ ≤ 1 for all n, then for μ-almost
every x and every ε > 0, eventually (1/n) · log (gₙ x).toReal ≤ ε.
This is the entire probabilistic content of the SMB upper bound — Markov's inequality on the
nonnegative gₙ plus the Borel–Cantelli lemma, with the threshold ranging over ε = 1/(k+1).
No ergodic theorem, no martingale convergence. It is stated in the eventually form (rather than
as a limsup bound) because it is unconditionally true; the limsup repackaging
ae_limsup_div_log_toReal_le_zero additionally needs the sequence to be bounded below.
Algoet–Cover Lemma 1, the limsup form.
If g : ℕ → α → ℝ≥0∞ are a.e.-measurable with ∫⁻ gₙ ∂μ ≤ 1 and, pointwise, 1 ≤ gₙ x
(so that log (gₙ x).toReal ≥ 0 and the sequence (1/n)·log (gₙ x).toReal is bounded below by
0), then for μ-almost every x,
limsup_{n→∞} (1/n) · log (gₙ x).toReal ≤ 0.
The lower bound 1 ≤ gₙ x is exactly what holds in the Shannon–McMillan–Breiman application,
where gₙ x = (μ atomₙ)⁻¹ ≥ 1 since atoms have measure ≤ 1; it is genuinely needed because the
limsup of an ℝ-sequence unbounded below is a junk value.
The information-function name-count bound #
The engine above is applied here to an abstract information function infoFun : ℕ → α → ℝ
(the eventual concrete object is infoFunₙ x = -log (μ atomₙ).toReal from
Oseledets.Krieger.InfoFunction, built in a sibling file; we keep this result decoupled from it).
The single hypothesis is the partition-function bound at rate R:
∫⁻ exp(infoFunₙ − n·R) ∂μ ≤ 1. Feeding gₙ x = ofReal (exp (infoFunₙ x − n·R)) to the engine
gives, for a.e. x, that eventually (1/n)·infoFunₙ x ≤ R + ε, for every ε > 0.
The two SMB instantiations:
- Uniform / crude (
R = log #cells, Birkhoff-free).∫⁻ exp(infoFunₙ − n·log N) = N⁻ⁿ · #{atoms of positive measure} ≤ N⁻ⁿ · Nⁿ = 1, giving the boundlimsup (1/n)·infoFunₙ ≤ log N. No ergodic theorem. - Sharp (
R = h(P,T)). Requires a non-uniform competing measure and the Birkhoff ergodic theorem to evaluate its codelength as a time average (Algoet–Cover eq. (33)); the diagonalksEntropySeq m / m → hthen closesR ↓ h. Not proved here.