Conditional Shannon entropy of a finite measurable partition #
This file extends the measure-theoretic foundation for Kolmogorov–Sinai entropy started in
Oseledets.Entropy.Partition to the conditional setting. Given a probability space (α, μ)
that is standard Borel, a sub-σ-algebra 𝒜 ≤ mα, and a finite family of cells s : ι → Set α,
the conditional Shannon entropy is the μ-average of the pointwise entropy computed with respect
to the regular conditional probability κ = condExpKernel μ 𝒜:
H(s | 𝒜) = ∫ ω, ∑ᵢ negMulLog (κ(ω) (s i)).toReal ∂μ.
This is the information about the partition that remains after conditioning on the information in
𝒜. We record its three basic properties — nonnegativity, the log k upper bound for a genuine
partition, and the reduction to ordinary entropy when 𝒜 = ⊥ — together with the fundamental
fact that conditioning does not increase entropy (condEntropy_le), which is Jensen's
inequality for the concave function negMulLog applied to the disintegration of μ.
Main definitions #
Oseledets.Entropy.condEntropy: the conditional Shannon entropy∫ ω, ∑ i, negMulLog (condExpKernel μ 𝒜 ω (s i)).toReal ∂μof a finite family of cells.
Main results #
Oseledets.Entropy.condEntropy_nonneg: conditional entropy is nonnegative.Oseledets.Entropy.condEntropy_le_log_card: a partition intokcells has conditional entropy at mostlog k.Oseledets.Entropy.condEntropy_bot: conditioning on the trivial σ-algebra⊥recovers the ordinary entropy.Oseledets.Entropy.condEntropy_le: conditioning does not increase entropy.
References #
- François Le Maître, Notes on the Kolmogorov–Sinai theorem (2017), §1.
- Peter Walters, An Introduction to Ergodic Theory, Springer GTM 79, Chapter 4.
The conditional Shannon entropy
∫ ω, ∑ i, negMulLog (condExpKernel μ 𝒜 ω (s i)).toReal ∂μ of a finite family of cells
s : ι → Set α given a sub-σ-algebra 𝒜. It is the μ-average of the pointwise entropy of s
computed against the regular conditional probability condExpKernel μ 𝒜 ω, i.e. the information
about the partition remaining after conditioning on 𝒜.
Equations
- Oseledets.Entropy.condEntropy μ 𝒜 s = ∫ (ω : α), ∑ i : ι, (((ProbabilityTheory.condExpKernel μ 𝒜) ω) (s i)).toReal.negMulLog ∂μ
Instances For
For every ω, the conditional probability condExpKernel μ 𝒜 ω A of a set A has real value
in [0, 1], because condExpKernel μ 𝒜 ω is a probability measure.
The pointwise integrand of condEntropy is nonnegative: for every ω each cell has
conditional probability in [0, 1] (the kernel is Markov), where negMulLog is nonnegative.
Conditional Shannon entropy is nonnegative: the integrand is pointwise nonnegative because each
conditional cell probability lies in [0, 1].
Integrability of the single term ω ↦ negMulLog (condExpKernel μ 𝒜 ω A).toReal: the function
is measurable (composition of the continuous negMulLog with the measurable conditional
probability) and bounded by 1, hence integrable on the finite measure μ.
The integrand ω ↦ ∑ i, negMulLog (condExpKernel μ 𝒜 ω (s i)).toReal of condEntropy is
μ-integrable, as a finite sum of integrable single terms.
For a finite measurable partition P of the probability space and μ-almost every ω, the
conditional probabilities of the cells under condExpKernel μ 𝒜 ω sum to 1.
The cells are measurable, cover the whole space, and are μ-a.e. disjoint; the last fact transfers
to the conditional kernel via the disintegration condExpKernel μ 𝒜 ∘ₘ μ.trim h𝒜 = μ, so for a.e.
ω the cells are condExpKernel μ 𝒜 ω-a.e. disjoint and the kernel — being Markov — distributes
its total mass 1 over them.
Conditional version of Proposition 1 (Le Maître). A finite measurable partition of a
probability space into k cells has conditional Shannon entropy at most log k.
For μ-almost every ω the cell probabilities under condExpKernel μ 𝒜 ω sum to 1, so the
pointwise entropy is bounded by log k via the existing Jensen bound entropy_le_log_card;
integrating this constant bound over the probability measure μ yields the claim.
Conditioning on the trivial σ-algebra ⊥ recovers the ordinary Shannon entropy: the
conditional probability condExpKernel μ ⊥ ω (s i) equals the unconditional probability μ (s i)
for μ-almost every ω, so the integrand is a.e. the constant entropy μ s.
The disintegration identity ∫ ω, (condExpKernel μ 𝒜 ω A).toReal ∂μ = (μ A).toReal:
the μ-average of the conditional probability of a measurable set recovers its unconditional
probability, by the law of total probability (condExpKernel_ae_eq_condExp and integral_condExp
applied to the indicator of the set).
Conditioning does not increase entropy. For any finite measurable partition P of the
probability space and any sub-σ-algebra 𝒜, the conditional entropy is at most the unconditional
entropy.
This is Jensen's inequality (ConcaveOn.le_map_integral) for the concave function negMulLog,
applied term by term: for each cell Pᵢ, the average of negMulLog (κ(ω) Pᵢ).toReal is at most
negMulLog of the average ∫ ω, (κ(ω) Pᵢ).toReal ∂μ, which equals negMulLog (μ Pᵢ).toReal by the
disintegration integral_condExpKernel_toReal. Summing over i gives the bound.