About the mutual (conditional) information
Swiss Federal Institute of Technology (ETH Zurich)
In general, the mutual information between two random variables X and Y , I(X; Y ), might
be larger or smaller than their mutual information conditioned on some additional information Z,I(X; Y |Z). Such additional information Z can be seen as output of a channel C taking as inputX and Y . It is thus a natural question, with applications in fields such as information theoreticcryptography, whether conditioning on the output Z of a fixed channel C can potentially increasethe mutual information between the inputs X and Y .
In this paper, we give a necessary, sufficient, and easily verifiable criterion for the channel C (i.e.,
the conditional probability distribution PZ|XY ), such that I(X; Y ) ≥ I(X; Y |Z) holds for every jointdistribution of the random variables X and Y . Furthermore, the result is generalized to channelswith n inputs (for n ∈ N), that is, to conditional probability distributions of the form PZ|X
The mutual information I(X; Y ) between two random variables X and Y is one of the basic measuresin information theory. It can be interpreted as the amount of information that X gives on Y (or viceversa). In general, additional information, i.e., conditioning on an additional random variable Z,can either increase or decrease this mutual information.1 Without loss of generality2, this additionalinformation Z can be seen as output of a channel C with input (X, Y ), which is fully specified bythe conditional probability distribution PZ|XY .
In the following, we investigate the question whether for a fixed conditional probability distribu-
tion PZ|XY (i.e., a fixed channel C with input (X, Y ) and output Z), conditioning on Z can increasethe mutual information between X and Y . We give a sufficient criterion, depending only on PZ|XY ,such that this is not the case, i.e., I(X; Y ) ≥ I(X; Y |Z) for all distributions PXY . The criterionis also necessary in the sense that, if it is not satisfied, there exists a probability distribution PXYsuch that I(X; Y ) < I(X; Y |Z). Moreover, since our criterion is basically a simple information the-oretic expression, it can easily be handled, and the verification of whether it is satisfied by a givenconditional probability distribution PZ|XY is efficient.
One possible application of this result is in the field of information theoretic cryptography, where
it is used for the analysis of secret-key agreement protocols3, but this application is not discussed inthis extended abstract.
This paper is organized as follows. In Section 2, the notation and some definitions are introduced.
The main theorem is stated and proved in Section 3. A generalization of the result to channels withmore than two inputs, i.e., to probability distributions of the form PZ|X
1Let for example X = Y = Z be three (uniformly distributed) binary random variables. Then, conditioning on Z
decreases the mutual information between X and Y . On the other hand, for two independent binary random variables Xand Y , conditioning on Z = X ⊕ Y increases their mutual information.
2at least in the context where only the three random variables X, Y and Z are considered3See [?] for an example of information-theoretically secure secret-key agreement.
Let in the following p be a conditional probability distribution of the form
p : (x, y, z) ∈ X × Y × Z −→ p(z|x, y),
i.e., for each pair (x, y) ∈ X × Y, p(·|x, y) is a probability distribution of a random variable withrange Z.
The conditional probability distribution p uniquely defines a channel4 C taking as input two
random variables X and Y with ranges X and Y, respectively, and giving an output Z in the rangeZ. The main goal of this paper is to investigate the question whether for such a fixed channelC conditioning on the channel output Z can increase the mutual information (i.e., the correlation)between the two inputs X and Y with arbitrary joint distribution PXY . This motivates the followingdefinition.
Definition 2.1. The conditional probability distribution p is called correlation free if
for all PXY , where X, Y and Z are random variables5 distributed according to PXY and PZ|XY := p.
Similar to the joint probability distribution PUV of two random variables U and V , which is
the product of PU and PV if and only if U and V are statistically independent, we will see inSection 3 that the conditional probability distribution p can be written as a product if and only if itis correlation free.
Definition 2.2. The conditional probability distribution p is called multiplicative if it is the productof two functions r and s depending only on (z, x) and (z, y), respectively, i.e.,
for all x ∈ X , y ∈ Y and z ∈ Z.
It is easy to decide whether a given conditional probability distribution p is multiplicative. The
following lemma shows that one only has to check the conditional independence of a certain pair ofrandom variables.
Lemma 2.3. The conditional probability distribution p is multiplicative if and only if
for two independent random variables X and Y with uniform distribution on their ranges X and Y,respectively (i.e., PXY (x, y) = c for some constant c), and Z with PZ|XY := p.
it is obvious that p is multiplicative if and only if (for each fixed z ∈ Z with PZ (z) > 0) theprobability distribution PXY |Z=z can be written as a product of a function depending only on x anda function depending only on y, which is equivalent to the independence of X and Y conditioned onZ, i.e., I(X; Y |Z) = 0.
Theorem 3.1. A conditional probability distribution
is correlation free if and only if it is multiplicative.
4such that p is its probability transition matrix5In this work, we restrict to random variables with finite entropy. However, their range might be infinite.
Proof. If p is not multiplicative, then it follows from Lemma 2.3 that I(X; Y |Z) > 0 for randomvariables X, Y and Z with PXY = c (where c is some constant) and PZ|XY = p, while, obviously,I(X; Y ) = 0. Consequently, conditioning on Z increases the mutual information between X and Y ,i.e., p is not correlation free.
It thus remains to be shown that if for any random variables X, Y and Z, the conditional
probability distribution PZ|XY is multiplicative, then I(X; Y ) ≥ I(X; Y |Z). The argument will besubdivided into two parts, where in the first part, the implication is proven for a special case, calleddeterministic case. In the second part, we will make use of this result to prove the general case.
Let X, Y and Z be random variables with ranges X , Y and Z, respectively, such that PZ|XY
is multiplicative, i.e., PZ|XY (z|x, y) = r(z, x) · s(z, y). For the deterministic case, we additionallyassume that all values r(z, x) and s(z, y) are either 0 or 1, which obviously implies that PZ|XY (z|x, y)is also 0 or 1 (for all x, y, z). The value of Z is thus uniquely determined by X and Y , i.e.,
The main idea for the proof of the deterministic case is to introduce an additional random variable
Y with range Y and PY |ZX (y|z, x) := PY |Z(y|z) (for all x, y, z). Hence we have
i.e., X → Z → Y is a Markov chain. Moreover, the value of Z is uniquely determined by the valuesof X and Y , i.e.,
This can be seen as follows. Assume by contradiction that H(Z|XY ) > 0. Then there exist (atleast) two different values z1 = z2 and x, y such that PXY Z (x, y, z1) > 0 and PXY Z (x, y, z2) > 0. Since for i = 1 and i = 2
the factors r(zi, x) and s(zi, y) must be nonzero and thus by assumption be equal to 1. Consequently,the probabilities PZ|XY (z1|x, y) and PZ|XY (z2|x, y) are both equal to 1, which is a contradiction.
From (1) and (4) the expressions I(XY ; Z) and I(XY ; Z) are both equal to H(Z) and thus,
I(X; Z) + I(Y ; Z) = I(X; Z) + I(Y ; Z) ≥ I(X; Z) + I(Y ; Z|X) = I(XY ; Z) = I(XY ; Z)
where the inequality follows from the fact that X → Z → Y is a Markov chain (see (3)). Makinguse of some basic information theoretic equalities shows that
which concludes the proof for the deterministic case.
To prove the general case, we again assume that for given random variables X, Y and Z the
conditional probability distribution PZ|XY is multiplicative, but this time, the factors r and s ofPZ|XY might take on any value in the interval [0, 1].6
The main idea is to reduce this case to the deterministic case by constructing new random
and for which the conditional probability distribution P ¯
Y is again multiplicative, i.e., for all x, y
6This is the most general case, since any two factors r and s can be replaced by ˜
function only depending on z, such that the function values of ˜
s(z.y) ∈ {0, 1}. Additionally, the range of ¯
Z should consist of two disjoint sets A and
Then, from the result in the deterministic case, I( ¯
which implies I(X; Y ) ≥ I(X; Y |Z).
The main task is thus to find such a construction of random variables ¯
(7), (8), (9) and (10), which will be sketched in the remaining part of this section. However, in thisextended abstract, we skip the proof that the following construction fulfills the above conditions.
Without loss of generality, assume that the ranges X , Y and Z of the random variables X, Y
and Z, respectively, are finite and that the function values of r and s are rational numbers.7 Wethus can write
with appropriate constants ρ, σ ∈ N and functions φ and ψ with ranges {0, 1, . . . , ρ} and {0, 1, . . . , σ},respectively. Moreover, to simplify the notation, set Z = {0, 1, . . . , γ − 1} for an appropriate γ ∈ N0.
Let U and V be independent and uniformly distributed random variables with ranges U :=
{0, 1, . . . , α − 1} and V := {0, 1, . . . , β − 1}, respectively, where α := ρ · γ and β := σ · γ. Moreover,for all x ∈ X and y ∈ Y let Px and Qy be independent and uniformly distributed random variableswith ranges U and V, respectively. In the following, the |X |-tuple (Px∈X ) will be denoted as P, andthe |Y|-tuple (Qy∈Y ) as Q. The random variables ¯
Y should then be defined as triples (X, U, Q)
and (Y, V, P), respectively. Condition (7) is thus an immediate consequence of the independence ofU , V , P and Q.
Z we need some additional notation. Let for all x ∈ X , y ∈ Y and z ∈ Z
Note that for any given x, the sets Ax,z (for all z), and for any given y, the sets By,z are disjoint. Set
Cx,y := {(u, v) ∈ U × V | ∃ z ∈ Z : u ∈ Ax,z ∧ v ∈ By,z}.
It is easy to verify that, for fixed x, y, u, v with (u, v) ∈ Cx,y, there is exactly one element z ∈ Z suchthat u ∈ Ax,z and v ∈ By,z, which we will denote as z(x, y, u, v). Furthermore, the cardinality of theset Cx,y is equal to ρ·σ, and thus the cardinalities of both Cx,y and its complement ¯
are independent of x and y. This allows us to define a family of functions
parameterized by k ∈ {1, . . . κ} where κ := | ¯
Cx,y|, such that for each fixed pair (x, y) ∈ X × Y the
is a bijection between {1, . . . , κ} and ¯
Z will be defined such that the value of ¯
Y = (Y, V, P). We will distinguish two cases: ( ¯
V := V − QY (mod β). In the first case (if ( ¯
Z be defined as the triple (Z , P, Q) with Z := z(X, Y, ¯
7It is easy to verify that any triple of random variables X, Y and Z with multiplicative conditional probability
distribution PZ|XY can be approximated by a triple of random variables X , Y and Z having finite range and for whichPZ |X Y is multiplicative with rational factors, such that the mutual information between X and Y (given Z) is arbitrarilyclose to the mutual information between X and Y (given Z ).
Z be a triple (K, U, V) where K satisfies
and where U := (Ux∈X ) and V := (Vy∈Y ) are |X |- and |Y|-tuples with
Note that, since the function (11) is a bijection, the value of K is uniquely determined by (12). B isthen defined as the set of all possible values of ¯
The conditional probability distribution PZ|XY studied in the previous sections corresponds to achannel taking a pair of random variables as input. However, it is a natural question whether ourconsiderations can be extended to channels with more than two inputs.
Let therefore p be a conditional probability distribution of the form
p : (x1, . . . , xn, z) −→ p(z|x1, . . . , xn)
for some n ∈ N. Then, there is a canonical extension of Definition 2.2 including this more generaltype of conditional probability distributions.
Definition 4.1. The conditional probability distribution p is called multiplicative if it can be writtenas a product
p(z|x1, . . . , xn) = r1(z, x1) · · · rn(z, xn)
for appropriate functions r1, . . . , rn.
On the other hand, the generalization of the definition of correlation freeness is motivated by the
which is equivalent to I(X; Y ) ≥ I(X; Y |Z) (see (6)).
Definition 4.2. The conditional probability distribution p is called correlation free if
for any choice of random variables X1, . . . , Xn and Z with PZ|X
It turns out that, for these extended definitions, the equivalence between correlation freeness and
the multiplicative property of conditional probability distributions still holds.
Theorem 4.3. A conditional probability distribution
p : (x1, . . . , xn, z) −→ p(z|x1, . . . , xn)
is correlation free if and only if it is multiplicative.
Proof. We first show by induction that any multiplicative conditional probability distribution pis correlation free. Let therefore X1, . . . , Xn and Z be random variables such that PZ|X
multiplicative and assume that the implication is proven for probability distributions conditioned onn − 1 random variables, i.e.,
I(Xi; Z) ≥ I(X1 · · · Xn−1; Z).
I(Xi; Z) ≥ I(X1 · · · Xn−1; Z) + I(Xn; Z) ≥ I(X1 · · · Xn; Z)
where the last inequality is equivalent to I(X1 · · · Xn−1; Xn) ≥ I(X1 · · · Xn−1; Xn|Z) (see (6)) andtherefore a direct consequence of Theorem 3.1. (Note that if PZ|X
for X = (X1, . . . , Xn−1) and Y = Xn is multiplicative as well.) Since (15) is trivially satisfied forn = 2, the assertion follows by induction on n.
It remains to be proven that correlation freeness of a probability distribution p implies that p is
multiplicative. First, note that for random variables X1, . . . , Xn and Z
if X1, . . . , Xn are mutually independent. Second, recall that 0 = I(X; Y ) < I(X; Y |Z) for indepen-dent and uniformly distributed random variables X and Y if PZ|XY is not multiplicative (see firstsection of the proof of Theorem 3.1). Again (see (6)), this is equivalent to the inequality
Assume by contradiction that p is not multiplicative, and let X1, . . . , Xn be uniformly distributed
independent random variables and Z be distributed according to PZ|X
an index k such that PZ|XY is not multiplicative for X := Xk and Y := X1 · · · Xk−1Xk+1 · · · Xn. Hence, from (17) and (16)
I(X1 · · · Xn; Z) > I(Xk; Z) + I(X1 · · · Xk−1Xk+1 · · · Xn; Z) ≥
We have investigated for which cases the mutual information between arbitrarily distributed randomvariables X and Y can not be increased when conditioning on additional information Z about Xand Y , which is determined by a fixed conditional probability distribution PZ|XY . Clearly, Z canbe considered as the output of a channel C with two inputs, X and Y , and probability transitionmatrix PZ|XY . Our main theorem gives a necessary and sufficient criterion for the channel C, i.e.,for PZ|XY , such that I(X; Y ) ≥ I(X; Y |Z) for all distributions PXY . Furthermore, we have shownthat this result can be generalized to channels with more than two inputs.
Combining the main Theorem 3.1 and Lemma 2.3, our criterion and its consequence can be
formulated as follows: If for a fixed channel C specified by PZ|XY , conditioning on the output Zdoes not increase the mutual information between independent and uniformly distributed inputs Xand Y ,8 then conditioning on the output of C can not increase the mutual information betweeninputs X and Y having any arbitrary joint distribution PXY .
8Since the mutual information between two independent inputs X and Y is zero, this means that I(X; Y |Z) = 0.
Top Court in India Rejects Novartis Drug Patent - NYTimes.com Low-Cost Drugs in Poor Nations Get a Lift in Indian Court NEW DELHI — People in developing countries worldwide will continue to have ac-cess to low-cost copycat versions of drugs for diseases like H.I.V. and cancer, at leastfor a while. Production of the generic drugs in India, the world’s biggest provider of cheap medi-cines, w
RESEARCH REPORT Varenicline in the routine treatment of tobacco dependence: a pre–post comparison with nicotine replacement therapy and an evaluation in those with mental illness John A. Stapleton1,2,3, Lucy Watson2, Lucy I. Spirling2, Robert Smith2, Andrea Milbrandt2, Marina Ratcliffe2 & Gay Sutherland2,3 Health Behaviour Research Centre, Department of Epidemiology and Public Healt