Aalborg Universitet Blahut-Arimoto algorithm and code

been introduced to model data acquisition in resource-constrained systems. In this paper, an efficient algorithm for numerical computation of the rate-...

0 downloads 63 Views 619KB Size
Aalborg Universitet

Blahut-Arimoto algorithm and code design for action-dependent source coding problems Trillingsgaard, Kasper Fløe; Simeone, Osvaldo; Popovski, Petar; Larsen, Torben Published in: IEEE International Symposium on Information Theory Proceedings DOI (link to publication from Publisher): 10.1109/ISIT.2013.6620415

Publication date: 2013 Document Version Early version, also known as pre-print Link to publication from Aalborg University

Citation for published version (APA): Trillingsgaard, K. F., Simeone, O., Popovski, P., & Larsen, T. (2013). Blahut-Arimoto algorithm and code design for action-dependent source coding problems. In IEEE International Symposium on Information Theory Proceedings (pp. 1192-1196). IEEE. Proceedings of the IEEE International Symposium on Information Theory https://doi.org/10.1109/ISIT.2013.6620415

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access to the work immediately and investigate your claim.

1

Blahut-Arimoto Algorithm and Code Design for Action-Dependent Source Coding Problems

arXiv:1301.6190v1 [cs.IT] 25 Jan 2013

Kasper Fløe Trillingsgaard, Osvaldo Simeone, Petar Popovski and Torben Larsen

Abstract

The source coding problem with action-dependent side information at the decoder has recently been introduced to model data acquisition in resource-constrained systems. In this paper, an efficient algorithm for numerical computation of the rate-distortion-cost function for this problem is proposed, and a convergence proof is provided. Moreover, a two-stage code design based on multiplexing is put forth, whereby the first stage encodes the actions and the second stage is composed of an array of classical Wyner-Ziv codes, one for each action. Specific coding/decoding strategies are designed based on LDGM codes and message passing. Through numerical examples, the proposed code design is shown to achieve performance close to the lower bound dictated by the rate-distortion-cost function.

Index Terms

Rate-distortion theory, side information “vending machine”, Blahut-Arimoto algorithm, code design, LDGM, message passing.

I. I NTRODUCTION The source coding problem in which the decoder can take actions that affect the availability or quality of the side information at the decoder was introduced in [1]. The problem generalizes the well-known Wyner-Ziv set-up and can be used to model data acquisition in resource-constrainted systems, such as sensor networks. In the model studied in [1], each action is associated a cost January 29, 2013

DRAFT

2

and the system design is subject to an average cost constraint. The information-theoretic analysis of the problem was fully addressed in [1]. In this paper, instead, we tackle the practical open issues, namely the computation of the rate-distortion-cost function and code design. Specifically, the rate-distortion-cost function for the source coding problem with action-dependent side information was derived in [1]. However, no specific algorithm was proposed for its computation. A first contribution of this paper is to propose such an algorithm by generalizing the classical Blahut-Arimoto (BA) approach, which was introduced for the Wyner-Ziv problem in [2]. Convergence of the algorithm is also proved. Moreover, while the theory in [1] demonstrates the existence of coding and decoding strategies able to achieve the rate-distortion-cost bound, practical code constructions have not been investigated yet. It is recalled that, for classical lossy source coding problems, codes that have been able to achieve rate-distortion bound include Low Density Generator Matrix (LDGM) codes [3], polar codes [4] and trellis-based quantization codes [5]. For the Wyner-Ziv problem, efficient codes include compound LDPC/LDGM codes [6] and polar codes [4]. A second contribution of this paper is hence the study of code design for source coding problems with action-dependent side information. As shown in [1], optimal codes for this problem have a successive refinement structure, in which the first layer produces the action sequence and the refinement layer uses binning to leverage the side information at the decoder. Here, we first observe that a layered code structure in which the refinement layer uses a multiplexing of separate classical Wyner-Ziv codes, one for each action, is optimal. This allows us to simplify the code structure with respect to the successive refinement strategy in [1]. LDGM-based codes with message passing encoding are designed and demonstrated via numerical results to perform close to the rate-distortion-cost function. The paper is organized as follows. In Section II, the action-dependent source coding problem is described and results from [1] are summarized. In Section III, we describe the proposed algorithm for computation of the rate-distortion-cost function, and in Section IV, a practical code design DRAFT

January 29, 2013

3

Fig. 1.

Source coding with action-dependent side information.

is proposed. Finally, in Section V, we present numerical results for a specific example. A. Notation Throughout this work, we let upper case, lower case and calligraphic letters denote random variables, values and alphabets of the random variables, respectively. For jointly distributed random variables, PX (x), PX|Y (x|y) and PX,Y (x, y) denote the probability mass function (pmf) of X, the conditional pmf of X given Y and the joint pmf of X and Y . To simplify notation, the subscripts of the pmfs may be omitted, e.g., P (x|y) may be used instead of PX|Y (x|y). The notation X n represents the tuple (X1 , X2 , . . . , Xn ), and [a, b] where a, b ∈ Z with a < b denotes the set of integers {a, a + 1, . . . , b − 1, b}. Moreover, Z+ = {0, 1, . . .}, N = Z+ \ {0} and 1{cond} denotes the indicator function, and is one when cond is true, and zero otherwise. The notation b·c and d·e denotes the floor and ceiling operators, respectively. II. BACKGROUND In this section, we recall the definition of source coding problems with action-dependent side information and review the rate-distortion-cost function obtained in [1]. A. System Model The source coding problem with action-dependent side information introduced in [1] is illustrated in Fig. 1. In this problem, the source X n ∈ X n is memoryless and each sample is distributed according to the pmf PX . At the encoder, the encoding function   f : X n → 1, b2nR c , January 29, 2013

(1) DRAFT

4

  maps the source X n into a message M ∈ 1, b2nR c , where R denotes the rate in bits per sample. At the decoder, an action sequence An ∈ An is chosen according to an action strategy   g : 1, b2nR c → An ,

(2)

which maps the message M into an action sequence An . Based on An , the side information Y n ∈ Y n is conditionally independent and identically distributed (iid) according to the conditional pmf PY |X,A so that we have n

n

n

PY n |X n ,An (y |x , a ) =

n Y

PY |X,A (yi |xi , ai ).

(3)

i=1

ˆ n ∈ Xˆ n of X n according to the decoding function The decoder makes a reconstruction X   h : 1, b2nR c × Y n → Xˆ n ,

(4)

ˆ n. which maps message M and side information Y n into the estimate X The action cost function ∆(a) : A → R+ is defined such that ∆(a) = 0 for some a ∈ A and ∆max = maxa∈A ∆(a) < ∞, and the distortion function d(x, xˆ) : X × Xˆ → R+ is defined such that for each x ∈ X there is an xˆ ∈ Xˆ satisfying d(x, xˆ) = 0. The rate-distortion-cost tuple (R, D, C) is then said to be achievable if and only if, for all ε > 0, there exist an encoding function f , an action function g and a decoding function h, for all sufficiently large n ∈ N, satisfying the distortion constraint " E

n X

# ˆ i ) ≤ n(D + ε) d(Xi , X

(5)

i=1

and the action cost constraint " E

n X

# ∆(Ai ) ≤ n(C + ε).

(6)

i=1

The rate-distortion-cost function, denoted as R(D, C), is defined as the infimum of all rates R such that the tuple (R, D, C) is achievable. DRAFT

January 29, 2013

5

Fig. 2.

Optimal encoder for source coding problems with action-dependent side information.

B. Rate-Distortion-Cost Function The rate-distortion-cost function R(D, C) was derived in [1] and is summarized below. Lemma 1. ([1, Theorem 1]) The rate-distortion-cost function for the source coding problem with action-dependent side information is given as R(D, C) = min I(X; A) + I(X; U |Y, A),

(7)

PX,Y,A,U (x, y, a, u) = PX (x)PU |X (u|x)1{η(u)=a} PY |X,A (y|x, a),

(8)

and the minimization is over all pmfs PU |X and deterministic functions η : U → A under which the conditions ˆ opt (U, Y ))] ≤ D, E[d(X, X

(9)

E[∆(A)] ≤ C

(10)

and

ˆ opt : U × Y → Xˆ denotes the best estimate of X given U and Y , i.e., hold. The function X ˆ opt (u, y) = arg min E[d(X, xˆ)|U = u, Y = y]. X x ˆ∈Xˆ

(11)

Moreover, the cardinality of the set U can be restricted as |U| ≤ |X ||A| + 2. January 29, 2013

DRAFT

6

C. Optimal Coding Strategy

The proof of achievability of the rate-distortion-cost function in [1] shows that an optimal encoder has the structure illustrated in Fig. 2 and consists of the following two steps. •

Action Coding: The source sequence X n is mapped to an action sequence An . The action sequence is selected from a codebook CA of about 2nI(X;A) codewords, each type approximately equal to PA . The index B k identifies the selected codeword An , and hence consists of k, approximately equal to nI(X; A), bits. The selection of An is done with the aim of ensuring that An and X n are jointly typical with respect to the joint pmf PX,A (x, a) = PA|X (a|x)PX (x).



Source Coding: Given the action sequence An , a source codebook is chosen out of a set of around 2nI(X;A) codebooks, one for each codeword in CA . Each codeword U n in the selected source codebook has a joint type with An close to PA,U , and the number of codewords is about 2nI(X;U |A) . The source sequence is mapped to a sequence U n taken from the selected codebook with joint type PA,U and with the objective of ensuring that X n , An and U n are jointly typical with respect to the joint pmf PX,A,U (x, a, u). Each source codebook is divided into around 2nI(X;U |A,Y ) subcodebooks, or bins, in order to leverage the side information at the receiver using Wyner-Ziv decoding.

The message M is given by the concatenation of the bits B k and Bsks and thus the overall rate of the action code and the source codes is given by (7). Upon receiving the message M from the encoder, the decoder first reconstructs the action sequence An . The action sequence is used to measure the side information Y n . As An is known, the decoder also knows the source codebook from which U n is selected, and U n is then recovered by using Wyner-Ziv decoding based on ˆ n is obtained as X ˆi = X ˆ opt (Ui , Yi ) for the side information Y n . In the end, the final estimate X i ∈ [1, n]. DRAFT

January 29, 2013

7

III. C OMPUTATION OF THE R ATE -D ISTORTION -C OST F UNCTION In this section, we first reformulate the problem in (7) by introducing Shannon strategies. This result is then used to propose a BA-type algorithm for the computation of the rate-distortion-cost function (7).

A. Shannon Strategies We first observe that, from Lemma 1, it is sufficient to restrict the minimization to all joint distributions for which A is a deterministic function A = η(U ). Moreover, the final estimate of ˆ in (11) is a function of both U and Y . Based on these facts, we define a Shannon strategy X T ∈ T ⊆ X |Y| × A as a vector of cardinality |Y| + 1, in which the first |Y| elements are indexed by the elements in Y and T (y) ∈ Xˆ for y ∈ Y, and the last element is denoted a(T ) ∈ A. We also define the disjoint sets T a = {t ∈ T : a(t) = a} for all actions a ∈ A. The rate-distortioncost function (7) can be restated in terms of the defined Shannon strategies as formalized in the next proposition. Proposition 1. Let T ∈ T ⊆ X |Y| × A denote a Shannon strategy vector as defined above. The rate-distortion-cost function in (7) can be expressed as R(D, C) = min I(X; a(T )) + I(X; T |Y, a(T )),

(12)

where the joint pmf PX,Y,T is of the form PX,Y,T (x, y, t) = PX (x)PT |X (t|x)PY |A,X (y|a(t), x),

(13)

and the minimization is over all pmfs PT |X under the constraints X

E[∆(A)] =

PX (x)PT |X (t|x)∆(a(t)) ≤ C

(14)

t∈T ,x∈X

and E[d(X, T (Y ))] =

X

PX,Y,T (x, y, t)d(t(y), x) ≤ D.

(15)

t∈T ,x∈X ,y∈Y

January 29, 2013

DRAFT

8

Moreover, the cardinality of the alphabet T can be restricted as |T | ≤ |X ||A| + 2. Proof: Given an alphabet U, a pmf PU |X and a function η : U → A, the sum of the two mutual informations in (7) can be seen to be equal to the sum of the two mutual informations in (12) and the average distortion and cost in (9) and (10) to be equal to (15) and (14), respectively, by defining PT |X as follows. For each u ∈ U, define a strategy t with PT |X (t|x) = PU |X (u|x) ˆ opt (u, y) for y ∈ Y. such that a(t) = η(u) and t(y) = X Remark. The characterization in Proposition 1 generalizes the formulation of the Wyner-Ziv rate-distortion function in terms of Shannon strategies given in [2]. The following lemma extends to the rate-distortion-cost function R(D, C) some well-known properties for the rate-distortion function (see, e.g. [7], [8]). This will be useful in the next section when discussing the computation of R(D, C). Lemma 2. The following properties hold for the rate distortion cost-function R(D, C): 1) R(D, C) is non-increasing, convex and continuous for D ∈ [0, ∞) and C ∈ [0, ∞). 2) R(D, C) is strictly decreasing in D ∈ [0, Dmax (C)] and R(Dmax (C), C) = 0, where Dmax (C) = min PT

X

PX,Y,T (x, y, t)d(t(y), x),

(16)

t∈T ,x∈X ,y∈Y

under the constraint E[∆(a(T ))] =

X

∆(a(t))PT (t) ≤ C.

(17)

t∈T

3) For all D ∈ [0, Dmax (C)], the minimum in (12) is attained when the distortion inequality (15) is satisfied with equality. Proof: The lemma is proved by the arguments in [8, Lemma 10.4.1].

B. Computation of the Rate-Distortion-Cost Function DRAFT

January 29, 2013

9

Algorithm 1 BA-type Algorithm for Computation of the Rate-Distortion-Cost Function input: Lagrange multipliers s ≤ 0 and m ≤ 0. output: R(Ds,m , Cs,m ) with Cs,m and Ds,m as in (19)-(20). initialize: PT |X repeat Compute QA as in (25). Compute QT,Y as in (26). Minimize F (PT |X , QT,Y , QA ) with respect to PT |X using Algorithm 2. until convergence PT∗|X ← PT |X

In order to derive a BA-type algorithm to solve the problem in (12), we introduce Lagrange multipliers m for the cost constraint in (14) and s for the distortion constraint (15). The following proposition provides a parametric characterization of the rate-distortion-cost function in terms of the pair (s, m). Proposition 2. For each s ≤ 0 and m ≤ 0, define the rate-distortion-cost tuple (Rs,m , Ds,m , Cs,m ) via the following equations Rs,m = sDs,m + mCs,m + min {I(X; A) + I(X; T |Y, a(T )) − sE [d(X, T (Y ))] − mE [∆(a(T ))]} , PT |X

Cs,m =

X

PX (x)PT∗|X (t|x)∆(a(t)),

(18) (19)

t∈T ,x∈X

Ds,m =

X

PX (x)PT∗|X (t|x)PY |X,A (y|x, a)d(t(y), x),

(20)

t∈T ,x∈X ,y∈Y

where PT∗|X denotes a minimizing pmf PT |X for the optimization problem in (18). Then, the following facts hold January 29, 2013

DRAFT

10

1) The tuple (Rs,m , Ds,m , Cs,m ) lies on the rate-distortion-cost function, i.e., Rs,m = R(Ds,m , Cs,m ).

(21)

2) Every point (R, D, C) on the rate-distortion-cost function for D ∈ [0, Dmax (C)] can be written as (18)-(20) for s ≤ 0 and m ≤ 0; 3) The rate-distortion-cost function is given as R(D, C) = max (Rs,m + s(D − Ds,m ) + m(C − Cs,m )) .

(22)

s≤0

m≤0

Proof: The proposition above follows by strong duality as guaranteed by Slater’s condition [9, Section 5.2.3], and can also be derived directly as in [7]. Given the proposition above, one can trace the rate-distortion-cost function by solving problem (18) and using (19) and (20) for all s ≤ 0 and m ≤ 0. Inspired by the standard BA approach, we now show that problem (18) can be solved by using alternate optimization with respect to PT |X and appropriately defined auxiliary pmfs QT,Y and QA . To do this, we define the function F (·) of PT |X and auxiliary pmfs QT,Y and QA as in (23), F (PT |X , QT,Y , QA ) = DKL (PY,A ||QA ) −

X

PX,Y,T (x, y, t) log PY |X,A (y|x, a(t))

x∈X ,y∈Y,t∈T

+

X

PX,Y,T (x, y, t)d(t(y), x)

t∈T ,x∈X ,y∈Y

x∈X

−m

X

PX (x)DKL (PY,T |X (·, ·|x)||QT,Y ) − s X

∆(a(t))PX (x)PT |X (t|x),

(23)

t∈T ,x∈X

where DKL (P ||Q) denotes the Kullback-Leibler (KL) divergence1 and PX,Y,T , PY,T |X and PY,A are calculated from the joint pmf (13). We then have the following result. Proposition 3. For any s ≤ 0 and m ≤ 0, we have R(Ds,m , Cs,m ) = sDs,m + mCs,m +

1

The Kullback-Leibler divergence [8] is defined as DKL (P ||Q) =

DRAFT

min

PT |X ,QT,Y ,QA

P

i

F (PT |X , QT,Y , QA ),

P (i) log2

P (i) Q(i)

(24)

for pmfs P and Q.

January 29, 2013

11

with (19)-(20), where the distribution PT∗|X denotes a minimizing distribution in (24). Moreover, the function F (PT |X , QT,Y , QA ) is jointly convex in the pmfs PT |X , QT,Y and QA . Proof: The proof technique for the first part is due to [10], and is based on showing that the pmf QA minimizing F (·) for fixed QT,Y and PT |X is

QA (a) =

X x∈X ,t∈T

PX (x)PT |X (t|x) = PA (a),

(25)

a

and the pmf QT,Y minimizing F (·) for fixed QA and PT |X is given by

QT,Y (t, y) =

X

PX (x)PY |X,A (y|x, a(t))PT |X (t|x) = PT,Y (t, y).

(26)

x∈X

The convexity of the function F (·) follows from the log-sum inequality [8]. Based on Proposition 3, the proposed BA-type algorithm for computation of the rate-distortioncost function then consists of alternate minimizing (24) with respect to PT |X , QT,Y and QA . Due to the convexity of (24), the algorithm is known to converge to the optimal point similar to [2]. The proposed algorithm is summarized in Table Algorithm 1. The step of minimizing F (PT |X , QT,Y , QA ) with respect to PT |X is discussed in the rest of this section.

C. Minimizing F over PT |X To minimize the function F (PT |X , QT,Y , QA ) with respect to PT |X for fixed QA and QT,Y , we P add a Lagrange multipliers λx for each equality constraints t∈T PT |X (t|x) = 1 with x ∈ X , and resort to the KKT conditions as necessary and sufficient conditions for optimality. This property of the KKT conditions follows by strong duality due to the validity of Slater’s conditions for the problem [9, Section 5.2.3]. We assume PX (x) > 0 without loss of generality, since values of x with PX (x) = 0 can be removed from the alphabet X . January 29, 2013

DRAFT

12

By strong duality, we obtain the following optimization problem F (PT |X , QA , QT,Y ) =

min PT |X ≥0

P

t∈T

PT |X (t|x)=1

! max min F (PT |X , QA , QT,Y ) +

{λx }∈R|X | PT |X

X

λx

x∈X

X

PT |X (t|x) − 1 .

(27)

t∈T

In the proposed approach, the outer maximization in (27) is then performed using the standard subgradient method. The inner minimization is instead performed by finding the stationary points of the function. This leads to the system of equalities ga|x (PA|X , µx ) = PA|X (a|x) for a ∈ A and x ∈ X , with ga|x (PA|X , µx ) = PA|X (a|x)β

!1−β

2µx αa,x Q

y∈Y

P

x ˜∈X

P (y|x,a) PX (˜ x)PY |X,A (y|˜ x, a)PA|X (a|˜ x) Y |X,A

, (28)

where P

αt,x = QA (a(t))2m∆(a(t)) · 2 αa,x =

X

y∈Y

PY |X,A (y|x,a(t))[sd(t(y),x)+log QT,Y (t,y)]

αt,x .

,

(29) (30)

t∈T a

and β ∈ (0, 1) is a parameter of the algorithm (see Appendix A). Proposition 4. The algorithm in Tables Algorithm 1 and Algorithm 2 converges to the ratedistortion-cost function R(Ds,m , Cs,m ) for all s ≤ 0 and m ≤ 0. Proof: See Appendix A. IV. C ODE D ESIGN In this section, we consider the design of specific encoders and decoders for the source coding problem with action-dependent side information. The goal is to design codes that perform close the rate-distortion-cost function given in Lemma 1 for some fixed pmf in (8) (or equivalenty in Proposition 1 for some fixed pmf PX,Y,T ). DRAFT

January 29, 2013

13

Algorithm 2 Algorithm for Minimization of F with respect to PT |X input: QT,Y and QA . output: PT∗|X . parameters: Subgradient weights θi = 1i , i ∈ Z+ and constant β ∈ (0, 1). (0)

(0)

initialization: i = 0; µx = 1 for x ∈ X ; PA|X (a|x) =

1 |T |

for t ∈ T , x ∈ X .

repeat Perform fixed-point iterations on the system PA|X (a|x) = ga|x (PA|X , µx ) for a ∈ A and (i)

(i+1)

x ∈ X with starting point PA|X until convergence to obtain PA|X . Update the subgradients as   P (i+1) (i) (i+1) θi µx = µx + P (x) 1 − a∈A PA|X (a|x) for x ∈ X . i ← i + 1. until convergence Compute PT∗|X (t|x) =

(i) αt,x P (a(t)|x). αa(t),x A|X

(a) Encoder Fig. 3.

(b) Decoder

Code design for source coding problems with action-dependent side information. The illustration is for A = {0, 1}.

A. Achievability via Multiplexing As explained in Section II-C, the achievability proof in [1] is based on an action codebook CA for the action sequences An of about 2nI(X;A) codewords and 2nI(X;A) source codebooks of about 2nI(X;U |A) codewords for the sequences U n , where each source codebook corresponds to an action sequence An . We also recall that binning is performed on the source codebooks in January 29, 2013

DRAFT

14

order to reduce the rate. Here, we first observe that the code design can be simplified without loss of optimality by using the encoder and decoder structures in Fig. 3. Accordingly, as in [1], the action encoder selects the action sequence An , and the corresponding index B k , from the codebook CA to the decoder, where k = dnI(X; A)e. However, rather than using 2nI(X;A) source codebooks, we utilize only |A| source codebooks Cs,a , a ∈ A. Specifically, the source codebook Cs,a has about 2nPA (a)I(X;U |A=a) codewords, and each codeword in codebook Cs,a has a length of na = dn(PA (a) + ε)e symbols for some ε > 0. To elaborate, as seen in Fig. 3(a), after action encoding, which takes place as in [1], the source X n is demultiplixed into |A| subsequences, such that the a-th subsequence Xana contains all symbols Xi for which Ai = a. Therefore, for sufficiently large n, by the law of large numbers, the number of symbols in Xana is less than na with high probability. Appropriate padding is then used to make the length of the sequence exactly na symbols. The a-th subsequence Xana is then compressed using the codebook Cs,a with the objective of ensuring that Xana and Uana are jointly typical with respect to the pmf PX,U |A (·, ·|a). Binning is performed on each source codebook so that the number of bins is 2nPA (a)I(X;U |Y,A=a) . The bin index Baka of Uana is thus of ka = dnPA (a)I(X; U |Y, A = a)e bits. Overall, the rate of the message M , consisting of ka for the source codes with a ∈ A, is I(X; A) + the indices B k for the action code and Bs,a P a∈A PA (a)I(X; U |Y, A = a) = I(X; A) + I(X; U |A, Y ) as desired.

At the decoder, as seen in Fig. 3(b), the action sequence An is reconstructed and is used to measure the side information Y n . The side information Y n is demultiplexed into |A| subsequences, such that the a-th subsequence Yana contains all symbols Yi for which Ai = a. Each of the subsequences Uana are then reconstructed by using Wyner-Ziv decoding based on the message bits Baka and the side information Yana , and the reconstructed source subsequences ˆ a,i are obtained as X ˆ a,i = X ˆ opt (Ua,i , Ya,i ) for i ∈ [1, na ], where X ˆ a,i denotes the i-th symbol X ˆ n is obtained by multiplexing the of the sequence Xana . Finally, the source reconstruction X DRAFT

January 29, 2013

15

ˆ na for a ∈ A. subsequences X a Remark. The proposed code structure also applies to the classical successive refinement problem [11] and can be used to simplify the code design proposed in [12]. B. The Action Code Based on the encoder structure in Fig. 3(a), we discuss the specific design of the action encoder. The action code CA has to ensure that the codewords An approximately have the type PA , and the action encoder must obtain a codeword An that is jointly typical with respect to the joint pmf PX,A . These conditions are satisfied by optimal source codes [4]. Optimal source codes can be designed using LDGM codes or polar codes as shown in [13] and [4], respectively. Here, we adopt LDGM codes as proposed in [13], [14]. Specifically, in the following, we define an encoder based on message passing. This uses ideas from [13] to handle the general alphabet and pmf PA , and from [14] to implement message passing and decimation. The key difference with respect to [14] is that there the goal of the encoder is to minimize the Hamming distance, while the aim in this paper is to find an action sequence that is jointly typical with the source. We use the code described by the factor graph in Fig. 4. The bottom section of the graph is a LDGM code (see, e.g. [13]). The sequence B k denotes the message bits with k = dnI(X; A)e and {gκ,l : κ ∈ [1, d], l ∈ [1, n]} denote the check variables of the LDGM code, where the choice of d is explained later. The objective of the mappings ψl : {0, 1}d × A → {0, 1} for l ∈ [1, n] is to ensure that the types of the codewords, or action variables, are approximately equal to PA [13]. Specifically, each mapping ψl applies to the subset of check variables {gκ,l }κ∈[1,d] and to the symbol al and is defined in terms of a mapping φ : {0, 1}d → A as ψl ({gκ,l }κ∈[1,d] , a) = 1{φ({gκ,l }κ∈[1,d] )=a} .

(31)

Following [13], the value of d ∈ Z+ is chosen such that there are integers νa for a ∈ A satisfying X νa νa = 2d and PA (a) ≈ d . (32) 2 a∈A January 29, 2013

DRAFT

16

Fig. 4.

Factor graph defining the action encoder.

The mapping φ is then arbitrarily chosen such that exactly va of the 2d binary sequences {gκ,l }κ∈[1,d] map to a. Given the source sequence X n , the encoder runs the sum-product algorithm with decimation as in [14] in order to obtain the message bits B k , and hence the action sequence An (see [4] for a discussion of the role of decimation in source coding problems).

C. The Source Codes Based on the proposed encoder structure in Fig. 3(a), the design of each source code Cs,a for a ∈ A is equivalent to optimal codes for classical Wyner-Ziv problems. In the special case where Xˆ = {0, 1}, and the distortion metric is Hamming, the coding problem reduces to the binary Wyner-Ziv problem with Hamming distortion which was studied in [15], [4].

V. N UMERICAL E XAMPLES To exemplify the problems of interest and to demonstrate the tools developed in this paper, we consider the source coding problem with action-dependent side information depicted in Fig. 5 DRAFT

January 29, 2013

17

and described in the following. Let X ∈ X = [1, K + 1] be a random variable with pmf    1−q if x ∈ [1, K] K , PX (x) =   q if x = K + 1

(33)

for q ∈ [0, 1]. The letters 1, . . . , K denote source outcomes that are relevant for the decoder, and thus should ideally be distinguishable by the latter, while the letter x = K + 1 represents a source outcome that is irrelevant for the decoder. Examples where this situation arises includes monitoring systems in which the decoder wishes to recover the values of a physical quantity only when above, or below, a certain pre-determined threshold. To account for this requirement, the distortion function is given by d(x, xˆ) = 1{x6=xˆ and x∈[1,K]}

(34)

i.e., the decoder is only penalized if it makes an error when x is a relevant letter. At each time i, the decoder can choose an action Ai ∈ {0, 1}, such that, if Ai = 0, the side information is given by Yi = e, where e denotes an erasure symbol, and if Ai = 1, the side information is given by Yi = Y˜i , where Y˜i is the output of an erasure channel in which Y˜ = X ∪ {e} and    p for y˜ = e    y |x) = PY˜ |X (˜ 1 − p for y˜ = x ,      0 otherwise

(35)

where p ∈ (0, 1) is the erasure probability. The action cost function ∆(·) is given by ∆(a) =

1{a=1} , which implies that the cost constraint with 0 ≤ C ≤ 1 enforces that no more than nC samples of the side information Y˜ n can be measured by the receiver.

A. Computation of the Rate-Distortion-Cost Function We apply the proposed BA-type algorithm to the described scenario in order to compute the rate-distortion-cost function. For reference, we also consider the simplified strategy, in which the January 29, 2013

DRAFT

18

Fig. 5.

The action-dependent source coding problem.

0.7 0.6 C=0.25

Rate

0.5

p=0.0 (adaptive) p=0.1 (adaptive) p=0.0 (non−adaptive) p=0.1 (non−adaptive)

0.4 C=0.50 0.3 0.2 C=0.75 0.1 0

Fig. 6.

0.02

0.04

0.06 Distortion

0.08

0.1

0.12

Computed rate-distortion-cost function R(D, C) for K = 4, erasure probability p ∈ {0.0, 0.1} and q = 12 .

actions are chosen independently of the message M . We refer to the optimal approach discussed thus far as “adaptive actions”, while labeling as “non-adaptive actions” the simplified class of strategies in which the actions are selected independently of the encoder’s message (see [1]). The performance with non-adaptive actions can be obtained from Proposition 1 by imposing that A and X are independent. Fig. 6 shows R(D, C) for K = 4, q =

1 2

and p ∈ {0, 0.1} with both adaptive actions and non-

adaptive actions. We see that for the given scenario, we achieve significant gains using adaptive actions in comparison to non-adaptive actions. Moreover, the effect of the erasures decreases as the action cost decreases due to the reduced availability of the side information at the decoder.

B. Code Design We now turn to the issue of code design for the scenario. We consider the case in which p = 0, so that, the measured side information is noiseless and we adopt the code design proposed in DRAFT

January 29, 2013

19

Section IV. We start with some analytical considerations of the rate-distortion-cost function that will be useful for designing the codes. By symmetry, the pmf PA|X can be written as   C−qγ   if a = 1 ∧ x ∈ [1, K]  1−q      1−q−C+qγ if a = 0 ∧ x ∈ [1, K] 1−q , PA|X (a|x) =   γ if a = 1 ∧ x = K + 1        1−γ if a = 0 ∧ x = K + 1

(36)

h  i where γ ∈ 0, min 1, Cq is a parameter to be determined. The mutual information I(X; A) can thus be computed in terms of PA|X and PX , and the rate-distortion-cost function in (7) is then obtained via the following optimization problem R(D, C) =

min γ∈[0,min(1, C q )]

¯ I(X; A) + (1 − C)R



 D , PX|A=0 , 1−C

(37)

¯ where R(D, PX ) is the classical rate-distortion function of a memoryless source with pmf PX . Note that we have used the fact that I(X; U |Y, A = 1) = 0 since Y = X for A = 1. From (37), it is seen that we only need to design an action code and the source code Cs,0 , where the latter is a classical rate-distortion code. For the action code, we use the approach proposed in Section IV and for the source code we use the related LDGM scheme proposed in [13]. We consider the case where q = 21 , K = 4, which yields d = 2 for both the action code CA and the source code Cs,0 . We fix a blocklength of n = 10 000 , yielding LDGM codes of blocklength, 20 000 . Each point is averaged over 50 source realizations and LDGM codes. For both codes, we use the sum-product algorithm with decimation in [14]. As in [14], we use damping after 30 iterations and the maximum number of iterations is set to 100. Nodes are decimated if their log-likelihood ratios are larger than 2. Suitable irregular degree distributions optimized for the AWGN channel are obtained from [16]. The results are shown in Fig. 7. It is seen that the resulting distortions are close the lower bounds for both the adaptive and non-adaptive actions January 29, 2013

DRAFT

20 0.8 Adaptive Non−adaptive 0.6 Rate

C=0.25 0.4 C=0.50 0.2 C=0.75 0

Fig. 7.

0

0.01

0.02

0.03 0.04 Distortion

0.05

0.06

0.07

Rate-distortion-cost function (lines) compared to the performance of the proposed code design (markers) with both

adaptive and non-adaptive actions.

strategies. Moreover, the theoretical gains of the adaptive action strategy versus the non-adaptive one are confirmed by the practical implementation.

VI. C ONCLUSION In this paper, we have considered computation of the rate-distortion-code function and code design for source coding problems with action-dependent side information. We have formulated the problem using Shannon strategies and proposed a BA-type algorithm that efficiently computes the rate-distortion function. Convergence of this algorithm was proved. Moreover, we proposed a code design based on multiplexing that was shown, via numerical results, to perform close to the rate-distortion bound.

A PPENDIX A P ROOF FOR L EMMA 4 The BA-type algorithm detailed in Tables Algorithm 1 and Algorithm 2 is based on alternatively optimizing F (·) in (23) with respect to PT |X , QA and QT,Y . Given the convexity of this function, shown in Proposition 3, this procedure is known to converge [17]. The optimization with respect QA for fixed PT |X and QA and with respect to QT,Y for fixed PT |X and QT,Y DRAFT

January 29, 2013

21

are performed as in the proof of Proposition 3. Therefore, the proof is concluded once it is demonstrated that the procedure of Table Algorithm 2 converges to an optimum PT |X for fixed QA and QT,Y . This is discussed next. The procedure in Table Algorithm 2 for the optimization with respect to PT |X for fixed QA and QT,Y is based on the dual minimization (27) via an outer loop that performs subgradient iterations and an inner loop that performs fixed-point iterations to obtain a stationary point of the Lagrangian function (38) (see below). We first show that this nested loop procedure obtains an optimal dual solution PT |X of the dual problem and then argue that this is also a solution for the original primal problem. Convegence of the outer loop follows immediately by the well-known properties of the subgradient approach for weights that are selected as Θi = 1i [17]. Note that the constraints P 1 − a∈A P (i) (a|x) for x ∈ X are the subgradients with respect to λx of the dual function x + 2, the updates of given by the minimization in (27) [18]. Therefore, by defining µx = − Pλ(x)

(i)

the variables µx in Table Algorithm 2 can be seen to correspond to the classical subgradient updates. Given the known convergence properties of the subgradient method with the weights as in Table Algorithm 2, the outer maximation converges [17]. Next, we need to show that we can solve the inner minimization in (27) by using the fixedpoint iterations in (47) (see below). It is first shown that we can solve the minimization problem by solving a system of stationarity equations for P (a|x), a ∈ A, x ∈ X . Then, we conclude the proof using Banach fixed-point theorem [19]. The Lagrangian to be minimized is given by (cf. (27)) ! L(PT |X , {λx }) = F (PT |X , QA , QT,Y ) +

X x∈X

λx

X

PT |X (t|x) − 1 .

(38)

t∈T

It is noted that the function L is coercive in PT |X , and hence from Weierstrass theorem [20] a minimizer of L exists. The minimizer must be a stationary point, i.e., it must satisfy the KKT conditions [9, Section 5.5.3]. We obtain the following stationarity conditions by differentiating January 29, 2013

DRAFT

22

(38) with respect to P (t|x) and equating to zero, leading to log P (t|x) +

X

P (y|x, a(t)) log [P (y, a(t)] =

y∈Y

m∆(a(t)) +

X

P (y|x, a(t)) [sd(t(y), x˜) + log Q(t, y) + log Q(a(t))] + µx ,

(39)

y∈Y

log P (t|x) +

X

P (y|x, a(t)) log [P (y, a(t)] = log αt,x + µx

(40)

y∈Y

where αt,x is given in (29) and P (y, a) is calculated from the joint pmf in (13). We can then rewrite (40) by applying the exponential function to both sides and solving for P (t|x) P (t|x) = Q

2µx αt,x y∈Y

P

x ˜∈X

P (y|x,a(t)) P (˜ x)P (a(t)|˜ x)P (y|˜ x, a(t))

(41)

where αt,x is given in (29). Note that the right-hand side only depends on PT |X through PA|X , and hence by computing P (a|x) for a ∈ A and x ∈ X , P (t|x) can be calculated. By summing (41) over t ∈ T a , we obtain P (a|x) = Q

2µx αa,x

y∈Y

P

x ˜∈X

P (y|x,a) , P (˜ x)P (y|˜ x, a)P (a|˜ x)

(42)

where αa,x is given in (30). Given {µx }, the equalities in (42) for a ∈ A and x ∈ X form a system of |A||X | nonlinear equation with |A||X | unknowns, namely the |A||X | values P (a|x) for a ∈ A and x ∈ X . By solving for P (a|x), we can compute P (t|x) as in (41). Note that the constants αa,x are sums of exponential functions, and hence P (a|x) in (42) are strictly positive for a ∈ A and x ∈ X . Now, define ha|x (PA|X , µx ) = Q

2µx αa,x

y∈Y

P

x ˜∈X

, P (y|x,a) PX,A,Y (˜ x, a, y) Y |X,A

Ha|x (q, µx ) = log h(2q , µx ),

(43) (44)

and Ga|x (q, µx ) = log ga|x (2q , µx ) = βq + (1 − β)Ha|x (q, µx ) DRAFT

(45) January 29, 2013

23 |A||X |

where q ∈ R|A||X | and 2q ∈ R+

are the vectors corresponding to the elements qa|x =

log P (a|x) and P (a|x), respectively, and β ∈ (0, 1). Moreover, let G(q, {µx }) ∈ R|A||X | denote the vectors collecting the functions Ga|x for a ∈ A, x ∈ X . With these definitions it is now evident that (42) is equivalent to the following equation qa|x = Ha|x (q, {µx }).

(46)

We now show that the fixed-point iteration of the form q(k+1) = G(q(k) , {µx }).

(47)

converges towards a fixed-point q∗ , which is a unique fixed-point of (46) for any β ∈ (0, 1). Recall that the existence of a fixed-point q∗ is guaranteed by the necessity of the KKT conditions and by Weierstrass theorem. In the following, we apply Banach fixed-point theorem. To this end, we have to demonstrate that there is a closed subset Ω ∈ R|A||X | , such that the vector function G maps from vectors q ∈ Ω into Ω, and is a contraction in Ω. By the existence of a fixed-point q∗ , we define the subset Ω as the closed ball  Ω = Br (q∗ ) = q ∈ R|A||X | ||q − q∗ ||∞ ≤ r ,

(48)

for some r > q(0) − q∗ ∞ . In order to show that G maps from Ω into Ω and is a contraction, we compute the partial derivatives of Ha|x (q) and Ga|x (q) as following

and

X ∂Ha˜|˜x (q) P (x0 )P (y|x0 , a0 )2qa0 |x0 = −1{˜a=a0 } P (y|˜ x, a ˜) P ∂qa0 |x0 ˜)2qa˜|x x∈X P (x)P (y|x, a y∈Y

(49)

∂Ga˜|˜x (q) ∂Ha˜|˜x (q) = β 1{˜a=a0 and x˜=x0 } + (1 − β) . ∂qa0 |x0 ∂qa0 |x0

(50)

It is clear that the derivative

∂Ha˜|˜ x (q) ∂qa˜|˜ x

is strictly negative for q ∈ R|A||X | since P (x) > 0, and it

can be seen that X a0 ∈A,x0 ∈X

January 29, 2013

∂Ha˜|˜x (q) = −1. ∂qa0 |x0

(51)

DRAFT

24

Therefore, for β ∈ (0, 1), we must have that ∂Ga˜|˜x (q) ∂qa0 |x0 < 1. a0 ∈A,x0 ∈X X

(52)

It follows that we can bound the l∞ -norm of the Jacobian for G(q), JG (q), as ||JG (q)||∞ < 1.

(53)

By the definition of the l∞ -norm and by the mean value theorem [19], there exist values a ˜ ∈ A, x˜ ∈ X and ζ ∈ (0, 1) such that G(q1 ) − G(q2 )



= |Ga˜|˜x (q1 ) − Ga˜|˜x (q2 )| X ∂Ga˜|˜x 1 2 1 2 ≤ q − q ∞ ∂qa|x (ζq + (1 − ζ)q ) a∈A,x∈X

(54a) (54b)

≤ q1 − q2 ∞ max ||JG (q)||∞

(54c)

≤ K q1 − q 2 ∞

(54d)

q∈Ω

for q1 , q2 ∈ Ω, where the last inequality follows by the fact that ||JG (q)||∞ must attain a maximum value K < 1 when q ∈ Ω, since Ω is closed and bounded, by Weierstrass theorem. The chain of inequalities in (54) demonstrates that G is a contraction mapping. To show that G maps from Ω into Ω, suppose q ∈ Ω. Since Ω contains the fixed-point q∗ , it is then seen that ||G(q) − q∗ ||∞ = ||G(q) − G(q∗ )||∞ < ||q − q∗ ||∞ < r,

(55) (56)

and hence G(q) ∈ Ω. By invoking the Banach fixed-point theorem, the fixed-point iteration defined by (47) converges to a unique fixed-point q∗ . We finally observe that, since the fixed-point is unique, the minimizer of the Lagrangian function L is unique, and hence the optimal PT |X of the primal and the dual optimization problem coincide, thus concluding the proof. DRAFT

January 29, 2013

25

R EFERENCES [1] H. H. Permuter and T. Weissman, “Source coding with a side information ”vending machine”,” IEEE Trans. Inform. Theory, vol. 57, no. 7, pp. 4530–4543, Jul 2011. [2] F. Dupuis, W. Yu, and F. M. J. Willems, “Blahut-arimoto algorithms for computing channel capacity and rate-distortion with side information,” Proc. IEEE Symp. Inform. Theory, Jun 2004. [3] E. Martinian and J. S. Yedidia, “Iterative quantization using codes on graphs,” in Proc. Allerton Conf. Comm., Cont. and Comp., Monticello, IL, USA, Oct 2003. [4] S. Korada and R. Urbanke, “Polar codes are optimal for lossy source coding,” IEEE Trans. Inform. Theory, vol. 56, no. 4, pp. 1751 –1768, Apr 2010. [5] M. Marcellin and T. Fischer, “Trellis coded quantization of memoryless and gauss-markov sources,” IEEE Trans. Comm., vol. 38, no. 1, pp. 82 –93, Jan 1990. [6] M. Wainwright and E. Martinian, “Low-density graph codes that are optimal for binning and coding with side information,” IEEE Trans. Inform. Theory, vol. 55, no. 3, pp. 1061 –1079, Mar 2009. [7] R. M. Gray, Source Coding Theory. Kluwer Academic Publishers, 1990. [8] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley, 2006. [9] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2009. [10] R. E. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Inform. Theory, vol. 18, no. 4, pp. 460 – 473, Jul 1972. [11] W. Equitz and T. Cover, “Successive refinement of information,” IEEE Trans. Inform. Theory, vol. 37, no. 2, pp. 269 –275, Mar 1991. [12] Y. Zhang, S. Dumitrescu, J. Chen, and Z. Sun, “LDGM-based codes for successive refinement,” in 47th Annual Allerton Conf., Allerton House, UIUC, Illinois, USA, Oct 2009, pp. 1518–1524. [13] Z. Sun, M. Shao, J. Chen, K. Wong, and X. Wu, “Achieving the rate-distortion bound with low-density generator matrix codes,” IEEE Trans. Comm., vol. 58, no. 6, pp. 1643 –1653, Jun 2010. [14] T. Filler and J. Fridrich, “Binary quantization using belief propagation with decimation over factor graphs of ldgm codes,” in Proc. Allerton Conf. Comm., Cont. and Comp., Monticello, IL, USA, 2007. [15] M. Wainwright and E. Martinian, “Low-density graph codes that are optimal for binning and coding with side information,” IEEE Trans. Inform. Theory, vol. 55, no. 3, pp. 1061 –1079, Mar 2009. [16] A. University of Newcastle. (2012, Dec.) Lopt - online optimisation of ldpc and ra degree distributions. [Online]. Available: http://sonic.newcastle.edu.au/ldpc/lopt [17] B. Polyak, Introduction to Optimization. Optimization Software, Inc., 1987. [18] D. P. Bertsekas, Convex Optimization Theory: Supplementary Chapter 6 on Convex Optimization Algorithms. Athena Scientific, 2010. [19] E. S¨uli and D. Mayers, An Introduction to Numerical Analysis. Cambridge, 2003. [20] D. P. Bertsekas, Convex Optimization Theory. Athena Scientific, 2009.

January 29, 2013

DRAFT