Thomas_M.Cover信息论英文教材课后题答案.doc

资源描述

2.2、Entropy of functions. Let be a random variable taking on a finite number of values. What is the (general) inequality relationship of and if (a) ? (b) ? Solution: Let . Then . Consider any set of ’s that map onto a single . For this set , Since is a monotone increasing function and . Extending this argument to the entire range of (and ), we obtain , with equality iff if one-to-one with probability one. (a) is one-to-one and hence the entropy, which is just a function of the probabilities does not change, i.e., . (b) is not necessarily one-to-one. Hence all that we can say is that , which equality if cosine is one-to-one on the range of . 2.16. Example of joint entropy. Let be given by 0 1 0 1/3 1/3 1 0 1/3 Find (a) ,. (b) ,. (c) (d) . (e) (f) Draw a Venn diagram for the quantities in (a) through (e). Solution: Fig. 1 Venn diagram (a) . (b) ()() (c) (d) (e) (f) See Figure 1. 2.29 Inequalities. Let , and be joint random variables. Prove the following inequalities and find conditions for equality. (a) (b) (c) (d) Solution: (a) Using the chain rule for conditional entropy, With equality iff ,that is, when is a function of and . (b) Using the chain rule for mutual information, , With equality iff , that is, when and are conditionally independent given . (c) Using first the chain rule for entropy and then definition of conditional mutual information, , With equality iff , that is, when and are conditionally independent given . (d) Using the chain rule for mutual information, And therefore this inequality is actually an equality in all cases. 4.5 Entropy rates of Markov chains. (a) Find the entropy rate of the two-state Markov chain with transition matrix (b) What values of ,maximize the rate of part (a)? (c) Find the entropy rate of the two-state Markov chain with transition matrix (d) Find the maximum value of the entropy rate of the Markov chain of part (c). We expect that the maximizing value of should be less than, since the 0 state permits more information to be generated than the 1 state. Solution: (a) The stationary distribution is easily calculated. Therefore the entropy rate is (b) The entropy rate is at most 1 bit because the process has only two states. This rate can be achieved if( and only if) , in which case the process is actually i.i.d. with . (c) As a special case of the general two-state Markov chain, the entropy rate is . (d) By straightforward calculus, we find that the maximum value of of part (c) occurs for . The maximum value is (wrong!) 5.4 Huffman coding. Consider the random variable (a) Find a binary Huffman code for . (b) Find the expected codelength for this encoding. (c) Find a ternary Huffman code for . Solution: (a) The Huffman tree for this distribution is (b) The expected length of the codewords for the binary Huffman code is 2.02 bits.( ) (c) The ternary Huffman tree is 5.9 Optimal code lengths that require one bit above entropy. The source coding theorem shows that the optimal code for a random variable has an expected length less than . Given an example of a random variable for which the expected length of the optimal code is close to , i.e., for any , construct a distribution for which the optimal code has . Solution: there is a trivial example that requires almost 1 bit above its entropy. Let be a binary random variable with probability of close to 1. Then entropy of is close to 0, but the length of its optimal code is 1 bit, which is almost 1 bit above its entropy. 5.25 Shannon code. Consider the following method for generating a code for a random variable which takes on values with probabilities . Assume that the probabilities are ordered so that . Define , the sum of the probabilities of all symbols less than . Then the codeword for is the number rounded off to bits, where . (a) Show that the code constructed by this process is prefix-free and the average length satisfies . (b) Construct the code for the probability distribution (0.5, 0.25, 0.125, 0.125). Solution: (a) Since , we have Which implies that . By the choice of , we have . Thus , differs from by at least , and will therefore differ from is at least one place in the first bits of the binary expansion of . Thus the codeword for , , which has length , differs from the codeword for at least once in the first places. Thus no codeword is a prefix of any other codeword. (b) We build the following table Symbol Probability in decimal in binary Codeword 1 0.5 0.0 0.0 1 0 2 0.25 0.5 0.10 2 10 3 0.125 0.75 0.110 3 110 4 0.125 0.875 0.111 3 111 3.5 AEP. Let be independent identically distributed random variables drawn according to the probability mass function. Thus . We know that in probability. Let , where q is another probability mass function on . (a) Evaluate , where are i.i.d. ~ . Solution: Since the are i.i.d., so are ,,…，，and hence we can apply the strong law of large numbers to obtain 8.1 Preprocessing the output. One is given a communication channel with transition probabilities and channel capacity . A helpful statistician preprocesses the output by forming . He claims that this will strictly improve the capacity. (a) Show that he is wrong. (b) Under what condition does he not strictly decrease the capacity? Solution: (a) The statistician calculates . Since forms a Markov chain, we can apply the data processing inequality. Hence for every distribution on , . Let be the distribution on that maximizes . Then . Thus, the statistician is wrong and processing the output does not increase capacity. (b) We have equality in the above sequence of inequalities only if we have equality in data processing inequality, i.e., for the distribution that maximizes , we have forming a Markov chain. 8.3 An addition noise channel. Find the channel capacity of the following discrete memoryless channel: Where . The alphabet for is . Assume that is independent of . Observe that the channel capacity depends on the value of . Solution: A sum channel. , We have to distinguish various cases depending on the values of . In this case, ,and . Hence the capacity is 1 bit per transmission. In this case, has four possible values . Knowing ,we know the which was sent, and hence . Hence the capacity is also 1 bit per transmission. In this case has three possible output values, 0,1,2, the channel is identical to the binary erasure channel, with . The capacity of this channel is bit per transmission. This is similar to the case when and the capacity is also 1/2 bit per transmission. 8.5 Channel capacity. Consider the discrete memoryless channel , whereand . Assume that is independent of . (a) Find the capacity. (b) What is the maximizing ? Solution: The capacity of the channel is , which is obtained when has an uniform distribution, which occurs when has an uniform distribution. (a) The capacity of the channel is /transmission. (b) The capacity is achieved by an uniform distribution on the inputs. 8.12 Time-varying channels. Consider a time-varying discrete memoryless channel. Let be conditionally independent given , with conditional distribution given by . Let , . Find . Solution: With equlity if is chosen i.i.d. Hence . 10.2 A channel with two independent looks at . Let and be conditionally independent and conditionally identically distributed given . (a) Show . (b) Conclude that the capacity of the channel is less than twice the capacity of the channel Solution: (a) (b) The capacity of the single look channel is . The capacity of the channel is 10.3 The two-look Gaussian channel. Consider the ordinary Shannon Gaussian channel with two correlated looks at , i.e., , where with a power constraint on , and , where . Find the capacity for (a) (b) (c) Solution: It is clear that the two input distribution that maximizes the capacity is . Evaluating the mutual information for this distribution, Now since, we have . Since , and , we have, And . Hence (a) . In this case, , which is the capacity of a single look channel. (b) . In this case, , which corresponds to using twice the power in a single look. The capacity is the same as the capacity of the channel . (c) . In this case, , which is not surprising since if we add and , we can recover exactly. 10.4 Parallel channels and waterfilling. Consider a pair of parallel Gaussian channels, i.e., , where , And there is a power constraint . Assume that . At what power does the channel stop behaving like a single channel with noise variance , and begin behaving like a pair of channels? Solution: We will put all the signal power into the channel with less noise until the total power of noise+signal in that channel equals the noise power in the other channel. After that, we will split any additional power evenly between the two channels. Thus the combined channel begins to behave like a pair of parallel channels when the signal power is equal to the difference of the two noise powers, i.e., when .

展开阅读全文