Information projection

In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is

p^{*}={\underset {p\in P}{\arg \min }}\operatorname {D} _{\mathrm {KL} }(p||q)

where $D_{\mathrm {KL} }$ is the Kullback–Leibler divergence from q to p. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection $p^{*}$ is the "closest" distribution to q of all the distributions in P.

The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex:[1]

$\operatorname {D} _{\mathrm {KL} }(p||q)\geq \operatorname {D} _{\mathrm {KL} }(p||p^{*})+\operatorname {D} _{\mathrm {KL} }(p^{*}||q)$

This inequality can be interpreted as an information-geometric version of Pythagoras' triangle inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.

It is worthwhile to note that since $\operatorname {D} _{\mathrm {KL} }(p||q)\geq 0$ and continuous in p, if P is closed and non-empty, then there exists at least one minimizer to the optimization problem framed above. Furthermore, if P is convex, then the optimum distribution is unique.

The reverse I-projection also known as moment projection or M-projection is

$p^{*}={\underset {p\in P}{\arg \min }}\operatorname {D} _{\mathrm {KL} }(q||p)$

Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior. For I-projection, $p(x)$ will typically under-estimate the support of $q(x)$ and will lock onto one of its modes. This is due to $p(x)=0$ , whenever $q(x)=0$ to make sure KL divergence stays finite. For M-projection, $p(x)$ will typically over-estimate the support of $q(x)$ . This is due to $p(x)>0$ whenever $q(x)>0$ to make sure KL divergence stays finite.

The concept of information projection can be extended to arbitrary statistical f-divergences and other divergences.[2]

References

Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). Hoboken, New Jersey: Wiley Interscience. pp. 367(theorem 11.6.1).
Nielsen, Frank (2018). "What is... an information projection?" (PDF). 65 (3). AMS: 321–324. Cite journal requires |journal= (help)

K. Murphy, "Machine Learning: a Probabilistic Perspective", The MIT Press, 2012.
F. Nielsen, "What is... an information projection?", AMS Notices, (65) 3, pp. 321–324, 2018

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Cover, Thomas M.; Thomas, Joy A. (2006). Elements of Information Theory (2 ed.). Hoboken, New Jersey: Wiley Interscience. pp. 367(theorem 11.6.1).

[2] Nielsen, Frank (2018). "What is... an information projection?" (PDF). 65 (3). AMS: 321–324. Cite journal requires |journal= (help)

Information projection

See also

References