The Minimum Description Length framework is powerful but is often overlooked. I believe that 1 reason for this is that methods for attaining efficient encodings are subtle. In this paper, I discuss one of those techniques, stochastic encoding. When there are multiple nearly equally valuable choices of a parameter, it is more valuable to choose stochastically—according to a probability distribution— rather than selecting the single best choice. Why? Because information can be transmitted in which parameter is chosen. This is exactly the “bitsback” argument
shows how text classifiers can be traced back to compression algorithms