A prominent sense of taste is key to learning anything well. Paraphrasing Jiro Ono, arguably the greatest sushi chef, "If your sense of taste is lower than that of the customers, how will you impress them?". For his chefs in training, he believes that it is the sensitivity of their palates that determine the heights that they will eventually attain.
This makes sense. The more sensitive the artist is to differences, more is learnt in each iteration. The more that the writer is affected by the images conjured by a piece of text, greater is the number of crumbled drafts that end up in the dustbin, and the resulting improvement. The more that the composer's heart rises with a particular rhythm of notes, more is the impetus to innovate.
Learning continues for as long as discrimination is possible and promptly ends when one is no longer able to do so. It may be said that the limits of discrimination determine the limits of learning. It is particularly fascinating that this maxim, of breaking down learning to the act of testing between alternatives, has resulted in, perhaps, in one of our greatest conceptual leaps.
Towards illustrating this, consider the following learning task. Suppose we have 𝑛 distinct integers arranged in some unknown order. Our task is to learn this arrangement, via continually querying for the larger number at two different locations.
Notice that there are 𝑛! possible ways in which these integers can be arranged. Also notice that there are 2ⁿ possibilities corresponding to an 𝑛-length binary sequence. We may thus draw a one-one mapping between each integer arrangement and a log₂(𝑛!) long binary sequence. On the other hand, using any known sorting algorithm of complexity O(𝑛 log₂𝑛), O(𝑛 log₂𝑛) queries suffice in determining the order. Now log₂(𝑛!) = O(𝑛 log₂𝑛). Therefore each of the O(𝑛 log₂𝑛) queries can be viewed as resolving one element of uncertainty in this binary sequence.
In other words, learning here is equivalent to the cumulative reduction in uncertainty that results from the tests. In order to generalize this example, let us retrace the steps we just took. Notice that once the learning task was framed, a space of alternatives was automatically created. Creation of this space is implicit in the definition of any such learning task. Thus, it makes sense to capture and characterize the complexity of the particular learning task via the size of this alternative space.
Further, as we just saw, testing is one natural way to identify the particular alternative in that space. Several such testing strategies might be possible. The minimum number of tests across strategies depends on the particular alternative space and the type of test that is used. Under a given alternative space, the more complicated that each test is, less is the number of tests required. But upon fixing these, the minimum number of tests is directly related to the size of the alternative space.
The notion of 'bit' allows us to quantify both these quantities. First, the size of a particular alternative space, in bits, is simply the length of the smallest binary strings to which the space can be mapped to, injectively. Second, each test is worth one bit if it resolves between two alternatives. That is, a test to which the answer is either a 'yes' or a 'no'. With both the reserve of information and the amount exchanged in each transaction quantified by the number of 'bits', information can be treated as a currency.
Now what makes a currency particularly useful is the abstraction that it affords. For example, treating money from a both a fruit vendor and a venture capitalist in exactly the same manner allows banks to conduct commerce at scale. In the same vein, the universal nature of bits allow for algorithms to process, store and transmit these alternatives independent of the learning task. For instance, the same algorithm to learn the integer arrangement may be applied to learning an alphabet arrangement. Thus it is the 'bit' that has allowed us to raise information to the status of a real physical entity, enabling the current 'Information Age'.
This duality between testing and learning has since been used in other domains too. In machine learning, for instance, in deriving the VC dimension which quantifies the complexity of hypothesis classes. VC dimension here is related to the number of tests required to distinguish between hypotheses. Similarly, this duality is found in several reductions that occur in statistics and theoretical computer science. And in several such cases, the duality has been found to be tight, that is, the bounds derived from testing matches that of the task's difficulty.
Testing paves way, and determines the limits of learning. Understanding this relationship better has been one of the cornerstones of 20th century research.
This makes sense. The more sensitive the artist is to differences, more is learnt in each iteration. The more that the writer is affected by the images conjured by a piece of text, greater is the number of crumbled drafts that end up in the dustbin, and the resulting improvement. The more that the composer's heart rises with a particular rhythm of notes, more is the impetus to innovate.
Learning continues for as long as discrimination is possible and promptly ends when one is no longer able to do so. It may be said that the limits of discrimination determine the limits of learning. It is particularly fascinating that this maxim, of breaking down learning to the act of testing between alternatives, has resulted in, perhaps, in one of our greatest conceptual leaps.
Towards illustrating this, consider the following learning task. Suppose we have 𝑛 distinct integers arranged in some unknown order. Our task is to learn this arrangement, via continually querying for the larger number at two different locations.
Notice that there are 𝑛! possible ways in which these integers can be arranged. Also notice that there are 2ⁿ possibilities corresponding to an 𝑛-length binary sequence. We may thus draw a one-one mapping between each integer arrangement and a log₂(𝑛!) long binary sequence. On the other hand, using any known sorting algorithm of complexity O(𝑛 log₂𝑛), O(𝑛 log₂𝑛) queries suffice in determining the order. Now log₂(𝑛!) = O(𝑛 log₂𝑛). Therefore each of the O(𝑛 log₂𝑛) queries can be viewed as resolving one element of uncertainty in this binary sequence.
In other words, learning here is equivalent to the cumulative reduction in uncertainty that results from the tests. In order to generalize this example, let us retrace the steps we just took. Notice that once the learning task was framed, a space of alternatives was automatically created. Creation of this space is implicit in the definition of any such learning task. Thus, it makes sense to capture and characterize the complexity of the particular learning task via the size of this alternative space.
Further, as we just saw, testing is one natural way to identify the particular alternative in that space. Several such testing strategies might be possible. The minimum number of tests across strategies depends on the particular alternative space and the type of test that is used. Under a given alternative space, the more complicated that each test is, less is the number of tests required. But upon fixing these, the minimum number of tests is directly related to the size of the alternative space.
The notion of 'bit' allows us to quantify both these quantities. First, the size of a particular alternative space, in bits, is simply the length of the smallest binary strings to which the space can be mapped to, injectively. Second, each test is worth one bit if it resolves between two alternatives. That is, a test to which the answer is either a 'yes' or a 'no'. With both the reserve of information and the amount exchanged in each transaction quantified by the number of 'bits', information can be treated as a currency.
Now what makes a currency particularly useful is the abstraction that it affords. For example, treating money from a both a fruit vendor and a venture capitalist in exactly the same manner allows banks to conduct commerce at scale. In the same vein, the universal nature of bits allow for algorithms to process, store and transmit these alternatives independent of the learning task. For instance, the same algorithm to learn the integer arrangement may be applied to learning an alphabet arrangement. Thus it is the 'bit' that has allowed us to raise information to the status of a real physical entity, enabling the current 'Information Age'.
This duality between testing and learning has since been used in other domains too. In machine learning, for instance, in deriving the VC dimension which quantifies the complexity of hypothesis classes. VC dimension here is related to the number of tests required to distinguish between hypotheses. Similarly, this duality is found in several reductions that occur in statistics and theoretical computer science. And in several such cases, the duality has been found to be tight, that is, the bounds derived from testing matches that of the task's difficulty.
Testing paves way, and determines the limits of learning. Understanding this relationship better has been one of the cornerstones of 20th century research.