To illustrate, suppose we sort all growth observation by size, and define the top third of observations as “high growth”, coded as 1, and the bottom third as “low growth”, coded as 0. The sample is then randomly separated into a core sample and a smaller test sample used for robustness checks. For the core sample, the algorithm searches for sequential splits, each consisting of the explanatory variable, and its associated threshold value, which best discriminates between the two groups. In most cases, the fit will not be perfect. Suppose, for example, that investment is correlated with growth, and is thus a potentially useful discriminant. There will, however, be some countries that have high investment rates but (nonetheless) belong to the low growth sample (a type I error), and others that have low investment rates but belong to the high growth sample (a type II error). The algorithm searches over all observed values of the investment rate until it finds the threshold value £j which minimizes the sum of the type I and type II errors.
This minimum sum of errors provides a natural gauge of the ability of investment rates to predict fast versus slow GDP growth. The same procedure is applied sequentially to each of the J explanatory variables (e.g. human capital, trade openness, etc). Sorting all explanatory variables by their minimum error then provides a ranking of their relative ability to discriminate between the two groups. To check robustness, the threshold for each variable (computed for the core sample) is then used to split the test sample, yielding a second sum of errors. Together, the core sample and the test sample scores provide an overall measure of the ability of the variable to discriminate. The variable with the smallest error (with the associated best threshold) is then used to form the first node. All sample observations exceeding the threshold are sorted into one sub-branch, the remaining observations are sorted into the second sub-branch.