For each sub-branch, the algorithm is then repeated. In principle, this process could continue until every observation has been placed into its own branch. This would be akin to including as many explanatory variables as observations in a regression and thus getting a “perfect,” if meaningless, fit. A termination rule is thus required. The rule used resembles, loosely speaking, an adjusted R2 criterion. After each split, the improvement in the overall fit (which, just like the change in the raw R2 upon adding an additional explanatory variable is always non-negative) is combined with a penalty on the number of branches which promotes parsimony. If the penalty exceeds the improvement, the branch is terminated at the prior node, if not, the algorithm continues.
Several aspects of the algorithm are noteworthy. First, the algorithm automatically establishes both a global (full sample) and a set of local (sub-sample) priority ordering among the potential determinants. It thus identifies both globally important variables and variables which, while not globally important, are nevertheless significant for a sizable subset of observations. Second, it allows for a variable to only become an important determinant conditional on a number of prior condition on other variables having been met, and thus automatically allows for context dependence. Third, the procedure is very robust to outliers since splits occur on an interior threshold, an issue of particular importance in our application (see Levine and Renelt ). Fourth, the decision tree is invariant to any monotone transformation of the variables. This is especially useful in the empirical growth literature, where there is very little theory to provide guidance on the appropriate functional form.