He empirical growth literature seeks to identify factors which are important in determining output growth. In the familiar cross-country regression framework (and likewise in limited dependent variables regressions if the dependent variable is “high” growth versus “low” growth), a variable is considered “important” if, controlling for the other regressors, it can explain a large fraction of the variation in the dependent variable. To be sure, this is a relevant gauge of the universal importance of a variable. Yet, even if a factor is not robustly linked to growth for the entire sample, it may well be of key importance for a subgroup of observations. This is more difficult to unearth within a regression framework, which implicitly assumes that the same functional form to apply to all countries.
For instance, human capital may be robustly associated with growth, but for the subgroup of countries suffering from military conflict, devoting more resources to education will arguably have a very limited effect. Standard regression analysis in effect computes an average of the effect over the two subsamples (war/no war), impeding identification of the link. If the type of non-linearity is known, controls can of course be included, in the above case, adding a dummy for civil conflict as well as the product of the dummy and the human capital accumulation variable would suffice to capture the effect. Yet such precise knowledge about the type of non-linearity is the exception rather than the rule. Without such priors, and in the presence of multiple explanatory variables, allowing for non-linearities in the standard regression framework soon becomes impractical, or requires imposing largely subjective ex-ante assumptions.
The presence of non-linearities can, however, be readily explored in the context of a sequential decision tree using criteria based on the explanatory variables (has the country experienced war? is per capita income in the lowest quintile?) at each node to split the sample into sub-branches. A binary recursive tree provides a specific algorithm for implementing this type of sequential decision tree. Formally, it is a sequence of rules for predicting a binary dependent variable у on the basis of a vector of independent variables Xj, j = 1, ….J. At each branch of the tree, the sample is split according to some threshold value Xj of one of the explanatory variables into two sub-branches. The splitting is repeated along the various sub-branches until a terminal node is reached.