Pacheco-Cobos et al., A test of Boserup’s hypothesis in the milpa, SM p. 40

Supplemental Materials

Sample Preparation

Assembling comparative information on maize yields from ethnographic sources requires that we standardize heterogeneous reports of data. Different sources either estimated maize yield based on farmer report, direct observation or measurement by weight or bulk. Consequently, the precision and reliability of the information vary from case to case. We deal with unreported variables of interest to us (Table S2) by reconstructing them as closely as possible using censuses and geographically-based datasets which we can match with the study location. In most instances this resulted in coarsening of the variable, e.g. the use of regional biomes instead of the local ecoregion around a village. Potentially relevant but rarely or inconsistently reported information such as household size, maize varieties, labor input, seeding density was not included in our analysis. We grouped the more consistently available variables into three categories: geographical, environmental and agricultural (Table S2).

Geographical Variables

Country, State or Department, and the Municipality within which each Community was located were coded for general geographical reference. Although most studies report maize yields at the Community level, we included some that report maize yields at the Municipality or State levels. Geographical coordinates (latitude and longitude in decimal degrees) and names of study sites as originally reported or as found in geographical and governmental sources (e.g., GoogleTM Earth 2011 ; INEGI 2011a) were used to determine geographic features like elevation and environmental features like soil quality and biome. Overall, the sample lies between 09.05° to 21.62° north latitude, and -99.1° to -82.52° west longitude. Earth Point (2011) proved to be useful in converting coordinates between geographical formats. Elevation was coded in meters (m) above sea level.

When reported, we always used the original authors’ estimates for population density (persons/km2). When population size or density were not reported, we sought an estimate as close as possible to the year in which maize yields were assessed, using several tactics consistently: (a) where possible, we consulted in-country population censuses, seeking the most localized estimate available; (b) if population numbers but not area were available, we calculated territorial areas from local maps; (c) where possible, we confirmed reported local territorial areas with present-day records; and (d) where necessary, we used the density from the next largest geographical census unit that encompassed the location.

We found few reliable population statistics at the community level for studies early in the twentieth century. If community-level information was not available, we reported the population density of the corresponding Municipality, Department or State, using censuses available from governmental, statistical or educational institutions (SI n. d.), (INE 1966 ; INE 2001 ; INE 2002 ; INE 2010), (INEGI 1970 ; INEGI 1980 ; INEGI 1995 ; INEGI 2000 ; INEGI 2005 ; INEGI 2010 ; INEGI 2011a ; INEGI 2011b), (INAFED and SEGOB 2010), (INEC 1995). When not reported by governmental institutions we calculated community area either by using maps provided by authors in which the cultivated land was delimited (i.e., Reina 1967: 2) or by using detailed digital maps (INEGI 2013) on which we drew catchment polygons around the identified study sites. For instance, in the densely populated area of Los Altos de Chiapas, Mexico (Perales, et al. 2005), polygon boundaries were drawn halfway to adjacent communities. A similar procedure was followed for Komchen (Askinasy 1936 ; Shuman 1974) and communities visited by Steggerda (1941) in the Maya lowlands. For some of Stadelman’s (1940) data, in which reported Municipalities’ territories seem too small (< 15 km2), we use the corresponding, present-day area. Some of the lowest population densities in our dataset corresponded with areas known to be at an initial stage of colonization (Carter 1969 ; Redfield and Villa Rojas 1962 ; Urrutia 1967) or known to be protected (Carr 2008), while some of the highest population densities corresponded with areas that have been inhabited for centuries, such as Central and Southeastern Mexico (Altieri and Trujillo 1987 ; Perales, et al. 2005). The full sample covers a range of 1 person/km2 to 171 persons/km2 (Table S2).

Environmental Variables

Rainfall specifies annual precipitation, our sample covering the range of (750 – 3500 mm). When not reported in the original source, we use the average annual precipitation reported by the closest meteorological station (CONAGUA 2011 ; INIFAP 2012 ; INSIVUMEH 2011 ; SMN 2011). For Huehuetenango Department, Municipality rainfall patterns were matched with the detailed classification provided by Castañeda (1998: 37).

We superimposed our sample locations on the Harmonized World Soil Database (HWSD) (FAO, et al. 2012) and the Olson et al. (2001) ecological zonation database to find out which soil qualities and biomes are associated with each Community (Municipality or State). The HWSD defines soil quality based on seven categories (SQ1-SQ7), each assessed according to one of seven constraint levels. For analytical purposes we converted soil quality into a binary variable, treating constraint level 1 as a slight constraints category, and merging levels 2-5 into a moderate-to-severe constraints category. Levels 6 and 7 are not represented in our dataset (see Table S2). Although not included as independent variables in our analysis, the set of soil types matching our locations were: acrisols, alisols, cambisols, gleysols, leptosols, luvisols, nitisols, phaeozems, regosols, and water bodies (descriptions in IUSS Working Group WRB 2006).

The eco-zonation database of Olson et al. (2001) divides the world into 867 ecoregions nested within 14 biomes. Our sample sites fell within the three tropical and subtropical biomes identified as moist forest, dry forest, and coniferous forest (labels shortened from Olson, et al. 2001).[1]

The variable calendar year takes the integer value one for the earliest year of observation in our sample (1931) and sequentially increments through observations up to year 2011. We incorporate calendar year in order to determine if secular trends in fallow practices and/or yields are present in the sample.

Agricultural Variables

Maize yields were coded as total or mean kilograms per hectare (kg/ha). Given the variety of units reported (e.g., quintales per manzana, pounds per cuerda, bushels per acre, carga per mecate, zontles per ha) we frequently resorted to conversions. If equivalences were not given by the original authors, we found Rowlett's (2005) tables useful. The precision with which yields were measured varied widely from one study to another. In some cases dissimilar production estimates were observed within a particular region. For example, in the Peten region of Guatemala, Cowgill (1962) estimated yields to be over 1500 kg/ha, while Reina (1967) estimated them to be below 400 kg/ha. In other cases, repeated values for yields among different municipalities (i.e., Stadelman 1940) suggest that reported production data came from fixed estimates, not independent observations. We used year of harvest to designate the year in which maize yields were measured or estimated. If year of harvest was not indicated, we assumed it to be the year preceding the source’s year of publication.

Strategy specifies whether reported yields came from fields that farmers cultivated using traditional milpa practices, or a modified system in which maize is intercropped with mucuna (Mucuna spp.; velvet bean), which serves to fix nitrogen in soil and, under some conditions, increases productivity (Triomphe and Sain 2004). Maize yields from agricultural practices described and categorized in more detail in terms of weeding techniques, land preparation tools, planting densities, and intercropping with tree species were coarsened into the milpa category.

We coded fallow patterns, successive years of cultivation, and seasonal or successive cultivation within the annual cycle. Years cropped and years fallow indicated the number of years land was cultivated or left to rest, respectively. Summed, they determine the total cycle (years). The ratio years cropped to years fallow provides a measure of agricultural intensity, as defined by Turner et al. (1977). Authors generally report the crop-to-fallow ratio of land use, either quantitatively or qualitatively. If the reports were qualitative, we derived quantitative equivalents by reference to practices known from neighboring areas or ethnically related groups. In a few instances we used Boserup’s (1965: 8-9) five types of land use intensity, as summarized by Turner (1976) or Johnston (2003), to derive a quantitative estimate of the crop to fallow ratio. In other cases when maize yields were reported for the same community at different periods (e.g., Diemont, et al. 2006 ; Nations and Nigh 1980), it was useful to know that the later of the two studies reports a reduction in fallow.

We excluded from the analysis cases or sources not reporting fallow in either quantitative or qualitative terms (list, Figure 2 caption). Given our interest in understanding milpa land use intensification, we found valuable materials summarizing cropping and fallow cycles at a regional (Schwartz and Corzo 2010) or continental (Kass and Somarriba 1999) scale, as these established context for our coded fallow values.

Year in cycle denotes the position of the measured yield in successive years of cultivation. If this was not reported, we assume that yield measurements correspond with the first year in the cropping cycle. This variable is particularly important since it can signal how continuous land use affects maize yields, and how this is influenced by different agricultural strategies such as milpa versus mucuna. A third year or more of continued cultivation is rare for the milpa strategy but not for mucuna. Fields sometimes are named according to their place in the cycle. For example, Reina (1967) reports that milpa de monte and milpa de cañada correspond to fields in their first and second successive year of cultivation, respectively.

Harvest season specifies whether the maize yields reported correspond to the first or to a second season of maize cultivation. In Central America the first maize season is grown during the summer or rainy months, while the second corresponds with maize grown during the winter or dry months. This cultivation pattern allows for two harvests, although they do not necessarily occur within the same calendar year. The first season maize harvest commonly is reported to be the most productive, but when intercropped with mucuna a second season maize harvest can achieve similar yields (Triomphe and Sain 2004). When not reported, we assume yields to correspond with the first harvest season. Traditional names for first and second maize season are, respectively: milpa de año and el tornamil (Pool Novelo, et al. 1998), maiz de temporal and maiz tapachole (Eilittä, et al. 2004), or tzotzil col and yaax kinil col (Pacheco-Cobos, personal observation, Santa Cruz, Belize). The growth of a third maize crop (San José or payapak) within the agricultural year was reported by Schwartz (1990) and Cowgill (1961) in Petén, Guatemala, but only the latter reported maize yields for this crop.

In a few cases crops per year replaces information about harvest season in the original sources. This is of particular interest for calculating effective fallow a derived variable reflecting the degree to which a particular piece of earth is in continuous production, thus a measure of Boserupian intensification.

Statistical Modeling

Our dataset is assembled from multiple ethnographic sources spanning approximately 80 years; it consequently presents analytical challenges. First, the precision and reliability of yields and other measurements vary from source to source – a situation we can mitigate as described in “Sample Preparation,” but which we ultimately must accept. Second, the dataset contains clusters of observations in which yields were monitored in spatially proximate communities and/or over successive years. These clusters were identified by close reading of the primary sources and coded as a categorical variable. Within clusters, distinct yields tend to be paired with similar or identical covariate values, introducing dependence in prediction errors. The models must account for clustering effects if they are to be reliably interpreted. Third, the sizes of clusters vary considerably across the sample: of the n=297 cases included in the dataset, 54 form singleton clusters containing only one record while the remaining 243 cases belong to clusters containing between 2 and 48 records.

To address these challenges, we fit multi-level regression models containing random intercepts for clusters (Gelman and Hill 2007). Practically speaking, random intercepts allow each cluster to have a unique baseline yield, while requiring all members of the cluster to share the baseline. Cluster-specific baselines capture additional sources of heterogeneity in yields, which may result from unique growing conditions or practices unmeasured by our geographical, environmental and agricultural covariates. The shared baseline acknowledges that members of the same cluster may have correlated yields.

Mediation Analysis

In a mediation analysis, the indirect effect of a change from population density D to population density D + δ is defined as the expected difference in unadjusted yield that would obtain if population density were held fixed at D, while changing effective fallow to the value it would take if population density were increased to D + δ (Pearl 2012). To estimate indirect effects as D and δ vary, we generate potential, that is “counterfactual”, yields using the parametric algorithm of Imai et al. (2010). In observational studies such as ours, potential outcomes can be generated probabilistically using coefficients of a submodel (Model M), along with estimates of model- and observation-level uncertainty. The parametric algorithm is general enough that estimation can be carried out even though population density has non-linear relationships with the mediator as described by effective fallow, and response identified as unadjusted yield.

Supplemental Tables

Pacheco-Cobos et al., A test of Boserup’s hypothesis in the milpa, SM p. 40

Table S1. Milpa maize database sources, characteristics and notes.

# / Source
(Country) / Community
(Ethnicity) / Pers./km2 / Crop to fallow / Strategy / Year of harvest / Notes /
1 / (Alcorn 1989)
(MEX) / Tamjajnec
(Huastecos) / 1.2 / 1:3 / Milpa / 1987 / Unadjusted yield: Maize production was less than half of family’s needs (maize purchases necessary). Conversion: USD/ha to kg/ha. In year 1987 the price for maize = 234 pesos/tonne (FAOSTAT 2011). Household’s forest management in patches (te’lom) complements production from other farm subunits. Sale of sugar is the primary source of cash income. Coffee cultivated in some areas.