Proposal for the Evaluation of São PauloSchool Employees Performance Pay Reform

Barbara Bruns,Claudio Ferraz, and Marcos Rangel[*]

October 2008

BACKGROUND

Context for the reform in Brazil. Despite spending significant resources in education and increasing school attendance at all levels during the late 1990s and early 2000s, Brazil’s education performance is significantly lower than countries with similar income per capita. In the 2006 Program for International Student Assessment (PISA) test of math, science and literacy among 15 year old students, for example, Brazil ranked 54th among 57 countries in mathematics, scoring lower than Argentina, Indonesia, Mexico, Chile, Thailand and Uruguay. While reading performance was better, with Brazil ranking 49th out of 56 countries, Mexico, Thailand, Uruguay and Chile all scored significantly higher. These internationally benchmarked results as well as evidence from national and state-level learning assessments showing very low average proficiency levels have led to an overall agreement among policy makers that Brazil’s education challenge lies in improving the quality of public schools.

The large decentralization process that took place since 1988 and the introduction of several government programs aimed at increasing the amount of resources going to public schools have not improved the performance of students in test scores accordingly. Moreover, the distribution of test scores, even within small regions, shows a great disparity (controlling for student and family characteristics), suggesting that teacher quality and school management plays an important role (Menezes-Filho, 2007).

Teachers face weak incentives in Brazil. Salaries are relatively low and there are not incentives linked to performance as salaries are mostly determined by tenure (Holanda-Filho and Pessoa 2007). Low teacher motivation and large indices of absenteeism from the classroom directly affect students’ performance. About 30,000 teachers, or 12.8% of the total teaching force, are reported absent each day in the Sao Paulo state education system.

At the end of 2007, the São Paulo State Secretary of Education launched a program aimed at improving the quality of its 5,000 primary and secondary schools and 250,000 teachers. The program consists of several actions, among those the introduction of a new curriculum with clear guidelines on the material and competencies to be taught in each grade, a strong focus on universal literacy for young children and the introduction of supervisors to help directors and teachers improve school management. A central feature of the educational reform in São Paulo is the introduction of an innovative “teacher bonus” to link pay more closely to performance for all state school employees.

Context for research interest in evaluating this program. Despite the central relevance of teacher contracting and pay policies for education system performance, the evidence base on “what works” is weak. In both developing and developed countries, teacher pay is overwhelmingly based on educational attainment, training and experience, rather than performance. Yet variations in teacher performance, even within a single grade in the same school, are substantial (Rivkin et al 2001). Hanushek (2004) has estimated that the “good teacher effect” on student learning outcomes is roughly equivalent to the effect of a 50% decrease in average class size in the US– a much costlier reform. Studies also indicate a weak correlation between teachers’ actual effectiveness and the most common proxies for teacher quality, namely education and experience. Most of the evaluated experience with bonus or merit pay has been in the US. The early experience was not effective (Cohen and Murnane 1986), but these experiments may have been too limited in the magnitude of the reward and the character of the performance evaluation (Hanushek 1994).

The most carefully evaluated programs outside of the US are a cash bonus program for secondary school teachers in Israel (Lavy 2004), a program awarding prizes to teachers in grades 4-8 in rural Kenya (Glewwe, Ilias and Kremer, 2008), and a study in Andra Pradesh India that is currently in its second year. The Israeli results showed significant effects on student performance in the subject areas rewarded, which were attributed by the researchers to changes in teaching methods, after-school-teaching, and increased responsiveness to students’ needs. The researchers concluded that the cash bonuses for individual teachers were more cost-effective than alternative programs which offered cash bonuses for schools as a group or added instructional time to all schools. The Kenya study found relatively modest effects on student learning in the treatment schools, but these gains disappeared after a year. There was little evidence of teacher effort aimed at increasing long-run learning: teacher attendance did not improve, homework assignments did not increase, and pedagogy did not change. The only observed change was that teachers conducted more test preparation. The AP study (Muralidharan and Sundararaman, 2007) is a larger scale, longer-duration study, which after one year found a significant (.19 SD in math, .12 SD in language) impacts on student learning from both group-based (whole school rewarded for average learning gains) and individual incentives (teachers rewarded differentially based on the gains registered by their own class).

The proposed evaluation of Sao Paulo’s bonus program would be the first rigorous evaluation of such a reform in a middle-income developing country, in a program at scale. The study would have high marginal value as a complement to the existing research base in developing countries, which are of pilot programs in low-income settings, and produced somewhat inconsistent results. Our proposed study will evaluate how merit pay affects teachers’ effort, training uptake, skills and classroom practice, and student learning outcomes, and whether it promotes significant adverse behaviors (diverting curriculum time from non-tested subjects or manipulation of test results). Deeper understanding of these issues is needed for effective policy in this area.

THE INTERVENTION

The performance pay system designed for the São Paulo state schools is an annual bonus paid to schools based on how well they meet individual school level targets. Thus, school progress is measured and rewarded on a value-added basis, and implicitly takes into account schools’ differing socioeconomic contexts and specific educational challenges. The incentive is a strong one: for schools that meet 100% of their target, all employees will receive a bonus equivalent to three monthly salaries. For schools that do not meet their targets, the bonus will be paid proportionally to the percentage of the target met (i.e. schools that meet 50% of their target will receive a bonus equivalent to one and a half monthly wages).

The target is calculated for each school based on two sets of indicators. First, 70 percent of the target is calculated based on SARESP (Sao Paulo state annual achievement test) test scores and average school level promotion rates.[1] The remaining 30 percent is based on teacher attendance and school management indicators.

The SARESP is a standardized test applied annually to all schools in the state of São Paulo. All students in grades 1st, 2nd, 4th, 6th, 8th of primary school (ensino fundamental) and the 3rd (final) year of high school (ensinomédio)are tested on their knowledge of Mathematics and Portuguese. The scale of the exams varies between 0 and 500. Instead of defining school level targets based on average scores, which could create incentives for schools to disregard students at the bottom of the learning distribution, the Secretary of Education decided to use information from the whole distribution of test scores. Four levels of proficiency were created in order to facilitate teacher interpretations of the scores: Below Basic (Not Meeting Learning Standards), Basic (Partially Meeting Learning Standards), Proficient (Meeting Learning Standards), and Advanced (Meeting Learning Standards with Distinction).[2] The cut-offs for each category are different for each grade. For Mathematics, for example, the 4th grade cut-offs are 175, 225 and 275. For 8th grade, they are 225, 300, and 350.

In order to aggregate the percentage of students that belong to each category into an index, the Secretary of Education assigned values that penalize the schools linearly for students that are below the Advanced category. The index assigns a penalty of 3,2,1 and 0 to students in each category (the value 3 is assigned to students Below Basic, the value 2 is assigned to students in Basic, and the value 1 is assigned to students in the Proficient level). The indicator of grade discrepancy is then calculated as:

This indicator is then converted into an index that varies between 0 and 10 using the following formula:

In addition to this index, the indicator used by the secretary of education takes into account the approval rates. For each school, the primary school years are divided into two groups: the first that varies from 1st to 4th grade and the second from 5th to 8th grade. The average time it takes for students to complete a grade (or group of grades) is then calculated using the fact that the sum of the inverse of approval rates for each grade provides an estimate of the average time that it takes for students to complete a grade. This flow measure of average time it takes to complete a grade, F, is normalized to vary between zero and one by dividing the number of years that should take to students to complete a group of grade by the actual time that it actually takes.

The performance indicator based on test scores I, is then combined with the flow indicator F to create a measure of school quality for São Paulo--the IDESP[3]:

Because F varies between 0 and 1, it penalizes the schools for taking longer than expected to complete a series of grades and thus creates incentives in the direction of automatic promotion. But because performance also depends on test scores, there is a countervailing incentive which penalizes schools if students do not learn adequately.

Using the IDESP as an indicator of quality for each school, the secretary of education used the same methodology as the IDEB, implemented by the Ministry of Education. They assume that all schools will converge in 2030 to a maximum grade that equals 9. A logistic function is then estimated and the predicted value for each school and year from 2008 to 2030 provides the target that the school has to attain in that specific year.

The bonus will be paid for all schools according to the percentage of the target achieved by the end of the year. All employees from the schools that meet the target will receive a 100% bonus (equivalent to three monthly salaries), while schools that do not meet the target will receive a bonus that is proportional to the percentage of the target that is attained (e.g. for a school that meets 80% of the target, all employees will receive a bonus of 0.8*3 monthly salaries).

Primary Research Questions and Outcome Indicators

This evaluation aims to answer 10 research questions:

1) Does linking teachers’ pay to indicators of school performance via a bonus result in improved student learning?

2) Does linking teachers’ pay to indicators of school performance via a bonus reduce teacher absence?

3) Does linking teachers’ pay to student test scores via a bonus result in positive behaviors such as increased teacher effort (hours worked, quantity of homework assigned and graded), more effective teaching strategies, or reassignments in school personnel in favor of tested grades and subjects?

4) Does linking teachers’ pay to student test scores via a bonus result in undesirable behaviors such as manipulation of test results or reduced class time spent on non-tested subjects?[4]

5) Does giving schools information about the rules of the game for the bonus significantly improve their chances of earning it?

6) Are schools that are unsuccessful in the first year of the bonus program more or less likely to put effort into competing for the bonus in subsequent years?

7) What strategies do schools use to try to improve performance under the bonus program?

8) To what extent do the levels of trust, teamwork and cooperation within schools explain their success in accessing the bonus?

9) How does success -- or lack of success -- in the first year of the bonus program affect levels of trust, teamwork and cooperation within schools?

10) What strategies do schools employ to build trust, teamwork and cooperation?

Outcome indicators will include: student test scores (SARESP), student enrollment, promotion and completion rates, teacher/school personnel absence rates. For all schools, the Secretary of Education collects rich socioeconomic and other background data on school directors, teachers, supervisors and students via an annual school survey, and the state also surveys parents via an online survey. São Paulo also has good budget data, which we will use to estimate changes in school-level spending and the cost-effectiveness of the reform in producing student learning improvements.

For a sample of schools, we will also try to deepen the analysis in three areas. First, we will collect data on teachers’ instructional strategies, use of time and classroom resources, through direct observation using a standardized classroom observation instrument. Second, we will collect qualitative data from directors, teachers, supervisors students and parents about perceptions of the bonus program and school-level changes both prior to and after the first round of bonus payments. Finally, through the application of an innovative set of new instruments, we will develop direct measures of the levels of trust, teamwork and social capital within schools.

Evaluation Design/ Identification Strategy

  1. Estimating the impacts of the target based scheme

The core evaluation question for any performance pay scheme is whether its introduction improves performance. In the context of education, performance is measured by the acquisition of cognitive skills. Thus, this evaluation aims at estimating whether teachers under a target-based scheme put more effort into teaching and consequently improve students’ cognitive skills. The effects of introducing such a scheme can be divided in two. First, the announcement that a bonus will be paid based on a school target scheme might induce teachers to increase their effort in teaching and therefore affecting students’ performance in test scores. This can be evaluated using pre and post-bonus data on test scores if there is variation across schools that introduce the bonus (ideally chosen randomly, as in Muralidharan and Sundararaman, 2007). Secondly, the payment of the bonus might have an effect on subsequent teacher behavior. Teachers that receive a large bonus might get encouraged while those that receive a small bonus might get discouraged. This evaluation proposal aims at measuring both the short and the medium-term effects of the target based bonus scheme.

Sao Paulo’s target scheme has three characteristics that make it unique and allow for a quasi-experimental evaluation strategy. First, schools that are in the bottom of the distribution of test scores will have to gain more with respect to their initial test scores in order to attain the target. Second, small differences in the distribution of test scores can induce larger differences in school targets because there are thresholds of test scores that divide each category. Third, schools with the same target have differential incentives because one can have more students near the threshold while another might have students that are far away from the cutoff.

2. Measuring the short-term effects of the target-based bonus

The announcement of the new performance pay scheme is expected to create new incentives for school employees. One way to credibly estimate the effects of the bonus announcement would be to randomize the introduction of a bonus system for a sub-group of schools and keep another group of schools under the current scheme. This strategy is followed by Muralidharan and Sundararaman (2007) to study the effects of group versus individual bonus schemes in rural India.

For the current program, which is being implemented at scale in São Paulo, randomization is not possible. We proposes a quasi-experimental approach that exploits discontinuities created in targets across schools based on the initial performance indicator and differences in incentives created by the distribution of SARESP grades.

The first idea exploits the fact that similar schools might end up with significantly different targets because of small differences in the initial SARESP distribution. The second idea exploits the fact that schools with the same target face differential incentives to meet the target depending on how far away from the cut-off to cross levels their students are. For two schools that have to meet the same target, the school with more students near the thresholds faces stronger incentives.

3. Exploiting Differential Incentives

Despite the fact that the rule of the bonus is the same for all schools, the effect of the bonus will depend on how easy it is for a school to meet its target. Schools that face the same target will have differential incentives to meet these targets depending on how far away from the thresholds that define the quality categories its students are located. A simple example will help to illustrate this point. Suppose there are two schools with 5 students each. School A has 3 students with a math score in the SARESP of 110, 1 student with 151 and another with 280. School B has 3 students with scores of 149, 1 student with 151 and another with 280. Because both schools have 60% of their students below basic level (score less than 150), 20% of students in the basic level and 20% in the advanced level, these two schools will have the same indicator of performance equal to 2.67. Suppose for simplicity that they have the same approval rates, their target for 2008 is going to be exactly the same. Nonetheless, the effort these two schools have to put to meet their target is significantly different. While school B has to put little effort to make its students cross the 150 threshold, school A effort will have to be significantly larger. Hence, the incentive for extra effort for schools that have a larger share of students near an upper threshold will be stronger. Conditional on the target, the incentives faced by schools should decrease as the average distance of students’ scores from the cut-offs increases.