New York City is experimenting with a policy that provides struggling “Renewal schools” with additional resources and community services in an attempt to turn them around rather than simply closing them. In some cases, the city has had to take additional aggressive steps: Last week, it announced that after three years spent trying to turn around two Renewal high schools, all teachers and staff there will have to reapply for their jobs.
Two recent evaluations have found that the results from the Renewal Schools program thus far have been disappointing at best. In a report for the Manhattan Institute, I found evidence that being classified as a Renewal school led to improvements in a school’s average test scores through 2016, two years after the policy’s full implementation, that were statistically significant — that is, estimated precisely enough to provide confidence that the measured effect of the policy is not entirely due to random chance — but too small to justify the treatment’s exorbitant expense. (Adding the just-released 2017 test scores to the analysis has no meaningful impact on the findings.)
The administration’s apparent position — that no research design can inform us about the effects of Renewal schools — is especially troubling.
My study compares the trajectory of average test scores within a school before and after being designated a Renewal school, holding constant all school attributes that do not change over time. Another recent analysis for Chalkbeat, by Teachers College professor Aaron Pallas, found that test score gains in Renewal schools are no different from those made by other schools with similar demographic profiles. Although I obviously prefer the method I chose for my paper, the method that Pallas applied is also rigorous.
How could two rigorous evaluations of the same data produce seemingly different results? One reason is that Pallas and I make slightly different decisions about how to define treatment (he says that it started in 2015, when the policy was introduced; I say it really started in 2016, which is when all schools had actually received increased services) and which test scores we emphasize. But another potentially important reason is that the different methods we apply to measure the impact of Renewal schools rely on different assumptions.
You heard that right. All findings from empirical research rely on assumptions. Some research methods require more and stronger assumptions than others. But no study firmly gives the answer to a research question.
The administration of Mayor Bill de Blasio doesn’t like the results of either my or Pallas’s study, and has challenged the underlying assumptions of our approaches. They correctly point out that Pallas’s design doesn’t account for the fact that Renewal schools were specifically chosen because they were in trouble, and so they might not be expected to perform as well as other schools with similar demographics. In the case of my study — well, it isn’t exactly clear to me why the administration doesn’t think that my estimates are valid. They just don’t. To be charitable, my method assumes that no factors specific to Renewal schools both change over time and are related to average school test score growth.
Fair enough. All assumptions can be challenged. We academics constantly question each other’s assumptions in hotel conference rooms.
That said, if the Renewal schools policy were as effective as the administration had hoped, we should see some sign of it in analyses like these. Said another way, the assumptions required to argue that each of these studies is missing a true large policy effect are simply implausible.
The administration’s apparent position — that no research design can inform us about the effects of Renewal schools — is especially troubling. It has taken a similar stance in previous cases when the results from empirical research didn’t square with its worldview. That’s OK. They know what works. Trust them.
Taxpayers looking at the program’s $400 million–and-growing price tag should find that position unacceptable. It was also entirely avoidable.
Studies like mine and Pallas’s use what researchers call quasi-experimental designs. These methods try to replicate as closely as possible the result that would have emerged had there been a randomized experiment. This sort of research is often convincing. But real experiments are always preferred.
The administration could have implemented the program as a randomized field trial (RFT). Rather than simply designating about 100 low-performing schools as Renewal schools, they could have randomly assigned Renewal school status among 200 or so schools they thought would benefit from the treatment. The process is similar to a medical trial. Then we could justifiably attribute to the treatment any difference in the later outcomes of students in Renewal schools and the other schools. The necessary assumptions to believe such estimates (there are always assumptions!) are few and weak.
It’s too late for Renewal schools. But there will be more big policy interventions in the future. It’s past time that policymakers in New York City and elsewhere consider how a policy will be studied when developing its design and implementation. Applying an RFT is just about required in order to get federal funds to experiment with a new education program. It is also common procedure for pilot programs funded by private foundations. It should similarly be common practice for states and cities that use taxpayer dollars to adopt big, expensive, and potentially important policy interventions.
Convincing evidence of program effects is essential for good policymaking.
Some worry that RFTs unnecessarily and immorally withhold a beneficial intervention from those who need it. The first problem with that concern is that we don’t know whether an intervention is beneficial until we have convincingly studied its effects. But a more practical matter is that budget constraints often limit the availability of a treatment anyway. (I’m confident that the de Blasio administration would have further expanded the Renewal schools program had it the budget to do so.) In those cases, randomization not only allows for rigorous study of the policy’s effects but also is the fairest way to allocate a scarce resource.
Convincing evidence of program effects is essential for good policymaking. In the case of Renewal schools, the evidence isn’t perfect, but it clearly suggests that the policy has produced disappointing results. If we want more conclusive evidence about whether the next big policy intervention “works,” then policymakers need to implement it with research design in mind.
This piece originally appeared at the The 74