The Mission of the Manhattan Institute is
to develop and disseminate new ideas that
foster greater economic choice and
individual responsibility.


Civic
Report

No. 55 October 2008


Grading New York:

An Evaluation of New York City's Progress Report Program

Marcus A. Winters, Ph.D., Senior Fellow, Manhattan Institute

 

Executive Summary

In 2006-07, New York City, the largest school district in the United States, decided it would follow several other school systems in adopting a progress report program. Under its program, the city grades schools from A to F according to an accumulating point system based on the weighted average of measurements of school environment, students’ performance, and students’ academic progress.

The implementation of these progress reports has not been without controversy. While many argue that they inform parents about public school quality and encourage schools to improve, others contend that grades lower morale at low-performing schools. To date there has been too little empirical information about the program’s effectiveness to settle these questions.

This paper incorporates student-level data in a regression-discontinuity design to study the impact of a school’s receipt of a particular grade – A, B, C, D, or F — on student proficiency in math and English one year later.

The main findings of the paper are as follows:

  • Students in schools earning an F grade made overall improvements in math the following year, though these improvements occurred primarily among fifth-graders.
  • Students in F-graded schools did no better or worse in English than students in schools that were not graded F.


About the Author

MARCUS A. WINTERS is a senior fellow at the Manhattan Institute. He has conducted studies of a variety of education policy issues including high-stakes testing, performance pay for teachers, and the effects of vouchers on the public school system. His research has been published in the journals Education Finance and Policy, Economics of Education Review, Teachers College Record, and Education Next. His op-ed articles have appeared in numerous newspapers, including the Wall Street Journal, Washington Post, and USA Today. He received a B.A. in political science from Ohio University in 2002 and a Ph.D. in economics from the University of Arkansas in 2008.


Introduction

Several public school systems in the U.S. have recently adopted policies intended to hold schools accountable for student outcomes, as measured typically by standardized math and English exams. One type of accountability policy centers on what are often referred to as “progress reports.” Programs using them provide schools with what are essentially public report cards, which grade them from A to F and often bring them material rewards or sanctions.

The New York City public school system, the largest school district in the United States, adopted a progress report program, which first graded schools on the basis of their performance at the end of the 2006–07 school year. Under the program, schools accumulate points based on their students’ performance on standardized exams and a variety of other factors. Besides risking public disgrace, schools that repeatedly receive F or D grades are subject to review and ultimately face takeover by the city. Schools that earn high grades are eligible to receive rewards.

The goal of the city’s policy is twofold: to inform parents about school quality; and to encourage schools to improve in response to incentives. Many argue, however, that the policy harms public schools by depressing the morale of teachers and others. For example, a November 11, 2007, editorial in the New York Times argued that “the practice of giving, say, an F, to an otherwise high-performing school that lags in student improvement for a single year stigmatizes the entire school and angers parents.”

Unfortunately, the often vigorous debate between those who argue that the grading policy is essential to improving New York City’s public schools and those who believe that it is detrimental to them has thus far occurred in a data vacuum. This paper seeks to inform this debate with empirical evidence on the program’s effectiveness.

In particular, we follow the strategy of previous work on Florida’s schools in order to evaluate the impact of grading New York City schools on their students’ achievement one year later. The design of New York’s program allows for the use of a “regression discontinuity” approach, which, under certain reasonable assumptions, allows for a causal interpretation of the impact of earning a particular grade—A, B, C, D, or F—on school productivity as measured by student academic performance.

Our findings are somewhat mixed. Using data on students in grades four through eight, we found that students in schools that received an F grade in 2007 made academic improvements in English that were on a par with the improvements of students in schools that received better grades. However, we find that students in F- and D-graded schools made meaningful improvements in math relative to other schools, though this result appears to have been caused primarily by student progress in the fifth grade. In summary, our results suggest that schools may have responded to the F sanction by improving their performance and that there is no reason to believe that the sanction of a low grade harmed student achievement.

The paper continues in six parts. Section 2 provides a brief overview of previous research evaluating a similar progress report program in Florida. In Section 3, we discuss the design of New York City’s policy. In order to give our results on the relative improvement of D- and F-graded schools greater context, we present some information about overall school progress in New York City in Section 4. We then devote Section 5 to discussing our methodology and data. We report results from estimation in Section 6, including a replication of results obtained by Rockoff and Turner (2008), who employed a similar design to study New York’s policy but used aggregate data. Section 7 states our conclusions.


Previous Research

For a thorough review of research evaluating accountability policies overall, see the recent survey by Figlio and Ladd (2008). For our current purposes, we focus on previous evidence evaluating progress reports as a specific form of accountability. In particular, New York City’s progress report policy is quite similar in its design to Florida’s A+ Program, which has graded schools in that state since 1998. Though the programs provide different incentives for schools that earn certain grades—in particular, until recently, students in Florida public schools that received more than one failing grade in a four-year period became eligible for private school vouchers—both utilize a point-based system that determines whether schools receive certain letter grades that carry important consequences.

Florida’s A+ Program has been the subject of several studies. Greene (2001) and Greene and Winters (2004) used aggregate school-level data to directly compare the educational gains made by differently graded schools. They found that schools that received an F grade made substantial academic improvements relative to other schools. These results were confirmed by Chakrabarti (2005), who went on to find evidence that the results were not driven by regression to the mean.

Some recent studies of the A+ Program have used student-level data and have taken advantage of its known grade thresholds to pursue a regression-discontinuity approach. West and Peterson (2006) limited their sample to students in schools that earned point totals barely qualifying them for an F grade or barely missing the benchmark and thus earning them a D grade. Like other studies using aggregate data, West and Peterson’s study found evidence that the incentives of schools earning an F grade had a positive impact on student academic proficiency.

Using a more flexible regression-discontinuity design, Rouse and others (2007) also drew upon individual-level data, but they included students in all schools throughout Florida. Their model incorporated a control for a cubic function of the point total earned by a school, which, under certain reasonable assumptions, allows for causal interpretations of the impact on a school of the receipt of a particular grade. They found additional evidence that the incentives of the F-grade sanction led to increased school performance. These findings were replicated by Winters, Greene, and Trivitt (2008), who also utilized this procedure and found that the school-grading policy improved student proficiency in science, which is not part of the grading process.

A recent study by Rockoff and Turner (2008) follows the procedure of Rouse and others (2007) in evaluating the impact of progress reports in New York using aggregated data. The earlier paper found that schools that received an F or a D grade in 2006–07 had statistically and substantially higher scores in math and reading in 2007–08. One value of the present paper is its replication of these previous results, which provides confidence in both papers’ estimates.

Though there are other differences, the primary difference between the Rockoff and Turner (2008) work and the present paper is that here we utilize student-level data. Use of student-level data allows for more precise estimation and lends itself to a value-added approach to account for unobserved student heterogeneity that is not available in the school-aggregated data.


New York City’s Progress Report Policy

New York City’s progress report policy rates schools on a variety of factors, according to an accumulating point system based on the weighted average of metrics intended to measure school environment, student performance, and student academic progress. It then assigns grades from A to F to schools, according to certain benchmarks. Progress reports were first issued at the beginning of the 2006–07 school year. The city claims that the reports are “designed to help principals and teachers accelerate academic achievement” and that the policy “enables students, parents, and the public to hold the DOE and its schools accountable for student outcomes and improvement” (New York City Department of Education, 2007).

The first factor in determining the school’s grade is school environment. This metric uses information from school and parent surveys about school safety and parental engagement. The environment index accounts for 15 percent of the total points that can be awarded.

The remainder of the points earned by a school are linked to performance on the state’s standardized math and English exams. The value assigned to the percentage of students with test scores that meet the proficient or advanced benchmark on these tests is 30 percent of the total score that can be awarded. This measure rewards those schools in which students meet a particular academic level, although it may put schools in which students have lower beginning proficiency at a disadvantage. To compensate, 55 percent of the school’s potential points are linked to the progress that students make on the standardized math and English tests during the year. This value-added measure takes into account the percentage of students making at least a year’s worth of academic progress as well as the average change in proficiency scores of students who began the year with proficiency in the bottom third of the school. Schools can earn additional bonus points if students deemed “high need” make exemplary gains on the state exams.

The resulting scores on each of these factors are then further adjusted to account for the school’s performance relative to the rest of the schools in the district and a grouping of schools with similar characteristics. The scores on each of these elements are weighted, as indicated above, in order to produce the school’s total points under the system.

Schools are then assigned grades from A to F on the basis of the number of total points earned. The range of overall points that yield particular grades is reported in Table 1. The table shows that there are slightly different point requirements for elementary, middle, and K–8 schools, for which we account in the analysis.

Many commentators in New York have argued that the grading system does not accurately measure school “quality.” For example, it has often been pointed out in the popular press that many of the same schools earning poor grades under the city’s system receive high marks under the different accountability system that the No Child Left Behind Act calls for, and vice versa.

This study is not particularly interested in the extent to which the progress report policy accurately measures school quality. Under no circumstances should this paper be interpreted as suggesting that the program is accurately or inaccurately identifying successful or “failing” schools. The identification of those factors that underlie a successful school involves value judgments that only communities, the school district, and elected representatives can make.

Nor is this paper concerned with whether progress reports have improved school effectiveness generally in New York. It may be that every school responds positively or negatively to the grading policy, regardless of whether it receives a high grade or a low grade at the outset. Our procedure does not lend itself to measuring general improvements throughout the school system.

Rather, the goal of this paper is to measure how schools that are officially and publicly deemed to be failing and thus face sanction respond to that designation. In particular, we evaluate whether student proficiency in such schools suffers or increases as a consequence of the schools’ success in earning higher grades. As we will see in the next section, the point system used to grade schools allows us to control for unobserved differences in school quality during estimation. We emphasize, however, that for our purposes, it is the grade and points themselves that were earned under the system that are important, not the particular factors that are responsible for a higher score or grade.

Schools that receive a poor grade under the program face unspecified sanctions and even restructuring or closure if they fail to improve.[1] However, the act of stigmatizing schools as “failing” could have a motivating effect on them. Several researchers have speculated that accountability policies could “shame” schools into better performance (Figlio and Rouse 2005, Ladd 2001, Carnoy 2001, Harris 2001). In this paper, we are not particularly concerned with the causes of improvements in student performance, though inquiry into the causes is a clear avenue for future research.


Overall Performance in New York City

As mentioned above, our research did not directly measure whether the progress report program led to general improvements or declines in the performance of public schools in New York City. However, reviewing some summary statistics about overall progress in the school system can help place any results about the relative performance of schools earning particular grades into a broader context.

Table 2 summarizes the performance of New York City schools on fourth- and eighth-grade math and English exams in 2006, 2007, and 2008, using data aggregated to the school level in our data set.[2] Between 2006 and 2008, schools made statistically significant progress in both grades and in both math and English. The gains were largest in eighth-grade math and smallest in fourth-grade English. In fact, scores in English were actually statistically lower in 2007 than in 2006, but schools made progress in the next year.

Thus, the overall story on the state tests is one of relative improvement from 2006 to 2008. Though these gains are statistically significant (that is, we can have high confidence that the true gain, once we take into account measurement error, is greater than zero), we are not able to say that these overall gains are directly related to the progress report program, nor is it the place of this paper to conclude that such improvements are substantial enough to warrant overall optimism about the city’s schools.


Data and Method

We utilize a student-level data set provided by the New York City Department of Education. The data set includes demographics and test scores on the state’s standardized math and English exams for the universe of New York City public school students enrolled in grades three through eight from the 2006–07 through the 2007–08 school years. We are also able to link students to the schools they attend and thus school grades and points earned under the policy at the end of the 2006–07 school year.

Table 3 presents descriptive information about the schools in our data set overall and disaggregated by the letter grade earned by the school at the end of 2006–07. These descriptive statistics are not identical to, but do closely match, those reported by Rockoff and Turner (2008), who used data aggregated by the Department of Education.

One difficulty with the data set is that the state does not claim that the results of its math and English exams are “vertically aligned” across grades. When results are vertically aligned, a particular score should indicate a certain level of proficiency regardless of the grade for which the exam was prepared. So a fifth-grade student with a score of 600 would have the same level of reading proficiency, as measured by a fifth-grade test, as a third-grade student with a score of 600, as measured by a third-grade test, and so on. Our lack of a vertically aligned score is important for measurement because it means that the relationship between a student’s previous year’s score and current score could be affected by grade level. This causes a difficulty in estimation because our method is to pool students across grades into a single regression equation. Specification checks (not reported here) suggested that there are slight but significant differences in the relationship between previous and current student proficiency across grades. To account for these, along with estimating models that include all grade levels, we report models restricted to each grade level tested individually.

We follow the regression-discontinuity method first presented in this context by Rouse and others (2007) to study Florida’s similar school-grading program. This method takes advantage of the discrete cutoffs in the continuous point system utilized to assign schools particular letter grades. We slightly modify this procedure to fit better the particular design of the New York program.[3]

We use a cross-sectional regression model to measure how the relationship between student and school characteristics affects the student’s math or English score on the 2007–08 administration of the exam. In particular, we run regressions where the student’s 2007–08 test score is the dependent variable and independent variables include a cubic function of the student’s score on the exam in 2006–07,[4] observable characteristics about the student (race, ethnicity, special-education status, etc.), observable characteristics about the school (percentage of students who are of a particular race or ethnicity, etc.), and whether the school is listed as an elementary, K–8, or middle school. The model also controls for a cubic function of the number of points earned by the student’s school in each of the categories of the overall point system at the end of the 2006–07 school year and the letter grade earned by the school at the end of that year. Finally, we include an interaction between points earned on each of the input factors and the school type (elementary, K–8, or middle school). These interactions account for the fact that the cutoffs from the point system vary somewhat by school type, as shown in Table 1.

The central assumption of our procedure is that there is no difference in school quality that is conveyed in the school’s grade that is not also accounted for in (a cubic function of) the number of points that a public school earned in each category under the formula. If this assumption holds, we can interpret the estimate as the impact of a school’s receipt of a particular grade on a student’s academic proficiency.

The basic idea behind this technique is to take advantage of the known cutoffs above or below which schools are assigned different letter grades according to the policy. The continuous point system provides a direct measure of the quality of each public school as determined by the school system. Though important for policy purposes, the cutoffs on the point scale at which a school earns an A, B, C, D, or F grade are set at somewhat arbitrary points and thus convey little to no additional information about the school’s performance that is not already represented in the point total. Schools with similar point totals are likely to be similar in their effectiveness, but whether their score falls on one side or the other of a cutoff will determine whether they receive the F-grade sanction.

For instance, a public elementary school earning a point total of 31.0 under the grading system is likely educating its students just as well as another public elementary school that earns 30.8 points. However, these schools face very different incentives, since the former would receive a D grade and the latter an F grade. By controlling for each school’s point total, we are thus able to measure the independent impact of earning a particular grade on the school’s previous productivity level.

We make a couple of sampling restrictions that are worth mentioning. First, we exclude students who were tested in the third grade in 2007–08. Since we utilize a lagged dependent variable, and testing begins in the third grade, these students must have been retained in the third grade and thus may categorically differ from students in other grades. Second, we exclude students whose school is listed as a high school on its progress report. The data set contains observations of students taking the seventh- and eighth-grade exams who are listed as attending a high school, though the progress report definition suggests that such identified schools would teach grades nine through twelve. The data appear to indicate that these are specialty schools (for the arts, etc.), so we chose to eliminate them from the data set. However, our results remain robust when these sampling restrictions are relaxed.

It is possible that focusing on treatments that disproportionately affect students in low-performing schools, as this study does, may be affected by regression to the mean. Schools and students at the bottom of the achievement distribution may have such low scores partly because of random error. If a negative random error were more present in these schools, improvements made on tests in later years could be an inflated measure of a child’s academic progress.[5] In their similar study in Florida, Rouse and others (2007) present a series of specification tests indicating that regression to the mean is not the driving force behind their results. Unfortunately, the timing of the beginning of wide-scale testing in New York City does not allow us to adopt similar tests there. Thus, it remains possible that our results are affected by regression to the mean.


Results

We first aggregate our data set in order to replicate the recent results reported by Rockoff and Turner (2008). The results of this test are reported in Table 4. Though not identical, the coefficient and standard-error estimates in Table 4 closely mirror those reported in Rockoff and Turner’s paper.

This replication suggests that the Rockoff and Turner paper’s finding that F- and D-graded schools made bigger improvements than higher-graded schools continues to hold. It also lends some confidence that the data utilized to estimate our models of primary interest using student-level data are accurate. This confirmation is particularly important in view of the fact that Rockoff and Turner (2008) rely on data that were reported at the school-aggregated level, while we utilize data aggregated from our individual-level data set.

Table 5 reports the results of estimation in math overall and for each particular grade level. As in the aggregate data, the overall model that includes all grade levels continues to find a statistically significant and substantial positive effect after a school has received an F or a D grade. Here we find that students in F-graded schools made test-score improvements that were 3.5 scale points higher than students in C-graded schools. In standard-deviation terms, attending an F school has a positive one-year impact of about a 0.18 standard deviation in math proficiency relative to students in schools that earned a C grade.

The additional columns of Table 5 report results in math in regressions restricted only to the particular grade level. We do find evidence that the result varies across grade levels. In particular, we find a strong positive impact in students attending an F or a D school in the fifth grade. However, the results in other grades are statistically insignificant, and the coefficients of interest are negative in the fourth grade. The reasons for such different effects across grades are unclear. However, it is worth noting that the coefficient estimates on each of the cubic factors of the student’s lagged math score are similar, though the small differences are statistically significant. This gives some confidence in our overall estimate (column 1) because it indicates that lack of vertical alignment in the test scores across grades is probably not having a large impact on estimation of the model.

Table 6 reports our results in reading, following a similar format. We find no significant difference between the reading performance of students in F- or D-graded schools and that of students in schools receiving better grades. A result lacking statistical significance is found both in the overall regression and in each of the grade-level regressions.


Conclusions

In this paper, we have evaluated the impact of schools earning particular grades under New York City’s progress report policy on student academic proficiency. The regression-discontinuity methodology, by taking advantage of the city’s continuous point system for assigning school grades, allows us to make causal interpretations of the impact of such school grades on student progress.

Our results can be construed as indicating a mixed-positive effect from receipt of an F or a D grade under the policy. We find that students in F-graded schools made significant and substantial improvements in math, though these results appear to be primarily the result of progress made by fifth-grade students. We find no evidence that a school’s grade has a significant impact on student proficiency in English.

Endnotes
  1. New York City Department of Education website, accessed September 22, 2008:
    http://schools.nyc.gov/Accountability/SchoolReports/ProgressReports/Consequences/default.htm.
  2. We had to begin with the 2006 school year because scale scores prior to 2006 are not comparable with those in the later years.
  3. For a more technical treatment of our procedure, see http://www.manhattan-institute.org/pdf/cr_55_tech_version.pdf.

  4. A cubic function simply means that we included a variable for the score, another variable for the score squared, and another for the score cubed. Use of the cubic function allows for a more flexible model because it relaxes the assumption of linearity in measuring the impact of school points on student proficiency. That is, only controlling for the student’s prior score makes the strong assumption that every point has the same impact on the student’s proficiency the next year. The cubic function allows us to account for any nonlinearities in this relationship. This same basic argument holds for our use of a cubic function for each of the components of the school’s overall points under the progress report system.
  5. Think of a child who took a test near a window and became distracted by a loudly barking dog. The child’s test score on the exam would be lower than his true proficiency due to the accident of his location. When he took the exam the next year, the child was not distracted and posted a score that better reflected his true proficiency. However, in the data set, it will appear that he made a larger proficiency gain than he truly did. Since F schools have students with relatively low scores, it is possible that a disproportionate number of students in these schools had scores that, for some reason, were lower than their true level. If such a result were due to random error, we would be worried about regression to the mean.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. New York: Cambridge University Press.

Carnoy, M. 2001. “Do School Vouchers Improve Student Performance?” American Prospect 12, no. 1:42–45.

Chakrabarti, R. 2005. “Do Public Schools Facing Vouchers Behave Strategically? Evidence from Florida.” Manuscript. Program on Education Policy and Governance.

Figlio, D. N., and H. F. Ladd. 2008. “School Accountability and Student Achievement.” In Handbook of Research in Education Finance and Policy, ed. Helen F. Ladd and Edward B. Fiske. New York: Routledge.

Figlio, D. N., and C. Rouse. 2005. “Do Accountability and Voucher Threats Improve Low-Performing Schools?” Journal of Public Economics 90: 239–55.

Greene, J. P. 2001. “An Evaluation of the Florida A-Plus Accountability and School Choice Program.” Manuscript. Manhattan Institute.

Greene, J. P., and M. A. Winters. 2004. “Competition Passes the Test.” Education Next 4, no. 3: 66–71.

Harris, D. 2001. “What Caused the Effects of the Florida A+ Program: Ratings or Vouchers?” In School Vouchers: Examining the Evidence, ed. Martin Carnoy. Economic Policy Institute.

Ladd, H. F. 2001. “School-Based Educational Accountability Systems: The Promise and the Pitfalls.” National Tax Journal 54, no. 2: 385–400.

New York City Department of Education. 2007. “Educator Guide: The New York City Progress Report.”

Rockoff, J. E., and L. J. Turner. 2008. “Short-Run Impacts of Accountability on School Quality.” Unpublished manuscript.

Rouse, C. E., J. Hannaway, D. Goldhaber, and D. N. Figlio. 2007. “Feeling the Florida Heat? How Low-Performing Schools Respond to Voucher and Accountability Pressure.” National Center for Analysis of Longitudinal Data in Education Research, Working Paper 13.

West, M. R., and P. E. Peterson. 2006. “The Efficacy of Choice Threats within School Accountability Systems: Results from Legislatively Induced Experiments.” Economic Journal 116, no. 510: C46–62.

Winters, M. A., J. P. Greene, and J. Trivitt. 2008. “Building on the Basics: The Impact of High-Stakes Testing on Student Proficiency in Low-Stakes Subjects.” Manuscript. Manhattan Institute.

 


Center for Civic Innovation.
EMAIL THIS | PRINTER FRIENDLY
CR 55 (PDF)
CR 55 TECHICAL VERSION (PDF)

OP-ED
Grading NYC Schools: How It Helps Marcus Winters, New York Post, 11-12-08
IN THE PRESS
Kids at flunking schools still make grade, New York Daily News, 11-12-08
For most students, no benefit to a school's F grade, study finds, Gotham Schools, 11-12-08
When Schools Fail, Gotham Gazette, 11-12-08
Progress reports "F" grades: Measuring the effect, Inside Schools, 11-12-08
PODCAST
Click here to listen to a podcast discussion with the author Marcus Winters and Manhattan Institute's Vice President of Policy Research Howard Husock.

Table of Contents:
Executive Summary
About the Author
Introduction
Previous Research
New York City’s Progress Report Policy
Overall Performance in New York City
Data and Method
Results
Conclusions
Endnotes
References
 


Home | About MI | Scholars | Publications | Books | Links | Contact MI
City Journal | CAU | CCI | CEPE | CLP | CMP | CRD | ECNY
Thank you for visiting us.
To receive a General Information Packet, please email support@manhattan-institute.org
and include your name and address in your e-mail message.
Copyright © 2009 Manhattan Institute for Policy Research, Inc. All rights reserved.
52 Vanderbilt Avenue, New York, N.Y. 10017
phone (212) 599-7000 / fax (212) 599-3494