What To Think About the New York City Teacher Value-Added Scores

The highly-controversial New York City teacher value-added scores released last week are being presented by the New York Times with substantial margins of error. And in the end, understanding and reacting to margins of error is the essential challenge of teacher evaluation. Teaching, learning, and the interaction between them are incredibly complicated. As such, there’s no way to measure teacher effectiveness with 100 percent accuracy. It’s not a question of whether effectiveness measures have margins of error, only how big they are and what to do about them.

It’s worth noting that this is also true for traditional measures of teacher quality. Sometimes there’s confusion on this point. After all, there’s no margin of error in counting the number of years a teacher has been employed or the dichotomous variable of having a master’s degree, is there? But that’s like saying there’s no margin of error in the number of questions a student answered correctly on a multiple-choice test, or the dichotomous variable of whether a principal checked the “satisfactory” box on a one-page teacher evaluation. It’s technically accurate, but not the point– just as there’s a lot of potential error in a single test’s representation of the actual truth we care about–the totality of student learning in a given domain–the presence or absence of a master’s degree is a very poor, error-ridden method of determining the actual truth we care about–whether a teacher is more effective in helping students learn.

So while it’s perfectly reasonable to be concerned that the Times is publishing estimates of individual teacher effectiveness that may be wrong–that, given the scale involved, some of the estimates are, statistically speaking, almost surely wrong–let’s not forget that we already do that when we make available teachers’ levels of experience, educational credentials, and tenure status and pretend that these are reasonably accurate proxies for their effectiveness and value in the job market. They’re not.

The challenge, then, is to identify different ways of assessing the things we care about when it comes teacher quality, understanding the strengths and weaknesses of those measures, exploring their potential combination, accounting for their time and money cost, and making decisions with the information they yield that appropriately accounts for the risks and uncertainties inherent to margins of error. It would be crazy, for example, to fire a teacher based on a single value-added score in one subject and grade, which is why, of course, the New York City Department of Education hasn’t tried to do anything of the kind.

It’s also important to understand that the proper management of the risks inherent to imperfect measurement is particular to public K-12 education. Aaron Pallas explores this issue in a recent post titled “Reasonable Doubt”:

For the employers, it’s all about efficiency. It’s in the public interest, they argue, to recruit, retain and reward the best teachers, in order to maximize the collective achievement of students. A teacher-evaluation system that fails to identify those teachers who are effective, and those who are ineffective, can neither weed out consistent low-performers nor target those who might best benefit from intensive help….For teachers, the key concern is fairness. Fairness is primarily a procedural issue: Teachers, and the unions that represent them, seek an evaluation process that is neither arbitrary nor capricious, relying on stable and valid criteria that they believe accurately characterize the quality of their work…The values of efficiency and fairness collide head-on in [New York State’s proposed new teacher evaluation system, which includes value-added among other measures]…

Pallas then locates this issue in historical debate:

William Blackstone, an 18th-century English legal scholar, wrote “better that ten guilty persons escape than that one innocent suffer.” Benjamin Franklin, one of the founders of our country, later upped the ante to 100 to one. The principle captures squarely the trade-off between the value of efficiency and the value of fairness…It’s important to note that Blackstone and Franklin were concerned with the workings of government; fairness in the private sector was not a central concern, and efficiency was taken for granted as a consequence of market forces. Civil servants, as agents and employees of the state, arguably are subject to a different set of rights and responsibilities than those working in the private sector, and teachers are one of the largest groups of such public servants. What’s an acceptable tradeoff between efficiency and fairness in the mix of teachers’ rights and responsibilities? It’s a lot easier to speculate about percentages in the abstract than to confront the possibility that you, or someone close to you, might be out of a job because of an untested teacher-evaluation system that cuts corners on fairness.

Two points here. First, the efficiency / fairness tradeoff isn’t a zero-sum game. It can be improved by more accurate information. The invention of DNA fingerprinting, for example, improved the accuracy of criminal prosecution. The more sure we can be about guilt and innocence, the fewer guilty criminals we have to let loose upon society in order to keep the number of incarcerated innocent people at a morally bearable level. So, too, with more accurate methods of teacher evaluation.

Second: Pallas, I think it’s fair to say, believes that fairness (so defined) should be given more weight in the case of public-sector teachers than for private-sector employees. But does that really make sense? If you own a cardboard box factory, you’ll want to produce boxes as efficiently as possible, but you’ll also have to work out some set of labor arrangements with workers who have an interest in fairness, e.g. not being fired for no good reason. That’s why we have labor unions and collective bargaining. But however the deal is struck, whether it unduly favors management or labor or finds the perfect balance between them, the consequences are limited to management and labor. If the deal is struck so badly that the company ultimately fails, customers can always buy their cardboard boxes from someone else.

Public schools aren’t like that. Children can’t choose to be born, can’t choose where to live, and they can’t choose whether to live in a world that requires an education in order to lead a decent life. They are legally required to attend school. They have an enormous interest in the quality of their education, whether they know it or not. They can’t just buy a better teacher from someone else, and even the most ambitious school choice plans aren’t going to take all the friction out of that dilemma. So let’s not forgot that for every teacher at risk of being publicly identified with the wrong value-added rankings, there is a group of students at risk, too.

Finally, let there be no illusions that this, too, will pass. The test-based evaluation genie can’t be stuffed back in the bottle. Society will always have an interest in evaluating student learning in some kind of comparable way. Once that information exists, it will be legally available to the public. And the cost of converting it into teacher effectiveness measures is trivial in the grand scheme of things and getting cheaper all the time. I have no doubt that fill-in-the-bubble tests will become obsolete, probably sooner that people think. But whatever replaces them will be digital, analyzable, and usable for teacher evaluation. So get used to thinking about margins of error. They’re with us for the long haul.

Kevin Carey

Kevin Carey directs the Education Policy Program at New America.