Efforts to improve American primary and secondary education concentrate a great deal on teacher quality, specifically on getting rid of teachers who aren’t very good.

A lot of his has to do with evaluating teachers with the magic of “value added” modeling, which means basically looking at students’ end of year test scores and comparing them to their scores in previous years, as well as to other students in the same grade. This is supposed to show what the teacher has “added.”

This certainly sounds like a good idea, but the evaluation, it turns out, doesn’t work that well.

According to an opinion piece at Al Jazeera:

New York educators are pushing back forcefully against the state’s controversial teacher evaluation system. This spring, the Teachers Association of the cities of Rochester and Syracuse filed a lawsuit against the state, arguing that the ratings metrics unfairly penalize teachers of disadvantaged students. Now Sheri G. Lederman (PDF), a lifelong teacher from Long Island, is challenging her “ineffective” rating as arbitrary and capricious, based on an ill-conceived and misapplied statistical model of teaching quality.

The biggest problem here seems to be that evaluation systems are inappropriately applied.

Because tests are given only in certain subjects to certain age groups, 70 percent of educators in Florida last year received… rankings based on students or subjects they didn’t even teach. New York’s system determines whether a teacher is highly effective, effective, developing or ineffective, using a triad of measures: 20 percent based on value-added modeling of students’ state test scores, 20 percent on district level assessments and 60 percent on an array of other measures, such as classroom observations. Lederman’s value-added classification dropped two rungs in just one year despite having student test scores that were consistently more than double the state average for meeting standards.

This isn’t a matter of nitpicking legalistic wrangling to try to avoid punishment for bad behavior. The use of value added assessments really are unfair to teachers, because they just don’t consistently measure the sort of things we might want to know about our teachers

[Value Added Modeling] VAM-style evaluations might work well for internal diagnostics in painting broad-brush district comparisons or in pinpointing areas for teacher training. Yet the shoddiness of specific VAM forecasts raises serious doubts about their use in determining an individual teacher’s worth. A 2010 report commissioned by the U.S. Department of Education (DOE) found that the error rate for value-added scores can be as high as 35 percent when using only one year of data.

Really, 35 percent. As the article puts it, “a system that could rate 1 in 3 teachers incorrectly is one that essentially plays pin the tail on the donkey with their livelihoods.”

Having high standards for teaching and learning is a good thing. Determining teachers’ livelihoods on junk science means having capricious and punitive standards for education. And there’s no way American children will be helped by that.