K-12 education has been convulsed for years by the idea that good teaching is a trait, a tacit justification for all the versions of the loony idea that we can increase learning by just finding the ‘bad teachers’Â and firing them. The latter scheme looks even better if “finding” employs a bureaucratic, mechanistic process of testing students (on things that can be measured “objectively”–bye-bye art, music, creativity, and courage). The alternative idea is that people with widely varying intrinsic qualities, or starting points, can all learn to be better teachers.Â Both are obviously correct to some degree; at the time they get control of the chalk, some people have better “teacher traits” than others, and it must also be the case that practice, training,Â and coaching can improve anyone’s performance at this job, like all others.Â But the relative weight placed on trait and learning theories of effectiveness matters a lot.

Administrators and politicians love what I call immaculate corrections, schemes like student testing for teacher promotion, that excuse managers from all the heavy lifting of retail attention to what subordinates and customers are actually doing and why they do it.Â If you can coupleÂ impersonal performance assessment with a theory of motivation that puts greed (for a money raise) and fear (of dismissal) in play, and delegate the implementation labor to people who aren’t on your payroll and can’t defend themselves against having their time wasted (the students), it’s a hat trick.Â The only defect of a scheme like this is that it doesn’t deliver much value in the classroom (or wherever), but that’s a feeble weapon with which to confront an internally consistent and theoretically beautiful construct that lets managers out of doing a lot of real work.

Alison Gopnik’s WSJ column has more on the costs of using the trait model, retailing this recent paper [paywall]: people in academics who believe traits count for a lot seem to (i) gather in particular disciplines (ii) have a lot of trouble engaging women and African-Americans as peers, presumably because they also wrap up familiar stereotypes about what kind of people are (intrinsically) smart. Gopnik:

Professors of philosophy, music, economics and math thought that “innate talent” was more important than did their peers in molecular biology, neuroscience and psychology. And they found this relationship: The more that people in a field believed success was due to intrinsic ability, the fewer women and African-Americans made it in that field.

This should be sort of a bombshell, but it’s beenÂ a busy few weeks. We’ve known for a while that the student evaluations of teaching we use at Cal–to the near-exclusion of anything else–for promotion and tenure decisions don’t have much to do with student learning. Indeed, our administrative higher-ups are reflecting deeply on the fell implication that maybe we should (i) do more observation and coaching with an eye to actually improving teaching before review time, when it could actually be useful, and (ii) evaluate teaching for promotion in some way that actually indicates whether students are learning.Â Of course, both of these involve actual work, while SETs produce numbers (which must be Data, right?) and don’t cost us (faculty) anything to obtain, so it’s a tough call.

This call has got a lot tougher with the appearance of the first study known to me [HT: Philip Stark] in which students could register their evaluations without knowing the actual sex of the instructor, using an on-line course in which the same teacher presented as a male and as a female, and hooboy:

Students in the two groups that perceived their assistant instructor to be male rated their instructor significantly higher than did the students in the two groups that perceived their assistant instructor to be female, regardless of the actual gender of the assistant instructor….For example, when the actual male and female instructors posted grades after two days as a male, this was considered by students to be a 4.35 out of 5 level of promptness, but when the same two instructors posted grades at the same time as a female, it was considered to be a 3.55 out of 5 level of promptness.

Hard to imagine anything more traity than sex, mmm. There’s more (a colleague reminded me of this about a minute after this post went up; click on the link at the top of the story) and stuff like this anyway needs to be considered against the background of the crap women put up with every day, at work, at school, and on the street.

So the same teaching practices will get a woman significantly lower student evaluation scores than a man.Â Could this be true for minorities…how could it not?Â I think this study–assuming of course that contrary findings don’t emerge from similar experiments–is a beacon to personal injury lawyers and every woman prof (at least; stay tuned for the experiment in which Phyleesha and Felice are the same person) henceforth denied a raise or tenure through a process in which student evaluations counted. Not to mention an ambitious federal prosecutor with a copy of Title IX in his pocket. Now we’re not just talking about leaving student learning on the table, but consent agreements and actual money: I wonder if this will be enough to make us stop delegating teaching assessment to unpaid, inexpert conscripts.Â There’s lots of useful stuff to learn from student evaluations, but not for pay and hiring.

[Cross-posted at The Reality-Based Community]