this is such an egregious oversimplification that i can’t resist responding. there are two important things going on with value added models for evaluating teacher effectiveness: 1) the design of the model that comes up with a number/score for teacher effectiveness and 2) the policies about how that information is used and the consequences that it has for teachers. there are serious ways to go wrong in both areas, but saying the idea of measuring this information is an inherently sucky idea is so reductive as to be meaningless.

first, it’s important to consider what the current alternatives are for evaluating teacher effectiveness. right now, a lot of it is done by sending principals into a teacher’s classroom, once or twice a year, to write up a half-assed evaluation on whatever they happened to see during the one class period they were in the class. in some areas, that’s the entirety of a teacher’s evaluation and results in evaluations that either say you are “effective” or “ineffective” - no way to differentiate between teachers who are utterly exception and those who are barely adequate. understandably, principals are often reluctant to categorize a teacher as “ineffective” based on that minimal interaction, so teachers who may actually be ineffective are unlikely to be categorized as such, and will go on teaching kids. (there’s also a pretty strong argument that a teacher might behave differently when the principal is sitting in the classroom than on a typical day of teaching…)

so, to start my grand analogy, this is like evaluating whether someone is a good and safe driver by having a police car follow them on a third of their commute, one day a year. it’s not very much information about how they drive, it’s probably influenced by their knowledge that the police car is there, and it results in really general conclusions that don’t allow a lot of precision.

so - people have turned to value added models to give us more information. to grossly oversimplify how a value added model works - it takes a lot of statistical information about all the kids in a class that could be relevant to predicting their academic performance and makes a guess at how we would expect those kids to perform on the standardized test at the end of the year. it then compares those predicted scores to their actual scores. if they’re a lot higher, we attribute that improvement to the teacher. if they’re a lot lower, we attribute that gap to the teacher.

it’s immediately obvious that there is a lot of imprecision in this method. first of all, predicting the academic performance of an individual child is fraught with difficulty and error. second, attributing the differences between predicted scores and actual scores to the individual teacher is also a pretty big assumption - there could have been a whole lot of other things going on that made a difference. this is why the better value-added measures look at trends over a number of years, looking at the average difference in test scores for kids in a teacher’s class over 5 years, rather than a single year, to get a better sense of the typical effect the teacher has. for more about how to construct a measure that gives a more valid sense of the effect of the teacher, i recommend this paper. [i don’t know all the details of the NYC teach eval method critiqued by the linked post, but i suspect they use a pretty sloppy model that doesn’t average over multiple years, etc.]

this can be seen in our analogy, too. say we only care about whether a driver is speeding. there’s a lot of different ways to measure speed - we could record when a driver passed point A and point B and estimate their speed, we could measure their speed from an aircraft, we could position a cop with a radar gun to measure their speed, we could have a cop car drive beside them and check their spedometer. each of those methods has benefits and drawbacks and we could argue about which would best capture the true speed of the driver, which is what we care about.

the second part of the controversy is how whatever information is gleaned from a value added measurement should be use in evaluating teachers. i can say unequvically that the measures that are currently in use are usually pretty imprecise and should not be used to publish the names of teachers in the newspaper with score of how good they are. that is stupid and wrong and is what is going to happen in NYC and it is a shame. i’m not aware of anyone arguing that these methods are precise enough to constitute the entirety of a teacher’s evaluation - most propose combining them with existing classroom evaluation and feedback from parents and teachers as part of an overall evaluation package. 

i find the idea that we can’t use information as part of an overall evaluation unless it is totally precise very confusing, since the evaluations now are based on information that is also extremely imprecise. imagining that we can use these value added scores to divide teachers into 10 piles, from most to least effective, is silly and overestimates the precision of these measures. but the idea that we could get an idea of who consistently falls in the top half of effectiveness and who consistently falls in the bottom half is very reasonable and seems like important information that it would be silly for us to ignore.

so it’s a challenge for school districts to make good decisions about how to use the information, given its lack of precision and inappropriateness for super fine grained distinctions. whether and how it is tied to bonuses or salary increases or discipline or firing or provision of supportive resources to increase teaching effectiveness are tricky decisions that should be made in consultation with stakeholders including teachers, parents, and students. but ignoring the information entirely is just as silly and does a disservice to those stakeholders who want to ensure that teachers are effective.

(Source: azspot)