Updated: Aug 11, 2020
I had a question from someone looking for research regarding the bias in state test scores. I liked the response I wrote and included it below.
What I generally say isn’t that state test scores themselves are biased, but rather they are improperly used as a judgment tool and that results in biased judgments. As a predictive test, a state test is designed to order kids according to narrow underlying constructs (e.g., some aspect of reading and math) relative to their peers as of a particular day. That ordering will be from the students furthest below average to the students furthest above average. It's helpful to realize that the test items on this type of test don’t really result in an accumulated score, like on a quiz or a test you create to check on how much of what was taught was learned. Rather, each item helps the test maker figure out how far from average each student is.
What the results will show are patterns that exist as of that testing event, and if bias is an underlying cause of those patterns that will become readily apparent. And if we stopped there and did further analyses we would be using the tool as designed. (Of course, we could do that with just a sample of a few thousand students at each grade across a state and have more than enough for such an analysis, but that would be bad for the publishers.)
Where the mistake occurs—and this is in complete violation of what this type of testing can do—is to assign blunt judgments to students and schools, as if the quality of the school is reflected in the test score. Doing that renders a judgment prior to an investigation of what might have caused that pattern. It may be that the school with high test scores is full of students who would score high regardless of the school they attended, and so a judgment of school success based solely on those scores is dead wrong. It may be that a school with low test scores is full of students whose lives are being saved by the teachers. If so, a judgment of failure would be dead wrong. Where causes can be discovered that should be judged, fine, judge them, either good or bad. But until they are discovered any judgments are invalid.
Using a state test score as the basis for invalid judgments creates judgments that are biased because one cause for where a student is on a particular day in reading math concerns socioeconomics. While seeing that pattern is a good thing, you can’t logically leap to an inference that says, “low scores signal failure on the part of the school and high scores signal success,” because you never bothered to look at causes, which are the only thing you can reasonably judge. But the fact that the invalid judgments confirm a bias that exists broadly in society (that is, schools in poor communities are bad and schools in wealthy communities are good), those judgments are presumed, quite wrongly, to be valid.
The worst thing about all of this is that it perpetuates the bias. The one good thing I can say about state testing is it gives us a real chance to see the bias and inequities in our society which should lead us to work towards broader societal solutions. That's what should happen once we see the patterns and then perform our investigations. We perhaps will find things in the school that should be judged, but we would undoubtedly find that judgments regarding our history and the cause of the inequities are valid and useful, and most certainly worthy of a solution.
But the way the results are used allows the whole country to basically avoid the difficulty of solving real problems. It is much easier to judge the schools rather than society, and since those who will be subject to the negative judgments are those with the least amount of power those judgments are unlikely to have negative political consequences. Thus we act pleased with a system that solves nothing, helps no one, and hurts those who can least afford to be hurt further.