School Information System
Newsletter Sign Up |

Subscribe to this site via RSS: | Newsletter signup | Send us your ideas

September 11, 2011

Lies, damn lies and the myth of "standardized" tests

Marda Kirkwood:

[Note from Laurie Rogers: Recently, results from the 2011 state standardized test scores came out, and the general impression given to the public -- for example from the state education agency (OSPI) and from media in Seattle and in Spokane -- was that improvements had been made. It's all in the definitions: How do you define "improvement"? Did some of the numbers go up? Assuredly. Did that mean that real improvments in real academic knowledge had been made? It's best to remain skeptical.

Most students in Spokane are as weak in math skill this year as they were last year. Given a proper math test that assesses for basic skills, many high schoolers still test into 4th or 5th-grade math. College remedial rates are still high. Parents are still frantic, and students are still stressed out about math. So ... what do those higher scores actually mean? I've been trying to find out. It's hard to say.

Posted by Jim Zellmer at September 11, 2011 4:44 AM
Subscribe to this site via RSS/Atom: Newsletter signup | Send us your ideas
Comments

Marda Kirkwood is just another uninformed opinion-maker who is just as certain of her opinion as she is wrong on almost every count. If you want to know the truth about testing, you will only get dumber as you read her comments.

She defines a "real" standardized test in her absolutist opinion but she misses the mark on almost every criterion possible.

Standardized tests simply means consistent conditions, consistent interpretations, and consistent scoring. That is all.

No, it doesn't mean it is scored by a machine (as she implies when she says that scoring by "fallible" humans make it non-standardized); no it doesn't mean that it is a timed test; no it doesn't mean that is must be normed-referenced (as opposed to criterion-referenced).

In theory, at least, a norm-referenced test measures how much a student knows compared to other students, while a criterion-referenced test is supposed to measure how much a student knows compared to how much a test writer thinks they should know. Which do you think might be a more reliable measure of knowledge, in theory?

And, not surprisingly, she thinks that the bell curve holds some magic, like religion, without realizing for both normed-referenced and criterion-referenced tests, the bell-curve produced by the test is artificially created. That is, the writers of these tests also engage in magic thinking around the bell curve.

She says that a standardized test must reliably measure academic knowledge. But she doesn't get the inherent inconsistency: you cannot create a test that both forces a bell curve (with a forced mean and forced standard deviation) and reliably measures knowledge. Successful transmission of knowledge, to mean anything, must mean that we've basically eliminated the randomness of ignorance that a bell curve illustrates, by replacing it with common understanding of subject matter and skills; that is, the original bell curve must disappear.

You think otherwise? Would you take a drug whose treatment success rate fits the bell curve? Would you buy a car that had a useful life expectancy that fit the bell curve? How useful would a medical diagnostic test be if its accuracy fit the bell curve?

Then you have the real practical problem of measuring a student's academic knowledge. Even is there is no time limit on how long a student has to take a test, there is a limit to how many questions a student should be given. A math test of 10,000 questions can cover a lot more territory than a 100-question math test. That is, say, 200 hours vs 2 hours answering questions.

A 100-question test that takes 2 hours to answer cannot measure everything a student is to learn in a given academic year. Most of such ideas and topics will not be represented on any test, and for others, there might be one question.

Then there will be necessarily a whole set of questions that will be required to force a bell curve -- that will be most of the questions on the test. To force a bell curve, questions which too many students get right, or too many get wrong will not be on the test, even if a question is relevant to measuring how much a student knows. You simply will never get a bell-curve out of a 100-question test, if learning/teaching is successful, by allowing in questions that measure how much a student knows of the topic they have learned.

Other kinds of questions are placed on tests, which limit the coverage of the tests. These are questions that measure the internal reliability of the test itself, and others that determine the reliability of the student's knowledge. The latter questions exist to determine if the student's answers to some questions are a guess, so the same "question" is asked a different way.

All this magic, "faith-based" opinions about tests and what they measure and how they measure, all without a modicum of knowledge, can never lead to anything approaching a successful knowledgeable school system or population.

Posted by: Larry Winkler at September 11, 2011 11:47 AM
Post a comment









Remember personal info?