Validity: An Exploration
 Assessment in School   Systems
 What do Items Really Test?
 Evolution in Action
 To See a Test in a Grain of   Sand...
 Analyzing Items and Tasks
 Designing an Alternative   Matrix
 Administration and   Alignment
 In a Time Far Far Away...

   

Related Weblinks

Designing an Alternative Matrix

Testing makes great use of statistical analyses. For example: do students who get high total scores tend to get a particular item right, and conversely, do students who score at the bottom of the total score rank tend to get that item wrong? We would logically expect an answer of 'yes' to both questions, and if the opposite is true (high-scoring students get it wrong, and low-scoring students get it right), then the item is not really an item at all, it is a kind of "anti-item". Contact between an item and an anti-item may cause the entire test to explode. (Well, maybe not. But it does make for a bad test.)

How did this get started? Early work on the statistics of testing was done by Cattell, Galton, and particularly Binet. Then, World War One happened. Early psychologists – especially in the USA – felt a calling to contribute to the war effort, or perhaps, they saw also that the war was an opportunity to try out various ideas in the then-nascent field of modern experimental psychology.

Robert Yerkes and his colleagues proposed to the U.S. government a massive testing system for newly enlisted soldiers destined for the European war. This project became known as the Army Alpha testing. It used a modified form of Binet's test to measure the putative intelligence of 1.7 million recruits. The idea was to place them in various military service positions – at least in part – based upon their test scores. By the time that the Alpha testing program was up and running, the war was over.

Whether by intent, accident, or both, Yerkes and his colleagues went on to establish a far more powerful and far-reaching test industry: modern normative psychometrics on a large scale. The statistical techniques they used formed the basis of all modern test development.

We have a short thought-experiment about this, but before we get to that, we need to cover a bit more history: how World War One started. Archduke Ferdinand was assassinated, which set in motion various alliances, most notably that of Austria-Hungary with Germany, and pre-set invasion contingencies (like the Schlieffen Plan). Much of Europe was on a hair-trigger, formed of long-standing treaties and the outcome of previous wars.

The assassination was – itself – the proximate cause of The Great War. There are various accounts of that event, but it is clear that to some extent it was a matter of chance. The Archduke's driver got lost and while finding his way happened upon a conspirator – Gavrilo Princip, which is when the fatal shots were fired.

  • Here is our thought experiment: What if Princip had missed his aim? The Archduke would survive. (Perhaps) the hair-trigger would not be sprung, and (perhaps) World War One would not happen. Yerkes and colleagues would not develop Army Alpha – and therefore, the world would not see that the small-scale work of Binet was replicable on a vaster social scale. Even post-War controversies that depend on testing would not have happened, such as the IQ testing controversy and the use of testing in eugenics. (And a book on language testing in 2006 would not be able crack a joke about anti-items contacting items and causing a test to explode.)

Copyright © 2006 Taylor & Francis Group, an informa business