Original publication 'Toetsvragen schrijven' 1983 Utrecht: Het Spectrum, Aula 809, Onderwijskundige Reeks voor het Hoger Onderwijs ISBN 90-274-6674-0. The 2006 text is a revised text.

Item writing

Techniques for the design of items for teacher-made tests

1. Introduction


Ben Wilbrink

Contents examples pages

1   Introduction

1.1   Item design: art or skill?
1.2   First principles
1.3   Summary of content
1.4   An historical perspective

1.5   Literature

2   Item types, transparency, item forms en levels of abstraction

2.1   Open-ended questions
2.2   Multiple-choice (MC) questions
2.3   Essay type questions
2.4   Transparency
2.5   Item forms for many uses
2.6   Valid questioning
2.7   An historical perspective

2.8   Literature

3   Course content inventory

3.1   (Indirect) observable terms
3.2   Abstract terms and constructs
3.3   Theoretical terms
3.4   Relational networks of terms
3.5   Variants of 'definitions'
3.6   Literature

4   Questions on individual terms

4.1   Translation
4.2   Definition
4.3   Providing examples
4.4   Recognizing/naming examples
4.5   Recognizing/naming formal terms
4.6   Descriptions
4.7   Literature

5   Questions on relations between terms

5.1   Translating and picturing
5.2   Discriminating
5.3   Classifying
5.4   Algorithms, routines
5.5   Lawful relations
5.6   An historical perspective
5.7   Literature

6   Questions of text

6.1   Participation control
6.2   Theme's and headlights
6.3   Analysis
6.4   Inference
6.5   Composition

6.6   Literature

7   Posing problems

7.1   About problems
7.2   Taking inventory
7.3   Heuristics
7.4   Literature

8   Quality control

8.1   Rules in examining
8.2   check these points
8.3   Indepent assessment of item quality
8.4   checklists
8.5   An historical perspective
8.6   Literature

... an experience, a very humble experience, is capable of generating and carrying any amount of theory (or intellectual content), but a theory apart from an experience cannot be definitely grasped even as a theory.

John Dewey, in: Democracy and education.

The history of human progress is the story of the transformation of acts which, like the interactions of inanimate things, take place unknowingly to actions qualified by understanding of what they are about; from actions controlled by external conditions to actions having guidance through their intent: - their insight into their own consequences. Instruction, information, knowledge, is the only way in which this property of intelligence comes to qualify acts originally blind. (Quest for certainty (1929, p. 245)

John Dewey citing himself in his (1939, p. 521).
Here is the point: once you have learned how to ask questions - relevant and oppropriate and substantial questions - you have learned how to learn and no one can keep you from learning whatever you want or need to know.

Neil Postman and Charles Weingartner (1969, p. 34).

How many parts of speech are there? What?
What is a noun? How many attributes has a noun? What?
Give the inflection of the active verb.
James Bowen's (vol. 1 p. 211) transcription is from the Ars minor on teaching Grammar, written by Donatus. This book "was organized into a series of questions and answers designed to teach the fundamentals of Latin grammar by rote."
James Bowen (1972). A history of Western education. Volume one. The ancient world: Orient and Mediterranean. Methuen.
And the West? The 'Donaet' was still in use in Western European school a thousand years later.

Item writing is not a science or science-based technology. Yet the European West does have more than a thousand years experience in asking questions and assessing the answers given.
The kind of question-and-answer instruction above may seem amazing, because of its being dead serious in that answers should be literally correct, nevertheless something uncannily resembling this kind of literal reproduction was still rife in my own Latin school days.

"This study examines and evaluates two sources of evidence bearing on the validity of 31 MC item-writing guidelines intended for teachers and others who write test items to measure student learning. These two sources are measurement textbooks and research."

Thomas Haladyna, Steven M. Downing, and Michael C. Rodriguez (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15, 309-334. http://depts.washington.edu/currmang/Toolsforteaching/MCItemWritingGuidelinesJAME.pdf [Dead link? May 1, 2009]

Do not be surprised to find in the year 2002 the above kind of statement, expressing the lack of scientifically informed item writing techniqes.

1 Introduction

schema van alles

Figuur 1. Schema van alles. Voor software om zoiets te maken zie http://cmap.ihmc.us/

1.1 Item design: art or skill?

Item writing is essentially creative - it is an art. Just as thee can be no set of formulas for producing a good story or a good painting, so there can be no set of rules that guarantees the production of good items. (Wesman, 1971 p. 81)

Every test begins with an idea in the mind of the item writer. The production and selection of ideas upon which test items may be based is one the most difficult problems confronting him. (...)
There is no automatic process for the production of item ideas. They must be invented or discovered, and in these processes chance thoghts and inspirations are very important. (Wesman, 1971 p. 86)

"From the outset of this book, it has been emphasized that constructing test items is a complex task, requiring both technical skill and creativity." The book is about the technicalities, indeed. "Creativity, however, is an element of item construction that can only be identified; it cannot be explained. Item writers, as individuals, will bring their own sense of art to the task."
[Osterlind, 1997, p. 308]

The Wesman and Osterlind view - which is rather the general view in the field - flies in the face of everything that is even remotely construct valid. It posits rather shamelessly that students do not have a fair chance to adequately prepare themselves for tests written in this artful way. How could they? Should they be artists also? Here is the most important motivation to write this book on item design: to offer an alternative to the artists in the field.

Writing plausible distractors comes from hard work and is the most difficult part of MC item writing.

Thomas Haladyna, 1999 p. 97

De toetsvragenschrijver die het ernstig meent met de inhoudelijke representativiteit [content validity] van de vragen, begint niet met het neerpennen van vragen zoals ze hem te binnen schieten, en ook niet door nuchter een vraag per bladzijde tekst te bedenken, maar door ieder van zijn onderwijsdoelen te vertalen in taakomschrijvingen. Gewoonlijk levert dat het beste resultaat op door globale doelen nogal fijn onder te verdelen. Binnen ieder van die onderverdelingen is het niet voldoende om de onderwerpen louter op te sommen, maar moet voor ieder onderwerp zijn aangegeven op welke vorm van beheersing van dat onderwerp het onderwijs is gericht. (...)
Het moet duidelijk zijn of de student te maken krijgt met technische termen, met in dagelijkse termen beschreven situaties, met afbeeldingen van situaties of met concrete dingen. Antwoordmodellen zijn ook van belang. Maar al te vaak wordt de meerkeuzevraagvorm als vanzelfsprekend beschouwd. Als de toetsvragenschrijver onbevangen nadenkt over zijn doelen, zal hij er vaak toe besluiten dat de taak vraagt om door de student geconstrueerde antwoorden - om geschreven antwoorden op aanvulvragen, of om mondelinge antwoorden om belemmeringen zo klein mogelijk te houden.

Lee J. Cronbach in Thorndike (1971, p. 458)

1.2 First principles

Atkin, Black en Coffey (2001, Classroom Assessment and the National Science Education Standards (2001) [available on NAP

do students realize the goals? not: do they differ from each other?

"The central point is that, to be effective, feedback should cause thinking to take place. Implementation of such practice can change the attitudes of both teachers and pupils to written work: the assessment of pupils’ work will be seen less as a competitive and summative judgement and more as a distinctive step in the process of learning."

Paul Black (2004). Raising standards through formative assessment. In Carol Adams and Kathy Baker (Eds). Perspectives on pupil assessment. The GTC conference New Relationships: Teaching, Learning and Accountability London, 29 November 2004.

Paul Black is an advocate of assessement for learning, contrasting it with assessment of learning. Help the individual pupil to reach the goals, give her meaningful feedback. Reporting grades is not meaningful feedback, and will definitely damage the motivation of many pupils.

Achievement test questions tend to ask for definite answers. This need not be a problem in any given test, but it definitely is problematic if achievement testing forces complex and probabilistic phenomena into the format of questions having clear cut answers. Fischbein (1975) presents research showing just one such effect (see here). The age-of-the-captain problem is a straightforward illustration of what is meant here (see examples chapter 5 ).

feed-forward, backwash.
heuristics must be generally applicable.
statistical analysis
ultimate goals

1.3 Summary of content

1.4 An historical perspective

Famous 16th century humanist course : no exams

(IV p. 507) " ..... there never existed a roll of attending students: the lectures were quite free, and, although highly formative, they did not lead to any test, nor to any degree: they were just intended to turn those who cared to avail themselves of them, into well-equipped searchers and ripe scholars."
(II p. 122) ... the nature of the Institute, which was meant to be free and generous in the dispensing of choice linguistic knowledge: whoever wished, could avail himself of what was offered in genuine benevolence. Its result did not stay out: a thorough acquaintance with the languages and literatures of Rome and Greece soon spread, and, although more slowly, yet more suely even showed its influence on intellectual activity itself: it reduced all study and research to reality and objectivity, breaking off with tradition and senseless repeating. That new impulse, which was as the very spirit of Busleyden's Institute, was exhibited in the forming of the students: instead of a series of automatic beings, drilled after the same pattern, it created a race of searchers, of able workers, longing for a new, for a better and intenser activity. That spirit of the teaching was at once recognized by youth, that age of generous attempts, dreaming of progress and ideal.

Henry de Vocht (1951-1955). History of the foundation and the rise of the Collegium Trilingue Lovaniense 1517-1550. 4 volumes. Louvain: Bibliothèque de l’Université, Bureau de Recueil.

If there is one place and time for the beginning of 16th century Humanism, it is the in 1517 newly founded College of the Three Languages (Latin, Greek and Hebrew) at Leuven University. The one person, Erasme, was narrowly involved. Many students from all over Europe flocked in to attend its lectures, and spread the new attitude to reading, writing and learning all over Europe.

Explicit - formal - assessment of learning is not at all self-evident. The case of the Collegium Trilingue Lovaniense is merely a shining example of the kind of course that is an end in itself, instead of one that finds its end in an exam. Back to today's worries: do not be mistaken to organize end-of-course tests in the case of skills labs, or other courses where pupils primarily do things or otherwise get new experiences. It will still be necessary to design questions and problems that will stimulate pupils, but they will not necessarily be used in exit tests.

1.5 Literature

John Dewey. Democracy and education.

John Dewey (1939). Experience, knowledge and value, in Paul Arthur Schilpp and Lewis Edwin Hahn: The philosophy of John Dewey. Open Court, third edition, 1989..

Wesman (1971) 'Writing the test item' in Thorndike Educational Measurement (p. 81-129).

Neil Postman and Charles Weingartner (1969) Teaching as a subversive activity. Penguin Education Specials.

more literature

See also the English literature mentioned in the Dutch chapter html

Isaac I. Bejar (2005). Toward a science of assessment. In Wayne J. Camara en Ernest W. Kimmel (Eds) (2005). Choosing students. Higher education admissions tools for the 21st century. London: Lawrence Erlbaum Associates.

Ginette Delandshere (2002). Assessment as inquiry. Teachers College Record, 104, 1461-1484. pdf

Howard T. Everson (1995). Modeling the student in intelligent tutoring sytems: The promise of a new psychometrics. Instructional Science, 23, 433-452.

E. Fischbein (1975). The intuitive sources of probabilistic thinking in children. Dordrecht: Reidel.

Adriaan D. Groot (1946/1978). Thought and choice in chess. Den Haag: Mouton, 1978.

Thomas M. Haladyna (1999 2nd). Developing and validating multiple-choice test items. Erlbaum. (2004 3rd)

Stephen Klassen (2006). Contextual assessment in science education: Background, issues, and policy. Science Education, 1-32. restricted access pdf

Robert L. Linn (Ed.) (1989). Educational measurement. National Council on Measurement in Education, and American Concil on Education. Third edition (the second edition is Thorndike, 1971).

Robert J. Mislevy (1994). Test theory reconceived. National Center for Research on Evaluation, Standards, and Student testing (CRESST) pdf.

Steven J. Osterlind (1997). Constructing test items: Multiple-Choice, Constructed-Response, Performance, and Other Formats. Kluwer.

James W. Pellegrino, Naomi Chudowsky and Robert Glaser (Eds). Knowing what Students Know. The Science and Design of Educational Assessment. Board on Testing and Assessment / Center for Education / Division of Behavioral and Social Sciences and Education / National Research Council: Committee on the Foundations of Assessment. Washington, DC: National Academy Press. [for reading available on NAP]

Lorrie A. Shepard (2000). The role of classroom assessment in teaching and learning. CSE Technical Report 517 http://www.cse.ucla.edu/Reports/TECH517.pdf [Dead link? May 1, 2009] Published in V. Richardson (Ed.) (2001), Handbook of research on teaching (4th ed). Washington, DC: American Educational Research Association.

Robert L. Thorndike (ed.) (1971). Educational measurement. Washington, DC: American Council on Education.

Stephen Toulmin (1958). The uses of argument. Cambridge University Press.

A. G. Wesman (1971). Writing the test item. In Robert L. Thorndike (ed.) (1971). Educational measurement. Washington, DC: American Council on Education.

Institute of Educational Assessors site

Office for Standards in Education (Ofsted) (2003). Good assessment in secondary schools. pdf

