Original publication 'Toetsvragen schrijven' 1983 Utrecht: Het Spectrum, Aula 809, Onderwijskundige Reeks voor het Hoger Onderwijs ISBN 90-274-6674-0. The 2006 text is a revised text. The integral 1983 text is available at www.benwilbrink.nl/publicaties/83ToetsvragenAula.pdf.

Item writing

Techniques for the design of items for teacher-made tests

4. Test items on individual terms


Ben Wilbrink

this database of examples has yet to be constructed. Suggestions? Mail me.

The most basic concepts are the words of one's mother language. Paul Bloom (2000) has written down everything he knows about how children learn the meanings of words. Therefore his text contains many examples of how the correctness of meanings is tested in daily discourse and activities.

Word learning, and especially the learning of names for things, certainly seems like a simple process, at least to scholars who are not directly engaged in its study. [Bloom, p. 3]

As an item writer, you are directly engaged in the intricacies of the learning of (the meaning) of words, for you will be asking for those meanings. Be assured you will time and again naively assume the learning of new words, terms or concepts to be a straightforward as well as simple thing. Try to remember there is nothing that is simple here, that items testing for meaning need not be easy at all. Bloom uses an example from Quine (1960) to drive the point home: a linguist visiting a new tribe might note that a rabbit rushing by seems to be called gavagai. Can she be sure that gavagai is what we mean by rabbit? Quine: she can never be sure, there is an infinity of other possible meanings. This state of affairs has many implications, one of them being that it is not possible to design a test item that will establish beyond doubt that the test taker is using the term gavagai in the same way we do rabbit, or the concept of number in the way the textbook intends it. It is possible to test for differences in meaning, though, in just the same way one may try to falsify hypotheses. In the physics course, there is more to testing for the correct understanding of the term force then directly asking for it: some naive misunderstanding might still exist, and had better be tested for more directly.

Disciplinary languages are not particularly different from one's mother language. In order to function effectively as a baker, broker, lawyer of physicist one must thoroughly master the meaning of the technical terms in one's trade or profession. An important part of education, therefore, is about the terms of the trade, or the different subjects in the curriculum.

4.1 translation


Figure 1. Translation

implicit context.

A schema on transfer. From N. Sanjay Rebello and Dean A. Zollman (2004?). A Model for Dynamic Transfer of Learning. pdf

Transfer is connected to the translation category, in the sense of discovering the situation at hand is a translation of a known situation, or can be translated into a known situation, etcetera. Transfer is a hot issue in educational research, see Mestre (2005).

Figure 4.1 "Assessment items that focus on a conceptual understanding of fractions." [Schoenfeld, 2006, p. 16]

As noted earlier, traditional assessments look for students' ability to compute with fractions (e.g., Find 2/5 + 3/4). More recent assessments, aligned with the broad characterizations of mathematical proficiency just listed, ask for more. See Figure 1 for examples of assessment tasks that have been used to judge various aspects of students' knowledge of fractions.
Tasks 1 and 2 check for students' conceptual understanding. They ask students to work with different ways of representing fractions, and they check to see whether students understand that a given fraction of the whole must be the same size in every occurrence. (Many students will answer '1/4' for Task 1 and '2/6' for Task 2.) Task 3 probes students' understandings of proportional and inverse relationships (when the denominator increases, the value of the fraction decreases) as well as their ability to explain what they understand. [Schoenfeld, 2006, p. 16 pdf]


Figure 2. Context or not, and if so, which one?

Express in [set-theoretic] symbols:

  1. b is an element of C
  2. C is a proper subset of D
  3. the union of A and C
  4. the set which consists of the elements d, e and g
  5. d is not an element of the intersection of A and B
  6. the complement of A is a proper subset of the union of B and C

Allwood a.o. (1977), p. 13
The answers are, respectively, bC, CD, AC, {d, e, g}, dAB, A-BC
[HTML code for logical operators: here]

The boxed question is most elementary stuff, e. and f. being of a certain complexity, though. There is no point in doing endless exercises at this level, because somewhat more complex text and problems will routinely contain symbolic expressions like these. In summative testing, then, the skill in handling symbols is not tested expressly, and should probably not be assessed incidentally - on the basis of small errors made - either.
Manipulating these abstract sets and their elements is one thing, translating real world events in set theoretic relations might be quite another. The boxed problem assumes that this translation already has been made. My - and your - intuition will be that this translation is the really difficult part of using set theoretic symbolisms adequately. How do Allwood and his co-authors deal with this? In their problem 2. in the reverse way, they ask for back-translating a symbolism into natural language.

Translate the following expressions into idiomatic English.

  1. {x |x is a boy and Mary has kissed x}
  2. {x |x is a Dane} ∩ {x |x is a philosopher}

Allwood a.o. (1977), p. 13
'Danish philosophers' is the answer to the second ond, 'the boys whom Mary has kissed" the other.

Because using the symbolism adequately is so basic, one would expect a long series of exercises, covering a number of typical and possibly not so typical situations and examples. 'The boys Mary kissed' is a funny exercise, but it reminds one of the toy problems used in the typical physics course: surely they can illustrate important physical laws, but they will not drive home the notion that these laws are important. If these set theoretic symbolisms is important at all, and it seems to be so because of its place in the first chapter in the Allwood logic book, then show us, and help us learn to adequately use its power. On the way to doing so, we might even be able to get freed of some naive notions that in a number of situations would lead us astray. What we do we find in the Allwood chapter, however, is nothing but yet more terminology. Why bother to learn it by heart? This chapter on sets should be removed to the appendices of the book, to be consulted if need be.

If it is important to get your students knowledgeable about set theoretic techniques, then teach them. Do not be content with presenting the basic vocabulary and some oversimplistic examples.

Set theory is very important in the fundamentals of many disciplines, mathematics to begin with. Asked to name a significant book, there just might be some students who mention the 'Principia mathematica' van Russell and Whitehead. Asked for an example of a problem in set theory that is associated with a Greek Island, they might venture the Cretenzer: all Cretenzers are liars, and this Cretenzer says he is a liar too. Let your students discuss the Cretenzer problem. Let them think of other problems of this kind, give them some examples from the history of science. Elaborate this to the point where your students can see clearly why it is important in the study of language to have an exact technique to represent or describe complx events. Use computer programming as a vehicle to demonstrate the most obvious point, if the students have some programming experience. Use their knowledge of fundamental questions or laws in physics and mathematics, and let them discover how they might be connected to set theory, or translated in set-theoretic terms. If you introduce Venn-diagras, let them look up what kind of problem Venn could solve by using this technique.

Why all this trouble? Learning strange words isn't all that different, is it? It is different, because it is perfectly transparent what good it is, in learning a second language, to learn some vocabulary. It is not a mystery how these words eventually will be used in translations or in speaking the language. The vocabulary of science is quite another matter. There are reasons why there is a special vocabulary to begin with, or why symbolic techniques are used that look somewhat infantile to begin with. And that is exactly the point, isn't it: we need to be ridiculously precise in laying the foundations of our scientific theories, we have to 'educate' them theories in a certain sense, theories should behave properly.

I will probably time and again return to set theory because it might turn out to be one of the most suitable disciplines to illustrate design techniques for achievement test items. Because of its fundamental character, surely, but also because these techniques in one way or another are used in almost every discipline taught in our schools.

Special situation (see also par. 6.6)

May 2006. It just might be the case that there are many training situations where the trainees have to master a lot of rather abstract course material, or a lot of rather disconnected facts, or both. There is no opportunity to break down the abstractions in such a way trainees will be able to truly understand the material: critical maintenance tasks in a nuclear reactor will be handled by personnel not having a PhD in physics. Facts may be rather disconnected because they are empirically established quantities: the distance of the Moon to the Earth. Or the - exact - numbers have been set by fiat: speed limits, number of days in quarantaine.

Assessing the mastery of trainees in courses containing a lot of material they may have learned by heart poses a peculiar problem. Remember that assessing the translation of words is a straightforward thing, and does not pose the risk of trainees learning the test items by heart, instead of the course material. The two things are the same. The case is different for the speedlimit. Using but one item to test for speed limit A in D, the risk is trainees will learn the item by heart, not the speed limit. At the very least, a variety of test items will have to be developed to test for knowledge of speed limit A in D. The same for abstractions. Always asking what the seven kinds of A are specifically, is an invitation for trainees to learn the list by heart. Surely, knowing the list will enable them to use the practical knowledge, but first rambling through a list hardly is what the course is meant to train for.

Looking for this variety of items asking for particular facts or names - abstractions - the question is whether multiple choice is a good choice. Answer that question for yourself in a roundabout way: first develop that variety of questions in the straightforward open end question format. Only then contemplate whether multiple choice is a natural development, or at least not hopelessly unnatural - the book will help you here. But first answer the question whether multiple choice is the only practical answer to your assessment problem: many trainees, multiple assessments, a rather large number of items on a limited body of course material.

The idea of the multiple choice format is linked to ease of scoring tests. However, there are many other ways to ease this burden of scoring. Using optical character readers is a technique that might enable open ended questions to get scored automatically. Remember the particular case we are investigating here: course material containg many abstractions - names - and/or partical facts and/or particular numbers. These kinds of information are rather exact, just as the spelling of words in another language are. It should not be too difficult to list the one or two alternative answers to the open ended question that will count as a correct answer also.

Coarse goals and course content place restrictions on the design of a variety of ways to test for a particular item of knowledge. The challenge is not to invent clever ways to test for knowledge, that way out would be way too easy. The challenge is to design ways to test for knowledge that enhance the goals of the course or that motivate trainees to go for a better integration of separate bits of knowledge into a meaningful whole.

But how can this be, this 'meaningful whole' of abstract knowledge? There is an intriguing memory method, used in the sixteenth and seventeenth century, that enables one to efficiently store and remember a lot of disconnected facts. Basically, the trick is to relate each item in one or two associative steps to a particular standard item available in the memory system. The memory system is a personal one, and could be the house of one's elders, each room containing particular objects. I am not suggesting here to use these old techniques. Nevertheless, organizing a lot of rather disconnected abstractions in meaningful ways will be distinctly helpful in instruction, and the efficient thing to do is to make use of such meaningful organization in testing for knowledge of the abstractions. To begin with, there will already be a meaningful structure in the course material, and/or in the context of the course material. Bring that structure to the fore, schematizing the meaningful relationships involved.

Now suppose you are stuck, you do not see that many meaningful relations between the abstract items in the course. What heuristics are available to enable you to discern more meaningful relationships, if there are any? At this point I do not have any heuristics available, other than the series comprising the book. Therefore, I will gather a number of different cases, and investigate in each particular case what specific possibilities might be. In the course of this exercise I expect to find a number of heuristics that will be useful in most of the cases. Of course, you are invited to submit examples also.


- precision. The number pi in 5 decimal places. - scale [orde van grootte, grootteorde]. Is the distance from the Moon to the Eart 700 thousand or 7 million km? html

4.2 Definitions

4.3 Providing examples

4.4 Recognizing/naming examples


Is 7 a prime number? _______


Why is 7 an example of a prime number?

The motivation for this type of classroom question given (p. 8):
"It helps pupils recall their knowledge of the properties of prime numbers and the properties of 7 and compare them. They then decide whether 7 is an example of a prime number."
"It requires pupils to explain their understanding of prime numbers and use this to justify their reasoning."
"It provides an opportunity to make an assessment without necessarily asking supplementary questions."

Assessment for learning. Using assessment to raise achievement in mathematics. www.qca.org.uk http://www.qca.org.uk/downloads/6311_using_assess_raise_acievement_maths.pdf [dead link? 1-2009]. Elaborates on effective questioning techniques in cases such as 'Why is 7 prime?'

4.4 more literature

Richard C. Anderson, Raymond W. Kulhavy (1972). Learning Concepts from Definitions. American Educational Research Journal, 9, 385-390.

4.5 Recognizing and naming using formally defined terms

4.6 Declarative knowledge

4.7 Literature

Paul Bloom (2000). How children learn the meanings of words. MIT Press.

J. P. Mestre (2005). Transfer of learning: from a modern multidisciplinary perspective. San Francisco: Sage. comment and summary

Willard V. Quine (1960). Word and object. MIT Press. [I have not yet seen this book, Paul Bloom refers to it; bibliotheek WassWeg: PSYCHO C8.-104]

Alan H. Schoenfeld (2006). What doesn't work: The challenge and failure of the what works clearinghouse to conduct meaningful reviews of studies of mathematics curricula. Educational Researcher, 35, march, 13-21. pdf

more literature

Jens Allwood, Lars-Gunnar Andersson and Östen Dahl (1977). Logic in linguistics. Cambridge University Press.

Benjamin S. Bloom et aliis (Ed.) (1956). Taxonomy of educational objectives. The classification of educational goals. Book 1 Cognitive domain. David McKay.

Benjamin S. Bloom, J. Thomas Hastings and George F. Madaus (Eds) (1971). Handbook on formative and summative evaluation of student learning. London: McGraw-Hill.

C. Alan Boneau (1990). Psychological literacy. A first approximation. American Psychologist, 45, 891-900.

Hans Freudenthal (1978). Weeding and sowing. Preface to a science of mathematical education. Dordrecht: Reidel.

M. D. Merrill and R. D. Tennyson (1977). Teaching concepts: an instructional design guide. Englewood Cliffs, N.J.: Educational Technology Publications.

Richard B. Millward: Models of concept formation. In Richard E. Snow, Pat-Anthony. Federico and William E. Montague (Eds.) (1980). Aptitude, learning and instruction. Volume 2: cognitive process analyses of learning and problem solving (p. 245-276). Erlbaum.

Mark K. Singley and John R. Anderson (1989). The transfer of cognitive skill. London: Harvard University Press.

E. E. Smith and D. L. Medin (1981). Categories and concepts. Cambridge, Mass.: Harvard University Press.

P. W. Tiemann and S. M. Markle (1978). Analyzing instructional content: a guide to instruction and evaluation. Champaign, Illinois: Stipes, 1978.


There are some - or even a lot - of sites displaying example tests and sets of items, many also presenting construction principles. Naming a site here does not mean I endorse all of the techniques and examples presented there. However, if a site for strong reasons deserves disapproval, I will argue so.