Original publication 'Toetsvragen schrijven' 1983 Utrecht: Het Spectrum, Aula 809, Onderwijskundige Reeks voor het Hoger Onderwijs ISBN 90-274-6674-0. The 2006 text is a revised text. The integral 1983 text is available at www.benwilbrink.nl/publicaties/83ToetsvragenAula.pdf.

Item writing

Techniques for the design of items for teacher-made tests

4. Test items on individual terms

Examples

Ben Wilbrink

this database of examples has yet to be constructed. Suggestions? Mail me.

The most basic concepts are the words of one's mother language. Paul Bloom (2000) has written down everything he knows about how children learn the meanings of words. Therefore his text contains many examples of how the correctness of meanings is tested in daily discourse and activities.

Word learning, and especially the learning of names for things, certainly seems like a simple process, at least to scholars who are not directly engaged in its study. [Bloom, p. 3]

As an item writer, you are directly engaged in the intricacies of the learning of (the meaning) of words, for you will be asking for those meanings. Be assured you will time and again naively assume the learning of new words, terms or concepts to be a straightforward as well as simple thing. Try to remember there is nothing that is simple here, that items testing for meaning need not be easy at all. Bloom uses an example from Quine (1960) to drive the point home: a linguist visiting a new tribe might note that a rabbit rushing by seems to be called gavagai. Can she be sure that gavagai is what we mean by rabbit? Quine: she can never be sure, there is an infinity of other possible meanings. This state of affairs has many implications, one of them being that it is not possible to design a test item that will establish beyond doubt that the test taker is using the term gavagai in the same way we do rabbit, or the concept of number in the way the textbook intends it. It is possible to test for differences in meaning, though, in just the same way one may try to falsify hypotheses. In the physics course, there is more to testing for the correct understanding of the term force then directly asking for it: some naive misunderstanding might still exist, and had better be tested for more directly.

Disciplinary languages are not particularly different from one's mother language. In order to function effectively as a baker, broker, lawyer of physicist one must thoroughly master the meaning of the technical terms in one's trade or profession. An important part of education, therefore, is about the terms of the trade, or the different subjects in the curriculum.

4.1 translation

gif/06dToetsvragen4.1.1.jpg

Figure 1. Translation

implicit context.
gif/06dex4.1.rebello.jpg

A schema on transfer. From N. Sanjay Rebello and Dean A. Zollman (2004?). A Model for Dynamic Transfer of Learning. pdf

Transfer is connected to the translation category, in the sense of discovering the situation at hand is a translation of a known situation, or can be translated into a known situation, etcetera. Transfer is a hot issue in educational research, see Mestre (2005).

Figure 4.1 "Assessment items that focus on a conceptual understanding of fractions." [Schoenfeld, 2006, p. 16]

As noted earlier, traditional assessments look for students' ability to compute with fractions (e.g., Find 2/5 + 3/4). More recent assessments, aligned with the broad characterizations of mathematical proficiency just listed, ask for more. See Figure 1 for examples of assessment tasks that have been used to judge various aspects of students' knowledge of fractions.
Tasks 1 and 2 check for students' conceptual understanding. They ask students to work with different ways of representing fractions, and they check to see whether students understand that a given fraction of the whole must be the same size in every occurrence. (Many students will answer '1/4' for Task 1 and '2/6' for Task 2.) Task 3 probes students' understandings of proportional and inverse relationships (when the denominator increases, the value of the fraction decreases) as well as their ability to explain what they understand. [Schoenfeld, 2006, p. 16 pdf]

gif/06dToetsvragen4.1.1.jpg

Figure 2. Context or not, and if so, which one?

Express in [set-theoretic] symbols:

b is an element of C
C is a proper subset of D
the union of A and C
the set which consists of the elements d, e and g
d is not an element of the intersection of A and B
the complement of A is a proper subset of the union of B and C

Allwood a.o. (1977), p. 13

The answers are, respectively, b ∈ C, C ⊂ D, A ∪ C, {d, e, g}, d ∉ A ∩ B, A^{^-} ⊂ B ∪ C
[HTML code for logical operators: here]

The boxed question is most elementary stuff, e. and f. being of a certain complexity, though. There is no point in doing endless exercises at this level, because somewhat more complex text and problems will routinely contain symbolic expressions like these. In summative testing, then, the skill in handling symbols is not tested expressly, and should probably not be assessed incidentally - on the basis of small errors made - either.
Manipulating these abstract sets and their elements is one thing, translating real world events in set theoretic relations might be quite another. The boxed problem assumes that this translation already has been made. My - and your - intuition will be that this translation is the really difficult part of using set theoretic symbolisms adequately. How do Allwood and his co-authors deal with this? In their problem 2. in the reverse way, they ask for back-translating a symbolism into natural language.

Translate the following expressions into idiomatic English.

{x |x is a boy and Mary has kissed x}
{x |x is a Dane} ∩ {x |x is a philosopher}

Allwood a.o. (1977), p. 13

'Danish philosophers' is the answer to the second ond, 'the boys whom Mary has kissed" the other.

Because using the symbolism adequately is so basic, one would expect a long series of exercises, covering a number of typical and possibly not so typical situations and examples. 'The boys Mary kissed' is a funny exercise, but it reminds one of the toy problems used in the typical physics course: surely they can illustrate important physical laws, but they will not drive home the notion that these laws are important. If these set theoretic symbolisms is important at all, and it seems to be so because of its place in the first chapter in the Allwood logic book, then show us, and help us learn to adequately use its power. On the way to doing so, we might even be able to get freed of some naive notions that in a number of situations would lead us astray. What we do we find in the Allwood chapter, however, is nothing but yet more terminology. Why bother to learn it by heart? This chapter on sets should be removed to the appendices of the book, to be consulted if need be.

If it is important to get your students knowledgeable about set theoretic techniques, then teach them. Do not be content with presenting the basic vocabulary and some oversimplistic examples.

Set theory is very important in the fundamentals of many disciplines, mathematics to begin with. Asked to name a significant book, there just might be some students who mention the 'Principia mathematica' van Russell and Whitehead. Asked for an example of a problem in set theory that is associated with a Greek Island, they might venture the Cretenzer: all Cretenzers are liars, and this Cretenzer says he is a liar too. Let your students discuss the Cretenzer problem. Let them think of other problems of this kind, give them some examples from the history of science. Elaborate this to the point where your students can see clearly why it is important in the study of language to have an exact technique to represent or describe complx events. Use computer programming as a vehicle to demonstrate the most obvious point, if the students have some programming experience. Use their knowledge of fundamental questions or laws in physics and mathematics, and let them discover how they might be connected to set theory, or translated in set-theoretic terms. If you introduce Venn-diagras, let them look up what kind of problem Venn could solve by using this technique.

Why all this trouble? Learning strange words isn't all that different, is it? It is different, because it is perfectly transparent what good it is, in learning a second language, to learn some vocabulary. It is not a mystery how these words eventually will be used in translations or in speaking the language. The vocabulary of science is quite another matter. There are reasons why there is a special vocabulary to begin with, or why symbolic techniques are used that look somewhat infantile to begin with. And that is exactly the point, isn't it: we need to be ridiculously precise in laying the foundations of our scientific theories, we have to 'educate' them theories in a certain sense, theories should behave properly.

I will probably time and again return to set theory because it might turn out to be one of the most suitable disciplines to illustrate design techniques for achievement test items. Because of its fundamental character, surely, but also because these techniques in one way or another are used in almost every discipline taught in our schools.

Special situation (see also par. 6.6)

May 2006. It just might be the case that there are many training situations where the trainees have to master a lot of rather abstract course material, or a lot of rather disconnected facts, or both. There is no opportunity to break down the abstractions in such a way trainees will be able to truly understand the material: critical maintenance tasks in a nuclear reactor will be handled by personnel not having a PhD in physics. Facts may be rather disconnected because they are empirically established quantities: the distance of the Moon to the Earth. Or the - exact - numbers have been set by fiat: speed limits, number of days in quarantaine.

Assessing the mastery of trainees in courses containing a lot of material they may have learned by heart poses a peculiar problem. Remember that assessing the translation of words is a straightforward thing, and does not pose the risk of trainees learning the test items by heart, instead of the course material. The two things are the same. The case is different for the speedlimit. Using but one item to test for speed limit A in D, the risk is trainees will learn the item by heart, not the speed limit. At the very least, a variety of test items will have to be developed to test for knowledge of speed limit A in D. The same for abstractions. Always asking what the seven kinds of A are specifically, is an invitation for trainees to learn the list by heart. Surely, knowing the list will enable them to use the practical knowledge, but first rambling through a list hardly is what the course is meant to train for.

Looking for this variety of items asking for particular facts or names - abstractions - the question is whether multiple choice is a good choice. Answer that question for yourself in a roundabout way: first develop that variety of questions in the straightforward open end question format. Only then contemplate whether multiple choice is a natural development, or at least not hopelessly unnatural - the book will help you here. But first answer the question whether multiple choice is the only practical answer to your assessment problem: many trainees, multiple assessments, a rather large number of items on a limited body of course material.

The idea of the multiple choice format is linked to ease of scoring tests. However, there are many other ways to ease this burden of scoring. Using optical character readers is a technique that might enable open ended questions to get scored automatically. Remember the particular case we are investigating here: course material containg many abstractions - names - and/or partical facts and/or particular numbers. These kinds of information are rather exact, just as the spelling of words in another language are. It should not be too difficult to list the one or two alternative answers to the open ended question that will count as a correct answer also.

Coarse goals and course content place restrictions on the design of a variety of ways to test for a particular item of knowledge. The challenge is not to invent clever ways to test for knowledge, that way out would be way too easy. The challenge is to design ways to test for knowledge that enhance the goals of the course or that motivate trainees to go for a better integration of separate bits of knowledge into a meaningful whole.

But how can this be, this 'meaningful whole' of abstract knowledge? There is an intriguing memory method, used in the sixteenth and seventeenth century, that enables one to efficiently store and remember a lot of disconnected facts. Basically, the trick is to relate each item in one or two associative steps to a particular standard item available in the memory system. The memory system is a personal one, and could be the house of one's elders, each room containing particular objects. I am not suggesting here to use these old techniques. Nevertheless, organizing a lot of rather disconnected abstractions in meaningful ways will be distinctly helpful in instruction, and the efficient thing to do is to make use of such meaningful organization in testing for knowledge of the abstractions. To begin with, there will already be a meaningful structure in the course material, and/or in the context of the course material. Bring that structure to the fore, schematizing the meaningful relationships involved.

Now suppose you are stuck, you do not see that many meaningful relations between the abstract items in the course. What heuristics are available to enable you to discern more meaningful relationships, if there are any? At this point I do not have any heuristics available, other than the series comprising the book. Therefore, I will gather a number of different cases, and investigate in each particular case what specific possibilities might be. In the course of this exercise I expect to find a number of heuristics that will be useful in most of the cases. Of course, you are invited to submit examples also.

quantities

- precision. The number pi in 5 decimal places. - scale [orde van grootte, grootteorde]. Is the distance from the Moon to the Eart 700 thousand or 7 million km? html

4.2 Definitions

4.3 Providing examples

4.4 Recognizing/naming examples

NOT SO GOOD

Is 7 a prime number? _______

GOOD

Why is 7 an example of a prime number?

The motivation for this type of classroom question given (p. 8):
"It helps pupils recall their knowledge of the properties of prime numbers and the properties of 7 and compare them. They then decide whether 7 is an example of a prime number."
"It requires pupils to explain their understanding of prime numbers and use this to justify their reasoning."
"It provides an opportunity to make an assessment without necessarily asking supplementary questions."

Assessment for learning. Using assessment to raise achievement in mathematics. www.qca.org.uk http://www.qca.org.uk/downloads/6311_using_assess_raise_acievement_maths.pdf [dead link? 1-2009]. Elaborates on effective questioning techniques in cases such as 'Why is 7 prime?'

4.4 more literature

Richard C. Anderson, Raymond W. Kulhavy (1972). Learning Concepts from Definitions. American Educational Research Journal, 9, 385-390.

abstract College undergraduates were exposed to one-sentence definitions of unfamiliar concepts and then answered multiple-choice test questions, each of which required them to select an instance of a concept from among four possibilities. Overall performance was very high. This indicates that people can easily learn concepts from definitions. People who used each defined word in a sentence as the definitions were presented learned substantially more than those who read each definition aloud three times, which is further evidence that procedures that make meaningful processing more likely facilitate learning.
Looks simple, isn't it? I will have to study this one. The article seems to have been forgotten.

4.5 Recognizing and naming using formally defined terms

4.6 Declarative knowledge

4.7 Literature

Paul Bloom (2000). How children learn the meanings of words. MIT Press.

precis short abstract: Normal children learn tens of thousands of words, and do so quickly and efficiently, often in highly impoverished environments. In How children learn the meanings of words, I argue that word learning is the product of a set of cognitive and linguistic abilities that include the ability to acquire concepts, an appreciation of syntactic cues to meaning, and a rich understanding of the mental states of other people. These capacities are powerful, early emerging, and to some extent uniquely human.

J. P. Mestre (2005). Transfer of learning: from a modern multidisciplinary perspective. San Francisco: Sage. comment and summary

a.o.:
Efficiency and Innovation in Transfer. Daniel L. Schwartz, John D. Bransford, and David Sears
pdf- Resources, Framing, and Transfer David Hammer, Andrew Elby, Rachel E. Scherr, and Edward F. Redish 89 pdf concept - Dynamic Transfer: A Perspective from Physics Education Research N. Sanjay Rebello, Dean A. Zollman, Alicia R. Allbaugh, Paula V. Engelhardt, Kara E. Gray, Zdeslav Hrepic, and Salomon F. Itza-Ortiz 217. (Previous paper pdf) - Theory, Level, and Function: Three Dimensions for Understanding Transfer and Student Assessment Daniel T. Hickey and James W. Pellegrino 251http://lpsl.coe.uga.edu/people/hickey/Publications/Hickey-Pellegrino.pdf [dead link? 1-2009]

Willard V. Quine (1960). Word and object. MIT Press. [I have not yet seen this book, Paul Bloom refers to it; bibliotheek WassWeg: PSYCHO C8.-104]

Alan H. Schoenfeld (2006). What doesn't work: The challenge and failure of the what works clearinghouse to conduct meaningful reviews of studies of mathematics curricula. Educational Researcher, 35, march, 13-21. pdf

abstract

more literature

Jens Allwood, Lars-Gunnar Andersson and Östen Dahl (1977). Logic in linguistics. Cambridge University Press.

An introductory text, exercises to every chapter, answers provided also, eminently suited as illustrative materials to the design of test items. I intend to use a number of these exercises as examples.

Benjamin S. Bloom et aliis (Ed.) (1956). Taxonomy of educational objectives. The classification of educational goals. Book 1 Cognitive domain. David McKay.

See also html.

Benjamin S. Bloom, J. Thomas Hastings and George F. Madaus (Eds) (1971). Handbook on formative and summative evaluation of student learning. London: McGraw-Hill.

C. Alan Boneau (1990). Psychological literacy. A first approximation. American Psychologist, 45, 891-900.

abstract Utilizes a survey of authors of psychology textbooks to compile a list of the terms and concepts in psychology's 10 subfields that should be general knowledge to psychology students. Reports the "Top 100" concepts and ranks lists of the 100 highest-rated terms for each of the 10 subfields. (FMW [ERIC])
From the overall top 100 the following might be important for readers of 'Test item design.' forgetting curve - intelligence - long-term memory - meaning - normal distribution - rehearsal - reinforcement - sample - semantic memory - short-term memory. In the subfield of cognitive psychology additionally: artificial intelligence - free recall - information-processing approach - immediate memory span - memory span - recall vs recognition - semantic memory - magical number seven - internal representation - episodic memory - mnemonic techniques - schema theory - working memory - chunk hypothesis - semantc network - automatixation - limited capacity model - retrieval process - spreading activation - heuristics and biases - maintenance rehearsal - procedural knowledge - problem-solving set - spreading activation [sic] - declarative knowledge - learning strategies - metamemory - metaphor - cognitive skills - concept identification/formation - categorical perception - confidence judgement - rote memorization - implicit learning - natural concepts paired-associate learning.
Do you know some of these terms? I do not know them all, I will have mentioned one or two in error. Remark that the lists in the article really are lists, no concept has been related to another. Well, one has to start somewhere to make an inventory of a dsicipline's concepts! Even the author confesses not to know what to use these lists for.

Hans Freudenthal (1978). Weeding and sowing. Preface to a science of mathematical education. Dordrecht: Reidel.

M. D. Merrill and R. D. Tennyson (1977). Teaching concepts: an instructional design guide. Englewood Cliffs, N.J.: Educational Technology Publications.

Richard B. Millward: Models of concept formation. In Richard E. Snow, Pat-Anthony. Federico and William E. Montague (Eds.) (1980). Aptitude, learning and instruction. Volume 2: cognitive process analyses of learning and problem solving (p. 245-276). Erlbaum.

Mark K. Singley and John R. Anderson (1989). The transfer of cognitive skill. London: Harvard University Press.

Barry Krusch (1994). The Role of Frame Analysis in Enhancing the Transfer of Knowledge. pdf
N. Sanjay Rebello and Dean A. Zollman (2004?). A Model for Dynamic Transfer of Learning. pdf.

E. E. Smith and D. L. Medin (1981). Categories and concepts. Cambridge, Mass.: Harvard University Press.

P. W. Tiemann and S. M. Markle (1978). Analyzing instructional content: a guide to instruction and evaluation. Champaign, Illinois: Stipes, 1978.

Sites

There are some - or even a lot - of sites displaying example tests and sets of items, many also presenting construction principles. Naming a site here does not mean I endorse all of the techniques and examples presented there. However, if a site for strong reasons deserves disapproval, I will argue so.

Qualifications and Curriculum Authority QCA site
- This is a U.K. Government sponsored public body
- "QCA maintains and develops the national curriculum and associated assessments, tests and examinations; and accredits and monitors qualifications in colleges and at work."
- The GCA endorses the assessment for learning approach for classroom questioning (also for the high-stakes national tests????). The following page contains lots of resources you may download http://www.qca.org.uk/7659.html [dead link? 1-2009].
October 7, 2006 \ contact ben at at at benwilbrink.nl http://www.benwilbrink.nl/projecten/06examples4.htm