Original publication 'Toetsvragen schrijven' 1983 Utrecht: Het Spectrum, Aula 809, Onderwijskundige Reeks voor het Hoger Onderwijs ISBN 90-274-6674-0. The 2006 text is a revised text.

Item writing

Techniques for the design of items for teacher-made tests

6. Writing items on text

Ben Wilbrink

this database of examples has yet to be constructed. Suggestions? Mail me.

What students think they are doing is much more important than what we think we want to teach them. With sustained engagement, they will come to believe that inquiry and argument offer the most promising path to resolving conflicts, solving problems, and achieving goals. They will become convinced that there are things to find out, that analysis is worthwhile, that unexamined beliefs are not worth having. These are intellectual values, complementary to but distinct from intellectual skills. Engagement, valuing, and understanding bootstrap one another. Yet values, in the end, may be the most critical of the three because students themselves will ultimately decide what is worth knowing. They must find the reasons to become educated. If they do not, schools will experience mediocre success at best.

Deanna Kuhn, 2005, concluding paragraph.

gif/06fgeneral.jpg

Figure 1. What about text? The scheme shows three things the pupil can do with text, as well as compose it. The 'personal theory' is the intellectual baggage of the student, her mental model of (this little bit of) the world.

The base level of questions about text is to ask for reproduction or recognition. The next higher level is analyzing separate parts of the text in relation to each other. Having done so, the pupil is fully prepared to combine other knowledge of the world with the information given in the text, and draw conclusions going outside the boundaries of the text itself. In all stages, but especially in the last one, the personal theory about the world - the pupil's mental model - plays a significant role. The kind of scheme in figure 1 summarizes cognitive theory, see for example Kuhn (2005). Acknowledging the role of personal theory introduces the meta-cognitive level: for the student it is important to know what she knows, what that information is worth - how sure it is - and how it does combine with the information in the new text - or how if fails to do so - and what that implies for either the mental model or the perceived value of the new information.

Thinking in terms of meta-cognition and mental models may not come easy to the teacher or professor, yet they have always done so: testing students for their knowledge and insights is a meta-cognitive exercise itself. Let this be a reassurance that reading the text of this chapter will be worthwhile.

For the time being, examples illustrating the interplay between text, operations on the text, and mental models are available in Kuhn (2005). In time, more examples will be collected from the literature, or constructed.

"Items that attempt to assess the test taker's ability to derive meaning from a passage and to make inferences are often limited to questions such as the following: What is the main idea in this story? What is this stoty mostly about? What is the best title for this story? How did the character probably feel? These are not bad questions. However, a close inspection often reveals that such questions can be answered using information that is explicitly stated in the text."

Robert L. Linn (1988). Dimensions of thinking: Implications for testing. CSE Technical Report 282. http://www.cresst.org/Reports/r282.pdf [link broken? 1-2009]. Published in Beau Fly Jones and Lorna Idol (Eds) (1991). Dimensions of thinking and cognitive instruction. Erlbaum.

It is highly seductive to write your test items in such a way that in fact they only ask for information given literally in the course text. This is of course highly problematic, for students will interprete this as an admonition to study the text this superficially only.

6.1 Participation: have you read it?

6.2 Theme's and main issues

Carefully explain what Nozick and Lewis would say about the following skeptical argument:

P1. I don't know that I am not a brain in a vat.
P2. If I don't know that I am not a brain in a vat, then I don't know that I have hands.
C. I don't know that I have hands.

from the MITOPENCOURSEWARE 'Theory of knowledge' fall 2003 exam html. Nozick en Lewis zijn twee bestudeerde auteurs.

What is the Gettier problem?

The same exam. It is a classical case in theory of knowledge.

6.3 Analysis

gif/06fToetsvragen6.3.1.jpg

Figuur 1. Film making in Sammy's Science House (Apple): analyse the time sequence, drag the images. Difficulty levels: 3 or 4 images. This one may prove tricky for adults as well. Images in stead of text, showing the chapter's title to be unnecessarily restrictive.

What more can one wish for, for kids three or four years of age? A bag full of films, immediate feedback on the results of your analysis, a 'real' film being shown as a reward for the correct sequencing. This surely is a prototype of a good test of analytical thinking. Or a good course for analytical thinking. What comes first, and what later.

Analysis of text has been described beautifully and very much to the point by Deanna Kuhn in the first part of her 2005 chapter 7 The skills of argument. This part of chapter 7 is original with the book, it is not based on research published elsewhere. I will choose this exposition as my frame of reference for the paragraph on analysis of text. The particular case discussed by Kuhn is an admissions test used by, amongst others, The City University of New York (CUNY), testing for argumentative skill. Regrettably, the test is used as an intelligence test, not as one testing for achievements in the skill of reasoning. That is no fault of the test itself, it is an omission in the curriculum followed by the students that have to sit this test.

p. 148: "I began this chapter by looking at a test of argumentative thinking that is employed as a "high-stakes" admission gate to a college degree. Few would probably quarrel with the assertion that students aspiring to a college degree should be able to meet the challenges this test poses. Yet a significant number of students seeking a higher degree fail the test, at some institutions at a rate exceeding 50 percent. Should these students take their failure as proof that they are not suited for higher education? Many no doubt have. But at least a few have reacted with dismay and questioned the message. "Which courses are these skills taught in?" they want to know. "Why haven't we learned them? What can I do now to learn what I need to?" These students have a legitimate complaint, certainly. If these skills are so highly valued, why are courses not available that teach them?

Kuhn meticulously treats one concrete analysis example from this test, showing the complexity of what it is that is asked from the students sitting this test. And then this is only a test of reasoning with information given to the student, not one of reasoning with course material studied. Kuhn's work demonstrates that analytical questions on text can be very complex indeed, even in cases where content itself is kept perfectly transparent. The citation illustrates that there might be serious problems in the relation between analytical questions asked of the student in the exam, and the quality and content of the instruction or course preparing the student for the very same examination. Be careful not to err here.

Figure from Kuhn, D. (2001). How Do People Know? Psychological Science [the image lives on Kuhn's site]

6.4 Inference

JAMES L. WARDROP, THOMAS H. ANDERSON, WELLS HIVELY, C. NICHOLAS HASTINGS, RICHARD I. ANDERSON, KEITH E. MULLER (1982). A framework for analyzing the inference structure of educational achievement tests. Journal of Educational Measurement, 19, 1-18. pdf

abstract A structure for describing different approaches to testing is generated by identifying five dimensions along which tests differ: test uses, item generation, item revision, assessment of precision, and validation. These dimensions are used to profile tests of reading comprehension. Only norm-referenced achievement tests had an inference system consistent with those intended uses. (Author/GK [ERIC])
So, this article is not about what I thought it would be about. That is what is about, though.

CRITICAL THINKING

In her book on the Vietnam War, Frances Fitzgerald argues that the U.S. should not have sent soldiers to fight in Vietnam because America could not hope to win the war. From her view, it was useless to support the corrupt South Vietnamese government against guerrillas who won the respect, loyalty, and support of the peasantry. But Neil Sheehan argues that America could have won the war if it had insisted on replacingincompetent South Vietnamese government officials. Success, according to Sheehan, required leaders who would focus on rural development policies, raise living standards, protect peasants, and reduce the ability of the Communists to recruit soldiers from among the peasants. Assume that Fitzgerald's and Sheehan's views are accurately represented here. Whose arguments are stronger?

Fitzgerald's, because the U.S. should not risk the lives of American soldiers to fight a foreign war.
Fitzgerald's, because the support of the South Vietnamese peasants was essential if American soldiers were to be effective.
Sheehan's, to the extent that the U.S. had the power to install effective leaders, implement rural development, and protect the peasantry.
Sheehan's, because World War I and World War II proved that American military intervention can be successful.
Both (b) and (c). If it were possible to accomplish what Sheehan suggests, the Communists would have lost their base of support among the peasantry. But if Sheehan's suggestions were not feasible, the U.S. should have avoided military intervention.

Yeh (2002, p. 15) pdf. See the article for an explanation of this item and itd theoretical background (Deanna Kuhn's work, among others).

6.5 Composition

6.6 The naive or novice learner

There are many situations in life as well as in institutons where complex text has to be 'learned' somehow, while the learner does not have the necessary background information to fully interpret and understand the textual material. Think of the reader of Scientific American articles (the personal domain), or employees being trained for complex tasks (the institutional domain).
The case material I will use to develop adequate questioning techniques will be two articles on use, effects, and costs of medicines. Bart Meijer van Putten (8 juli 2006). De pijn blijft. NRC Handelsblad, p. 41. Broer Scholtens (8 juli 2006). Zoek het geheim van de dure pillen. De Volkskrant, Kennis p. 5. This choice is motivated by the availability of these articles, the highly technical character of the information contained in it, the audience that is highly involved while not trained in medicine, pharmacy or research methodology, and - last but not least - the availability of lots of examples of well designed test items on medicinal course content, but directed at the well educated student or assistent doctors (f.e. Case and Swanson 2001 http://www.nbme.org/PDF/2001iwg.pdf [dead link? 1-2009]).

EXPOSITORY ORGANIZERS

In all cases I define advance organizers as introductory material at a higher level of abstraction, generality, and inclusiveness than the learning passage itself (...).
Further, advance organizers also differ from overviews in being relatable to presumed ideational content in the learner's current cognitive structure (...).
Expository organizers are used when the new learning material is completely unfamiliar, as determined by pretests, and attempts merely to provide inclusive subsumers that are both related to existing ideas in cognitive structure and to the more detailed material in the learning passage. (...).

David P. Ausubel (1978). In defense of advance organizers: A reply to critics. Review of Educational Research, 48, 251-257. jstor

It will be of interest to trace how exactly Ausubel constructs these 'expository organizers,' because it might be one way—or the way—for trainee/novice learners to give meaning to the material they eventually will have to master.

John Eshleman tried the tracing, going back to the original article in the Journal of Educational Psychology, but he could not understand what it was Ausubel was researching there.

INSTRUCTIONAL DESIGN: OOP

In courses other than computer science, students regularly work with texts and other artifacts far larger and more sophisticated than they could produce themselves. A literature or history course is a good example. The same occurs in sociology courses. These artifacts teach the student what is best in the field and should be emulated.

Give the students access to large programs and designs well before they have the ability to produce them. These artifacts can be used as the basis of exercises. Students can make small modifications to large programs and they can extend them in simple ways early on.

Joseph Bergin (online july 2006). Pedagogical patterns. html. As you will have understood, the work of Bergin is in the field of teaching object-oriented programming OOP.

The case of the naive learner may stand for the more general class of special pedagogical situations, or pedagogical patterns as Joseph Bergin (html) calls them. Bergin, in the box above, does not call the reader 'naive.' Instead he calls the text 'more sophisticated' than the reader, in this way generalizing the topic of this section 6.6. The patterns here are patterns of instructional design in the field of object oriented programming, going by the fancy names of Fixer Upper, Spiral, Mistake, Early Bird, Toy Box, Tool Box, Lay of the land, Test Tube, Larger than Life, Fill in the Blanks. 'Larger than Life,' see the above box, takes the naive learning from text as the problem, and proposes to make the problem the solution. Doing so implies questions - test items also - to be aligned with the instructional design, nothing less, nothing more. Beautiful. The other patterns share this one characteristic: they deviate - each in its own way - from the traditional way of sequencing curricular material. Computer programming is is a discipline that lends itself easily to experimenting with these patterns, because the computer environment itself offers the ultimate testing environment. Nevertheless, the patterns are useful in other disciplines also, for example in medicine starting on day one with realistic cases (Earli Bird pattern). Bergin is one of the people involved in the Pedagigical Patterns Project, all in the field of object-oriented programming etectera site.

6.6 more literature

Scott O. Lilienfeld (2002). When Worlds Collide: Social Science, Politics, and the Rind et al. (1998) Child Sexual Abuse Meta-Analysis [Controversy And Scholarly Publishing]. The American Psychologist, 57, 176-188. html

This article is a quite forceful demonstration of the many ways in which naive readers of scientific articles can go astray in their inferences from the text read. In this particular case even the US Congress almost unanimously let itself be fooled by a host of misinterpreters of the Rind et aliis study pdf, published in one of psychology's most respected magazines, the Psychological Bulletin.
The import of the Lilienfeld article for the topic of naive readership of texts is that it highlights the most importi=ant ways in which we, naive readers, can fool ourselves by sticking to strongly held beliefs in the face of facts disproving these very beliefs. Of course, it is the same mental model problem known from, among other disciplines, physics.

John M. Carroll and Judith Reitman Olson (1987). Mental models in human-computer interaction: Research issues about what the user of software knows. Washington, DC: National Academy Press. http://darwin.nap.edu/books/POD266/html/R1.html [dead link? 1-2009]

This might be an appropriate domain to research the issues of trainees that are naive learners. I will have to study this one. (some 30 pages). The closing section of the paper sums up a series of research recommendations.

Patricia A. Alexander en Judith E. Judy (1988). The interaction of domain-specific and strategic knowledge in academic performance. Review of Educational Research, 58, 375-404.

No free online version available. Try JSTOR.
Patricia A. Alexander (2003). Expertise and academic development: A new perspective on a classic theme. http://www.education.umd.edu/EDHD/faculty2/Alexander/ARL/Alexander_EARLI_Keynote_2003.pdf [dead link? 1-2009]
Patricia A. Alexander (2003). Can we get there from here? Educational Researcher, 32 Theme issue: Expertise. http://www.aera.net/uploadedFiles/Journals_and_Publications/Journals/Educational_Researcher/3208/3208_ThemeAlexander.pdf [dead link? 1-2009] The articles in this issue are available as pdf documents at html
Patricia A. Alexander (2003). The Development of Expertise: The Journey From Acclimation to Proficiency. Educational Researcher, 32
- abstract The Model of Domain Learning (MDL) is an alternative perspective on expertise that arose from studies of student learning in academic domains, such as reading, history, physics, and biology. A comparison of the MDL and traditional models of expertise is made. The key components and stages of the MDL are then overviewed. Discussion concludes with a consideration of evidence-based implications of this model for educational practice.
Patricia A. Alexander (2000). Toward a Model of Academic Development: Schooling and the Acquisition of Knowledge. Educational Researcher, 29, nr 2, pp. 28-34. <
Tamara L. Jetton en Patricia A. Alexander (2000). Learning from text: A multidimensional and developmental perspective. In Kamil, Rosenthal, Pearson and Barr Handbook of Reading Research, Volume III html concluding remarks In this article, we explored the multidimensional nature of learning from text through a discussion of the critical variables of students' knowledge, interest, and use of strategies. We also examined the developmental nature of learning from text as students journey through school from acclimation to competence, and finally to expertise in a subject area. We anticipate that future explorations of learning from text will focus on how individuals learn from text over time, how they learn within nonlinear hypertext environments, and how their beliefs affect this process. We await the future of reading research and instruction to provide us with additional insights into the complex process of learning from text.
Gilat Brill and Anat Yarden (2003). Learning Biology through Research Papers: A Stimulus for Question-Asking by High-School Students. Cell Biology Education 266-274. html
- from the abstract Question-asking is a basic skill, required for the development of scientific thinking. However, the way in which science lessons are conducted does not usually stimulate question-asking by students. To make students more familiar with the scientific inquiry process, we developed a curriculum in developmental biology based on research papers suitable for high-school students. (...) We suggest that learning through research papers may be one way to provide a stimulus for question-asking by high-school students and results in higher thinking levels and uniqueness.

6.7 Literature

Many items are as yet on my 'to do' list: they are mentioned here, but not used in the above text yet.

Deanna Kuhn (2005). Education for thinking. Harvard University Press. excerpt

Deanna Kuhn and D. Dean (2004). Connecting scientific reasoning and causal inference. Journal of Cognition and Development, 5, 261-288.
- abstract Literature on multivariable causal inference (MCI) and literature on scientific reasoning (SR) have proceeded almost entirely independently, although they in large part address the same phenomena. An effort is made to bring these paradigms into close enough alignment with one another to compare implications of the two lines of work and examine how they might illuminate one another. The conclusion is that SR research stands to benefit from recognition that it addresses a broader set of cognitive phenomena than reasoning in contexts that are explicitly scientific, whereas MCI research stands to benefit from recognizing inter- and, especially, intra-individual variability that its methods may have masked. Data reported here based on a merging of the two methodological paradigms support a model in which individuals have available a repertory of different inference strategies or rules (reflecting different criteria for inferences of causality) from which they select variably across occasions, in a dynamic process of theory-evidence coordination.
Jacqueline P. Leighton, Rebecca J. Gokiert and Ying Cui (2005). Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments: Causal and Categorical Reasoning in Science. Paper presented at the Annual Meeting of the American Educational Research Association (AERA), Montreal, Quebec, Canada (April 2005). pdf
- from the abstract The results of the present study indicate that science assessments involve at least two substantive dimensions to which students react - causal reasoning and categorical reasoning - described in the scientific reasoning literature (Kuhn and Dean, 2004).
S.T. Levy, D. Mioduser and V. Talis, (in preparation). Episodes to Scripts to Rules: Concrete-abstractions in kindergarten children's construction of robotic control rules. pdf
Robert J. Sternberg (2003). What Is an 'Expert Student?' Educational Researcher, 32 pdf
- abstract This article suggests that conventional methods of teaching may, at best, create pseudo-experts�students whose expertise, to the extent they have it, does not mirror the expertise needed for realworld thinking inside or outside of the academic disciplines schools normally teach. It is suggested that teaching for �successful intelligence� may help in the creation of future experts. It is further suggested that we may wish to start teaching students to think wisely, not just well.

Stuart S. Yeh (2002). Tests Worth Teaching To: Constructing State-Mandated Tests That Emphasize Critical Thinking Educational Researcher, 30, # 9, 12-17. pdf

more literature

Dutch

Jos Kessels, Ad van der Kam en Jan Tollenaar (1989). De zaak Arlet; inleiding in de kennistheorie + Handleiding voor de docent. Meppel: Boom.

M. Gall, B. Dunning en R. Weathersby (1971). Minikursus Denkvragen stellen. Groningen: Wolters-Noordhoff. Nederlandse bewerking: P. L. v. d. Plas en W. J. M. de Roos, 1977.

W. R. Borg, M. L. Kelley en P. Langer (1970). Minikursus effektief vragen stellen. Nederlandse bewerking J. Heeringa en S. A. M. Veenman, 1977. Wolters-Noordhoff.

English

Ron Oostdam en Gert Rijlaarsdam (). Towards strategic language learning. Amsterdam University Press (USA: The University of Chicago Press).

Huub van den Bergh (1990). On the construct validity of multiple-choice items for reading comprehension. Applied Psychological Measurement, 14, 1-14.

Benjamin S. Bloom et aliis (Ed.) (1956). Taxonomy of educational objectives. The classification of educational goals. Book 1 Cognitive domain. David McKay.

Joyce Chapman (2005). The Development of the Assessment of Thinking Skills. University of Cambridge Local Examinations Syndicate. http://www.cambridgeassessment.org.uk/research/confproceedingsetc/publication.2005-10-13.7538460012/file/ [dead link? 1-2009]

Robert Cummins: Cross-domain inference and problem embedding. In Robert Cummins and John Pollock (Eds) (1991). Philosophy and AI. (p. 23-38) MIT.

Entwistle, N. (1995). Frameworks for understanding as experienced in essay writing and in preparing for examinations. Educational Psychologist, 30, 47-54. abstract

This article has been mentioned already in paragraph 3.4.
The kind of understanding in Entwistle might just be a level deeper than that usually tested for in items on text. At least, that is my hunch. Such will surely be the case where learning is somewhat superficial, as treated in paragraph 6.5. It might be a useful suggestion to define the topic in 6.5 in just this way: a course implemented in such a way that it does not easily allow deep undertanding of the material presented (in the time available for its digestion).

Frank Friedman en John P. Rickards (1981). Effect of Level, Review, and Sequence of Inserted Questions on Text Processing. Journal of Educational Psychology, 73, 427-436.

Donald Laming (2003). Marking university examinations: some lessons from psychophysics. Psychology Learning and Teaching, 3(2), 89-96. pdf

abstract This paper looks at four simple psychophysical experiments and spells out the implications that their results have for the marking of examinations on the basis that: (i) the process of marking examination scripts is dominated by the psychology of the assessor, not by the material that is being marked; (ii) the examiner marking a psychology essay is, psychologically speaking, the same assessor as the participant who participates in a psychophysical experiment; and (iii) the psychology of assessment (in general) can be inferred from a psychophysical experiment to an extent that is impossible with examination scripts. The difference is that psychophysical stimuli admit physical measurement, from which accuracy of assessment can be calculated, while examination scripts do not. The paper finishes with some suggestions how the reliability (not necessarily the validity) of examination marking might be improved.

Elliot G. Mishler(1986). Research interviewing. Context and narrative. Cambridge, Massachusetts: Cambridge University Press.

narrative analysis

Don Nix (1985). Notes on the efficacy of questioning. In Arthur C. Graesser and John B. Black (Eds) (1985). The psychology of questions. Hillsdale, New Jersey: Lawrence Erlbaum.

"This chapter focuses on the use of questioning techniques for the purpose of directly teaching inferential reading comprehension and meta-comprehension skills to children in classroom settings. (...) It is assumed that inferential comprehension is a complex process: the child must activily transform what, on the page, is a string of symbols into an inferentially integrated network of meaning. The nature of classroom questioning is viewed in terms of what impact it can have on a child's ability to perform this complex process."

J. P. Rickards (1979). Adjunct questions in text: a critical review of methods and processes. Review of Educational Research, 49, 181-196.

Claire E. Weinstein, Ernest T. Goetz and Patricia A. Alexander (Eds) (1988). Learning and study strategies. Issues in assessment, instruction, and evaluation. London: Academic Press.

a.o.: Diane Lemonnier Schallert, Patricia A. Alexander and Ernest T. Goetz: Implicit instruction of strategies for learning from text. Beau Fly Jones: Text learning strategy instruction: Guidelines from theory and practice. Part IV: Evaluation of research and practice in learning and study strategies, 263-347

Hsin-Kai Wu and Chou-En Hsieh (in press as of June 2006). Developing Sixth Graders' Inquiry Skills to Construct Explanations in Inquiry-based Learning Environments. International Journal of Science Education. pdf

abstract The purpose of this study is to investigate how sixth graders develop inquiry skills to construct explanations in an inquiry-based learning environment. We designed a series of inquiry-based learning activities and identified four inquiry skills that are relevant to students' construction of explanation. These skills include skills to identify causal relationships, to describe the reasoning process, to use data as evidence, and to evaluate explanations. Multiple sources of data (e.g., video recordings of learning activities, interviews, students' artifacts and pre/post tests) were collected from two science classes with 58 sixth graders. The statistical results show that overall the students' inquiry skills were significantly improved after they participated in the series of the learning activities. Yet the level of competency in these skills varied. While students made significant progress in identifying causal relationships, describing the reasoning process, and using data as evidence, they showed slight improvement in evaluating explanations. Additionally, the analyses suggest that phases of inquiry provide different kinds of learning opportunities and interact with students' development of inquiry skills.

Corinne Zimmerman (2005). The Development of Scientific Reasoning Skills: What Psychologists Contribute to an Understanding of Elementary Science Learning. Final Draft of a Report to the National Research Council Committee on Science Learning Kindergarten through Eighth Grade. pdf

abstract The goal of this article is to provide an integrative review of research that has been conducted on the development of children's scientific reasoning. Scientific reasoning (SR), broadly defined, includes the thinking skills involved in inquiry, experimentation, evidence evaluation, inference and argumentation that are done in the service of conceptual change or scientific understanding. Therefore, the focus is on the thinking and reasoning skills that support the formation and modification of concepts and theories about the natural and social world. Major empirical findings are discussed using the SDDS model (Klahr, 2000) as an organizing framework. Recent trends in SR research include a focus on definitional, methodological and conceptual issues regarding what is normative and authentic in the context of the science lab and the science classroom, an increased focus on metacognitive and metastrategic skills, explorations of different types of instructional and practice opportunities that are required for the development, consolidation and subsequent transfer of such skills. Rather than focusing on what children can or cannot do, researchers have been in a phase of research characterized by an �under what conditions� approach, in which the boundary conditions of individuals' performance is explored. Such an approach will be fruitful for the dual purposes of understanding cognitive development and the subsequent application of findings to formal and informal educational settings.

September 8, 2007 \contact ben at at at benwilbrink.nl

http://www.benwilbrink.nl/projecten/06examples6.htm