Individual decisions and achievement testing

Ben Wilbrink

text under construction

A first version of the article will be written in Dutch, because it needs to be written very fast and as concise as possible. Nevertheless, I will try my hand at an English text, independently of the Dutch one.

Taking Thorndike's (1904) book as the beginning of 'educational measurement,' there is a century of research on achievement testing lying behind us. A very tiny fraction of the tremendous volume of publications is about student strategies. Modeling student strategic preparation for tests is lacking almost completely, the work by Van Naerssen and my own research seem to be the only exceptions. On the level of groups of students, there is the highly original model developed by James Coleman; I have applied this model on a dataset (Wilbrink, 1992a, b). This article will however be restricted to the individual student model. Strategic preparation implies that the student is taken to be a decisionmaker trying to optimize her benefits against the expenditure of scarce resources, especially time. The term 'resources' is taken from Coleman's theoretical framework.

Key concepts/points

The article will articulate the following positions that are crucial to the possibility of a general model of strategic preparation for achievement tests.

1) The individual test is taken to be part of a series of tests together defining an examination or a course. The model will explicitly consider the way results on individual tests will be combined into the overall result on the examination. Because the combination rules typically have been formalised in institutional regulations, it will be possible to objectively translate them into functions on the dimension of test outcomes, be they points, grades, or whatever. If one wants to, these functions might be regarded to be utility functions. They are objective utility functions, just as financial outcomes of transactions might be regarded as objective utilities. Of course, objective utilities might be made subjective, incorporating the risk aversiveness etc. of the student; the article will not venture into this difficult terrain (see for example Schlaifer, 1959, for an introduction).

2) The formal regulations generally will allow some compensation between the results on different tests in the course. It follows immediately that the strategic prospects of different students will be the same only on the very first test. On any other test different students might have different histories in terms of scores obtained on earlier tests, and therefore they face strategically different situations. Even though the formal regulation might treat every test in the same way, in actual fact there will be large differences between students in the strategic situation they face in preparation for test X.

3). The complex relations between the tests comprising the course or examination make it difficult or impossible to find a closed solution for the strategic options confronting the student at any particular test. That is to say: except for the last test. Having sat all tests except the last one, the strategic situation now allows a definite optimizing strategy, in fact this strategy is to be obtained using a Van Naerssen type of model.

4). Because it is possible to evaluate optimal strategies for the last test, it should be possible to model optimal strategies for the next-to-last test also, enumerating all possible outcomes on the next-to-last test itself.

5) The trick can however not to be repeated for all other tests in the series, because the sheer number of possible combinations of outcomes on all the tests considered will prevent to do so, even using a very fast computer. A satisficing solution will be to treat the remaining tests as if they were the next-to-last test themselves.

6) In order to obtain concrete solutions for optimal strategies is will be necessary to assume a particular learning model to be applicable. A number of different models might be used to obtain solutions, in order to determine how sensitive the optima are to the learning model assumptions. Because actual decisions more often than not will be of the 'go' or 'stop' character, the choice of learning model is not a very critical feature of the model.

7) An unexpected result of the model will be that it is now possible to replace the intial formal utility function on test results with a secondary realistic utility function that is determined by the expectation of the amount of time that is necessary to absolve the course or examination. This is an important result, because it allows a confrontation between the model predictions and the way utility functions have been used in the literature on criterion-referenced measurement.

8) The description given thus far is rather abstract. It is possible to actually construct the model, in fact two different constructions have been effected: an analytical one, and one by simulation. Both constructions have been used as check on the correctness of the other one. The model-by-simulation makes it possible to explain the model without using statistics and mathematics, making it possible for students as well as their teachers to understand the inner workings of the model. The concrete form taken in this constructive effort is that of Java-applets, allowing the model to be used everywhere on any platform.

9) To make the model happen, a lot of assumptions are necessary, and a lot of technical problems need to be solved. Important assumptions are a) that every tests looks to the student as if it is randomly sampled from a domain of possible test questions on this course content, and b) that the result on a preliminary test sampled from the same domain is available as information to use to derive a predictive distribution for the test score that will be obtained.

10) Expected utility functions have not been mentioned yet. They might be constructed using the secondary utility functions mentioned above, a choice learning model, and the prediction model. Because these expected utilities in fact are expected amounts of time necessary to succeed for the course or examination, the time to be invested yet in preparation for the test to come may be subtracted. The resulting function will show an optimal point; if it is a real optimum (it's tangent being horizontal) the optimum investment is indicated by it, otherwise the student already has passed the optimum and may stop her preparations now. The optimum is the same as that resulting from the model without using the concept of the secondary utility function.

11) The model as constructed, is in fact a chain of consecutive modules, the very last one is the expected utility module, the first one is the binomial model. The model construction itself was started by implementing the betabinomial model. Because of the cumulative character of the model, later modules using the techniques of earlier ones also, it proved to be a difficult journey to develop the modules later on in the series, and to get the necessary 'closure' of the model, i.e. optimal strategies. For a long time it was not even sure whether such closure in the general case was possible at all. Already in 1995 the technique of indifference curves was used to find optimal solutions for the special class of situations where a given time budget had to be allocated optimally to the preparation for two tests at the same time. That solution was not a general one, however, and there was no hope to make it a general one. Only after solving a number of puzzles in the combination of test results, it proved possible to see that a closed solution for the last test situation would allow a definite model for the next-to-last test situation also.

The student's best interest in achievement testing

Students supposedly prepare themselves for sitting achievement tests. This preparation can take all kinds of qualities: adequate to the task, better or worse than that, misdirected, taking calculated risks, doing the best one can given scarce resources in abilities or available time or both, even cheating belongs to the strategies available. It should be evident to all stakeholders involved that the results on achievement tests, and therefore educational results at large, in large measure depend on the time and energy students spend in test preparation. This preparation, of course, extends overt the whole of the course, it is not primarily a thing of the last day or last minute. Students have lots of opportunities to choose strategies that may differ in being adequate to the cause achieving good results.

What theories are available that adequately describe or prescribe these student strategies? Theories of motivation surely describe important aspects of these student strategies, as might research on study method or learning style. But these are not the kind of theory adequate to the task of describing strategies of test preparation. Theories of procrastination do come close, however, because they might describe the decision-making of students on the way they spend their time. They are are not wholly adequate because they try to model procrastionating behavior independently of the characteristics of the test or examination.

What about psychometrics or educational measurement? They are about the characteristics of the tests, indeed. The problem with psychometrics and its edcational measurement branch is that they typically do not consider student strategies at all. Secretly it is even assumed students do not prepare themselves for the test, just as should be the case if they were to sit psychological tests proper, such as tests of intellectual abilities or personality. In terms of the difference made by Cronbach and Gleser (1957) between institutional and individual decision making, psychometrics is a purely institutional model. Most of the decision-making models within psychometrics are institutional models, assuming student strategies to be non-existent or at least independent of whatever changes are made in pass-fail decison procedures or in the tests themselves. The line of research followed by Wim van der Linden and his colleagues is quite typical for the situation.

What is missing in the literature is a theory linking strategic decision making in test preparation to characteristics of the test or examination and the decisons contingent on its possible outcomes. A prototype of such a theory was presented in the seventies by Robert van Naerssen, it was limited to the situation of pass-fail test scoring. It so happens that the prototype nevertheles will be a cornerstone of a more general theory covering the situation of conecutive tests covering a course or examination, because it describes the special situation of the last test in the series. The reason is that the results on the foregoing tests are in already, leaving no degrees of freedom on the last test, even in situations where the total score is the mean of the individual test scores obtained.

Literature

James S. Coleman (1990). Foundations of social theory. Cambridge, Massachusetts: The Belknap Press of Harvard University Press.

Robert Schlaifer (1959). Probability and statistics for business decisions. New York: McGraw-Hill.

Edward L. Thorndike (1904). Theory of mental and social measurements. New York: The Science Press.

R. F. van Naerssen (1970). Over optimaal studeren en tentamens combineren. Rede. [in Dutch]

R. F. van Naerssen (1974). A mathematical model for the optimal use of criterion referenced tests. Nederlands Tijdschrift voor de Psychologie, 29, 431-446. pdf

Ben Wilbrink (1992). Modelling the connection between individual behaviour and macro-level outputs. Understanding grade retention, drop-out and study-delays as system rigidities. html Also: The first year examination as negotiation; an application of Coleman's social system theory to law education data. htm In Tj. Plomp, J. M. Pieters & A. Feteris (Eds.), European Conference on Educational Research (pp. 701-704; 1149-1152). Enschede: University of Twente. Papers: author. htm

Tentamenmodel: upgrade

Ben Wilbrink

28 september 2006

In 1957 (of pas in hun 2e editie) wezen Cronbach en Gleser er nadrukkelijk op dat er naast institutionele modellen voor ook individuele modellen zijn voor beslissingen van allerlei soort. Er is sinds die tijd vrijwel geen serieuze poging gedaan om voor toetsen in het onderwijs dergelijke modellen op te stellen. De uitzondering is Van Naerssen's (1970, 1978) werk over het tentamenmodel, en het daarop geïnspireerde eigen werk dat het onderwerp van dit artikel is. In meer algemene zin heeft Hofstee (1978) aandacht gevraagd voor de vele manieren waarop gewone mensen strategisch omgaan met de vragen die professionals hen in allerlei situaties kunnen stellen.

Alles wat er aan toetsen, examens en selectie in het onderwijs gebeurt, berust dus nog steeds op onuitgesproken en op zijn minst naieve noties over de manier waarop leerlingen en studenten hun strategisch gedrag (Cronbach en Gleser, Hofstee) richten op die toetsen. Misschien nog ernstiger is de onuitgesproken vooronderstelling dat studenten in het geheel niet reageren op beleidswijzigingen, bijvoorbeeld bij criterium-gerefereerd toetsen. In de school van Wim van der Linden komt de student als strateeg in het geheel niet voor: alle strategie is voorbehouden aan de institutie of zijn vertegenwoordigers. Dat is natuurlijk een onhoudbare positie. Een uitgewerkt tentamenmodel laat zien hoe ernstig dat gemis is. Overigens geldt in algemene zin voor vrijwel alle psychometrische activiteiten dat de geteste burger daarin een verwaarloosbare factor - een quantité négligable is. Voor toepassingen in de medische sector is dat een terechte methodische eis om zuiver te kunnen meten, maar in het onderwijs gaat dat juist in tegen de kern van de zaak: dat er voor die examens wordt gewerkt. Strategisch voorbereiden van toetsen en examens: dat heet studeren.

- examen: een verzameling van examenonderdelen.
- examenregling: hoe resultaten combineren tot de einduitslag. De combinatieregels zijn altijd compensatorisch, in extreme gevallen is de compensatie nul. De laatste af te leggen toets is altijd zo'n extreem geval: de uitslag daarvan is beslissend. Er zijn hier dus twee belangrijke inzichten: 1) de formele regels voor de combinatie van uitslagen zijn compensatoir, een gradueel kenmerk; 2) het laatste af te leggen onderdeel is per definitie een situatie absolveert, of niet. Let wel: formeel is het ook op de laatste toets toegestaan te compenseren in de mate zoals in de regeling vastgelegd, maar de facto is de vrije ruimte voor compensatie volledig weggenomen. Dat valt wel iets te nuanceren: er zijn uitslagen mogelijk waardoor alsnog een ander onderdeel het feitelijke laatste examenonderdeel wordt. Een derde belangrijke inzicht is dan: de individuele strategische situatie is een andere dan de formele regeling. Wat voor het laatste onderdeel in extreme mate geldt, geldt ook voor alle overige onderdelen, behalve het eerste. Voor het eerste af te leggen examenonderdelen zijn de strategische posities van alle deelnemers gelijk, daarna niet meer. Formele regelingen geven makkelijk de indruk dat toegestane compensaties dus voor iedereen altijd aan de orde zijn, maar de facto is dat niet het geval. In werkelijkheid verschillen strategische posities aanzienlijk, wat het vrijwel altijd onmogelijk maakt - door uit de hand lopende aantallen permutaties van alle nog open mogelijkheden - om een exact tentamenmodel op te stellen, tenzij voor het laatste en het voorlaatste examenonderdeel. Een verrassende uitkomst van jarenlang werken aan de ontwikkeling van het model is nu dat het modelleren van dat laatste onderdeel de sleutel levert, en dat juist die situatie zich leent voor een tentamenmodel zoals Van Naerssen dat in 1970 voor een nogal extreme situatie presenteerde: toetsen waar de student voor moet slagen, desnoods na zoveel herkansen als daarvoor maar nodig mogen zijn. Dat onbeperkt herkansen is geen realistisch model, maar is eenvoudig te vervangen door een inschatting van de tijd die gemoeid is met wat nodig is wanneer op die laatste toets de nodige punten niet worden gehaald.

=========================================== 26-9-2006

In de voorbereiding op toetsen neemt de individuele student voortdurend tactische beslissingen die mogelijk het best zijn te omschrijven als gericht op tegen zo spaarzaam mogelijke besteding van tijd een zo goed mogelijk resultaat boeken. Het is net het leven zelf. Cronbach en Gleser hebben voor het modelleren van een en ander de eerste steen gelegd. Daar is vervolgens heel weinig mee gedaan, totdat Van Naerssen, mede op basis van zijn besliskundige studie over de selectie van chauffeurs, een uitwerking voor een specifieke situatie - pass-fail scoring and unlimited opportinity to repeat failed tests - gaf met de naam 'tentamenmodel.' Ondanks een serie vervolgpublicaties, is het hem niet gelukt het model definitief leven in te blazen. Een mogelijke verklaring daarvoor ligt in de aard van het te ontwikkelen model zelf: niet alleen is dat voor enigszins realistische situaties meteen behoorlijk ingewikkeld, bovendien is het zo dat ieder van de samenstellende delen van het model een geslaagde uitwerking moet hebben voordat het model een eerste proefvlucht kan maken.

Ga maar na. Meestal staan examenregelingen compensatie tussen onderdelen toe, zodat het model weliswaar op de afzonderlijke toets gericht moet zijn, maar toch de examensituatie in het oog moet houden.

De student minimaliseert dan de tijd nodig om voor het examen te slagen, althans, laten we eens aannemen dat zoiets de onderliggende strategie van de student is. Waar bestaat die tijd zoal uit? Dat is nog knap ingewikkeld, omdat tijdens de rit voortdurend verwachtingen aan de orde zijn over hoeveel tijd er nog nodig zal zijn om het resterende deel binnen te halen.

Die verwachtingen zijn in feite voorspellingen. Wat is er bijvoorbeeld nodig om voor de eerstvolgende toets een adequate voorspelling van de te behalen score te kunnen maken? Daarvoor is op de een of andere manier een inschatting van de eigen beheersing van de examenstof nodig. Op basis van welke informatie kan zo'n inschatting worden gemaakt? Wat is daarvoor de adequate statistische techniek?

Voor een voorspelling is het nodig iets te weten over de vragen die gesteld kunnen worden. Is het redelijk om te veronderstellen dat die vragen willekeurig uit een grote verzameling van mogelijke toetsvragen komen? Ook al maakt de docent die toets op een heel andere manier? Beantwoord dit bevestigend, dan is het binomiaalmodel te gebruiken als breekijzer op die voorspelling. Want dan is de stofbeheersing gedefinieerd als het percentage van die denkbare vragenverzameling dat goed zou worden beantwoord, als voorgelegd.

Om een indruk te krijgen van de eigen beheersing, kan een proeftoets worden gedaan, ook random getrokken uit diezelfde verzameling. Of vertaal andere informatie in termen van een denkbare proeftoets, nu we toch virtueel bezig zijn. Zo'n proeftoetsresultaat is een empirische toets op de eigen beheersing, er valt voor die beheersin een aannemelijkheidsverdeling te construeren. Aha, als die aannemelijkheidsverdeling kend is, dan kan op basis daarvan, en het binomiaalmodel dat we aannemen, een voorspelling worden geconstrueerd. De vorm daarvan kan een betabinomiaalverdeling zijn, niet onbekend in de psychometrie, maar dat hoeft niet, en we streven naar algemeenheid. Leuk, die betabinomiaal, maar dat specifieke model is niet echt nodig.

[Zo gaat ie lekker, dan nut nog, leren, verwacht nut, laatste toets, voorlaatste toets, tweede generatie nut, eerder besliskundig onderzoek, implementatie, implicaties, afronding, literatuur. Moet ik in een uurtje ook allemaal zo kunnen utischrijven. Dan heb ik nog geen afgerond ineiding, daar is iets meer voor nodig, en later kan ik dan weer details over de afzonderlijke stappen indikken.]

[Waarom zou je zo'n individueel model willen hebben, we hebben toch goede institutionele modellen? Dat is nu juist de crux: zonder de strategische besognes van de student te kennen, hangen die institutionele modellen volkomen in de lucht. Wat optimaal is voor een instellingen, bijvoorbeeld hoe de examenregling in te richten, is per definitie een afgeleide van hoe studenten strategisch omgaan met de zo ingerichte regeling.]

[Het eerste onderwerp moet zijn dat er een serie toetsen is waaruit het examen of de opleiding of de cursus bestaat. Dat type probleem is eerder behandeld (Dahllöf; Van der Linden en Vos). Het punt is dat de student in die reeks altijd op een bepaald punt staat waar de voorbereiding voor de eerstvolgende toets aan de orde is. De reeksgegevens zijn dus samengebald in die specifieke situatie.]

[Het tweede onderwerp kan dan zijn dat in de algemene situatie van compensatoire combinaties, de laatste toets de facto een pass-fail situatie op kan leveren. Niet altijd, maar dat is meer een luxe situatie waarin ieder resultaat geod is, maar sommige beter dan andere. Dat is niet bijzonder lastig uit te weken. Het interessante is nu dat de LT situatie de door Van Naerssen gemodelleerde situatie is.

Het idee is nu om in feite de behandeling van het model te beginnen bij die LT, dus niet bij moduul 1, 2, etcetera. Een soort kortsluiting dus, die verhevigd kan worden door eenvoudig een betabinomiaalmodel aan te nemen.

Ik stap hiermee dus af van een presentatie van het model die de feitelijke opbouw van het model volgt. Om een en ander retorisch/didactisch goed over het voetlicht te krijgen, is het ook oneindig veel handiger het geduld van de lezer niet op de prof te stellen, en met 'los' te gaan. Wow. Zo gaat ie goed. Ik hoef er dan alleen nog een eenvoudige klasse van leermodellen bij te slepen, en klaar is Kees. Het is mogelijk dat een LT-uitslag niet de laatste is, omdat alsnog een andere toets moet worden overgedaan. No problem, dan is die andere de LT geworden.]

[Het LT-probleem laat overduidelijk zien dat de feitelijke situatie voor de student een werkelijk totaal andere is dan de examenregeling suggereert dat ze is: de regeling kan vrolijk zijn dat er op de LT volledige compensatie geldt, terwil de de facto strategische situatie die van drempelnut is. Dat geldt in zijn algemeenheid ook voor de andere toetsen in het examen: de de facto situatie voor de student is een door persoonlijke omstandigheden (vooral de toetshistorie) bepaalde, dus een ernstige variant op de formele situatie zoals in de exa,menegeling beschreven. De iplicatie is, maar eigenlijk hebbe we dat altijd al geweten, toch?, dat de groep studenten die toets C af gaat leggen, niet in een strategische situatie verkeert die als homogeen is op te vatten. ]

November 9, 2006 \ contact ben apenstaartje benwilbrink.nl

http://www.benwilbrink.nl/projecten/spa_article.htm