True utility: What the result is worth (the student's calculation)

Module nine of the SPA model: Utility functions (second generation)

Ben Wilbrink

some highlights of this module

Figure 9.1 illustrates the main points of the 'true' - second generation - utility function.

Figure 9.1.1 Typical form of (second generation) utility functions (blue: replacement learning; red: accumulation learning) where some compensation between low and high scores is allowed. In the depicted case the first generation (green) function represents the ruling that compensation of 5 points above as well as below the reference is allowed in this indivual student's case.

Click Figure 9.1.1 or .2 for a view of the menu as well. To use the applet itself go the applets page 9.

How test scores are valued

In the rather complex case of educational assessment researchers have perhaps been too eager to solve the problem of the specification of utility of scores on a particular test. The expert's approach should have been to ask whether this really is the right formulation of the problem. Taking a step backwards, it might be observed that the situation students find themselves in is one of an economy of compensation points. Just as in an economy of money, utility then is utility of compensation points, and it is specified on the range of possible compensation points. Subjective utility in normal cases will be risk aversive in the positive range, and risk seeking in the negative range. Adding more positive points to an already comfortable amount gathered earlier will become less attractive.
Again it is possible to make this notion exact by restricting attention to the Next To Last Test and Last Test combination. The score on the NTLT will alter the budget of compensation points (or leave it as it is), and therefore the optimal investment on the LT. A positive or negative number of compensation points will result in a less time, respectively more time, needed to obtain a pas on the LT. Equating the value of extra compensation points to these differences in investment time will allow the construction of a utility function. Call this type of utility function a second generation utility function, to distinguish between this type and the type of utility function treated in chapter four and used here to enable the generation of the second type of utility function.

Figure 9.1.2 Case identical to that in figure 1, but without the advantage of retesting the NTLT after passing the LT but not the score compensating for the negative compensation points. Utilities left of the reference now are lower.

In the case of negative compensation points there are at least two situations possible that complicate matters. To begin with, it is possible to pass the LT on its original reference score, while not scoring enough extra points to compensate for the budget of negative compensation points. In this case, the student will be allowed to retest the NTLT. The extra investment of time needed for retesting on the NTLT generally will be less than that needed to retest the LT on the high score that will compensate for the negative compensation points. The model will aloow the evaluation of the chances for this to happen. The utilities for negative compensation scores therefore will get somewhat higher.

Figure 9.1.3 Case identical to that in figure 2, but now also without the option to retest the NTLT. The plot shown is the thumbnail utility function plot.

The second complicating situation is that it is possible for the optimal strategy on the LT to obtain a score that will compensate for the negative compensation points to take more time than the sum of the optimal strategy to pass the LT on its reference score, and the optimal strategy to pass the retest of the NTLT on its reference score. Remember: the reference score on the NTLT depends on the compensation points the student has earned already. Using the applet's options 911, 912 and 913 it is possible to turn one the corrections, or both, off. Figure 2 illustrates the (small) effect of the first type of correction by leaving it out; otherwise the case is identical to that of figure 1. Figure 3 illustrates the effect of leaving out the second type of correction also.

Scientific position

characteristics of the approach chosen

Special points

Generalness

ultimate goals, ultimate utilities?
The student's world of grade points and the attempt to map its crucial aspects into objective utility functions do have a flavour of artificiality. Ultimately, however, the student's future position in society will depend on kind and quality of her educational results. To be useful, doing exercises in utility assignments need not be supplemented by research on the relation between education received and labour market positions. Nor is it necessary to translate utilites in terms of human capital, to be able to estimate expected future earnings for the individual or society at large. If you would like to have a look at the relevant literature on these broader themes, consult the international literature list in Wilbrink and Dronkers (1993) Dilemma's.

Empirical support

Application

Project history

The second generation utility function is a late discovery - August 2005 - that uses the model's optimal strategies (chapter 8) to construct utility curves based on future savings in resources. In this way outcome utilities can be equated with future investment of resources. Shifting the cutoff score on the Last Test produces a series of different optimal strategies; it is just this series that allows the construction of the second generation utility function in the case of the Next To Last Test, and therefore also in the case of any test (except the Last Test).
Think of savings in future investment of resources in terms of euro's, and it will be clear that there are utility functions on the range of euro's implied here. The same is the case where the resources consist of preparation time for the Last Test following the immediate test being prepared for now. Of the many possible utility functions on money or time, one function is especially interesting: the neutral or objective function that is linear in time or money units. Subjective functions are departures from the objective one, expressing attitudes such as risc avoidance for positive amounts end risk seeking for negative ones.

September 22. The construction of the second generation utility functions. Construct for the Last Test the function that will show for a range of possible cuttingscores around the reference score the corresponding optimal time needed to succeed (not the immediate preparation time corresponding to that optimum). The second step is to use this function to construct expected savings - or dissavings - in future time corresponding to compensating scores on the Next To Last Test. Plotting these results in the utility function on scores on the Next To Last Test. There exist now two utility functions, the first one being the representation of the factual situation the student finds herself in according to the ruling in force, the second representing what this factual situation means in terms of possible future savings in necessary investment of preparation time. The first kind of utility curve has not become obsolete, on the contrary, in an important way it represents the thinking of the teachers - the institution - on what is proper in assessing students. The newly discovered catch is that the formal rules may be more or less removed from the real world of the students. Plotting the first and second generation utility functions will immediately make clear how much the two relaties depart from each other in any particular case. As of september 22 it is possible to inspect this in the thumbnail utility function plot produced by the strategy applets #8. It will be seen that the flatter accumulation learning produces bigger discrepancies than replacement learning does. Also it is clear that the discrepancies are particularly big the more negative compensation points are allowed. Now let me say something about the reason for the latter kind of discrepancy. It is because learning is not a linear function of time invested, while the usual kind of ruling for the combination of scores or grades assumes utility to be linear in scores or grades obtained. The SPA model then allows investigation of the parameters that widen the gap, or might help to close it. Probably the best possibility to bring primary and secondary generation utility curves closer together is to weigh compensation points according to the corresponding extra investment of preparation time (this has to be implemented yet, but should not prove to be complex). Need I yet explain the importance of aligning primary and secondary generation utility (functions)? The ruling on the way scores or grades will be combined signals to the students the importance of obtaining those scores; this signalling better be in line with what it - empirically - takes to achieve these scores, to prevent miscalculations of students in choosing their strategies of preparation.

The new utility function will probably allow the construction of expected utility curves that immediately represents strategic expectations. All in all, it will take some time to program the lot of this, and report everything in the chapters on the Last Test, on the Next To Last Test, on Utility curves (the Ruling, a name that will not be appriate any longer), on Expected Utility curves, and possibly again in the chapter on the Next To Last Test. It is a big thing. I know of nothing of the sort in the psychometric literature, but I have to check the economic literature on decision making.

Ultimately this new development probably will lead to a reshuffling of the cumulative sequence of modules. The new sequence might well become: 1. The Generator, 2. The Mastery-Envelope, 3. The Predictor, 4. The Learning, 5. The Last Test, 6. The Next To Last Test (Compensation), 7. The Utility of Outcomes, 8. The Expected Utility of Time Investments.

October 23. Programming the second generation utility function involved many changes in many places in the program. It is unavoidable to make mistakes in such an operation. Because it was not possible to test the program after every individual addition, the resulting chaos posed severe difficulties in locating the logical and other errors that were introduced inadvertenly. The work on the SPA model was slowed down for more than three weeks.

27 augustus 2005

Ik heb een doorbraak voor de constructie van de nutsfunctie. Ik wil het idee hierbij vastleggen en dateren, dan kan ik het vervolgens op mijn publieke website uitwerken.

1. Zoals je weet, construeer ik nut over toetsscores op een objectieve manier, gegeven de regeling voor het combineren van toetsscores. Het laatste komt neer op hogere scores die lagere scores elders kunnen compenseren, en omgekeerd.

1a. Voor die objectieve constructie is overigens een meer principiële constructie mogelijk, over compensatiepunten in abstracto zeg maar, even objectief trouwens, waarvan de nutsfunctie voor een specifiek casus een afgeleide is. Ik heb dit op de site al wel aangegeven, maar niet uitgewerkt. Het moet dezelfde functie opleveren, uiteraard, maar dat moet ik nog aantonen.

2. Voor de laatste test in een serie hebben we te maken met een typisch Van Naerssen casus. De optimale strategie is eenduidig te bepalen, gegeven de score die tenminste moet worden behaald.

2a. Als je de te behalen score systematisch varieert, kun je een plot maken van de bijbehorende optima, of beter: van winst of verlies t.o.v. het optimum horend bij de eerste feitelijke cesuur.

2b. Op de voorlaatste toets in de serie kunnen compensatiepunten worden gescoord, dat geeft a.h.w. de range aan in de functie van 2a die in feite van belang is.

2c. In feite is de zo geconstrueerde functie de nutsfunctie voor scores op de voorlaatste toets, voor deze student, met deze proeftoetsscores.

Voila.

Als een en ander is uitgewerkt, zal ik een cd-rom branden waarin het hele model dan beschikbaar is, voorzover gereed.

===========================================

30 augustus 2005

Zatermorgen was het idee pas een dag oud. Met het uitwerken ben ik nog wel even bezig. Waar het nu naar uit gaat zien, is het volgende.

Het gaat om de constructie van de nutsfunctie; de optimale strategie voor de voorlaatste toets zal er niet door veranderen.

De objectieve nutsfunctie als afbeelding (kaart) van de examenregeling, rekening houdend met de strategische positie van de individuele leerling, is in feite een noodgreep omdat er geen andere manier is om op objectieve wijze nut te bepalen. Voor dat laatste heb je nu juist het model nodig.

Bij de uitwerking van de optimale strategie voor de voorlaatste toets blijkt de zo geconstrueerde nutsfunctie geen rol te spelen, tenminste niet als nutsfunctie.

Intuïtief voel je wel aan dat het echte nut afhangt van de leercurve en diverse andere parameters. Dat alles vind je precies in het tentamenmodel terug.

De gegevens die nodig zijn om de optimale strategie voor de voorlaatste toets te bepalen, maken het mogelijk die alternatieve nutsfunctie te construeren, die krijg je er bij wijze van spreken gratis bij.

Die nieuwe nutsfunctie is nut in termen van wat er op toekomstige investeringen - tijd - bespaard wordt, of juist extra nodig is.

De oude nutsfunctie is er niet mee in strijd, of fout, maar is een andere conceptie. Het gaat er nu dus geweldig op lijken dat de constructie van nutsfuncties zoals die nu beschreven is in het betreffende hoofdstuk, in zijn geheel kan vervallen. De informatie die erin weergegeven is - de ruimte voor compensatie - blijft evenwel dezelfde rol spelen bij het bepalen van de strategie voor de voorlaatste toets, en kan daar dan ook worden besproken.

Kortom: de enige echte nutsfunctie die nodig is om optimale strategieën te kunnen bepalen is die van drempelnut voor de laatste toets, ofwel de slaagkans voor die toets als je de verwachting neemt.

Pas na afhandelen van de strategie voor de voorlaatste toets (of iedere andere toets in de reeks, behalve de laatste) kan vervolgens de nutsfunctie worden behandeld, in relatie tot de literatuur, en vervolgens de functie van verwacht nut over het mogelijke voorbereidingstraject. Hoe die laatste er uit gaat zien, daar heb ik nog geen notie van, ik verwacht eigenlijk dat het een functie met een realistisch maximum zal zijn, dat overeenkomst met dat voor de optimale strategie. Als het anders zou zijn, heb ik een modelfout gemaakt.

Het zou handig zijn deze beknopte uiteenzetting te illustreren met wat beelden, maar ik ben bezig dat te programmeren, en daar heb ik net even geen gelukkige hand in. Maar ik kan uit de losse pols wel iets construeren.

casus: laatste test, referentie 15 uit 20 items. In een bepaald geval zijn de optima wanneer de cesuur op resp.
13 14 15 16 17 zou liggen:
.02 .17 .35 .55 .83 (naar verwachting benodigde tijd om te slagen)
-.33 -.18 0.0 .20 .48 (verschillen t.o.v. refrentiepunt 15)
.33 .18 0.0 -.20 -.48 (nut van compenseerbare scores op de VOORLAATSTE toets)

De waarden in de laatste regel zijn te herschalen, zodat de referentiescore nut 1 krijgt, of net zoals je het zelf anders wilt doen.

Simpel, niet? Een kind kan de was doen. Maar er is een heel tentamenmodel voor nodig om dit mogelijke te maken.

===========================================

1 september 2005

Aan de uiteenzetting is nog het volgende toe te voegen.

De constructie van de nutsfunctie wordt dan de combinatie van drempelnut, met daaromheen wat compensatiepunten daaraan afdoen/toedoen. In zekere zin is dat een combinatie van twee nutsfuncties.

Belangrijker is dat deze nieuwe functie gelijksoortig is aan de oorspronkelijke nutsfunctie, onder de interpretatie dat de laatste afbeeldt wat compensatiepunten ten naaste bij 'waard' zijn in termen van te besteden tijd. Formele regelingen zijn meestal lineair, omdat we vanouds gewend zijn cijfers te middelen (fouten op te tellen).

Het moet dan mogelijk zijn om de beide nutsfuncties direct met elkaar te vergelijken, en vooral moet het mogelijk zijn door wijziging van de formele regeling de beide nutsfuncties (ze veranderen er allebei door) dichter bij elkaar te brengen.

Dit levert interessante mogelijkheden op voor beleid en voor onderzoek. Immers, het lijkt op voorhand een goede zaak te zijn de formele regeling niet al te zeer te laten afwijken van wat voor de meeste studenten in feite de winst- en verliesmogelijkheden zijn. Een kwestie van doorzichtigheid zou je zeggen.

Maar eenvoudig is dat niet. De kwalificatie 'voor de meeste studenten' is nodig, omdat de niuewe nutsfunctie niet alleen afhangt van de compensatiepunten die deze student al in portefeuille heeft, maar ook van de gekozen strategie, d.w.z. de directe investering die de student bereid is in de verdere voorbereiding op deze toets te doen.

Te onderzoeken is dan wat er gebeurt wanneer het lukt de formele compensatieregeling gelijkwaardig te maken met de situatie voor de student die haar optimale strategie kan volgen. Is daar een eenduidige oplossing voor? Dat valt zonder modelberekeningen niet stellig te zeggen, maar het zal haast wel. Complicatie is dat bijvoorbeeld ook toetseigenschappen ertoe kunnen doen.

Kortom, de constructie van de nutsfunctie gaat als het ware in tranches:

1. zonder leermodel en zonder stochastisch model:
1a. als vertaling van de formele compensatieregeling (formeel)
1b. rekening houdend met de compensatiepunten die deze student al bezit (individueel)

2. na de inzet van het SPA-model is een nieuwe nutsfunctie te construeren
2a. als vertaling van de situatie voor de student met optimale strategie (ermee gemoeide tijd)
2b. idem, maar dan bij een formele regeling die zo is gekozen dat de erbij behorende functies 1b en 2a dicht bij elkaar liggen. (doorzichtigheid)

De nieuwe nutsfunctie (2a of 2b) is te kaarten naar een heldere functie van verwacht nut; de laatste heeft overigens op zich geen optimum, daar is voor nodig dat de direct te investeren tijd erbij wordt geteld. Immers, nut is nu gesteld in termen van tijd, verwacht nut daarom eveneens, en daar kun je dus zinvol tijd bij optellen. Als ik me niet vergis komt er een soort curve uit die lijkt op wat ik in 1994 in de Cito-presentatie had geconstrueerd, maar daar houdt de gelijkenis wel ongeveer op.

Het lijkt allemaal wat ingewikkeld, maar de grondlijn is de eenvoud zelve. Om een en ander te programmeren, dat is weer een hele uitdaging, daar moet ik nu dus aan beginnen, en dat zal wel enige tijd gaan vergen. Wanneer dat een beetje is afgerond, laat ik het weten (vorderingen zijn op de website wel enigszins te volgen).

Hoe zit het nu met Van Naerssen's tentamenmodel? Zijn 1970-voorstel blijkt de kern van dit meer uitgewerkte model te zijn, het verhaal over nut is in zekere zin een extraatje dat eruit blijkt te volgen. Chapeau voor Bob.

Java code

Testing the applet

Literature

For the literature on utility see chapter four.

Wilbrink, Ben, en Jaap Dronkers (1993). Dilemma's bij de groei van de deelname aan hoger onderwijs. Zoetermeer: reeks Achtergrondstudies van het Ministerie van Onderwijs en Wetenschappen. ('s-Gravenhage: DOP) [258k html]

John H. Bishop (2004). Drinking from the Fountain of Knowledge: Student Incentive to Study and Learn- Externalities, Information Problems and Peer Pressure. Cornell, Center for Advanced Human Resource Studies Working paper 04-15 pdf

from the abstract This paper reviews an emerging economic literature on the effects of and determinants of student effort and cooperativeness and how putting student motivation and behavior at center of oneÕs theoretical framework changes oneÕs view of how schools operate and how they might be made more effective. (...) Student effort, engagement and discipline vary a lot within schools, across schools and across nations and have significant effects on learning. Higher extrinsic rewards for learning are associated the taking of more rigorous courses, teachers setting higher standards and more time devoted to homework. Taking more rigorous courses and studying harder increase student achievement. Post World War II trends in study effort and course rigor are positively correlated with achievement trends.
Even though, greater rigor improves learning, parents and students prefer easy teachers. They pressure tough teachers to lower standards and sign up for courses taught by easy graders. Curriculum-based external exit examinations improve the signaling of academic achievement to colleges and the labor market and this increases extrinsic rewards for learning. Cross section studies suggest that CBEEES result in greater focus on academics, more tutoring of lagging students, more homework and higher levels of achievement. Minimum competency examinations do not have significant effects on learning or dropout rates but they do appear to have positive effects on the reputation of high school graduates. As a result, students from MCE states earn significantly more than students from non-MCE states and the effect lasts at least eight years.

Advanced applet

For the advanced applet see the applets page applet 8a.

Mail your opinion, suggestions, critique, experience on/with the SPA

October 26, 2005 \ contact ben at at at benwilbrink.nl

http://www.benwilbrink.nl/projecten/spa_utility.htm