The Last Test: Optimal Strategy

Module Seven of the Strategic Preparation for Assessment model


Ben Wilbrink




some highlights of this module


Figure 7.1 illustrates the main points of the strategy curves in the special case of the last test LT in the series of tests in a course or an examination. Strategy curves are the functions of investments of time expected to be needed to pass the LT.

The reason to single out the last test is that the strategic position of the student is by now completely transparent. The results on earlier tests are known. In order to succeed for the course or examination, at least a certain cutoff score on the LT should now be passed. That cutoff score need not be the same as the original reference score, because compensation for earlier obtained higher or lower points may result in a factual cutoff that is lower or higher, respectively. In the applet dedicated to the LT the user should determine the factual cutoff and enter that as the reference score.


lasttest1.gif In essence this is the situation modelled by Van Naerssen in his 1970 tentamen model.


To begin with, the strategy curves in this case exhibit definite optimal points in the region of interest: the lowest points of the curves. The curves include all investments needed to succeed, consisting of the immediate extra investment in preparation for the test as well as of the time costs incurred as a consequence of failing the last test. The last costs might either be fixed, or consist of preparation for a new test until a satisfactory result has been scored. In the last case, the series of new tests in theory is infinite; it is assumed the strategy in preparing for retests in all respects equals that for the original Last Test. This assumption allows evaluation of the total expected costs needed to succeed.

In the particular case illustrated the accumulation model has its optimum in the immediate past: the student has already passed her optimal strategy; no harm has been done, however, because the extra investment will partly repay itself in terms of lesser time expected to be needed to succeed.

Earlier investments, i.e. time invested before the preliminary test was taken, are defined to be one episode. That one (constant) episode is also included in the investment curves. It is what economists call a sunk cost, and if you like you can subtract it from the curves simply by subtracting one from the vertical scale origin.

Now what has become of compensation allowed? If positive compensation points have been earned, they are used to lower the original cutoff score (reference score) for the last test. Also, if negative points have yet to be compensated, they might be used to heighten the reference for the last test, depending on specific regulations. Formally the last test might allow compensation just like it is allowed on other tests, materially the results on earlier tests are known and therefore a definite cutoff score on the last test has been established.

The curves look simple enough, the real complexity of the strategic model however is the sheer amount of parameters involved, every one of which may be changed and the analysis repeated on the new set of parameters. The two learning models, all other parameter values being equal, show that optimal strategies are quite sensitive to the learning model that might be applicable. The applet allows to opt for the use of special utility curves as treated in the chapter 'The ruling,' if one should have a good reason or interpretation for such a curve.

Clicking the figure will show the full picture of this case, including the parameter values chosen in the menu. To use the applet itself, go to the applets page, applet 7.






For the applet itself click spa_applets.htm#7,



Optimal strategies


Expected utility will rise with time invested and therefore offers of itself no clue as to when and why the student had better stop investing yet more preparation time. An external criterion is needed to construct a stopping rule that will help to extract the optimal investment, the optimal preparation strategy, from the model.


on the strategic situation in general


Remark that in the model preparation time and utilities have been kept separate. There is an important reason for doing so: time is a resource that can be invested in preparation for the test (for many students procrastination is a problem here, see Schouwenburg, 1993), while utility is a valuation on possible outcomes ( = test scores) of the test.

The utility structure for an examination or curriculum can be constructed objectively using the examination rules. Ultimately, however, passing or failing the examination itself has a certain import to the student. Here the external criterion needed for the optimal stopping rule can be found. Later on it might be possible to develop alternative criteria that are easier to apply, to be benchmarked eventually on the real external criteria. The consequence of failing the examination or curriculum might be that the student has to repeat the whole curriculum or examination, or only a small part of it, depending on the rules established in particular situations. Research on the effectiveness of repeating courses and flunking grades, by the way, has consistently proven its proponents wrong; therefore it would be better to seek alternatives to simple repetition of (parts of) courses. Narrow the definition of the external criterion to the time that it will take the student to comply to whatever the rules stipulate, and let time be measured in the convenient unit of the episode in the model. The episode is a unit of time spent, it is a personal parameter that will vary in real time value depending on personal qualities and contingencies.

The general solution for the optimal stopping rule or optimal strategy is that investing a certain amount of preparation time is profitable as long as it results in a greater reduction in the time expected to be needed to pass the course or examnation. Extra time may be needed for assignments following possible failure for the examination or curriculum as a whole. As soon as extra investment equals the reduction in expected costs, it stops being profitable to invest yet more time. There may be other reasons, of course, to continue preparation for the test, such as intrinsic motivation, cooperating with fellow students, or competition with fellow students. Those considerations however do not belong to the spa model itself.



the case of the Last Test


In the general case compensation of scores will be allowed in some specified range, making it difficult if not impossible to find immediate criteria for optimal preparation strategies. In the special case of the last test, however, it should be possible to specify immediate criteria. The reason is that all opportunity for compensation has been lost by then, resulting in a simple threshold utility function on the last test. The threshold, of course, will depend on previous test results, being lower the more positive compensation points, and higher the more negative compensation points have been collected.

The consequences of failing the last test may be one of two possible kinds:

In the menu there is a choice between the two classes of results: specifying a 'constant cost' (in units of episodes), or specifying the cost of preparing for another test as a proportion of the optimal preparation time for the actual last test.


The first class of consequences translates into an expected cost multiplying it by the expected threshold utility, i.e. by the chance to fail the last test. The unit of cost again is the episode.



The second class of consequences includes the special case modelled by Van Naerssen in his tentamen model: again sitting the same test, i.e. a new test on the same course material, until a pass is obtained. Van Naerssen would not have objected to the new test being on new material also, and the model (the program) will quietly oblige.

The model assumes the last test situation to be repeated indefinitely until a pass will be obtained. The expected time needed to get a pass is the sum 1 / E(u) of a geometric sequence times the investment needed preparing for a second, third etc. time for the LT, plus the investment for the first time, plus a correction factor if preparation time for second etc. tests is a proportion of that for the first test.







the Next-To-Last Test or any test


Having solved for the optimal strategy in the Last Test situation, it is possible to solve for the optimal strategy for the Next-To-Last Test. That solution will be presented in the next chapter. It will serve as an approximation to the optimal strategy on any test, the situation for any strategy being too intricate regarding the number of possible combinations to lend itself to exact solutions in a reasonable amount of computer time.

The introduction of the second generation utility curve - real utilities - might make it possible to evaluate optimal strategies for the NTLT using only the expected utility function. If that should prove to be so, chapter 8 on the NTLT will be scrapped, it will be replaced by the current chapter 9 on the second generation utility curve. The new chapter 9 will use expected utility functions - second generation utility - to find optimal strategies on the NTLT. Because the second generation utility function is based on preparation time, is should be easy to add the corresponding investment to its expected utility, both being quantified in episodes. The resulting strategy functions will rise from zero to a maximum, descend to a minimum, and rise again to infinity: in the limit preparation time for the first opportunity will be all the time needed to succeed. Infinity is heaven, though, it doesn't exist either. Because it is possible to fail the NTLT, the time expected to be needed to pass the NTLT has to be added yet to the said strategy function, giving it the same shape we know from the regular strategy functions for the NT or NTLT cases.


expected cost


The optimal strategy does not directly involve expected utility curves, but expected cost curves. The question is, then, what is the difference between the concepts of 'utility' and 'cost'? Utility is defined by the formal rules for the combination of test scores to derive the result on curriculum or examination; that is, formal or first generation utility is defined this way. Utility is assumed to be scaled linearly, allowing means or totals to be taken. It is evident, however, that realizing 0.1 extra points of utility will cost more or less time depending on where on the learning curve the student's mastery currently lies. Real or second generation utility will be true to these costs, however, and is therefore a negative function of costs.

The Last Test is a case of threshold utility; remember that under threshold utility, expected utility simply is the probability to immediately pass the test.
Measuring cost in the model's unit of time, makes it directly comparable to the extra investment of time the student would want to optimize. That's all there is to it, as far as the individual student's situation is concerned. On an aggregate level, of course, group behavior will be a disturbing factor, as will the reactions of faculty to the way their students behave. On the last point see my 1992 paper html, applying James Coleman's social system theory on a data set that shows the gamesmanship of both parties in this contest.



series of optima corresponding to cutting scores


Figure 1 lasttest1.2.gif


If for whatever reason the cutting score on the last test should be one point higher or lower, then the optimum strategy will be correspondingly lower or higher. In figure 7.2 left the three optima have been plotted, in the form of a line diagram. The actual cutting score generally will depend on the compensation points earned (or lost) by the student. If the range is from minus five to plus five points, then the other black curve in figure 7.2 emerges.
At this stage in the construction of the model, these curves simply plot results of the analysis or simulation in yet another way. In the sequel, however, this curve will prove to be the main building block of a second kind of utility function, representing the 'real' score utilities for the student at this point in the course, instead of the 'formal' utility following objectively from the ruling on the way test scores will be combined. In general the two kinds of utility functions will differ from each other. The question of interest is in the conditions typically making this difference smaller or bigger in a given case, or for groups of students.


threshold 'plus'


At the end of the day to pass or fail the Last Test is the event that counts. That is why the Last Test situation primarily is one of threshold utility (see the chapter The Ruling) on - formal, objective - utility functions. In many cases it will nevertheless matter to the student whether the passing score on the LT is a higher or lower one. In the frequent case that the student has amassed many positive compensation points, passing the LT might not even be the first concern of this student. I have yet to work out the technique and program it into the spa model, but it will not involve any new concepts other than that of the utility function in its second generation form, and the corresponding function of expected utility.




Scientific position


Van Naerssen observed already in 1970 that students will want to minimize their investment of time. In the psychometric literature on decision making, however, the position taken is that maximizing expected utility is sufficient for optimal decision making.

Van Naerssen's approach and the received view in psychometrics have in common the traditional decision-theoretic assumption of rational man. In the received view this 'man' is the teacher or the institution. Van Naerssen's position is somewhat involved; the strategic position of the student is modelled, and the teacher or institution is supposed to rationally use the results from model exercises to improve the rules for the combination of results of individual tests. Developments in decision-making theory in the last quarter of the last century have severely mitigated the rational man assumption. The question then is, does the spa model hinge on the rational man assumption or not, and if not, which group among the many stakeholders in educational assessment could use the spa model as one of the instruments in its decision-making toolbox, and in what way?


decision-making theory


The spa model does not use decision-making theory, yet there are some intriguing items in that theory that might apply to the kind of strategies the spa model tries to accommodate. Surely the planning fallacy is one of them; it is the all too human tendency to underestimate the time needed to finish a task. Can the spa model assist in countering this natural tendency? By the way, the planning fallacy is quite another phenomenon than that of procrastination (see Schouwenaar on that one), but it might explain the same data. Lagnado and Sloman (2004), following Kahneman and Tversky (1982), suggest that the mental simulation heuristic can explain the planning fallacy: if one can imagine the process of finishing the task, then the estimation of the probability of finishing it on time will be based on that mental simulation. The bias in this procedure is that unforeseen circumstances will not have been part of the mental simulation. This 'inside judgment' therefore will underestimate the time needed to accomplish the task, to bring one's mastery up to the desired level. The alternative would be to base the estimation on previous experiences of unexpectedly running out of time. For the untrained decision maker that 'outside judgment' is however all too easily overridden by the 'inside judgment.'

The spa model does not seem to be either 'inside' or 'outside', for different and obvious reasons. It is not a model of the imagination, and it does not model previous experience. It could be used as a 'third way' method of estimating the time needed to bring one's mastery up to a level that is optimal in an investment costs sense.




Special points


Strategic preparation for the Last Test in fact is the special case of preparing for tests having treshold utility. Therefore any test having threshold utility, either in a formal sense, or in the factual sense applying to the Last Test, strategically is equivalent to the Last Test case.




Generalness





Empirical support





Application





Project history


The crucial idea here is that the very last test of the exam or course has a very special position, a position allowing the exact modelling of its tactical or strategic aspects. In itself, this discovery is of a very recent date, somewhere about March 2005, just before my visit to Van Naerssen. I mention this visit here, because the 'Last Test discovery' brought the spa model full swing back to the original position taken by Van Naerssen in 1970 (see html).

The use of the LT as an anchor point in the model in its turn allows the definite modelling of the strategic situation in preparing for the Next-To-Last Test. The NTLT model in itself is a useful proxy for the modelling of any test in the series belonging to the exam or course, except the LT and NTLT itself.


In the project history there have been a series of different attempts to get a handle on the modelling of a series of tests - a complete examination or course. The original idea, of course, was to use expected utility in itself in one way or another; this was the situation in the 1994 Cito-presentation htm. The problem in maximizing expected utility is that direct investments must be made comparable - somehow - to formal utility, a problem not solved satistactorily until the end of 2005. The following year - at the ORD Groningen 1995 - the indifference curves approach was presented (html), allowing strategic choices in the simultaneous preparation for two tests. The method in itself is fine, but too restricted to allow true modelling of a series of tests. Since then, a few attempts to relate expected utility to investments in preparation seemed hopeful, but eventually were discarded because in effect they were attempts to capitalize on costs made in the past - in the economic literature known as 'sunk costs.' If there is one thing that rational man should NOT do, it is to let decisions be influenced by sunk costs. Maybe there are methods in economics - f.e., in prospect theory - useful for modelling the strategic aspects of a series of tests; now that the LT - NTLT modelling route has been found, the urgency for a search in the economic literature has lessened considerably.




Java code


The main methods here are the methods presented in the foregoing applets and chapters. New is the complex interplay in the use of all the available methods. There is a lot of opportunity for logical, conceptual, and other errors here, but the bottom line is that the code only involves the use of earlier methods and the manipulation of their results.


The applet has been published for the first time in a preliminary version on March 16, 2005. Undoubtedly the program will be in error in a few respects. Some results might look in error but will nevertheless be allright. See also the paragraph on the testing of the code, below. Minor problems are the layout of the menu, unwanted refreshing of the plot caused by any resizing or scrolling of the browser window, and the reporting of important statistics. Any suggestions are welcome, as are reports on bugs in the applet.


Figure 7.8.1 formulas


last test LT


For every point (epi) on the preparation time dimension

if ( fixedCost != 0.0 )
     cost[ epi ] = ( 1 - expU[ epi ] ) * fixedCost;
else {
     double t = epi / barsperepisode;
          // t is time invested for the first testing opportunity
     if ( expU[ epi ] > 0 )
         cost[ epi ] = t * ( - c * ( 1 - 1 / expU[ epi ] ));
}

The expected cost function (cost[ epi ]) will have a minimum; if that minimum is a real one, the corresponding investment is the optimum investment in preparation for the LT, given the information available.
Of course, the minimum might be a pseudo-minimum when either





Testing the code


gif/spa_ProjPredTest401280.gif


simulation versus evaluation
Plotting simulation results versus the results of analysis is not really a test for this module 7. Assuming the modules 1 - 6 to function well, the routines remaining to be tested are shared among the simulation and analysis options. For practical reasons (running times getting longer and longer ... ) the simulation will be offered in modules 7 and following as an option only, option = negative number specifying the number of observations . Nevertheless, if curious things are seen to happen, something is wrong or something is not understood well. It will be evident that the complex manipulation of the methods from modules 1 until 6 makes the results sensitive to any minor problems that still might exist in any of these modules.



I would like to have a special applet for plotting the projected likelihood or the predicted score distribution after investing x extra periods, to be able to inspect the analytical and simulated predictive distributions. For the projected predictive distribution that applet is available as applet 6.1 in the applets page, or in its advanced form. The picture shows simulation and analytical results using that applet (100.000 observations)


gif/lasttest9.2.gif

compensation points
To test the routine for the negative compensation point case, run the strategy module, module 8, and use option = 804 to print the values in the vector optLT containing the optimal strategies on the LT corresponding to all possible compensation situations. Figure 2 shows such a result. The left column is gives the episode, the second column the probability of a pass. Clicking the figure shows the full parameterization of this case. The blue thumbnail function is the second generation utility curve that is a simple function of the optLT values.






Literature


Buehler, R., D. Griffin and M. Ross (1994). Exploring the 'planning fallacy': Why people underestimate their task completion times. Journal of Personality and Social Psychology, 67, 366-381.

Cosmides, Lea, and John Tooby (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions the literature on judgment under uncertainty. Cognition, 58, 1-73. pdf

Dixit, Avinash K., and Robert S. Pindyck (1994). Investment under uncertainty. Princeton, Princeton University Press.

Jones, H. (1975). An introduction to modern theories of economic growth. Walton-on-Thames: Nelson.

Kahneman, D., and A. Tversky (1982). The simulation heuristic. In Kahneman, D., P. Slovic, & A. Tversky (Eds) (1982). Judgment under uncertainty: heuristics and biases. London: Cambridge University Press.

David A. Lagnado and Steven A. Sloman (2004). Inside and outside probability judgment. In Koehler and Harvey: Blackwell handbook of judgment and decision making. (p.157-176) Blackwell Publishing.

Larrick, R. P., R. E. Nisbett and J. N. Morgan (1994). Who uses cost-benefit rules of choice? Organizational Behavior and Human Decision Processes, 56, 331-347.

Linden, W. J. van der, and H. J. Vos (1996). A compensatory approach to optimal selection with mastery scores. Psychometrika, 61, 155-172.

Luenberger, D. G. (1998). Investment science. Oxford: Oxford University Press.

Mellenbergh, G. J., and W. J. van der Linden (1979). The internal and external optimality of decisions based on tests. Applied Psychological Measurement, 3, 257-273.

Naerssen, R. F. van (1970). Over optimaal studeren en tentamens combineren. Openbare les. Amsterdam: Swets & Zeitlinger. [html, also lists his other relevant publications and their abstracts]

Pope, R. (1983). The pre-outcome period and the utility of gambling. In B. P. Stigum and F. Wenstop, Foundations of utility and risk theory with applications. Dordrecht: Reidel. 137-177.

Schouwenburg, H. (1993): Uitstelgedrag bij studenten. Proefschrift Rijksuniversiteit Groningen.

Wilbrink, Ben (1992). The first year examination as negotiation; an application of Coleman's social system theory to law education data. In Tj. Plomp, J. M. Pieters and A. Feteris (Eds.), European Conference on Educational Research (pp. 1149-1152). Enschede: University of Twente. Paper: auteur. html

Wilbrink, Ben (1995). A consumer theory of assessment in higher education; modelling student choice in test preparation. 6th European Conference for Research on Learning and Instruction, Nijmegen. Paper: auteur. html




Advanced applet


For the advanced applet see the applets page; applet 8a offers the advanced feautures for the LT as well as the NTLT strategies.


new menu items





Mail your opinion, suggestions, critique, experience on/with the SPA



January 27, 2006 \ contact ben at at at benwilbrink.nl



Valid HTML 4.01!       http://www.benwilbrink.nl/projecten/spa_lasttest.htm