The Strategist: Optimal stopping

Module Eight of the Strategic Preparation for Assessment model

  Ben Wilbrink

still under revision Feb '06
The module does not function quite correctly yet; I am devising testing procedures to 'prove' what does function correctly, or what doesn't. The problem is that the module makes use of the routines of all previous modules in a complex mix, and therefore is inherently somewhat mysterious. Because this is also a problem in the presentation of the model, special attention will be given to the presentation of the Java code involved, as well as to appropriate example cases to illustrate the module's main points.

Figure 1 strat8.1all.gif

some highlights of this module - I

Figure 8.0.1 Curves of expected investments needed to succeed for the Next-To-Last Test. For Replacement (blue) en accumulation (red) learning model. The second test allows positive compensation (cyan and magenta, respectively). Vertical scaling equals the horizontal one.

Figure 1 strat8.2.gif

some highlights of this module - II

Figure 8.0.2 Curves of expected investments needed to succeed for the Next-To-Last Test: the case of negative compensation points.

Figure 1 strat8.3.gif

Figure 8.0.2.thumb. Thumbnail plot of first and second generation utility functions belonging to the Figure 8.2 case.

For the applet itself click spa_applets.htm#8,

Optimal strategies for the Next-To-Last Test

Being able to locate the optimal strategy in the Last Test situation as treated in the last chapter, it is now possible to find the optimal strategy to use in preparing for the Next-To-Last Test. The expected costs for the NTLT will be taken to include the expected cost to succeed for the Last Test insofar as the last cost changes as a result of the outcome on the NTLT. Assumptions have to be made in order to make this a viable option.

Assume, then, the Last Test to be in all respects, except content, strategically equal to the NTLT. [Technically it is feasible to specify a set of parameter values pertaining to the LT only; such would however clutter the already overloaded interface of the applets.]

here also the LT technique is the fundament

In the general case the utility function on the NTLT will not offer full compensation, it is therefore - if for no other reason - possible to fail the NTLT. Failing the NTLT will incur a constant cost or the expected costs of having to resit the test. Failing the NTLT entails serious consequences of the same kind as in failing the LT.

The situation agrees with the earlier statement in the chapter on utility functions: "Quite generally almost every testing situation in education is a case of threshold utility in combination with a certain range about the cutoff score, neutrally called the reference point in the SPA-model, where higher results on one test may compensate lower ones on another."

retroactive compensation

In the Last Test case the compensation allowed is - for obvious reasons - for credit or debt built up in previous tests only. That is the kind of situation the term 'compensation' is used for appropriately: in a retroactive sense. It is an experience or a result of the past that one is allowed to compensate for.

proactive compensation

Knowing, however, that an immediate result might qualify to be compensated for in a future test, will change the signs. Loosely calling this 'compensation' also, it might be termed proactive compensation, taking its effect only in the future and nevertheless allowing a change of preparation strategy in the immediate present.

It is necessary to distinguish the two kinds of compensation allowed in cases other than that of the LT. The NTLT is the best case to help explain. If the debt of negative compensation points (retroactive) already equals what maximally can be compensated for by the NTLT as well as the LT, then in actual fact the NTLT does not offer this particular student the opportunity to add more negative compensation points (proactive), even though formally - abstractly - it would be allowed. The freedom that is allowed formally, might in fact have been reduced or consumed by results obtained earlier. The possibilities still open after the present test might have been reduced likewise.

earning positive compensation points

In the interplay between the NTLT and LT negative compensation points earned on the NTLT (proactively) turn out to be a concept that somewhat differs from the negative compensation the LT allows for (retroactivily). The last one might act as a powerful motivation for the student to step up her strategic investment in preparing for the NTLT, the first one does not ostensably have that effect.

Earning positive compensation points will shift strategy curves on future tests down and to the right, thus easing the required - the at-least-optimal - strategy in preparing for them. The 'easing' however is a mix of a somewhat higher investment of preparation time and the resulting rather higher reduction in expected time needed to obtain a pass.

The situation here does seem to be rather straightforward; the student is motivated to step up her investment, thereby reducing future time investments with a bigger amount. The catch is that the first investment is an immediate one, and the reward is a future one. There is some opportunity for procrastination here. The more serious catch however is that students might not be able to invest the amount of time that would be optimal to them, because the time budget yet available to them is limited. In fact, some students might habitually find themselves in exactly this kind of predicament. The mechanisms involved, in relation to individual differences in capacities and capabilities between students, have been studied by, among others, Covington (1992) in his 'Making the grade.' For a description in Dutch see my 1980.

There is a multiplier effect involved, in the sense that having built up a certain credit will make it even easier to add more positive points. And yes, regrettably the reverse might become true also; having amassed a certain debt, the student might thereby be forced in a position of having to take yet more negative points. The SPA model will assist in studying this kind of effects different examination designs might have, or in finding solutions by individualizing instruction and assessment in certain ways.

earning negative compensation points

While earning positive compensation points results in a transparent strategic position in preparing for the LT, now having a lower cutoff score to pass, the earning of negative compensation points results in a somewhat complex strategic situation. The reason for this to be so, is that it is possible to pass the LT itself, while not absolving the negative compensation points. In such a case the better option for the student is to resit the NTLT, instead of trying to pass the LT on the compensating cutoff score. This will be immediately clear if one imagines the case where five negative points have to be compensated for, on a short test of only 20 items.

developing the formula of expected costs

In order to find the optimal strategy on the NTLT, the formula for the expected costs will have to be developed, given the initial investment is t and for a resit is ct.

Resitting the NTLT will be necessary if 2) the student fails to obtain at least a below-the-reference score that may be compensated for later, or 2) the student later decides to resit the NTLT because at that moment such is the better strategy. On the NTLT, therefore, the real cutoff score need not be the reference score. If negative compensation points are allowed, the real cuttoff score will be lower than the reference.

The NTL and LT test will be assumed to be strategically equal in all respects. Later versions of the spa model will accommodate certain differences, such as in length, reference, and optimal preparation time.

It will be assumed here that differences in the optimal strategy on the LT, resulting from compensating results on the NTLT - either negatively or positively, belong to the costs (profits). In other words: if stepping up one's investment on the NTLT might result in a more profitable position in preparing for the LT, it belongs to the expected costs of that particular strategy on the NTLT.

The formula to be developed now, contains the expected profit on the LT in abstract, i.e. algebraic, form only. Actual evaluation of these expected profits is possible by determining the optimal strategies on the LT corresponding to different amounts of compensation points earned on the NTLT, the technique and formulas have already been presented in the chapter on the Last Test.

any test

It is possible to quantify the strategy curve - have I defined it yet? - in the NTLT case because the number of combinations of scores on the NTLT and the LT allows it. That situation changes rapidly for tests earlier in the course or examination. The idea now is to approximate the strategy curve for any test earlier in the series by substituing the one that is valid for the NTLT. One thing we can be sure of is that the approximation never will be perfect. The question therefore is, will it be good enough?

I conjecture that in many situations the strategy for any test in a series will be approximately equal to the one obtained by assuming it is the Next To Last Test. In due time I will mae this statement a probable one by producing adequate examples.

Scientific position

allowing compensation is a must

The basic questions about test preparation strategies have been treated in the last chapter. This chapter has to deal with the complications arising by departing from testing under the pure threshold utility regime. In an important way, the topic is known territory, see for example the publications of Frederick Lord in the early sixties of the last century; in using a test battery it is unwise - suboptimal, uneconomical - to use sharp cutoffs on each of the composing tests. The scoring should allow higher results on some tests to compensate for lower ones on others, at least to a certain extent.

robust weights

Granted that some combination rule for test results is needed, the next question is how to weigh different results. The position chosen in the SPA model is, for the time being, to assume equal weights. An exception has been made for the individual test being composed by stratified random sampling of test questions from two subdomains. Indeed the construction of a test by choosing its items from subdomains is also a combination rule problem. Weighing test results differentially is a sensible option only if differences are appreciable, corresponding weights then need not be finely graded; they better be crudely graded for transparency reasons (see below). Representative for the literature on this issue is Dawes' (1977) 'The robust beauty of improper linear models in decision making.'

the ethics of it all

Even allowing lots of compensation points formally, many students will eventually find themselves in a situation of threshold loss for the Last Test in the course or examination. For some students their situation is serious in the sense that they must take an appreciable riks to fail the Last Test. The position taken in regard to the ethics of this situation is the following. Decisions to fail students scoring a small number of points short of the cutoff more often than not cannot be underpinned by valid arguments concerning the content of the test, no matter how valid the test may be known to be. Instead, the line of argument should be that this is the way instruction and assessment necessarily are designed, students should understand so and they will have to bear the risk of failing their tests themselves. To empower them to do so, teachers - the institution - should offer complete transparency about every upcoming test, the kind of questions to be expected, the way they will be scored and the results will be valued. The transparency principle was explicitly formulated by Adriaan de Groot (1970) and elaborated by Job Cohen (1982) (now the mayor of Amsterdam). Dawes (o.c.) also is very perceptive about the ethics of deciding on the basis of numbers instead of 'clinical' information gathered in interviews.

Special points


In the case of the Next-To-Last Test an inportant assumption about the last Test to follow is that in all significant respects, except its content, it is equal to the NTLT. In particular the 'starting position' on the Last Test is assumed to be the same number correct out of the same number of preliminary test items, after having studied one episode. Remark that in this particular instance the one episode for the NTLT is assumed to be physically the same length as the one episode for the LT. The assumption is in no way really restrictive on the model, it is a question of pragmatics in order to get definite results.

In due time the program will offer options for the LT to differ in significant characteristics from the NTLT.


strategic preparation or strategic instruction

The SPA model is constructed in such a way that it gives the impression that it applies only to situations where real tests are used, even though they may be teacher-made. There is, however, no reason why the model could not be used in situations of more informal assessment of students. To see this, remark that the model allows to analyze situations where some tests are split into a number of partial tests. In fact, a series of tests comprising a particular course is just such a case. Now if it is possible to do this once, it can be done again and again, ultimately arriving at the assessments made 'in real time' in instructional situations. This point is of tremendous import, because it allows to bring assessment back to where it belongs: in the instructional process itself. Assessment should be instructionally valid, which is not the same as being equitable - as the layman would say - or reliable - in the psychometrician's jargon -. Placing the emphasis on equity creates a drain of scarce resources now being used on testing or scoring essays instead of giving students proper feedback in instruction, in 'real time.'

Empirical support


The NTLT module effectively completes the strategic model as far as individual students are concerned. The model will allow evaluation of the effectiveness of different rules for the combination of test scores: how much compensation will be allowed, will negative compensation be allowed or positive compensation only, etcetera.

More complex models will be built, incorporating competition between students as well as (implicit) negatiation between (the body of) students and faculty (see my 1992 papers). Another interesting question is how individual student's results on the series of tests comprising a course or an examination can be characterised, depicted, or modelled, and how these individual results can be aggregated to characterize group results.

Project history

Basically, the problem situation is what the student's strategic position is in the face of the way separate tests in an examination will be combined to determine the exam outcome. An early study on the 'reliability' of examinations is that of Valentine (1932). His choice of terminology shows he is using psychological measurement as his framework. His basic attitude is that examinations might not be fair to students. and uses empirical data on entrance examinations, school certificate examinations, and scholarship awards to show them to be unfair. It is implied in this work that students in many situations do not have fair strategies available to influence their exam result. However, Valentine does not attempt to formalize or model the strategic position of students having to sit exams. It is the work of Cronbach and Gleser (1957) that opens up possibilities to model strategic positions examinees, Van Naerssen (1970, html) is one of the very first to try his hand on the problem. Nevertheless, Van Naerssen was not able to get a handle on the combination problem, other than using traditional psychometric analyses. In 1978 (pdf) I tried my hand on the combination problem, and it proved too early to get a solution: the second volume on student strategies never materialized.

This closing part of the SPA model is still under development. In the project's history it is a rather late (2005) addition, and it has not been published yet. In a rather crude form an optimization technique was developed in 1998 and tried on an extensive data set obtained from first year law students. The early operationalization was not quite satisfactory, however, because it makes use of the amount of preparation time that already has been invested. In other words: decisions to invest yet more preparation time come to depend on the amount of time that has been invested earlier; the more that has been invested earlier, the later the optimal stopping moment will be placed. Using this formula a firm might go bankrupt early, but a student's undertaking is not a firm, and conservative strategies will do little or no harm in this kind of educational setting. Yet the quest is after techniques or algorithms that could be used alternatively.
The current solution in a preliminary form was developed - discovered is maybe a better term - in March 2005. In August 2005 the model was revised on this point, and the LT and NTLT cases were clearly separated. Of course the strategy module is crucial to the SPA-model, so I will be somewhat candid about the construction and algorithm for the optimal strategies, until publication.

Already in 1995 a partial solution to the optimization problem was found in the indifference curve technique (see my 1995), making it possible to find the optimal distribution of available time in the preparation for two tests the student has to sit the same day. The method does however not help to find optimal strategies in the case of preparation for one test only. A logical error in the model itself in August 2005 wrongly suggested that negative and positive compensation worked out quite differently as regards to the optimal strategies that are available to the student. Further research triggered by this faulty finding eventually led to the development early in September of the second generation of utility functions, and subsequent correction of the logical error mentioned earlier. The problem in the development of the SPA model is that it is a journey in a completely new territory. Faulty results do not stand out as such, because every result first looks somewhat strange and unexpected. The second generation utility curve, however, brings us back to the world of mainstream decisonmaking models, the special point in the SPA model now being the way the second generation utility functions can be constructed.

Java code

[Strategy class] [Call method getNTLTeC_OptLT to] Get the vector of the optimal strategies as well as that of the optima on the LT corresponding to all possible results on the NTLT. [Stratified sampling has not yet (Mar 2006) been fully implemented here]

[Expectations class] [method getNTLTeC_OptLT]

  1. [Call get_OptLTExtended] The first thing here is to get the optLT array of optima on the LT. This array and the code to produce it have been treated in some detail in the foregoing chapter on the LT. Now for every possible result on the NTLT the corresponding optimal strategy on the LT as well as its its expected cost are available.

  2. [Call getExpUtilityF] Get the expected utility function using real utilities.

  3. else (option 820) construct the array eUU containing expected utility curves for every possible compensation score on the NTL regarded as cutoff score, i.e. the probabilities to pass on that cutoff score

    1. given the particular compensating score

    2. subsitute it in the parametervector pn where both compensation parameters have been set equal to zero

    3. get the expected utility function, using the formal utility curve; these expected utiities are probabilities to pass the test on the cutting score ( = this compensating score)

    4. substitute the expected utility function in the array eUU

  4. For all (fractional) episodes (and some more, to prevent pseudo-optima on the LT) get the value of the optimal strategy given this investment eC[ time ], using the real utility function routine

  5. or using the option 820 straightforward routine, the sum of the probabilities of expected profits on the LT is split up in a series of different steps

    1. get the probability eUs_Last of obtaining the highest compensation
      eUs_Last = eUU[ compNeg + compPos ][ time ]

    2. add the product of this probability with the expected profit on the LT in the sum ew
      ew += eUs_Last * ( optLT[ 0 ][ compPos ] - optLT[ 0 ][ 0 ]
      The expected profit on the LT equals the difference between the expected total investment under the optimal strategy and the original reference on the LT (the first term) and the same under the lower cutoff corresponding to maximum compensation (the second term). The order in optLT: the optimum corresponding tot the highest compensating score on position 0. Remember: the order in eUU: lowest compensating score, i.e. the highest negative one if there is such a score, on position 0.

    3. for the other positive compensation points, for each get its probability and multiply it by the expected profit on the LT, add in ew
    4. get the probability eUs_Last of obtaining the at least the reference score on the NTLT
      eUs_Last = eUU[ compNeg ][ time ]

    5. for the negative compensation points, for each get its probability and multiply it by the expected profit on the LT, add in ew

[Strategy class] [For the record] Thumbnail plot of utility functions, labels, values and other auxiliary items, function values if such an option has been chosen, etcetra. Declarations of variables etcetera.

The following text in this paragraph is an old (2005) version, it has yet to be updated:

the vector optLT containing optima on the LT

The NTLT without compensation opportunity presents the student with a strategic situation equal to that for the LT. It is only when some compensation is allowed that the NTLT strategy becomes special. The first observation about the strategic situation is that different numbers of compensation points result result in different optimal strategies on the LT. These optima are evaluated and collected in the array optLT, containing the optima themselves as well as the initial strategies corresponding to them. The method doing so is get_OptLTExtended, the crucial part of that method is shown in Figure 1. It is deceptively simple, however, because crucial evaluations are hidden in the methode get_eCForThisComp.

Figure 2 therefore presents the way the strategy curve on the LT is determined in the presence of compensation points, especially negative ones.

next-to-last test NTLT

Testing the code

This applet embodies the complete model and is therefore rather complex. Its complexity makes it difficult to test the appropriate functioning of the program. Assume the modules 1 until 6 to be correct

simulation versus evaluation
Plotting simulation results versus the results of analysis is not really a test for this module 7. Assuming the modules 1 - 6 to function well, the routines remaining to be tested are shared among the simulation and analysis options. Nevertheless, if curious things are seen to happen, something is wrong or something is not understood well. It will be evident that the complex manipulation of the methods from modules 1 until 6 makes the results sensitive to any minor problems that still might exist in any of these modules.
gif/picture will be replaced I would like to have a special applet for plotting the projected likelihood or the predicted score distribution after investing x extra periods, to be able to inspect the analytical and simulated predictive distributions. For the projected predictive distribution that applet is available in the applets page applet 6.1 or advanced 6.1a. The picture shows simulation and analytical results using that applet (100.000 observations)

One problem inherent in the method itself should be well understood.

If compensation positive = 0 and negative = 0 then the applet does function as it should, as regards both point 1) and 2) mentioned above.

Advanced applet

For the advanced applet see the applets page; applet 8a offers the advanced feautures for the LT as well as the NTLT strategies.


Martin J. Beckman (1972). Decisions over time. In C. B. McGuire and R. Radner (Eds). Decision and organization. A volume in honor of Jacob Marschak. Amsterdam: North-Holland.
The strategy over a series of tests unfolds itself in time. It is probably possible to develop models for longer series of tests than the series of two treated in the SPA-model. The question is whether solving the complexities involved will bring sufficient results. The chapter by Beckman allows a first estimate of the success more complex models might have.

Buehler, R., D. Griffin and M. Ross (1994). Exploring the 'planning fallacy': Why people underestimate their task completion times. Journal of Personality and Social Psychology, 67, 366-381.

Gretchen B. Chapman: Sooner or later: The psychology of intertemporal choice. In Douglas L. Medin (Ed.) (1998). The psychology of learning and motivation. Advances in research and theory (p. 83-113). Academic Press.

Cohen, M. J. (1981). Studierechten in het wetenschappelijk onderwijs. Zwolle: Tjeenk Willink. [Student's rights in education]

Cosmides, Lea, and John Tooby (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition, 58, 1-73. pdf

Covington, Martin V. (1992). Making the grade: a self-worth perspective on motivation and school reform. Cambridge: Cambridge University Press.

Dawes, Robyn M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571-582.

Dawes, Robyn M. and Bernard Corrigan (1974). Linear models in decision making. Psychological Bulletin, 81, 95-106.

Dixit, Avinash K., and Robert S. Pindyck (1994). Investment under uncertainty. Princeton, Princeton University Press.

Groot, A. D. de (1970). Some badly needed non-statistical concepts in applied psychometrics. Nederlands Tijdschrift voor de Psychologie, 25, 360-376.

Kamel Jedidi and Rajeev Kohli (2003). Probabilistic subset-conjunctive models for heterogeneous consumers. Journal of Marketing Research (forthcoming) pdf or pdf info Jones, H. (1975). An introduction to modern theories of economic growth. Walton-on-Thames: Nelson.

Kahneman, D and A. Tversky (1982). The simulation heuristic. In Kahneman, D., P. Slovic, & A. Tversky (Eds) (1982). Judgment under uncertainty: heuristics and biases. London: Cambridge University Press.

David A. Lagnado and Steven A. Sloman (2004). Inside and outside probability judgment. In Koehler and Harvey: Blackwell handbook of judgment and decision making. (p.157-176) Blackwell Publishing.

Larrick, R. P., R. E. Nisbett and J. N. Morgan (1994). Who uses cost-benefit rules of choice? Organizational Behavior and Human Decision Processes, 56, 331-347.

Linden, W. J. van der, and H. J. Vos (1996). A compensatory approach to optimal selection with mastery scores. Psychometrika, 61, 155-172.

Lord, F. M. (1962). Cutting scores and errors of measurement. Psychometrika, 27, 19-30.

Luenberger, D. G. (1998). Investment science. Oxford: Oxford University Press.

Mellenbergh, G. J., and W. J. van der Linden (1979). The internal and external optimality of decisions based on tests. Applied Psychological Measurement, 3, 257-273.

Naerssen, R. F. van (1970). Over optimaal studeren en tentamens combineren. Openbare les. Amsterdam: Swets & Zeitlinger. html

Pope, R. (1983). The pre-outcome period and the utility of gambling. In B. P. Stigum and F. Wenstop, Foundations of utility and risk theory with applications. Dordrecht: Reidel. 137-177.

Schouwenburg, H. (1993): Uitstelgedrag bij studenten. Proefschrift Rijksuniversiteit Groningen.

C. W. Valentine (1932).The Reliability of Examinations. An Enquiry. London: University of London Press.

Ben Wilbrink (1978). Studiestrategieën. Examenregeling deel A. Amsterdam: COWO (docentenkursusboek 9). 800k pdf

Ben Wilbrink (1980). Uitval en vertraging in het w.o.: een overschat probleem. Onderzoek van Onderwijs, 9 nr 4, 14-18. html

Ben Wilbrink (1992). The first year examination as negotiation; an application of Coleman's social system theory to law education data. In Tj. Plomp, J. M. Pieters and A. Feteris (Eds.), European Conference on Educational Research (pp. 1149-1152). Enschede: University of Twente. Paper: auteur. doc

Ben Wilbrink (1995). A consumer theory of assessment in higher education; modelling student choice in test preparation. 6th European Conference for Research on Learning and Instruction, Nijmegen. Paper: auteur. html

Michael Yee, Ely Dahan, John R. Hauser and James Orlin (2005). Greedoid-based non-compensatory cinsideration-then-choice inference. pdf

more literature

Robert L. Bangert-Drowns, James A. Kulik and Chen-Lin C. Kulik (1992). Effects of frequent classroom testing. Journal of Educational Research, 85, 89-99.

Robert L. Bangert-Drowns, James A. Kulik and Chen-Lin C. Kulik (1983). Effects of coaching programs on achievement test performance. RReview of Educational Research, 53, 571-585.

William E. Becker and Sherwin Rosen (1992). The learning effect of assessment and evaluation in high school. Economics of Education Review, 11, 107-118.

Advanced applet

For the advanced applet see the applets page applet 8a.

Mail your opinion, suggestions, critique, experience on/with the SPA

July 10, 2006 \ contact ben at at at

Valid HTML 4.01!