Ben Wilbrink

still under revision Feb '06

The module does not function quite correctly yet; I am devising testing procedures to 'prove' what does function correctly, or what doesn't. The problem is that the module makes use of the routines of all previous modules in a complex mix, and therefore is inherently somewhat mysterious. Because this is also a problem in the presentation of the model, special attention will be given to the presentation of the Java code involved, as well as to appropriate example cases to illustrate the module's main points.

Figure 8.1 pictures blue and red strategy curves that look remarkably similar to the ones in the last chapter. Even if compensation is allowed, basically the student will have to pass the Next-To-Last Test (NTLT), if need be by scoring negative compensation points; in pretty much the same way the Last Test (LT) must be passed. The blue and red curves depict the case where no compensation is allowed, reducing the NTLT to an LT as far as test preparation strategy is concerned.

The compensation case is somewhat different whether positive or negative compensation (Figure 8.2) is allowed. Figure 8.1 pictures the strategic possibilities of the full positive compensation case: the optimal strategies demand larger initial investments, but they reward the larger initial investment with much lower total expected investment. Total investment is related to the NTLT as well as the LT; because compensation for higher points earned on the NTLT lowers the factual reference (cutoff) on the LT, (much) lower investments on the LT will suffice to succeed. The investment 'profit' is subtracted in the strategy curves as depicted in Figure 8.1. Remember that the strategy curve includes initial investment.

Allowing full positive compensation reduces the expected cost 1.3027 - 1.0356 = 0.2671 episodes, and that

*includes*the extra initial investment of approximately 1.2 - 1.1 = 0.1 episodes. The extra investment pays back a 250 percent! The resulting higher mastery is the intrinsic reward of allowing compensation, but the lower mastery now needed to pass the LT should be discounted here.The model assumption here is that the student will follow her optimal strategy on the Last Test. Another possibility is to assume the student to follow an LT strategy that is equivalent to the one followed for the NTLT; follow a suboptimal strategy on the NTLT, then do the same on the LT. The advanced applet will offer a choice between them (has yet to be implemented as of 8-17-2005).

Clicking the figure will show the full picture of this case, including the parameter values chosen in the menu. Or you may use the applet itself.

Figure 8.2 pictures the negative compensation case: higher scores on the LT may compensate for below-reference scores on the NTLT.

On the NTLT the factual negative compensation allowed is determined by the net result of negative and positive points earned already, and the formal compensation allowed on NTLT and LT (assumed to be equal in this respect). Factually being allowed a number of negative compensation points on the NTLT means that obtaining higher scores on the LT must compensate for negative points obtained on the NTLT.

The problem is - and this is different from the positive compensation case - that allowing negative compensation on the NTLT at first sight appears to be somewhat disadvantageous to the student. It follows from the nature of the learning involved that, in the mastery range of interest, the steepness of the learning curve will be diminishing. More often than not, the compensating points will have to be earned in a higher region of mastery than where the negative compensation points have been lost. The extra time needed on the LT will, however, more than be made good by the time saved by not having to resit the NTLT, at least not immediately.

The cyan staircase funtion is the objective utility function in the replacement learning case, corresponding to five negative compensation points allowed, and no positive compensation points allowed. The magenta one, belonging to the accumulation model of learning, is degenerate, but nevertheless still has one meaningful step. The cyan and magenta functions are the second generation utility functions belonging to the replacement model (cyan) and the accumulation model (magenta). The second generation staicases in this case would show negative utilities, if they hadn't been clipped to be just zero. This clipping is justified because here the student has the choice between two strategies, the worst one being to go for the higher cutoff score on the LT, the better one going for the reference cutoff on the LT while at the same time accepting failure on the NTLT, and therefore accepting its consequences, that is: preparing for a resit on - hopefully - new course material.

The spa_model may be used to evaluate all kinds of more or less weird rulings, but even the advanced applet will not offer all the necessary parameters to do so; in these cases just contact me, so I can produce a special edition of the model.Clicking the figure will show the full picture of this case, including the parameter values chosen in the menu. To use the applet itself, go here.

For the applet itself click spa_applets.htm#8,

Being able to locate the optimal strategy in the Last Test situation as treated in the last chapter, it is now possible to find the optimal strategy to use in preparing for the Next-To-Last Test. The expected costs for the NTLT will be taken to include the expected cost to succeed for the Last Test insofar as the last cost *changes* as a result of the outcome on the NTLT. Assumptions have to be made in order to make this a viable option.

Assume, then, the Last Test to be in all respects, except content, strategically equal to the NTLT. [Technically it is feasible to specify a set of parameter values pertaining to the LT only; such would however clutter the already overloaded interface of the applets.]

In the general case the utility function on the NTLT will not offer full compensation, it is therefore - if for no other reason - possible to fail the NTLT. Failing the NTLT will incur a constant cost or the expected costs of having to resit the test. Failing the NTLT entails serious consequences of the same kind as in failing the LT.

The situation agrees with the earlier statement in the chapter on utility functions: "Quite generally almost every testing situation in education is a case of threshold utility in combination with a certain range about the cutoff score, neutrally called the *reference point* in the SPA-model, where higher results on one test may compensate lower ones on another."

In the Last Test case the compensation allowed is - for obvious reasons - for credit or debt built up in previous tests only. That is the kind of situation the term 'compensation' is used for appropriately: in a retroactive sense. It is an experience or a result of the past that one is allowed to compensate for.

Knowing, however, that an immediate result might qualify to be compensated for in a future test, will change the signs. Loosely calling this 'compensation' also, it might be termed *proactive* compensation, taking its effect only in the future and nevertheless allowing a change of preparation strategy in the immediate present.

It is necessary to distinguish the two kinds of compensation allowed in cases other than that of the LT. The NTLT is the best case to help explain. If the debt of negative compensation points (retroactive) already equals what maximally can be compensated for by the NTLT as well as the LT, then in actual fact the NTLT does not offer this particular student the opportunity to add more negative compensation points (proactive), even though formally - abstractly - it would be allowed. The freedom that is allowed formally, might in fact have been reduced or consumed by results obtained earlier. The possibilities still open after the present test might have been reduced likewise.

In the interplay between the NTLT and LT negative compensation points earned on the NTLT (proactively) turn out to be a concept that somewhat differs from the negative compensation the LT allows for (retroactivily). The last one might act as a powerful motivation for the student to step up her strategic investment in preparing for the NTLT, the first one does not ostensably have that effect.

Earning positive compensation points will shift strategy curves on future tests down and to the right, thus easing the required - the at-least-optimal - strategy in preparing for them. The 'easing' however is a mix of a somewhat higher investment of preparation time and the resulting rather higher reduction in expected time needed to obtain a pass.

The situation here does seem to be rather straightforward; the student is motivated to step up her investment, thereby reducing future time investments with a bigger amount. The catch is that the first investment is an immediate one, and the reward is a future one. There is some opportunity for procrastination here. The more serious catch however is that students might not be able to invest the amount of time that would be optimal to them, because the time budget yet available to them is limited. In fact, some students might habitually find themselves in exactly this kind of predicament. The mechanisms involved, in relation to individual differences in capacities and capabilities between students, have been studied by, among others, Covington (1992) in his 'Making the grade.' For a description in Dutch see my 1980.

There is a multiplier effect involved, in the sense that having built up a certain credit will make it even easier to add more positive points. And yes, regrettably the reverse might become true also; having amassed a certain debt, the student might thereby be forced in a position of having to take yet more negative points. The SPA model will assist in studying this kind of effects different examination designs might have, or in finding solutions by individualizing instruction and assessment in certain ways.

While earning positive compensation points results in a transparent strategic position in preparing for the LT, now having a lower cutoff score to pass, the earning of *negative* compensation points results in a somewhat complex strategic situation. The reason for this to be so, is that it is possible to pass the LT itself, while not absolving the negative compensation points. In such a case the better option for the student is to resit the NTLT, instead of trying to pass the LT on the compensating cutoff score. This will be immediately clear if one imagines the case where five negative points have to be compensated for, on a short test of only 20 items.

In order to find the optimal strategy on the NTLT, the formula for the expected costs will have to be developed, given the initial investment is *t* and for a resit is *ct*.

Resitting the NTLT will be necessary if 2) the student fails to obtain at least a below-the-reference score that may be compensated for later, or 2) the student later decides to resit the NTLT because at that moment such is the better strategy. On the NTLT, therefore, the real cutoff score need not be the reference score. If negative compensation points are allowed, the real cuttoff score will be lower than the reference.

The NTL and LT test will be assumed to be strategically equal in all respects. Later versions of the spa model will accommodate certain differences, such as in length, reference, and optimal preparation time.

It will be assumed here that *differences* in the optimal strategy on the LT, resulting from compensating results on the NTLT - either negatively or positively, belong to the costs (profits). In other words: if stepping up one's investment on the NTLT might result in a more profitable position in preparing for the LT, it belongs to the expected costs of that particular strategy on the NTLT.

The formula to be developed now, contains the expected profit on the LT in abstract, i.e. algebraic, form only. Actual evaluation of these expected profits is possible by determining the optimal strategies on the LT corresponding to different amounts of compensation points earned on the NTLT, the technique and formulas have already been presented in the chapter on the Last Test.

**Expected cost on the NTLT**

Assume the first sit of the NTLT takes*t*episodes - the direct investment.

The first sit results in three kinds of outcomes: 1) a pass without compensation points, 2) a pass with one or more compensation points, 3) no pass. Assume a resit of the NTLT to cost*ct*, following the same optimal strategy that has been used for the first sit. Let*q*be the probability to fail the NTLT on the cutoff score, i.e. the probability to obtain result 3) in this sit. Evaluating the expected utility function, using a threshold function on that cutoff score, will give the probability of a pass*p*= 1 -*q*. Let strategy*t*be the preparation for the first sit, and*a*=*ct*that for a second or later sit of the NTLT.

Outcome 1) will leave the strategic position on the LT unchanged.

Outcome 2), however, will result in a profit*w*on the LT, corresponding to whatever number of compensation points might obtain. Let*r*be the probability to obtain that particular compensation or profit, and therefore*rw*its expected profit. For the whole range of negative and positive compensation points the expected profit is the sum Σ*rw*, the probability to obtain one of these possibilities is*p*- otherwise the student fails the NTLT immediately.

Sitting the test for the first time therefore results in the following expected cost, the first investment*t*minus the expected profit - being a negative cost - :

*t*- Σ*rw*=*t*-*a*+ (*a*- Σ*rw*).

The**constant cost**case in failing the NTLT then simply is, letting*k*be this constant cost

*qk*.

Therefore, in the constant cost case the expected cost, given strategy*t*, is

*t*- Σ*rw*+*qk*.

If failing the test means the student has to resit the test, then the following obtains. The probability to have to resit the NTLT immediately is*q*, the associated cost therefore is, the expected profit and its probability being the same because the strategy is still*t*, but the direct investment*a*=*ct*:

*q*(*a*- Σ*rw*).

In the same way a possible second and third resit entail expected costs:

*q*² (*a*- Σ*rw*), and

*q*³ (*a*- Σ*rw*), respectively.

And so further. Recognizing the geometric sum here, the total expected cost is:

*t*-*a*+ (*a*- Σ*rw*) /*p*.

**What about Σ***rw*?

The evaluation of Σ*rw*can be straightforward; an option in the program will use the straightforward method. The special thing about this sum, however, is that it looks like the summing involved in expected utilities. Here we discover that it is possible to construct a realistic, not just a formal, utility function for the NTLT, and use it to substitute expected utility in the formula just derived. There is much more to say about this - in fact this state of affairs is one of the special characteristics of the spa model - in other places and chapters.

How to handle this Σ*rw*? The*w*are expected profits, expressed in terms of episodes, and therefore on the same scale as investment in preparation of the NTLT; they have been fully treated already in module 7 on the LT, paragraph 1, see also the paragraphs 8 and 9 (the code, and testing the code). They are not utilities. However, it takes but one or two steps to make them into utilities, and therefore the sum into an expected utility: add the constant optimal - i.e., minimal - expected cost*o*to pass the NTLT on its reference score, - i.e. not bothering about earning compensation points, either negative or positive ones, and failing scores having utility zero:

E(*u*) = Σ*r*(*w*+*o*) = Σ*rw*+ Σ*ro*= Σ*rw*+*o*Σ*r*= Σ*rw*+*op*,

because the sum of the probabilities to score particular compensation points on the NTL - including the zero compensation points case - equals the probability of a pass, it is*p*:

Σ*r*=*p*.

This*o*+ Σ*rw*can be scaled in the usual way for utilities - dividing by*o*, an operation that is not strictly necessary in the search for the optimal strategy on the NTLT in compensation cases. The thumbnail, by the way, gives a plot of the scaled real utility function.

Unscaled the formula now becomes, in the**resit case**:

*t*-*a*+ (*a*- Σ*rw*) /*p*

=*t*-*a*+ (*a*- ( E(*u*) -*op*)) /*p*

Its scaled form:

*t*/*o*-*a*/*o*+ (*a*- ( E(*u*) -*op*) /*o**p*

In the**constant cost**case the formula is

*t*- Σ*rw*+*qk*

=*t*- E(*u*) +*op*+*qk*.

Be aware, however, that the constant*o*in general will not have the same value in the constant cost case as in the resit case.

It is possible to quantify the strategy curve - have I defined it yet? - in the NTLT case because the number of combinations of scores on the NTLT and the LT allows it. That situation changes rapidly for tests earlier in the course or examination. The idea now is to approximate the strategy curve for *any* test earlier in the series by substituing the one that is valid for the NTLT. One thing we can be sure of is that the approximation never will be perfect. The question therefore is, will it be good enough?

I conjecture that in many situations the strategy for *any* test in a series will be approximately equal to the one obtained by assuming it is the Next To Last Test. In due time I will mae this statement a probable one by producing adequate examples.

The basic questions about test preparation strategies have been treated in the last chapter. This chapter has to deal with the complications arising by departing from testing under the pure threshold utility regime. In an important way, the topic is known territory, see for example the publications of Frederick Lord in the early sixties of the last century; in using a test battery it is unwise - suboptimal, uneconomical - to use sharp cutoffs on each of the composing tests. The scoring should allow higher results on some tests to compensate for lower ones on others, at least to a certain extent.

Granted that some combination rule for test results is needed, the next question is how to weigh different results. The position chosen in the SPA model is, for the time being, to assume equal weights. An exception has been made for the individual test being composed by stratified random sampling of test questions from two subdomains. Indeed the construction of a test by choosing its items from subdomains is also a combination rule problem. Weighing test results differentially is a sensible option only if differences are appreciable, corresponding weights then need not be finely graded; they better be crudely graded for transparency reasons (see below). Representative for the literature on this issue is Dawes' (1977) 'The robust beauty of improper linear models in decision making.'

Even allowing lots of compensation points formally, many students will eventually find themselves in a situation of threshold loss for the Last Test in the course or examination. For some students their situation is serious in the sense that they must take an appreciable riks to fail the Last Test. The position taken in regard to the ethics of this situation is the following. Decisions to fail students scoring a small number of points short of the cutoff more often than not cannot be underpinned by valid arguments concerning the content of the test, no matter how valid the test may be known to be. Instead, the line of argument should be that this is the way instruction and assessment necessarily are designed, students should understand so and they will have to bear the risk of failing their tests themselves. To empower them to do so, teachers - the institution - should offer complete transparency about every upcoming test, the kind of questions to be expected, the way they will be scored and the results will be valued. The transparency principle was explicitly formulated by Adriaan de Groot (1970) and elaborated by Job Cohen (1982) (now the mayor of Amsterdam). Dawes (o.c.) also is very perceptive about the ethics of deciding on the basis of numbers instead of 'clinical' information gathered in interviews.

In the case of the Next-To-Last Test an inportant assumption about the last Test to follow is that in all significant respects, except its content, it is equal to the NTLT. In particular the 'starting position' on the Last Test is assumed to be the same number correct out of the same number of preliminary test items, after having studied one episode. Remark that in this particular instance the one episode for the NTLT is assumed to be physically the same length as the one episode for the LT. The assumption is in no way really restrictive on the model, it is a question of pragmatics in order to get definite results.

In due time the program will offer options for the LT to differ in significant characteristics from the NTLT.

The SPA model is constructed in such a way that it gives the impression that it applies only to situations where real tests are used, even though they may be teacher-made. There is, however, no reason why the model could not be used in situations of more informal assessment of students. To see this, remark that the model allows to analyze situations where some tests are split into a number of partial tests. In fact, a series of tests comprising a particular course is just such a case. Now if it is possible to do this once, it can be done again and again, ultimately arriving at the assessments made 'in real time' in instructional situations. This point is of tremendous import, because it allows to bring assessment back to where it belongs: in the instructional process itself. Assessment should be instructionally valid, which is not the same as being equitable - as the layman would say - or reliable - in the psychometrician's jargon -. Placing the emphasis on equity creates a drain of scarce resources now being used on testing or scoring essays instead of giving students proper feedback in instruction, in 'real time.'

The NTLT module effectively completes the strategic model as far as individual students are concerned. The model will allow evaluation of the effectiveness of different rules for the combination of test scores: how much compensation will be allowed, will negative compensation be allowed or positive compensation only, etcetera.

More complex models will be built, incorporating competition between students as well as (implicit) negatiation between (the body of) students and faculty (see my 1992 papers). Another interesting question is how individual student's results on the series of tests comprising a course or an examination can be characterised, depicted, or modelled, and how these individual results can be aggregated to characterize group results.

Basically, the problem situation is what the student's strategic position is in the face of the way separate tests in an examination will be combined to determine the exam outcome. An early study on the 'reliability' of examinations is that of Valentine (1932). His choice of terminology shows he is using psychological *measurement* as his framework. His basic attitude is that examinations might not be fair to students. and uses empirical data on entrance examinations, school certificate examinations, and scholarship awards to show them to be unfair. It is implied in this work that students in many situations do not have fair strategies available to influence their exam result. However, Valentine does not attempt to formalize or model the strategic position of students having to sit exams. It is the work of Cronbach and Gleser (1957) that opens up possibilities to model strategic positions examinees, Van Naerssen (1970, html) is one of the very first to try his hand on the problem. Nevertheless, Van Naerssen was not able to get a handle on the combination problem, other than using traditional psychometric analyses. In 1978 (pdf) I tried my hand on the combination problem, and it proved too early to get a solution: the second volume on student strategies never materialized.

This closing part of the SPA model is still under development. In the project's history it is a rather late (2005) addition, and it has not been published yet. In a rather crude form an optimization technique was developed in 1998 and tried on an extensive data set obtained from first year law students. The early operationalization was not quite satisfactory, however, because it makes use of the amount of preparation time that already has been invested. In other words: decisions to invest yet more preparation time come to depend on the amount of time that has been invested earlier; the more that has been invested earlier, the later the optimal stopping moment will be placed. Using this formula a firm might go bankrupt early, but a student's undertaking is not a firm, and conservative strategies will do little or no harm in this kind of educational setting. Yet the quest is after techniques or algorithms that could be used alternatively.

The current solution in a preliminary form was developed - discovered is maybe a better term - in March 2005. In August 2005 the model was revised on this point, and the LT and NTLT cases were clearly separated. Of course the strategy module is crucial to the SPA-model, so I will be somewhat candid about the construction and algorithm for the optimal strategies, until publication.

Already in 1995 a partial solution to the optimization problem was found in the indifference curve technique (see my 1995), making it possible to find the optimal distribution of available time in the preparation for *two* tests the student has to sit the same day. The method does however not help to find optimal strategies in the case of preparation for one test only.
A logical error in the model itself in August 2005 wrongly suggested that negative and positive compensation worked out quite differently as regards to the optimal strategies that are available to the student. Further research triggered by this faulty finding eventually led to the development early in September of the second generation of utility functions, and subsequent correction of the logical error mentioned earlier. The problem in the development of the SPA model is that it is a journey in a completely new territory. Faulty results do not stand out as such, because every result first looks somewhat strange and unexpected. The second generation utility curve, however, brings us back to the world of mainstream decisonmaking models, the special point in the SPA model now being the way the second generation utility functions can be constructed.

*schematic*

[Strategy class] [Call method getNTLTeC_OptLT to] Get the vector of the optimal strategies as well as that of the optima on the LT corresponding to all possible results on the NTLT. [*Stratified sampling* has not yet (Mar 2006) been fully implemented here]

[Expectations class] [method getNTLTeC_OptLT]

[Call get_OptLTExtended] The first thing here is to get the

**optLT**array of optima on the LT. This array and the code to produce it have been treated in some detail in the foregoing chapter on the LT. Now for every possible result on the NTLT the corresponding optimal strategy on the LT as well as its its expected cost are available.[Call getExpUtilityF] Get the expected utility function using

*real utilities*.*else*(option 820) construct the array**eUU**containing expected utility curves for every possible compensation score on the NTL regarded as cutoff score, i.e. the probabilities to pass on that cutoff scoregiven the particular compensating score

subsitute it in the parametervector

**pn**where both compensation parameters have been set equal to zeroget the expected utility function, using the formal utility curve; these expected utiities are probabilities to pass the test on the cutting score ( = this compensating score)

substitute the expected utility function in the array eUU

For all (fractional) episodes (and some more, to prevent pseudo-optima on the LT) get the value of the optimal strategy given this investment

**eC[ time ]**, using the real utility function routine*or*using the option 820 straightforward routine, the sum of the probabilities of expected profits on the LT is split up in a series of different stepsget the probability

*eUs_Last*of obtaining the highest compensation

**eUs_Last = eUU[ compNeg + compPos ][ time ]**add the product of this probability with the expected profit on the LT in the sum

*ew*

**ew += eUs_Last * ( optLT[ 0 ][ compPos ] - optLT[ 0 ][ 0 ]**

The expected profit on the LT equals the difference between the expected total investment under the optimal strategy and the original reference on the LT (the first term) and the same under the lower cutoff corresponding to maximum compensation (the second term). The order in optLT: the optimum corresponding tot the*highest*compensating score on position 0. Remember: the order in eUU:*lowest*compensating score, i.e. the highest negative one if there is such a score, on position 0.-
for the other
*positive*compensation points, for each get its probability and multiply it by the expected profit on the LT, add in ew get the probability

*eUs_Last*of obtaining the at least the reference score on the NTLT

**eUs_Last = eUU[ compNeg ][ time ]**for the

*negative*compensation points, for each get its probability and multiply it by the expected profit on the LT, add in ew

[Strategy class] [For the record] Thumbnail plot of utility functions, labels, values and other auxiliary items, function values if such an option has been chosen, etcetra. Declarations of variables etcetera.

The following text in this paragraph is an old (2005) version, it has yet to be updated:

The NTLT without compensation opportunity presents the student with a strategic situation equal to that for the LT. It is only when some compensation is allowed that the NTLT strategy becomes special. The first observation about the strategic situation is that different numbers of compensation points result result in different optimal strategies on the LT. These optima are evaluated and collected in the array optLT, containing the optima themselves as well as the initial strategies corresponding to them. The method doing so is get_OptLTExtended, the crucial part of that method is shown in Figure 1. It is deceptively simple, however, because crucial evaluations are hidden in the methode get_eCForThisComp.

Figure 2 therefore presents the way the strategy curve on the LT is determined in the presence of compensation points, especially negative ones.

This applet embodies the complete model and is therefore rather complex. Its complexity makes it difficult to test the appropriate functioning of the program. Assume the modules 1 until 6 to be correct

*simulation versus evaluation*

Plotting simulation results versus the results of analysis is not really a test for this module 7. Assuming the modules 1 - 6 to function well, the routines remaining to be tested are shared among the simulation and analysis options. Nevertheless, if curious things are seen to happen, something is wrong or something is not understood well. It will be evident that the complex manipulation of the methods from modules 1 until 6 makes the results sensitive to any minor problems that still might exist in any of these modules.

I would like to have a special applet for plotting the projected likelihood or the predicted score distribution after investing x extra periods, to be able to inspect the analytical and simulated predictive distributions. For the *projected* predictive distribution that applet is available in the applets page applet 6.1 or advanced 6.1a. The picture shows simulation and analytical results using that applet (100.000 observations)

One problem inherent in the method itself should be well understood.

the analytical optimum might change as the number of episodes declared becomes smaller, what it should not do. The problem here is that the optimum on the Last Test might be a pseudo-optimum depending on the number of episodes declared. Losing a number of compensation points on the Next-To-Last Test will force the optimum on the Last Test to the right, possibly out of the bounds of the number of episodes declared. I have solved the problem by taking - in the computer program - two times the number of episodes declared to evaluate the optima on the LT. In extreme cases this might not be sufficient to prevent the wrong results.

Martin J. Beckman (1972). Decisions over time. In C. B. McGuire and R. Radner (Eds). ** Decision and organization. A volume in honor of Jacob Marschak.** Amsterdam: North-Holland.

The strategy over a series of tests unfolds itself in time. It is probably possible to develop models for longer series of tests than the series of two treated in the SPA-model. The question is whether solving the complexities involved will bring sufficient results. The chapter by Beckman allows a first estimate of the success more complex models might have.

Buehler, R., D. Griffin and M. Ross (1994). Exploring the 'planning fallacy': Why people underestimate their task completion times. *Journal of Personality and Social Psychology*, *67*, 366-381.

Gretchen B. Chapman: Sooner or later: The psychology of intertemporal choice. In Douglas L. Medin (Ed.) (1998). **The psychology of learning and motivation. Advances in research and theory** (p. 83-113). Academic Press.

- Is this relevant to the spa-model? The very first sentence seems to affirm the question. "Exercising, studying for an exam, and investing for retirement are everyday activities that involve a trade-off between short-term and long-term consequences."
- A good list of references.
- This publication is referred to in Anne-Sophie Melenhorst (2002). Adopting communication technology in later life: The decisive role of benefits. Dissertation Technical University of Eindhoven. pdf, chapter 5: Potponement perceived as a factor of uncertainty.

Cohen, M. J. (1981). **Studierechten in het wetenschappelijk onderwijs.** Zwolle: Tjeenk Willink. [Student's rights in education]

Cosmides, Lea, and John Tooby (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. *Cognition*, *58*, 1-73. pdf

Covington, Martin V. (1992). **Making the grade: a self-worth perspective on motivation and school reform.** Cambridge: Cambridge University Press.

Dawes, Robyn M. (1979). The robust beauty of improper linear models in decision making. *American Psychologist*, *34*, 571-582.

**abstract**Proper linear models are those in which predictor variables are given weights in such a way that the resulting linear composite optimally predicts some criterion of interest; examples of proper linear models are standard regression analysis, discriminant function analysis, and ridge regression analysis. Research summarized in Paul Meehl's book on clinical versus statistical prediction - and a plethora of research stimulated in part by that book - all indicates that when a numerical criterion variable (e.g., graduate grade point average) is to be predicted from numerical predictor variables, proper linear models outperform clinical intuition. Improper linear models are those in which the weights of the predictor variables are obtained by some nonoptimal method; for example, they may be obtained on the basis of intuition, derived from simulating a clinical judge's predictions, or set to be equal. This article presents evidence that even such improper linear models are superior'to clinical intuition when predicting a numerical criterion from numerical predictors. In fact, unit (i.e., equal) weighting is quite robust for making such predictions. The article discusses, in some detail, the application of unit weights to decide what bullet the Denver Police Department should use. Finally, the article considers commonly raised technical, psychological, and ethical resistances to using linear models to make important social decisions and presents arguments that could weaken these resistances.

Dawes, Robyn M. and Bernard Corrigan (1974). Linear models in decision making. *Psychological Bulletin, 81*, 95-106.

Dixit, Avinash K., and Robert S. Pindyck (1994). **Investment under uncertainty.** Princeton, Princeton University Press.

Groot, A. D. de (1970). Some badly needed non-statistical concepts in applied psychometrics. *Nederlands Tijdschrift voor de Psychologie, 25*, 360-376.

Kamel Jedidi and Rajeev Kohli (2003). Probabilistic subset-conjunctive models for heterogeneous consumers. *Journal of Marketing Research* (forthcoming) pdf or pdf info
Jones, H. (1975). **An introduction to modern theories of economic growth**. Walton-on-Thames: Nelson.

Kahneman, D and A. Tversky (1982). The simulation heuristic. In Kahneman, D., P. Slovic, & A. Tversky (Eds) (1982). **Judgment under uncertainty: heuristics and biases.** London: Cambridge University Press.

David A. Lagnado and Steven A. Sloman (2004). Inside and outside probability judgment. In Koehler and Harvey: **Blackwell handbook of judgment and decision making.** (p.157-176) Blackwell Publishing.

Larrick, R. P., R. E. Nisbett and J. N. Morgan (1994). Who uses cost-benefit rules of choice? *Organizational Behavior and Human Decision Processes*, *56*, 331-347.

Linden, W. J. van der, and H. J. Vos (1996). A compensatory approach to optimal selection with mastery scores. *Psychometrika*, *61*, 155-172.

Lord, F. M. (1962). Cutting scores and errors of measurement. *Psychometrika*, *27*, 19-30.

Luenberger, D. G. (1998). **Investment science**. Oxford: Oxford University Press.

Mellenbergh, G. J., and W. J. van der Linden (1979). The internal and external optimality of decisions based on tests. *Applied Psychological Measurement*, *3*, 257-273.

Naerssen, R. F. van (1970). Over optimaal studeren en tentamens combineren. Openbare les. Amsterdam: Swets & Zeitlinger. html

Pope, R. (1983). The pre-outcome period and the utility of gambling. In B. P. Stigum and F. Wenstop, **Foundations of utility and risk theory with applications**. Dordrecht: Reidel. 137-177.

Schouwenburg, H. (1993): **Uitstelgedrag bij studenten**. Proefschrift Rijksuniversiteit Groningen.

C. W. Valentine (1932). ** The Reliability of Examinations. An Enquiry. ** London: University of London Press.

- This truly is a study on examinations as combinations of separate tests and papers. It is not about the reliability of grading individual papers.

Ben Wilbrink (1978). **Studiestrategieën**. Examenregeling deel A. Amsterdam: COWO (docentenkursusboek 9). 800k pdf

Ben Wilbrink (1980). **Uitval en vertraging in het w.o.: een overschat probleem. **Onderzoek van Onderwijs, 9 nr 4, 14-18. html

Ben Wilbrink (1992). **The first year examination as negotiation; an application of Coleman's social system theory to law education data**. In Tj. Plomp, J. M. Pieters and A. Feteris (Eds.), **European Conference on Educational Research** (pp. 1149-1152). Enschede: University of Twente. Paper: auteur.
doc

Ben Wilbrink (1995). **A consumer theory of assessment in higher education; modelling student choice in test preparation**. 6th European Conference for Research on Learning and Instruction, Nijmegen. Paper: auteur.
html

Michael Yee, Ely Dahan, John R. Hauser and James Orlin (2005). Greedoid-based non-compensatory cinsideration-then-choice inference. pdf

Robert L. Bangert-Drowns, James A. Kulik and Chen-Lin C. Kulik (1992). Effects of frequent classroom testing. *Journal of Educational Research, 85*, 89-99.

- meta-anaysis, typically American situations, remediation taken after testing is a crucial factor here.
- Susan M. Brookhart (2004). Classroom Assessment: Tensions and Intersections in Theory and Practice.
*Teachers College Record, 106,*Page 429 abstract - Khalaf, A. S. S. & Hanna, G. S. (1992). The impact of classroom testing frequency on high-school-students achievement.
*Contemporary Educational Psychology, 17*(1), pp. 71-77.[I have not seen this one] - Kika, F. M., McLaughlin, T. F., & Dixon, J. (1992). Effects of frequent testing of secondary algebra students.
*Journal of Educational Research, 85*, 159-162. [I have not seen this one] - Bernhard Jacobs (www 2003) Lerneffekte häufigen Testens in pädagogischen Umwelten html
- D. William Deck (1998).
**The effects of frequency of testing on college students in a principles of marketing course.**Disserttaion. pdf - W. J. Haynie, III (1992). Effects of mutiple-choice and short-answer tests on delayed retention thinking. JTE html

Robert L. Bangert-Drowns, James A. Kulik and Chen-Lin C. Kulik (1983). Effects of coaching programs on achievement test performance. R*Review of Educational Research, 53*, 571-585.

- "In the typical study, the effect of coaching was to raise achievement test scores by .25 standard deviations."
- Powers and Camara (1999). Coaching and the SAT I. Research notes, Office of Research and Development, RN-06 pdf

William E. Becker and Sherwin Rosen (1992). The learning effect of assessment and evaluation in high school. *Economics of Education Review, 11*, 107-118.

**from the abstract**We show that competition between students does stimulate academic effort provided students are appropriately rewarded for achieving.

Mail your opinion, suggestions, critique, experience on/with the SPA

July 10, 2006
\ contact ben at at at benwilbrink.nl

http://www.benwilbrink.nl/projecten/spa_strategist.htm