Applet 4, The Ruling, will be extended with the special (formal) utility curve of the Last Test where there are some negative points that yet have to be compensated for. This is planned for the last week of December, 2005. This extension will have no consequences for the strategy curves in the applets 8 and 9, because they get evaluated using pure threshold curves at different score levels on the LT.
(Advanced) applets 8 and 9, as well as the chapters, are currently (Dec. '05) under revision, the current routines have not yet been tested adequately.
The guessing parameter is not yet available in applets 3 and higher.
The proper working of stratified sampling in the higher applets has not been checked for a long time now, be not amazed to get rubbish as results.
If you come across an applet that is not functioning properly, please mail me. It is not possible always to check all applets for unintended consequences of changes in classes. As this is a project in progress, such changes are made on a routine basis.
Applets are known to work correctly under:
Internet Explorer under Windows XP
Firefox 1.0.7 under Windows XP
Safari 1.2 under MacOSX 10.3.9
FireFox 184.108.40.206 under MacOSX 10.3.9
It might be the case that the applets do not open properly in browsers under Windows, or in browsers other than Safari under MacOS X: the applet field remains gray or blank. Be sure that Java is allowed. It might also be necessary to allow pop-ups! (see the preferences menu of your browser)
In module chapters original applets have been replaced by screenshots; therefore applet problems should not hinder readers of the SPA project. Readers not able to use the applets in their browser, and yet willing to do so, may contact me, if preferences of the browser pertaining to Java do not seem to be the problem.
Information about Java, and applets in particular:
MacOS X: There is a problem with Java versions 1.4 for browsers other than Safari. See http://javaplugin.sourceforge.net/Readme.html; http://developer.apple.com/documentation/Java/Conceptual/Java131Development/deploying/chapter_3_section_5.html; simile.mit.edu/repository/ misc/java_embedding_plugin/readme.rtf
MacOS X: Opera, version 8.5, produces 'java.lang.UnsupportedClassVersionError: Spa_BinomialApplet (Unsupported major.minor version 48.0)
MacOS X: Internet Explorer 5.2 for Mac, [preferences: enable Java on; cookies: never ask; web content: enable plug-ins on] produces 'java.lang.UnsupportedClassVersionError: Spa_BinomialApplet (Unsupported major.minor version 48.0)
Windows: Java Applet plug-in makes it possible for your computer (including Windows® XP, Me, NT, 2000, 98, or 95) to run applets in your browser. http://www.mcdonalds.com/search/help/plug_play/sunmicro.html
1 1a Generator 1o. 1oa advanced 2 2a Mastery Envelope 3 3a Predictor 4 4a Ruling 5 5a Learning 6 6a Expectations 6b. special 6.1a. advanced 7 7a Last Test 8 8a Strategy 9 9a True Utility
Clicking the start button will start a new simulation and/or analysis using the values specified in the menu.
The simulation simulates item scores for the specified number of observations or cases; the resulting distribution of testscores will be plotted as a solid green bar diagram.
The analysis evaluates a probability distribution function called the binomial distribution; the distribution plotted is how this mathematical distribution will look like if the number of students equals the number of observations declared in the menu.
Mastery may be chosen between 0 and 1; mastery as a concept is defined on the set of test items that every concrete test is supposed to have been sampled from.
The number of test items is the length of the test; this number is not limited.
The number of observations or runs is the number of runs for the simulation, and/or the number of cases that is represented in the analytical distribution.
The reference point is a particular test score that is of interest to the user of the applet. It might be the cutoff score in pass-fail testing, the minimum score that is still considered to be a level of sufficient mastery, etcetera. It is marked by a solid vertical in the plot.
The horizontal scale is that of test scores obtained, the lowest score being 0. The vertical scale will be automatically chosen and depends on the highest value of the analytical distribution, this point is tabmarked.
The statistics reported are the mean and standard deviation for every distribution. Because the theoretical value of the binomial statistics assumes the number of observations to be infinite, there will be small differences between the actual statitics and the statistics obtained by evaluating their formulas. Simulation and analytical statistics will differ from each other, of course; the differences wil be smaller the larger the number of observations is chosen.
In the following it will be assumed that menu items from earlier modules are known. Look back if you are not sure about the meaning of one or another menu item not mentioned.
The information available to the student is represented as the number correct on a preliminary test of chosen length. The assumption here is that the results on a preliminary test are known - immediately after sitting the test. The number correct is the score obtained on the preliminary test or pretest, the next menu item is the number of items in the preliminary test. The preliminary test is assumed to be a random sample from the same knowledge domain that the summative test will be sampled from too.
The preliminary test score may be used as a factual device to inform the student. Alternatively, the student may express the information available to her as the number correct on a virtual preliminary test of a certain length. Test length represents the strength of the information available.
If the information includes guessing, the assumed probability to guess the right answer on items not known can be specified. The likelihood evaluated or simulated then will be the likelihood of mastery itself. It is possible, of course, to keep the guessing probability at zero, knowing that it is in fact substantial; the likelihood then is the likelihood of the combination of mastery and guessing equalling the horizontal value that still will be called 'mastery.'
The number of mastery grid bars determines the 'grain' of the likelihood: the higher the number, the smoother the analytical plot will look, and the longer the simulation will take.
The horizontal scale is that of mastery, running from almost zero to almost one. The scale is divided in bars of equal width, their number is the mastery grid. In the pictured case the grid is 100, the rightmost bar represents mastery 0.995.
The options option is offered here for convenience. See the 'new menu items' rubric under the advanced applet for the possibilities offered.
The vertical scale by definition of the concept of a likelihood runs from one to zero. In many cases the simulation will seem to lie below the analytical distribution, because for the simulation also the highest likelihood will be set equal to one.
The mean and standard deviation of the likelihoods are reported, as well as the mastery value that has the maximum likelihood, and the probability that given this mastery the test score will equal the number correct as declared by the user.
The number of test items is the number of items in the test that stands to be predicted.
The reference is a score level that in some sense is critical. With pass-fail scoring it is the cutoff score. The other new menu items will be explained in the next module on utility functions. The values chosen here make the situation one of pass-fail scoring, in which case the expected utility equals the probability of passing the test on the first occasion: the probabilities are reported as epected uilities (expU) below the figure.
Threshold utility - in pass/fail scoring - is the extreme case where no compensation is allowed.
Full compensation is the case where all points earned will weigh equally. The reference point in the full compensation case is forced to be half the maximum score. Typically full compensation is formally valid only, except on the very first test to be taken, because points already earned or lost leave the factual room for compensation smaller than the formal rule suggests. Of course, in strategic modelling it is the factual utility that should be used, not the formal one.
The reference point is the critical score level on the test. In pass-fail scoring it would be the cutoff score.
If compensation is allowed, the unit might be any number of items. The item group is the number of items making a difference of one compensating point. The working is not symmetrical; to earn a negative point if the group is two, it is sufficient to score one item below the reference, to earn a positive compensation point one must get at least two additional items correct. Grading is a common way way of grouping scores.
Compensate negative is the number of points - not items! - short of the reference score that the student is allowed to compensate by higher points on later tests. This factual level of compensation allowed may differ - will most of the time differ - from the level of negative compensation that formally is available on this test. If the test is the last test (LT) to be taken, positive compensation earned earlier will be used to lower the reference score on the last test as far as its formal level of negative compensation allows, making it a lot easier to pass the LT.
Compensate positive is the number of points - not items! - above the reference point that the student is allowed to compensate by equally lower points on later tests. The factual number of points to be earned by this student may be lower than the number formally available on this test.
The vertical scale should be consistently used. The convention used in the model is to assign the reference point utility one. Other conventions are possible, the advanced applet might offer some. Positive compensation points therefore result in the vertical scale being longer than one.
The parameters in the menu should allow the modelling of almost all situations one might encounter in educational assessment. In the literature however other utility functions have been used, the advanced applet will allow to use and research them.
The mastery is the mastery supposed known after learning for one period, or a particular mastery chosen from the likelihood (module two).
There is a choice of two learning models offered, the accumulation model and the replacement model, described in Mazur and Hastie, 1978. Together they cover most of the learning of basic elements of knowledge. In actual practice not much is known about what learning models could be valid; therefore use the available models to study how sensitive strategic choices are to the kind of model that really applies (they are!).
Learning is defined on the basic elements of the knowledge domain. Test items typically involve two or more of those items at once, the precise number is the complexity of the items in the item domain.
The vertical scale is zero to one. Learning is supposed to begin at zero mastery, its ceiling is perfect mastery, i.e. one.
The horizontal scale is fixed by the one period assumed to lie between the beginning and the mastery specified. Otherwise it may be streched at will into the future, but in applications only the second episode will be of any import. To refine results, refine the grid by specifying a larger number of bars per episode.
Learning curves of any form might be used in the SPA model; no such options have been implemented however.
Episodes is the number of episodes (units of time) invested or to be invested in preparation for the test. The unit episode is the time invested at the time of preliminary testing. Choosing a larger number of episodes extends the prediction further into the future.
Bars/episode is the grid that the unit episodes may be divided in. Declaration of a small number of episodes otherwise would result in a rather clumsy 'curve'. The grid allows the number of episodes to be real, for example 1.1 will do for an analysis on a rather short term if the number of bars is chosen to be ten.
Statistics. It would be instructive to have the means and standard deviations of the distributions plotted. This has not been implemented yet, however.
The vertical scale origin is free to choose in order to be able to adequately depict curves having optima that otherwise would plat awkwardly.
Failing the last test (LT) does not necessarily equal failing the course or examination. In fact, such is seldom the case. But failing the test does have consequences. One such consequence might be that an equivalent course unit and summative test will have to be done. The time cost of one such extra unit is the retrial prep time; expected time equalizing that needed for the LT is "1.0." Smaller values may be chosen, or larger ones, according to what the examination rules specify as consequential actions on failing the LT. [Asking for a retry on the same test - another sample of items of course - 1) most of the time is not sound educational practice, 2) costs much less retrial prep time, and 3) is therefore an invitation of sloppy test preparation. Use the model to estimate the cost of such bad practice]
Another possible consequence of failing the LT might be to do some extra assignment of specified duration. This fixed cost must be specified in the number of periods involved, i.e. physical duration must be translated into the 'personal time' of the student. Fixed cost should be put at "0.0" if the retrial prep time is to be used. Technically fixed cost and retrial prep time can be used in combination.
In the right upper corner the utility function as specified is plotted; it serves to make it easier to check for the correct specified values. The advanced applet will offer the opportunity to plot the learning and expected utility curves as well.
Remember that the case handled in module 8 is that of the Next To Last Test. The strategy followed will result in more or less costs on the Last Test also, the 'more' or the 'less' are included in the expected cost curve for the NTLT.
Here the advanced option to plot on a second set of values is offered to be able to highlight differences between compensation scenario's.
Another advanced menu item is the option box. It may be used to shift the strategy on the LT a specified number of bars above or below the evaluated optimum strategy. Use the number 1100 and add or subtract the number of bars (maximum 99). The option serves to investigate the effects in the case of negative compensation on the NTLT; they are nearly negligible.
The advanced applet will offer the opportunity to plot the LT curves as well, enabling direct comparison of LT and NTLT strategies.
Option 204 prints out function values.
Option 205 uses the beta density in the analytical case. Mean and standard deviation, however, are evaluted using the vector values (see option 206 if you want the mean and standard deviation evaluated directly from the formulas).
Option 206 prints the standard deviation evaluated according to the beta density formula, as a check on the regular results evaluated on the basis of the actual likelihood function.
Option 207 makes the binomial parameter m + ( 1 - m ) * ( r + r * m ), in other words, the guessing probability gets higher with higher mastery. It is best used with r representing the guessing probability in case the student does not yet know anything of the course content.
Option 208 is a fast way to produce results without guessing, even when the guessing parameter is declared to be positive.
Option 209 evaluates or simulates the likelihood using the test-level model. Because the test-level model is equivalent to the item-level model, the results will be the same to that of the item-level model. To mark the results as being produced by the test-level method, the analytic plot will be shown in blue instead of red.
Option 210 will use the full simulation method instead of the fast simulation one. It might be used when the number of runs is chosen to be rather low; in such a case the fast method might give visibly 'clustered' results, not looking very 'randomly' produced. Needless to say, the full simulation method is very much slower than the fast method.
In fact, this is the advanced applet belonging to the Expectations module 6. It contains utility and learning parameters. To use it as the advanced applet for immediate predictions, put the value(s) for the number of episodes at one.
Stratified sampling from two subdomains uses the specifications of test one and two as the specifications for domain one and two, respectively, excepting the references and the compensations. The utility function over the combined test result will use as the reference point the SUM of the two references specified. The grouping and compensation parameters of the FIRST test ONLY will define the grouping and compensation on the utility function over the combined test result.
Mail your opinion, suggestions, critique, experience on/with the SPA