In Tj. Plomp, J. M. Pieters & A. Feteris (Eds.), European Conference on Educational Research (pp. pp. 701-704.). Enschede: University of Twente.


1992 European Conference on Educational Research (ORD) Enschede abstract + summary + paper + presentation + sheets

modelling the connection between individual behaviour and macro-level outputs

Ben Wilbrink

Center for Educational Research, University of Amsterdam.

summary

Understanding grade retention, drop-out and study-delay as system rigidities.

abstract
Certain anomalies in education are extremely resistant to change, as is certainly the case for grade retention in secondary education and for attrition and study-delay in higher education. Present-day research methodology and data analysis in the educational field are not fit to the task to elucidate this kind of macro-educational phenomena. Recently Coleman (1990, Foundations of social theory. Cambridge, Massachusetts) presented a theory of social systems that connects behavior of actors (for example students and teachers) at the micro-level with phenomena occurring at the macro-level of the social system involved. The theory of Coleman has its roots in micro-economics, and is conceptually very different from traditional methodological approaches to social (and educational) phenomena. The paper explores the possibility to apply this theory to the mentioned problems in education, using empirical data (grades and time expenditure) from the first year examinations 1985-1989 of the study of law at the University of Amsterdam.


Introduction and problem definition

Major educational problems like retention and attrition rates seem to be immune against policy measures and innovations meant to reduce these rates. Traditional methodology in the field of educational research does not yield meaningful insights into the nature of and causal relations behind retention and attrition. In a famous Dutch article Posthumus (1940) presented an incisive analysis of the detrimental character of the way teachers assess and evaluate their pupils, resulting in what later came to be called “Posthumus’ Law:” no matter the changes in the educational system, teachers will label the achievements of one quarter of their pupils as not up to standard or unsatisfactory (‘onvoldoende’). In the Dutch system of secondary education the consequences are extremely serious, because it is only a small minority of students that is not retained in at least one grade (year). This particular problem however is not peculiarly Dutch, as data in the Unesco study (1980, cited in Bos, 1984, p. 6-17) demonstrate. Posthumus’ analysis is as valid today, one half century later, as it was in 1940. Educational research evidently has not been 'up to standard,' failing to make any significant inroad into this problem area.

Retention and attrition at the macro-level will not change by well-meant admonition of teachers (Posthumus), nor by major educational innovations (De Groot, 1966; Bos, 1984). It is of course possible for the individual student to steer clear of these cliffs by stepping up one's investment of time and attention. The catch is, of course, that what is possible at the individual level, is not possible at the group- or system level. That is what history teaches us, whether we understand it or not. Doornbos (1985) points out that teachers in their behavior are restrained by the characteristics of the system they are working in: te problem of retention is a problem at the level of the educational system as system. Posthumus’ Law describes the behaviour of this system: not only teacher behaviour, but also student behaviour (Wilbrink, 1985).

The general problem in the educational field is how individual behaviours connect to macro-educational characteristics, with the direction of influence obviously going both ways. A promising model is Coleman’s (1990) social system theory.

Coleman’s Social System Theory

In the field of economics the relation between behaviour of individual actors (at the micro-level) and phenomena at the macro-level is described by micro-economic (equilibrium) models. The market price of a certain good, say apples, is the resultant of many individual bargaining situations (between customers and merchants, or at the fruit auction). Market price is a macro-level concept. Individual customers are not able to change the market price of apples. Coleman’s observation is that social systems can be modelled as systems or markets where actors exchange resources, the rates of exchange being determined at the system-level.

The paper will give a summary of the mathematical development of the basic model (Coleman’s chapters 25 and 26). A small set of basic assumptions is translated into restrictions on the data, enabling solution for values of the resources involved, en the individual power of actors in the system, and their interests (utilities) in these resources.

The model is a system model because exchange of resources takes place within a specified group of actors, and every exchange is dependent on the other exchanges taking place in this system. Individual exchanges within the system are competitive. Now 'competive' is a label that excellently fits most educational systems, and certainly the analyses of Posthumus, De Groot and Doornbos all point in this direction.

the model applied to higher education data

Educational systems can be modelled by Coleman’s theory. As an example data for the first year examination in the department of Law at the University of Amsterdam will be analysed. The resources being exchanged here are marks (from teachers to students) and time spent (from students to teachers). At the system level it is evident from the raw data that students are investing in their study only half of the 1600 hours they are expected to (by law), and only a minority have passed the first year examination ‘in time.’ The situation is ‘stable,’ in the sense that over the (observed) years no appreciable changes in the statistics have taken place. The amazing thing is that the individual student is almost always able by a relatively small extra investment to pass the examination in time, should he or she wish to do so. However, the number of students wishing to do so must be small, witness the data. (The paper will present the basic statistics here.) Available data allow determination of the exchange rate of marks against time, a macro-level characteristic of this educational system, and of the interest (or utility) of every student (and teacher) in marks and in time (i.e.: in time not to be spent in study). The analysis presupposes the system to be in equilibrium (after marks have been assigned).

This kind of analysis shows why everybody in this systems acts as he or she acts, given the characteristics of the system, i.e. in this case given the exchange rate of marks and time. There are many details that might be discussed, but the main point is that this kind of analysis shows every actor within the system acting under influence of important system characteristics. Traditional techniques of data analysis (experimental and correlational methods) treat every individual observation as independent of other observations, denying from the very start the possibility of dependent, competitive, 'systemic' behaviors.

Discussion of implications and possibilities

Every policy maker interested in improvement of attrition, in higher motivation of students, or in better marks, must have insight into the ways in which the educational system influences individual behaviors en in turn is shaped by collective individual behaviour. Especially government policies are directed at effects at the system level. Failure to evaluate the effects of education as obtained under the particular constraints of the system as (traditionally) implemented will lead to failure of law-induced educational innovation.

Social system analysis using marks and time expenditure will expose the weaknesses of current methods of educational assessment. The current grading system is taken for granted by almost every author on the subject. Certainly De Groot (1966) severely criticized the Dutch grading system, but he failed to indicate the historical roots of this particular grading system. Also in the handbooks of Thorndike (1972) and Linn (1989) no mention is made of the historical underpinnings of the current marking system in the U.S.A.. Chapman (1988) does present a historical survey of the American grading system, showing intelligence testing to be an offspring of the movement of standardized testing in education. A social system where marks are exchanged against time is extremely vulnerable to the weaknesses of assessment and grading. The scale of marks is extremely coarse, not allowing any fine-tuning between amount of study time and mark received. It is possible that poor predictability of marks is the main reason for the observed exchange rate for marks against time in the law education case. Certainly the social system analysis in this case indicates there is plenty of room for improvement using modelling techniques for educational assessment as developed by Van Naerssen (1974) and Wilbrink (1978). Also in secondary education a social system analysis approach could point out the classical grading system as the main culprit in the case against (Dutch) retention rates of 15 to 25 percent every year. Remarkably, in English speaking countries there is an 'educational reform' movement in the direction of stricter grading policies (Shepard & Smith, 1989), in the direction also of more sorrow for more students.

Social system analysis re-establishes a fact that seems to be known to everyone except specialists in educational assessment: the difference between a psychological test and an educational test is that for the first the student prepares by sleeping well, and for the second by studying until a personally satisfactory level of mastery is reached. The characteristics of the educational system determine what the student thinks is a satisfactory level. Change this characteristic of the system, and the effectiveness of schooling will change.

Resemblances and differences to alternative methods (for example: van der Heijden & van der Kamp, 1991; Jackson, Brett, Sessa, Cooper, Julin & Peyronnin,1991), if useful, will be indicated.

Literature

Bos, D.J. (1984). Blijven zitten met zittenblijven? Den Haag: Stichting voor Onderzoek van het Onderwijs.

Chapman, P.D. (1988). Schools as sorters. Lewis M. Terman, applied psychology, and the intelligence testing movement, 1890-1930. New York: New York University Press.

Coleman, J. (1990). Foundations of social theory. Cambridge, Massachusetts: The Belknap Press of Harvard University Press.

Doonbos, K. (1985). Naar een macro-educatieve analyse van het zittenblijven en verwante expulsiefenomenen in het Nederlandse schoolwezen. In Wald, A. (1985). Een jaartje overdoen. Verslag van het SVO-symposium over zittenblijven in het voortgezet onderwijs. Den Haag: SVO. Lisse: Swets & Zeitlinger.

Heijden, P.G.M. van der, & van der Kamp, L.J.Th. (1991). Latente budget analyse en onderzoek naar schoolloopbanen. TOR, 16, 297-314.

Jackson, S.E., Brett, J.F., Sessa, V.I., Cooper, D.M., Julin, J.A., & Peyronnin, K. (1991). Some differences make a difference: individual dissimilarity and group heterogeneity as correlates of recruitment. promotions, and turnover. American Psychologist, 46, 675-689.

Linn, R.L. (Editor)(1989). Educational measurement; second edition. Washington, D.C.: American Council on Education.

Naerssen, R.F. van (1974). A mathematical model for the optimal use of criterion referenced tests. Nederlands Tijdschrift voor de Psychologie, 29, 431-445. pdf

Posthumus, K. (1940). Middelbaar onderwijs en schifting. De Gids, jaargang 104, deel 2, 24-42. integraal op dbnl.nl

Shepard, L.A., & Smith, M.L. (Editors)(1989). Flunking grades. Research and policies on retention. London: The Falmer Press.

Thorndike, R.L. (Editor)(1971). Educational measurement; third edition. New York: American Council on Education / Macmillan.p class='lit' Unesco (1980). Wastage in primary and general secondary education: a statistical study of trends and patterns in repetition and drop-out. Paris: Unesco Office of statistics, (CSR-E-37).

Wilbrink (1978). Studiestrategieën. Amsterdam: Stichting Centrum voor Onderwijsonderzoek. html

Wilbrink (1986). Toetsen en testen in het onderwijs. SVO Jaarboek 1985, 275-288. Den Haag: Stichting voor Onderzoek van het Onderwijs. html



1992 European Conference on Educational Research (ORD)

modelling the connection between individual behaviour and macro-level outputs.
Understanding grade retention, drop-out and study-delay as system rigidities.

Ben Wilbrink

Center for Educational Research, University of Amsterdam.
Grote Bickersstraat 72, 1013 KS Amsterdam.

motivation

Certain unwanted states of affairs in education are extremely resistant to change, as certainly is the case for grade retention in secondary education and for attrition and study-delay in higher education. The individual student might prevent flunking of grades or failing the examination by stepping up his or her investment of time and attention. The catch, of course, is that what is possible at the individual level, is not possible at the group- or system level. This is what history teaches us, whether we understand it or not. Doornbos (1985) points out that teachers in their behavior are restrained by the characteristics of the system they are working in: the problem of retention is a problem at the level of the educational system as system. Evaluation of education should reflect this systemic character of education, and of most or all of its characteristics that currently are proposed as performance indicators. The problem then is that there are no known methods of data analysis or educational measurement that are fit to the task to elucidate this kind of systemic phenomena. Standard statistical methods assume independent observations, also in causal relations, in fact assuming that causal relations existing at the individual level are valid also for massive behavior changes. Because there are no adequate evaluaton instruments, it is not possible for the educational researcher to give policy makers a handle on this kind of systemic problem. The general problem in the educational field is how individual behaviours connect to macro-educational characteristics, with the direction of influence obviously going both ways. Recently Coleman (1990) presented a theory of social systems that connects behavior of actors (for example students and teachers) at the micro-level with phenomena occurring at the macro-level of the social system involved. The theory of Coleman has its roots in micro-economics, and is conceptually very different from traditional methodological approaches to social (and educational) phenomena. The paper explores the possibility to apply this theory to data from the Department of Law of the University of Amsterdam. In a companion paper in the section Higher Education this application and its implications for educational policy are reported on more fully.

Coleman’s modelling approach explained

I will explain Coleman’s approach in the terms of the particular example used, the first year of the study in Law. The basic idea is that students within a social system exchange time spent in study for marks received in ways that maximize satisfaction with the total number of marks received and the time kept for private use. Coleman uses the basic premise of micro-economic theory about human behavior: the higher the marks obtained, while remaining at the same level of satisfaction (because of having less of something else, private time in this case), the less of the still available time he will be giving up to get still higher marks. Coleman uses this and other strong assumptions from micro-economic theory to model the transactions taking place in the educational system, and to present a mathematical structure isomorphic to the structure of transactions. Given empirical data like marks obtained and time spent, the model allows estimation of a conceptually very different set of variables: the interests in marks obtained and interests in alternative uses of their time are characteristics of individual students and teachers, the exchange rate of marks and time is a characteristic of the educational system.

perfect market
The assumption is made that there is a competitive market, and therefore a common rate of exchange for all students and teachers. The assumption is necessary to be able to estimate the rate of exchange, given data on time spent and marks obtained. The relative values for the two resources in our market, time and marks, might be .6 and .4, under the constraint that their sum is 1. Before exchange the teachers together control the whole budget of available marks, the students still control all of their time budget. I will say that the power of the teachers before exchange is .4, i.e. equal to the budget of marks evaluated against their market price. The power of the students .6. In a competitive market the power of the actors before and after exchange will remain the same. Because everyone is trading at the same market prices, and because these maket prices are competitive, nobody wins or loses. Of course in reality some actors do win or lose. The values for time and marks may be determined by minimizing the violations of the assumption of a competitive market. Coleman (chapters 25 and 26) presents the mathematics for minimizing the sum of squared errors, an error being the difference of the power of a student before and after exchange.

The basic idea is that marks and time have a price lable attached to them, the prices or values having established themselves over many years, and that the prices or the exchange rate can be estimated given an appriopriate set of data.

satisfaction
If nobody wins or loses on this market, unles it is an error or accident, why is that students and teachers still are in business? The answer is that people maximize their satisfaction, or utility if you want to call it that. Students can gain in satisfaction by exchanging an appriopriate amount of their time against an apprioriate mark earned on a particular test.

It would be nice to have a mathemtical expression for this satisfaction. Coleman’s choice is an analogue to the Cobb-Douglas production function

        U = Π cij xji

where cij is the control person i has over good j, and xij is the interest person i has in good j; controls as well as interests are constrained to sum to 1. The Cobb-Douglas production function as such may be applied in the educational setting (see Polachek, Kniesner, & Harwood, 1978), Coleman uses it as a utility function. Now this utility function is remarkably different from the usual type of utility function in test-based decision making (van der Linden, 1991, reviews the latter). For one thing, it is possible to determine this function using empirical data on time spent and marks received. For every student the relative interest in time and in marks can be derived from available data; the controls of course are the number of hours the student has managed to keep for himself, and the marks he has obtained, expressed as proportions of the total budget available. These interests are characteristics of the individual person, knowledge of these interests might be of interest to educational policy makers (Coleman, p. 718).

application to higher education data

This kind of analysis shows every actor within the system acting under influence of important system characteristics. Traditional techniques of data analysis (experimental and correlational methods) treat every individual observation as independent of other observations, denying from the very start the possibility of dependent, competitive, 'systemic' behaviors.

Table 1 gives the results of a Coleman-type of system analysis on the six tests of the first year examination. Here the six tests together are taken to be the system. Every test is represented with seven parallel forms because the data are assembled in the seven courses from 1983 to 1988. The 179 students are not a representative sample: they are chosen because they participated in all six tests and complete filled out the six questionnaires. The results for this group of students are not valid for first year law students in general. The mean mark received by this select group is 6.63, an outstandingly high value, I can assure you.

Table 1
Analysis of a six-test educational system: the first year Law examination 1983 - 1988

__________________________________________________________
                   mean      |  sum                      n
                  ---------- | ------------------------     
                             |  control
                             | -----------
actors            mark  time | marks time power error2 
----------------------------------------------------------
students         6.63  783   | .626  .544 .579     44  179
teachers         3.37  657   | .374  .456 .421    802    6
__________________________________________________________


Note. The values for marks and time are respectively .429 en .571. Error2 is times 106.

Students control .626 of the total budget of marks. This proportion is not equal to the mean mark received because of a peculiarity of many marking systems. The scale of marks starts at 1, not at zero. Only nine points can be earned. The control accordingly is ( 6.63 - 1 ) / 9 = .626. Teachers control the complement of this, the marks they have not handed out. The students are able to keep a high proportion of the time budget to themselves. In this market the power of the students before exchange is euqual to the value of time, because they control the whole budget of, in this case, 1440 hours; after exchanges the power of the students is .579, very close to the initial power. Remember that the students’ power is the sum of the marks en time he controls evaluated at market prices, in this case the exchange rate of .429 versus .571. Does the model fit the data? The assumption is that for every actor his power before and after exchange will be the same. Table 1 gives the sum of squared errors that is minimized by the the values for marks and time respectively of .429 and .571. Table 2 presents a breakdown of the errors for the six tests or teachers.

Nothing new is revealed yet, but then in Table 1 only the assumption of a perfect market is used, that is the assumption of equal power before and after exchange. Assuming every actor has maximized his satisfaction, his relative interests in marks and time can be estimated. For the six teachers the interests and satisfaction are tabulated in Table 2. There is no equity in the satisfaction obtained by the six teachers, 'general introduction' obtaining a much higher satisfaction than ‘penitentiary law,’ the ratios between them are nearly the same as the ratios between their powers. Is there an explanation? Students have a time budget of 934 hours, they are not constrained to spend a certain proportion of it on first or on any other test, so the way they distribute their time over the six tests could be highly revealing. The available time as programmed by the Department of Law is given in the last column in Table 1. Now it is clear the big loser is penitentiary law; it is the last test, many students might not be interested investing the time to obtain yet another high mark they do not need to pass the examination. On the whole, however, the power of the six teachers is highly correlated with the number of hours programmed for the course. The explanation might be that in the regulation for this examination the marks for the tests are weighted approximately in the same proportion as the time budgets.

Table 2
The six-test educational system: results for the tests

________________________________________________________------------__
         			  mean       |  interest   power error2 satis-  time
                    ---------- | -----------              faction pro-  
                               |                                 gram-
test / teacher      mark  time |  marks time                       med
----------------------------------------------------------------------
general introduct.  3.15  161  |  .281 .719  .089    298  .093     320
constitutional law  3.32  121  |  .354 .646  .075      9  .075     240
private law         3.55  126  |  .360 .640  .078     45  .079     240
history of law      3.66   90  |  .448 .552  .065     43  .065     200
sociology of law    3.55   70  |  .503 .497  .056    235  .057     200
penitentiary law    2.97   88  |  .403 .597  .058    171  .058     240
_________________________________________________________________-----

Note. The values for marks and time are respectively .429 en .571. Error2 is times 106.

validity of the new concepts and 'interest' and 'satisfaction'

In Table 2 the interests and satisfactions of teachers were presented, and were shown to be related to particular characteristics of the tests that were not incorporated in the model, as for example penitentiary law being the last test in a series. The interests and satisfactions of the students are constructed variables that have discriminant validity, see the multitrait-multimethod lay-out (Campbell & Fiske, 1959) in Table 3. Table 3 is based on the results of six separate analyses, each of the tests being analysed as a separate educational system. In this way there are six estimates of the interest of every student in marks obtained. In the two-variable system analysis the interest in time is simply the complement of the interest in marks, and is not tabulated.

There is a new empirical variable in Table 3, it is the expected mark, as reported by every student just before taking the particular test. It is not the expected mark itself, but a corrected expectation that is used in the analysis, because there is a somewhat limited varibility in the expected marks as reported by the students. The correction is the mean difference between expected and obtained marks for the particular student. The motivation for the correction is threefold. (1) In a technical sense this correction would not influence the correlation of expected and obtained marks of the individal student over the series of six tests; this correlation is an important indicator of the ability of the student to predict his test results. (2) The second reason is that some students knowingly or unknowingly tend to systematically under- or overestimate their expected marks, legitimating a correction for this personal tendency. (3) The third and most relevant reason is that the system analysis might be done on the basis of expected marks in stead of obtained marks, and then the analysis based on corrected expected marks results in higher validities (Table 4).

Expected and obtained marks are two manifestations of the same construct. The obtained marks are the pure manifestations of this construct because of their legal meaning. The rather low correlations for obtained marks in the validity diagonals, contrasted with the high correlations for expected marks, should have the attention of the examination committee of the Department of Law. The total time is the sum of answers to five questions in the survey: time spent in direct preparation for the test, in attending lectures, in preparing for lectures, in attending seminars, and in preparing for seminars. Although there are substantial correlations between time spent on the different tests, the correlations with obtained marks as well as with expected marks are characteristically low, indicating that there must be an interaction with intellectual capacity.

Satisfaction should correlate with marks obtained and correlate negatively with time spent to study, and so it does (see Table 3, correlatons within tests). There is pattern of satisfaction being less determined by marks obtained as the year is progressing.

Interest in marks obtained is correlated .7 with these marks, and is correlated .5 with (corrected) marks expected.

Campbell and Fiske's criteria for construct validity are met: (1) the bold printed entries in the validity diagonals are significantly different from zero (convergent validity), (2) with the already commented upon exception of the values for marks obtained they are higher than the values lying in its column and row in the 'heterotrait-heteromethod' triangles adjacent to the validity diagonals, (3) with a number of exceptions they are higher than the corresponding correlations in the hetrotrait-monomethod triangles, and (4) the same pattern of interrelationship is shown in all of the heterotrait triangles of both the monomethod and heteromethod blocks. The exceptions regarding criterion 3 are an artefact of the method of estimating interests and satisfactions, this method necessarily resulting in hogh correlations with marks obtained.

Table 3
Relations between empirical variables and the constructs 'interest and 'satisfaction'

__________________________________________________________________________________________________________________________________________
test/   m.   s.d.    general introduction constitutional law       private law        history of law      sociology of law    penitentiary law           
vari-               -------------------  -------------------  -------------------  -------------------  -------------------  ---------------
able                  1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   
_____________________________________________________________________________________________________________________________________________

general introduction 
 1   6.85    1.39  
 2   6.80    1.26     67
 3 161      47        21  23
 4   0.507   0.118    69  55  82
 5   0.0032  0.0005   69  44 -46  11
constitutional law 
 1   6.68    1.47     49  71  27  46  25
 2   6.63    1.27     57  80  23  49  34   72
 3 121      42        15  21  72  58 -30   26  21
 4   0.507   0.141    36  52  67  69 -06   66  50  86
 5   0.0031  0.0006   34  46 -22  09  51   61  46 -47  01
private law        
 1   6.45    1.53     43  70  14  31  26   54  66  24  39  22
 2   6.62    1.31     66  86  23  53  41   73  82  21  50  46   76
 3 126      46        10  21  70  54 -36   18  18  75  61 -38   25  21
 4   0.520   0.150    29  50  61  60 -11   38  45  69  67 -14   63  50  88
 5   0.0030  0.0006   27  40 -43 -13  56   29  40 -40 -13  57   53  41 -61 -21
history of law     
 1   6.34    1.61     49  73  16  41  31   54  73  10  33  40   53  76  12  34  34
 2   6.44    1.27     60  81  27  53  33   68  74  23  49  40   65  83  21  46  36   80
 3  90      37        15  22  67  55 -29   20  19  69  59 -29   21  23  77  68 -43   24  27
 4   0.439   0.143    39  57  55  64  01   44  55  51  60  08   41  57  57  64 -08   71  62  81
 5   0.0032  0.0006   20  32 -42 -14  50   21  34 -49 -23  59   20  34 -58 -33  67   47  32 -68 -16
sociology of law   
 1   6.45    1.37     43  70  11  34  34   52  68  13  37  38   48  70  09  31  34   55  66  16  42  27
 2   6.44    1.24     58  72  16  44  41   67  64  20  46  41   64  75  13  40  41   66  71  25  53  26   73
 3  70      34        05  19  60  46 -29   17  15  69  59 -30   04  13  74  61 -51   14  19  72  57 -51   16  14
 4   0.344   0.104    29  53  49  54  02   42  50  56  65  06   30  48  56  61 -13   40  51  59  65 -18   69  51  78
 5   0.0035  0.0006   16  15 -48 -24  45   08  17 -57 -36  50   20  21 -65 -40  68   12  13 -60 -33  63   31  20 -87 -39
penitentiary law   
 1   7.03    1.67     31  65  22  35  17   48  65  18  37  31   45  68  15  32  26   51  65  18  44  23   52  59  21  44  04
 2   6.88    1.28     60  73  19  47  41   70  75  17  46  47   66  82  12  40  43   66  72  20  51  31   65  74  11  45  21   73
 3  88      39        08  17  55  41 -27   17  12  68  55 -34   14  14  72  61 -45   07  16  69  49 -52   01  10  72  52 -64   16  06
 4   0.378   0.112    29  54  47  51  01   43  51  53  61  05   36  52  53  58 -07   38  53  55  62 -15   36  45  60  67 -35   73  52  74
 5   0.0036  0.0006   17  28 -32 -08  41   17  34 -46 -20  56   16  32 -52 -30  60   29  28 -46 -09  65   37  31 -46 -09  61   48  43 -74 -13
_____________________________________________________________________________________________________________________________________________

Note. 1 = mark obtained on scale 1-10, 2 = corrected mark expected on the same scale, 3 = total of hours spent, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, m. = mean, s.d. = standard deviation; correlations are written without decimals; expected marks are corrected by the mean difference between marks obtained and marks expected on the six tests.

Table 4
Relations of the empirical variables and the constructs 'interest and 'satisfaction' estimated on the basis of (corrected) marks expected

_____________________________________________________________________________________________________________________________________________
test/   m.   s.d.    general introduction constitutional law       private law        history of law      sociology of law    penitentiary law           
vari-               -------------------  -------------------  -------------------  -------------------  -------------------  ----------------
able                  1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   
_____________________________________________________________________________________________________________________________________________
general introduction 
 1   6.85    1.39  
 2   6.80    1.26     67
 3 161      47        21  23
 4   0.508   0.115    51  70  82
 5   0.0031  0.0005   39  66 -46  09
constitutional law 
 1   6.68    1.47     49  71  27  56  39
 2   6.63    1.27     57  80  23  59  50   72
 3 121      42        15  21  72  61 -27   26  21
 4   0.510   0.134    39  54  67  79  06   51  61  87
 5   0.0031  0.0006   34  47 -29  10  68   34  58 -55 -10
private law        
 1   6.45    1.53     43  70  14  43  45   54  66  24  43  27
 2   6.62    1.31     66  86  23  61  54   73  82  21  52  46   76
 3 126      46        10  21  70  59 -28   18  18  75  64 -42   25  21
 4   0.532   0.139    37  53  66  75  04   44  48  69  75 -10   46  57  89
 5   0.0030  0.0006   42  48 -31  07  70   42  48 -38 -04  76   31  55 -61 -23
history of law     
 1   6.34    1.61     49  73  16  51  49   54  73  10  40  47   53  76  12  41  47
 2   6.44    1.27     60  81  27  61  47   68  74  23  50  39   65  83  21  50  45   80
 3  90      37        15  22  67  58 -24   20  19  69  61 -32   21  23  77  71 -40   24  27
 4   0.450   0.131    39  54  64  75  08   46  49  61  71  00   43  54  66  77 -05   53  65  87
 5   0.0032  0.0006   25  33 -40 -08  59   26  32 -45 -19  63   22  33 -58 -31  74   28  38 -73 -32
sociology of law   
 1   6.45    1.37     43  70  11  46  54   52  68  13  42  45   48  70  09  38  48   55  66  16  42  30
 2   6.44    1.24     58  72  16  50  50   67  64  20  44  34   64  75  13  41  47   66  71  25  50  23   73
 3  70      34        05  19  60  53 -20   17  15  69  61 -35   04  13  74  67 -41   14  19  72  63 -51   16  14
 4   0.347   0.098    35  53  56  70  15   48  46  64  73 -03   35  49  62  73 -02   44  52  67  76 -23   51  62  83
 5   0.0035  0.0006   20  14 -48 -26  43   14  14 -56 -37  52   24  21 -65 -46  64   15  13 -58 -37  62   17  30 -89 -49
penitentiary law   
 1   7.03    1.67     31  65  22  52  42   48  65  18  43  38   45  68  15  40  40   51  65  18  45  28   52  59  21  47  06
 2   6.88    1.28     60  73  19  53  48   70  75  17  46  44   66  82  12  43  52   66  72  20  47  29   65  74  11  48  23   73
 3  88      39        08  17  55  46 -22   17  12  68  55 -41   14  14  72  63 -42   07  16  69  58 -50   01  10  72  61 -62   16  06
 4   0.377   0.097    42  55  51  66  16   52  53  59  70  02   45  55  58  72  03   42  52  62  72 -17   39  48  62  78 -34   52  60  79
 5   0.0036  0.0006   27  24 -37 -10  49   22  30 -50 -21  66   21  31 -57 -31  71   30  24 -49 -23  64   36  31 -54 -23  66   23  47 -82 -30
_____________________________________________________________________________________________________________________________________________

Note. see the note under Table 3.

Clearly the theoretical constructs ‘interest in marks obtained’ and ‘satisfaction’ have construct validity. More importantly, they have discriminant validity in relation to the three empirical variables. As such they carry the promise of being a significant new instrument for the educational researcher.

model fit using expected instead of obtained marks

Students exchange their time against marks, but obviously in this continuous proces of exchange the only thing they can get in direct return is a higher expectation for the mark to be received for the test. The model should fit better using the expected marks than using the marks that in fact are obtained. Because empirical data on expected marks are available, the prediction can be tested. Two criteria for fit of the model are: (1) the validity of the new constructs ‘interests’ and ‘satisfactions,’ and (2) the sum of squared errors. Table 4 presents the data on validity, revealing distinctly higher validities for interests and satisfactions, and less violations of Campbell Fiske’s third criterion.

Table 5
Error terms using obtained or expected marks

_________________________________________________________________
system / actors     sum error^2 for analysis based on           n
                        -----------------------------------------
	                  marks obtained    marks expected
                                       ----------------------
                                       corrected uncorrected
-----------------------------------------------------------------
6-test system             85           83         75          185
        students                4             4          4    179
        teachers (tests)       80            78         71      6
-----------------------------------------------------------------
every test a system
-------------------
  general introduction     5            5          4          180
  constitutional law      14           13         13          180
  private law             47           14         14          180
  history of law          62           33         36          180
  sociology of law        32           27         29          180
  penitentiary law        12            8          8          180
_________________________________________________________________ 

Note. All sums are multiplied by 105. Under ‘every test a system’ the sum of squared errors for students as well teachers is given.

The summary statistics for the analysis with (corrected) expected marks are very close to the results of the previous analysis with obtained marks: the exchange rate of expected marks and time is .421 / .579. The sum of squared errors is smaller now, as predicted; Table 5 summarizes the results of both analyses. Remember that error is the difference beween power before and after exchange, the assumption being that in a perfect market the individual assets evaluated at market prices remain the same. The correction of expected marks, being based on the mean of the marks obtained over six tests, through the back door introduces the obtained marks again; therefore the analyses were repeated using the uncorrected expected marks. The reduction in errors is very marked for corrected as well as uncorrected expected marks. The mtmm-matrix, not presented here, shows validities that are lower than the validities presented in Table 4 for the analysis based on corrected expected marks.

The test as clearing house and the quality of the reproduction of interests

What is the quality of the estimation of interests using the Coleman model? To answer this question a simulation study is done to see how good the Coleman model is at reproducing known interests. When the interests are known, for example by specifying a certain distribution for these interests, it is possible to determine the exchange rate of marks and time as well as the combination of expected mark and time spent in study that maximizes the satisfaction for the particular student (Coleman, p. 684 and 676 respectively, presents the mathematics). I will use the values .39 and .61 respectively for obtained marks and time spent in extra-curricular activities. Interests for time are generated by pseudo-random sampling from the beta distribution with parameters 12 and 8, having mean .6 and variance 2/175 ( the pseudo-random number generator ran3 and procedures for sampling from the beta distribution used are given by Press, Flannery, Teukolsky, & Vetterling (1989)). All individual differences between students are absorbed in the differences in interests; think of intellectual capabilities, study habits, social economic background, educational background, and aspiration level.

The next step is constructive: given the expected mark, a process must be specified to generate an obtained mark. A crucial characteristic of the educational system is that the test functions as a clearing house for the exchange of marks and time. This clearing house can be modelled using a binomial model (Wilbrink, 1978, van den Brink, 1982) for the generation of test scores given the true mastery on the domain of questions that is sampled by the test. The crucial step then is from the expected mark to the true mastery. The trick here is to suppose the expected mark is the based on the score on a pretest sampled form the same domain of questions the test is sampled from. A beta density Beta(a,b) for the true mastery can then be specified, and sampled for the true mastery that will generate the score on the test. ( a + b = number of items in pretest, a / ( a + b ) = score on pretest; for the details see Wilbrink, 1978).

Table 6
Simulation results 100 runs, 100 students per run, both test and pretest 25 or 50 items

_____________________________________________________________________
vari- mean  s.d.   
able                    1            2             3             4     
______________________________________________________________________________
test and pretest 25 items
-----------------------------------------------------------------------------
1   6.612   .206
2   6.571   .166   .672 (.0877)
3  79.108  2.308   .740 (.0462)  .871 (.1434)
4    .398   .0121  .970 (.0073)  .766 (.1145)  .869 (.0278)
5    .0062  .0001  .518 (.1057) -.064 (.1005) -.066 (.1036)  .325 (.1125)
==============================================================================
test and pretest 50 items
-----------------------------------------------------------------------------
1   6.646   .186
2   6.590   .162	 .751 (.0953)
3  79.634  2.300	 .844 (.0309)  .859 (.1385)
4    .404   .0117	 .980 (.0048)  .806 (.1183)  .928 (.0163)
5    .0062  .0001  .451 (.1130) -.012 (.1223)  .013 (.1236)  .298 (.1181)
_____________________________________________________________________-----

Note. 1 = obtained marks, 2 = expected marks, 3 = hours spent, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, mean = mean of 100 samples, s.d. = standard deviation of the sample means, cross-tabulated are the means of the product moment correlations and their standard deviations. Start values: mean interest in marks .4, in time .6, variance .0115; computed: value for marks .39, time .61.


The results of the simulation stydy presented in Table 6 demonstrate that interests are estimated quite well, at least under the conditions established for the simulation. The mean of the interests is replicated, with a small standard deviation for samples of 100. The correlation between expected and obtained marks increases from .67 to .75 when test and pretest length are doubled from 25 to 50 items.

discussion

The results presented establish Colema's social system theory as a potentially useful tool for the educational researcher concerned with the kind of problems mentioned in the subtitle of the paper: grade retention, drop-out and study-delays. Using Coleman’s model it will be possible to use micro-economic theory to derive predictions of which measures might have effect on the relative interests of students for marks and for time. Economic theory predicts that a 10% increase in the time budget will be distributed over time spent in study and time spent in extracurricular activities in the same proportion as applies to the old budget (income elasticity is 1). Teachers could decide to grade less lenient, also in this case students will correct their aspiration level for marks in such a way that they have to spend the same proportion of their time budget as before. It is evident much more drastic measures have to be taken to effect any substantial change in behavior of students as wel as teachers, measures that inflence directly the interests themselves.

A social system where marks are exchanged against time is extremely vulnerable to the weaknesses of assessment and grading. The scale of marks is extremely coarse, not allowing any fine-tuning between amount of study time and mark received. Educational testing itself functions as a kind of 'clearing house', to make possible the exchange of time and marks, marks being the valued scores of the tests used. Regrettably educational measurement experts, with few exceptions, as for example Van Naerssen’s (1974) work on models of strategic study behavior of students, are not concerned with this ‘clearing house’ function of tests. The whole of item response theory is irrelevant to this ‘clearing house’ function of educational testing. Poor predictability of marks (predictability by the student is meant here) quite possibly is one of the reasons for the observed exchange rate for marks against time in the law education case, where students are able to keep a vey substantial part of their time budget to spend it on extra-curricular activities.

Social system analysis re-establishes a fact that seems to be known to everyone except specialists in educational assessment: the difference between a psychological test and an educational test is that for the first the student prepares by sleeping well, and for the second by studying until a personally satisfactory level of mastery is reached. The characteristics of the educational system determine what the student thinks is a satisfactory level. Change this characteristic of the system, and the effectiveness of schooling will change.

ECER, june 24 1992

Literature

Aiken, L. R., Jr. (1963). The grading behavior of a college faculty. Educational and Psychological Measurement, 23, 319-322.

Brink, W. P. van den (1982). Binomiale modellen in de testleer. Proefschrift Universiteit van Amsterdam.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Coleman, J. (1990). Foundations of social theory. Cambridge, Massachusetts: The Belknap Press of Harvard University Press.

Doonbos, K. (1985). Naar een macro-educatieve analyse van het zittenblijven en verwante expulsiefenomenen in het Nederlandse schoolwezen. In Wald, A., Een jaartje overdoen. Verslag van het SVO-symposium over zittenblijven in het voortgezet onderwijs(pp. 35-67). Lisse: Swets & Zeitlinger.

Kula, W. (1986). Measures and men. Princeton, New Jersey: Princeton University Press.

Linden, W. J. van der (1991). Applications of decision theory to testbased decision making. In Hambleton, R.K., & Zaal, J. N. Advances in educational and psychological testing: theory and applications. Dordrecht: Kluwer, 129-156.

Naerssen, R. F. van (1974). A mathematical model for the optimal use of criterion referenced tests. Nederlands Tijdschrift voor de Psychologie, 29, 431-445. pdf

Polachek, S. W., Kniesner, T. J.,, & Harwood, H. J. (1978). Educational production functions. Journal of Educational Statistics, 3, 209-231.

Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes in Pascal; the art of scientific computing. London: Cambridge University Press.

Wilbrink, B (1978). Studiestrategieën. Amsterdam: Center for Educational Research. pdf


Table 1 (Test prep time version)
Analysis of a six-test educational system: the first year Law examination 1983 - 1988

_____________________________________________________________________
                    mean      |  sum                      n
                   ---------- | ------------------------     
                              |  control
                              | -----------
actors              mark  time|  marks time power error2 
--------------------------------------------------------------
students            6.63  564 |  .626 .604  .613   5065  179
teachers            3.37  370 |  .374 .396  .387  38080    6
_____________________________________________________________________
                              |  interest               satisf.  time
                              | -----------                      pro-
test / teacher                |  marks time                   grammed
---------------------------------------------------------------------
general introduct.  3.15   79 |  .307 .693  .074   8210  .0753    180
constitutional law  3.32   63 |  .371 .629  .065      8  .0649    170
private law         3.55   72 |  .352 .648  .073   6007  .0731    147
history of law      3.66   61 |  .398 .602  .067    198  .0665    160
sociology of law    3.55   48 |  .450 .550  .057   6483  .0575    171
penitentiary law    2.97   47 |  .413 .587  .052  17173  .0521    192
_____________________________________________________________________

Note. The values for marks and time are respectively .391 en .609. Error2 is times 108. (* Contributions to the time budget are supposed to be (net values) 180, 170, 147, 160, 171 and 192 hours, totalling 934 hours.*)

Table 2 (test prep time version-versie)
Relations between empirical variables and the constructs 'interest and 'satisfaction'

_____________________________________________________________________________________________________________________________________________
test/   m.   s.d.  general introduction constitutional law       private law        history of law      sociology of law    penitentiary law           
vari-              -------------------  -------------------  -------------------  -------------------  -------------------  ------------------
able                1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5    1   2   3   4   5 
_____________________________________________________________________________________________________________________________________________
general introduction       
1  6.85    1.39  
2  6.80    1.26     67
3 79      36        15  18
4  0.411   0.129    58  46  84
5  0.0035  0.0007   46  28 -70 -23
constitutional law        
1  6.68    1.47     49  71  23  40  11
2  6.63    1.27     57  80  19  42  19   72
3 63      32        15  19  63  56 -35   24  21
4  0.419   0.161    34  45  60  67 -17   57  44  87
5  0.0034  0.0007   24  32 -26  01  45   40  29 -60 -17
private law       
1  6.45    1.53     43  70  12  26  13   54  66  21  32  08
2  6.62    1.31     66  86  17  43  25   73  82  20  44  30   76
3 72      33        10  21  54  46 -37   18  21  63  51 -42   30  22
4  0.528   0.189    22  38  51  50 -25   31  36  62  57 -29   52  40  93
5  0.0030  0.0007   24  37 -24 -05  41   25  33 -21  00  44   41  34 -52 -23
history of law 
1  6.34    1.61     49  73  13  33  17   54  73  11  28  25   53  76  15  26  26
2  6.44    1.27     60  81  21  43  18   68  74  22  43  24   65  83  22  36  30   80
3 61      28        14  21  58  50 -37   20  23  61  53 -35   21  23  68  61 -34   25  25
4  0.426   0.161    34  49  49  56 -16   41  51  46  52 -05   37  51  53  54 -07   64  52  85
5  0.0032  0.0007   20  30 -35 -13  47   21  28 -44 -21  58   16  30 -52 -36  61   40  26 -65 -20
sociology of law   
1  6.45    1.37     43  70  07  28  23   52  68  12  33  29   48  70  11  25  32   55  66  18  39  24
2  6.44    1.24     58  72  14  38  24   67  64  20  41  26   64  75  14  30  35   66  71  27  52  26   73
3 48      26        04  16  48  39 -32   14  14  63  55 -37   04  11  65  58 -37   10  16  65  49 -51   16  12
4  0.341   0.129    23  43  36  41 -09   35  42  54  62 -08   24  38  49  52 -05   30  41  55  55 -23   57  39  82
5  0.0036  0.0007   21  19 -36 -15  43   13  22 -43 -22  55   20  24 -53 -37  59   15  17 -47 -22  62   33  21 -76 -29
penitentiary law         
1  7.03    1.67     31  65  21  34  06   48  65  20  36  17   45  68  15  25  24   51  65  21  41  18   52  59  19  36  07
2  6.88    1.28     60  73  20  43  20   70  75  21  43  27   66  82  17  33  33   66  72  25  51  27   65  74  12  39  22   73
3 47      25        07  17  36  30 -23   18  15  57  49 -35   11  14  58  52 -25   12  17  53  42 -34   05  12  70  59 -46   19  13
4  0.307   0.116    27  50  36  43 -05   39  49  49  57 -04   33  48  45  49  09   37  48  47  54 -06   37  42  59  69 -15   66  50  78
5  0.0039  0.0006   20  26 -12  05  29   14  30 -34 -14  50   18  31 -39 -25  51   21  25 -28 -03  51   34  28 -43 -14  58   38  33 -66 -10
_____________________________________________________________________________________________________________________________________________


Note. 1 = mark obtained on scale 1-10, 2 = corrected mark expected on the same scale, 3 = hours spent in preparation, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, m. = mean, s.d. = standard deviation; ccorrelations are written without decimals; expected marks are corrected by the mean difference between marks obtained and marks expected on the six tests.


Table 5 (extended)
Error terms using obtained or expected marks

_____________________________________________________________________
system / actors        sum error^2 for analysis based on           n
                        -------------------------------------
	                  marks obtained    marks expected
                                       ----------------------
                                       corrected uncorrected
---------------------------------------------------------------------
6-test system             8455         8254       7476          185
        students                437          437        365     179
        teachers (tests)       8018          7817       7111      6
---------------------------------------------------------------------
every test a system
-------------------
    general introduction   502          458        388          180
        students                502          458        388     179
        teachers (tests)          0            0          0       1

    constitutional law     1371        1265       1252          180
        students                630          617        537     179
        teachers (tests)        741          648        715       1

    private law            4673        1422       1384          180
        students                740          696        584     179
        teachers (tests)       3933          726        800       1

    history of law         6196        3339       3579          180
        students                786          678        617     179
        teachers (tests)       5410         2661       2962       1

    sociology of law       3243        2685       2877          180
        students                757          741        688     179
        teachers (tests)       2486         1944       2189       1

    penitentiary law       1232         814        766          180
        students                722          701        638     179
        teachers (tests)        510          113        128       1
_____________________________________________________________________


Note. All sums are multiplied by 107. Under 'every test a system' the entries essentially are the sum of squared errors for students, the errors for the teachers being negligible.


modelling the connection between individual behaviour and macro-level outputs
Understanding grade retention, drop-out and study-delay as system rigidities

Ben Wilbrink

[To read]

The general problem in the educational field is how individual behaviours of students and teachers connect to characteristics of the educational system such as the attrition rate, with the direction of influence obviously going both ways. Recently Coleman (1990) presented a theory of social systems that connects behavior of individual actors with phenomena occurring at the system-level of the social system involved. The theory of Coleman has its roots in micro-economics, and is conceptually very different from traditional methodological approaches to social and educational phenomena. The competitive market in micro-economics is one of the few models where behaviours at different levels of aggregation are meaningfully connected with each other, so why not try to model educational system like market places where students and teachers exchange their resources, the resources of course being time and marks respectively. The paper explores the possibility to apply this model to educational data, in this case data from the Department of Law of the University of Amsterdam. In a companion paper in the section Higher Education this application and its implications for educational policy are reported on more fully.

Coleman’s modelling approach explained

I will explain Coleman’s approach in the terms of the particular example used, the first year of the study in Law. The basic idea is that students within a social system exchange their time for marks in ways that maximize satisfaction with the total number of marks received and the time kept for private use. Coleman uses the basic premise of micro-economic theory about human behavior: the higher the marks obtained, the less of the still available time the student will be giving up to get still higher marks. Coleman uses this and other strong assumptions from micro-economic theory to model the transactions taking place in the educational system, and to present a mathematical structure isomorphic to the structure of transactions. Given empirical data like marks obtained and time spent, the model allows estimation of a conceptually very different set of variables: the interests in marks obtained and interests in alternative uses of their time are characteristics of individual students and teachers, the exchange rate of marks and time is a characteristic of the educational system.

perfect market
The assumption is made that there is a competitive market, and therefore a common rate of exchange for all students and teachers. The assumption is necessary to be able to estimate the rate of exchange, given data on time spent and marks obtained. The relative values for the two resources in our market, time and marks, might be .6 and .4, under the constraint that their sum is 1. Before exchange the teachers together control the whole budget of available marks, the students still control all of their time budget. I will say that the power of the teachers before exchange is .4, i.e. equal to the budget of marks evaluated against their market price. The power of the students is .6. The assumption of the competitive market implies that the power of students and teachers after exchange still is the same, .6 and .4 respectively. Because everyone is trading at the same market prices, and because these maket prices are competitive, nobody wins or loses. Of course in reality some actors do win or lose. The values for time and marks may be determined by minimizing the violations of the assumption of a competitive market. Coleman (chapters 25 and 26) presents the mathematics for minimizing the sum of squared errors, an error being the difference of the power of a student before and after exchange.

The basic idea is that marks and time have a price lable attached to them, the prices or values having established themselves over many years, and that the prices or the exchange rate can be estimated given an appriopriate set of data.


satisfaction
If nobody wins or loses on this market, unles it is an error or accident, why is that students and teachers still are in business? The answer is that people maximize their satisfaction, or utility if you want to call it that. Students can gain in satisfaction by exchanging an appriopriate amount of their time against an apprioriate mark earned on a particular test.

It would be nice to have a mathemtical expression for this satisfaction. Coleman’s choice is an analogue to the Cobb-Douglas production function

        U = Π cij xji

where cij is the control person i has over good j, and xij is the interest person i has in good j; controls as well as interests are constrained to sum to 1. Now this utility function is remarkably different from the usual type of utility function in test-based decision making (van der Linden, 1991, reviews the latter). For one thing, it is possible to determine this function using empirical data on time spent and marks received. For every student the relative interest in time and in marks can be derived from available data; the controls are the number of hours the student has managed to keep for himself, and the marks he has obtained, expressed as proportions of the total budget available. These interests are characteristics of the individual person, knowledge of these interests might be of interest to educational policy makers (Coleman, p. 718).

application to higher education data

This kind of analysis shows every actor within the system acting under influence of important system characteristics. Traditional techniques of data analysis (experimental and correlational methods) treat every individual observation as independent of other observations, denying from the very start the possibility of dependent, competitive, 'systemic' behaviors.

Table 1 gives the results of a Coleman-type of system analysis on the six tests of the first year examination. Here the six tests together are taken to be the system. Every test is represented with seven parallel forms because the data are assembled in the seven courses from 1983 to 1988. The 179 students are not a representative sample: they are chosen because they participated in all six tests and complete filled out the six questionnaires. The results for this group of students are not valid for first year law students in general. The mean mark received by this select group is 6.63, an outstandingly high value, I can assure you.

Table 1
Analysis of a six-test educational system: the first year Law examination 1983 - 1988

_________________________________________________________________
                    mean      |  sum                      n
                   ---------- | ------------------------     
                              |  control
                              | -----------
actors              mark  time|  marks time power error2 
--------------------------------------------------------------
students            6.63  783 |  .626 .544  .579     44  179
teachers            3.37  657 |  .374 .456  .421    802    6
_________________________________________________________________


Note. The values for marks and time are respectively .429 en .571. Error2 is times 106.

Students control .626 of the total budget of marks. This proportion is not equal to the mean mark received because of a peculiar characteristic of the marking systems. The marking scale runs from 1 to 10, suggesting 10 points can be earned while in reality it is only 9 points. Teachers control the complement of this, the marks they have not handed out. The students are able to keep a high proportion of the time budget to themselves. In this market the power of the students before exchange is equal to the value of time .571, because they control the whole budget of, in this case, 1440 hours; after exchanges the power of the students is .579, very close to the initial power. Remember that the students’ power is the sum of the marks en time he controls evaluated at market prices, in this case the exchange rate of .429 versus .571. Does the model fit the data? The assumption is that for every actor his power before and after exchange will be the same. Table 1 gives the sum of squared errors that is minimized by the the values for marks and time respectively of .429 and .571. Table 2 presents a breakdown of the errors for the six tests or teachers.

Table 2
The six-test educational system: results for the tests

_____________________________________________________________________
                    mean      |  interest   power error2 satis-  time
                   ---------- | -----------              faction pro-  
                              |                                 gram-
test / teacher     mark  time |  marks time                       med
---------------------------------------------------------------------
general introduct.  3.15  161 |  .281 .719  .089    298  .093     320
constitutional law  3.32  121 |  .354 .646  .075      9  .075     240
private law         3.55  126 |  .360 .640  .078     45  .079     240
history of law      3.66   90 |  .448 .552  .065     43  .065     200
sociology of law    3.55   70 |  .503 .497  .056    235  .057     200
penitentiary law    2.97   88 |  .403 .597  .058    171  .058     240
_____________________________________________________________________


Note. The values for marks and time are respectively .429 en .571. Error2 is times 106.



Nothing new is revealed yet, but then in Table 1 only the assumption of a perfect market is used, that is the assumption of equal power before and after exchange. Assuming every actor has maximized his satisfaction, his relative interests in marks and time can be estimated. For the six teachers the interests and satisfaction are tabulated in Table 2. There is no equity in the satisfaction obtained by the six teachers, 'general introduction' obtaining a much higher satisfaction than 'penitentiary law,' the ratios between them are nearly the same as the ratios between their powers. Is there an explanation? Students have a time budget of 1440 hours, they are not constrained to spend a certain proportion of it on first or on any other test, so the way they distribute their time over the six tests could be highly revealing. The available time as programmed by the Department of Law is given in the last column in Table 1. Now it is clear the big loser is penitentiary law; it is the last test, many students might not be interested investing the time to obtain yet another high mark they do not need to pass the examination. On the whole, however, the power of the six teachers is highly correlated with the number of hours programmed for the course.

validity of the new concepts and 'interest' and 'satisfaction'

The interests and satisfactions are constructed variables that have discriminant validity, see the multitrait-multimethod lay-out (Campbell & Fiske, 1959) in Table 3 that gives the results for the students. Table 3 is based on the results of six separate analyses, each of the tests being analysed as a separate educational system. In this way there are six estimates of the interest of every student in marks obtained. In the two-variable system analysis the interest in time is simply the complement of the interest in marks, and is not tabulated.

There is a new empirical variable in Table 3, it is the expected mark, as reported by every student just before taking the particular test. It is not the expected mark itself, but a corrected expectation that is used in the analysis, because there is a somewhat limited varibility in the expected marks as reported by the students.

(* The correction is the mean difference between expected and obtained marks for the particular student. The motivation for the correction is threefold. (1) In a technical sense this correction would not influence the correlation of expected and obtained marks of the individal student over the series of six tests; this correlation is an important indicator of the ability of the student to predict his test results. (2) The second reason is that some students knowingly or unknowingly tend to systematically under- or overestimate their expected marks, legitimating a correction for this personal tendency. (3) The third and most relevant reason is that the system analysis might be done on the basis of expected marks in stead of obtained marks, and then the analysis based on corrected expected marks results in higher validities (Table 4). *)

(* Expected and obtained marks are two manifestations of the same construct. The obtained marks are the pure manifestations of this construct because of their legal meaning. The rather low correlations for obtained marks in the validity diagonals, contrasted with the high correlations for expected marks, should have the attention of the examination committee of the Department of Law. The total time is the sum of answers to five questions in the survey: time spent in direct preparation for the test, in attending lectures, in preparing for lectures, in attending seminars, and in preparing for seminars. Although there are substantial correlations between time spent on the different tests, the correlations with obtained marks as well as with expected marks are characteristically low, indicating that there must be an interaction with intellectual capacity. *)

Satisfaction should correlate with marks obtained and correlate negatively with time spent to study, and so it does (see Table 3, correlatons within tests). There is pattern of satisfaction being less determined by marks obtained as the year is progressing.

Interest in marks obtained is correlated .7 with these marks, and is correlated .5 with (corrected) marks expected.

Campbell and Fiske's criteria for construct validity are met.

(* (1) the bold printed entries in the validity diagonals are significantly different from zero (convergent validity), (2) with the already commented upon exception of the values for marks obtained they are higher than the values lying in its column and row in the 'heterotrait-heteromethod' triangles adjacent to the validity diagonals, (3) with a number of exceptions they are higher than the corresponding correlations in the hetrotrait-monomethod triangles, and (4) the same pattern of interrelationship is shown in all of the heterotrait triangles of both the monomethod and heteromethod blocks. The exceptions regarding criterion 3 are an artefact of the method of estimating interests and satisfactions, this method necessarily resulting in hogh correlations with marks obtained.*)

Clearly the theoretical constructs 'interest in marks obtained' and 'satisfaction' have construct validity. More importantly, they have discriminant validity in relation to the three empirical variables. As such they carry the promise of being a significant new instrument for the educational researcher.

model fit using expected instead of obtained marks

Students exchange their time against marks, but obviously in this continuous proces of exchange the only thing they can get in direct return is a higher expectation for the mark to be received for the test. The model should fit better using the expected marks than using the marks that in fact are obtained. Because empirical data on expected marks are available, the prediction can be tested. Two criteria for fit of the model are: (1) the validity of the new constructs 'interests' and 'satisfactions', and (2) the sum of squared errors. Table 4 presents the data on validity, revealing distinctly higher validities for interests and satisfactions.

Table 5 summarizes the results on the fit of both analyses. Remember that error is the difference beween power before and after exchange, the assumption being that in a perfect market the individual assets evaluated at market prices remain the same. The correction of expected marks, being based on the mean of the marks obtained over six tests, through the back door introduces the obtained marks again; therefore the analyses were repeated using the uncorrected expected marks. The reduction in errors is very marked for corrected as well as uncorrected expected marks.

Table 5
Error terms using obtained or expected marks

_____________________________________________________________________
system / actors          ∑ error2 for analysis based on           n
                        -------------------------------------
	                  marks obtained    marks expected
                                       ----------------------
                                       corrected uncorrected
---------------------------------------------------------------------
6-test system               85           83         75          185
        students                  4             4          4    179
        teachers (tests)         80            78         71      6
---------------------------------------------------------------------
every test a system
-------------------
    general introduction     5            5          4          180
    constitutional law      14           13         13          180
    private law             47           14         14          180
    history of law          62           33         36          180
    sociology of law        32           27         29          180
    penitentiary law        12            8          8          180
_____________________________________________________________________

Note. All sums are multiplied by 105. Under 'every test a system' the sum of squared errors for students as well teaches is given.

The test as clearing house and the quality of the reproduction of interests

What is the quality of the estimation of interests using the Coleman model? To answer this question a simulation study is done to see how good the Coleman model is at reproducing known interests. When the interests are known, for example by specifying a certain distribution for these interests, it is possible to determine the exchange rate of marks and time as well as the combination of expected mark and time spent in study that maximizes the satisfaction for the particular student (Coleman, p. 684 and 676 respectively, presents the mathematics). I will use the values .39 and .61 respectively for obtained marks and time spent in extra-curricular activities. Interests for time are generated by pseudo-random sampling (* from the beta distribution with parameters 12 and 8, having mean .6 and variance 2/175 ( the pseudo-random number generator ran3 and procedures for sampling from the beta distribution used are given by Press, Flannery, Teukolsky, & Vetterling (1989)). All individual differences between students are absorbed in the differences in interests; think of intellectual capabilities, study habits, social economic background, educational background, and aspiration level. *)

Table 6
Simulation results 100 runs, 100 students per run, both test and pretest 25 or 50 items

_____________________________________________________________________
vari-  mean    s.d.   
able                    1             2             3            4     
___________________________________________________________________________________
test and pretest 25 items
-----------------------------------------------------------------------------
1   6.612    .206
2   6.571    .166     .672 (.0877)
3  79.108   2.308     .740 (.0462)  .871 (.1434)
4    .398    .0121    .970 (.0073)  .766 (.1145)  .869 (.0278)
5    .0062   .0001    .518 (.1057) -.064 (.1005) -.066 (.1036)  .325 (.1125)
==================================================================================
test and pretest 50 items
-----------------------------------------------------------------------------
1   6.646    .186
2   6.590    .162     .751 (.0953)
3  79.634   2.300     .844 (.0309)  .859 (.1385)
4    .404    .0117    .980 (.0048)  .806 (.1183)  .928 (.0163)
5    .0062   .0001    .451 (.1130) -.012 (.1223)  .013 (.1236)  .298 (.1181)
___________________________________________________________________-------__

Note. 1 = obtained marks, 2 = expected marks, 3 = hours spent, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, mean = mean of 100 samples, s.d. = standard deviation of the sample means, cross-tabulated are the means of the product moment correlations and their standard deviations. Start values: mean interest in marks .4, in time .6, variance .0115; computed: value for marks .39, time .61.

The next step is constructive: given the expected mark, a process must be specified to generate an obtained mark. A crucial characteristic of the educational system is that the test functions as a clearing house for the exchange of marks and time. This clearing house can be modelled using a binomial model (Wilbrink, 1978, van den Brink, 1982) for the generation of test scores given the true mastery on the domain of questions that is sampled by the test. The crucial step then is from the expected mark to the true mastery. The trick here is to suppose the expected mark is the based on the score on a pretest sampled form the same domain of questions the test is sampled from. A beta density Beta(a,b) for the true mastery can then be specified, and sampled for the true mastery that will generate the score on the test. ( a + b = number of items in pretest, a / ( a + b ) = score on pretest; for the details see Wilbrink, 1978).

The results of the simulation stydy presented in Table 6 demonstrate that interests are estimated quite well, at least under the conditions established for the simulation. The mean of the interests is replicated, with a small standard deviation for samples of 100. The correlation between expected and obtained marks increases from .67 to .75 when test and pretest length are doubled from 25 to 50 items.

discussion

The results presented establish Colema's social system theory as a potentially useful tool for the educational researcher concerned with the kind of problems mentioned in the subtitle of the paper: grade retention, drop-out and study-delays.

(* Using Coleman’s model it will be possible to use micro-economic theory to derive predictions of which measures might have effect on the relative interests of students for marks and for time. Economic theory predicts that a 10% increase in the time budget will be distributed over time spent in study and time spent in extracurricular activities in the same proportion as applies to the old budget (income elasticity is 1). Teachers could decide to grade less lenient, also in this case students will correct their aspiration level for marks in such a way that they have to spend the same proportion of their time budget as before. It is evident much more drastic measures have to be taken to effect any substantial change in behavior of students as wel as teachers, measures that inflence directly the interests themselves. *)

A social system where marks are exchanged against time is extremely vulnerable to the weaknesses of assessment and grading. The scale of marks is extremely coarse, not allowing any fine-tuning between amount of study time and mark received. Educational testing itself functions as a kind of 'clearing house', to make possible the exchange of time and marks, marks being the valued scores of the tests used. Regrettably educational measurement experts, with few exceptions, as for example Van Naerssen’s (1974) work on models of strategic study behavior of students, are not concerned with this 'clearing house' function of tests. Poor predictability of marks (predictability by the student is meant here) quite possibly is one of the reasons for the observed exchange rate for marks against time in the law education case, where students are able to keep a vey substantial part of their time budget to spend it on extra-curricular activities.

Social system analysis re-establishes a fact that seems to be known to everyone except specialists in educational assessment: the difference between a psychological test and an educational test is that for the first the student prepares by sleeping well, and for the second by studying until a personally satisfactory level of mastery is reached. The characteristics of the educational system determine what the student thinks is a satisfactory level. Change this characteristic of the system, and the effectiveness of schooling will change.




companion paper see
http://www.benwilbrink.nl/publicaties/92ColemanApplicationECER.htm

Pascal program: contact me


Other or recent literature


Hubert M. Blalock, Jr., and Paul H. Wilken (1979). Intergroup processes. A micro-macro perspective. The Free Press.


Ulrich Trautwein and Oliver Lüdtke (2007). students’ self-reported effort and time on homework in six school subjects: between-students differences and within-student variation. Journal of Educational Psychology, 99, 432-444.



J. Sidney Shrauger & Timothy M. Osberg (1981). The Relative Accuracy of Self-Predictions and Judgments by Others in Psychological Assessment. Psychological Bulletin, 90, 322-351.

abstract The validity of individuals' self-assessments is compared with other assessment procedures commonly used in psychological evaluation. Comparisons are made in the prediction of all criteria that have been investigated: intellectual achievement, vocational choice, job performance, therapy outcome, adjustment following hospitalization, and peer ratings. Self-assessments are at least as predictive of these criteria as are other assessment methods against which they have been pitted. Limitations of this conclusion and its implications for current psychological evaluation procedures are examined. It is argued that greater attention should be given to self-assessments and to the evaluation procedures that may enhance their predictive validity. Steps are outlined for deciding when self-assessment should be used, and suggestions are offered as to how the validity of self-judgments might be maximized.



Philip L. Ackerman & Stacey D. Wolman (2007). Determinants and validity of self-estimates of abilities and self-concept measures. Journal of Experimental Psychology: Applied, 13, 57-78. abstract




abstract



May 20, 2013 \ contact ben apenstaartje benwilbrink.nl

Valid HTML 4.01!   http://www.benwilbrink.nl/publicaties/92ColemanModelingECER.htm http://goo.gl/Id1AU