1992 European Conference on Educational Research (ORD) Enschede abstract + summary + paper + presentation + sheets

Understanding grade retention, drop-out and study-delay as system rigidities.

Certain anomalies in education are extremely resistant to change, as is certainly the case for grade retention in secondary education and for attrition and study-delay in higher education. Present-day research methodology and data analysis in the educational field are not fit to the task to elucidate this kind of macro-educational phenomena. Recently Coleman (1990, Foundations of social theory. Cambridge, Massachusetts) presented a theory of social systems that connects behavior of actors (for example students and teachers) at the micro-level with phenomena occurring at the macro-level of the social system involved. The theory of Coleman has its roots in micro-economics, and is conceptually very different from traditional methodological approaches to social (and educational) phenomena. The paper explores the possibility to apply this theory to the mentioned problems in education, using empirical data (grades and time expenditure) from the first year examinations 1985-1989 of the study of law at the University of Amsterdam.

Major educational problems like retention and attrition rates seem to be immune against policy measures and innovations meant to reduce these rates. Traditional methodology in the field of educational research does not yield meaningful insights into the nature of and causal relations behind retention and attrition. In a famous Dutch article Posthumus (1940) presented an incisive analysis of the detrimental character of the way teachers assess and evaluate their pupils, resulting in what later came to be called “Posthumus’ Law:” no matter the changes in the educational system, teachers will label the achievements of one quarter of their pupils as not up to standard or unsatisfactory (‘onvoldoende’). In the Dutch system of secondary education the consequences are extremely serious, because it is only a small minority of students that is not retained in at least one grade (year). This particular problem however is not peculiarly Dutch, as data in the Unesco study (1980, cited in Bos, 1984, p. 6-17) demonstrate. Posthumus’ analysis is as valid today, one half century later, as it was in 1940. Educational research evidently has not been 'up to standard,' failing to make any significant inroad into this problem area.

Retention and attrition at the macro-level will not change by well-meant admonition of teachers (Posthumus), nor by major educational innovations (De Groot, 1966; Bos, 1984). It is of course possible for the individual student to steer clear of these cliffs by stepping up one's investment of time and attention. The catch is, of course, that what is possible at the individual level, is not possible at the group- or system level. That is what history teaches us, whether we understand it or not. Doornbos (1985) points out that teachers in their behavior are restrained by the characteristics of the system they are working in: te problem of retention is a problem at the level of the educational system as system. Posthumus’ Law describes the behaviour of this system: not only teacher behaviour, but also student behaviour (Wilbrink, 1985).

The general problem in the educational field is how individual behaviours connect to macro-educational characteristics, with the direction of influence obviously going both ways. A promising model is Coleman’s (1990) social system theory.

**Coleman’s Social System Theory**

In the field of economics the relation between behaviour of individual actors (at the micro-level) and phenomena at the macro-level is described by micro-economic (equilibrium) models. The market price of a certain good, say apples, is the resultant of many individual bargaining situations (between customers and merchants, or at the fruit auction). Market price is a macro-level concept. Individual customers are not able to change the market price of apples. Coleman’s observation is that social systems can be modelled as systems or markets where actors exchange resources, the rates of exchange being determined at the system-level.

The paper will give a summary of the mathematical development of the basic model (Coleman’s chapters 25 and 26). A small set of basic assumptions is translated into restrictions on the data, enabling solution for values of the resources involved, en the individual power of actors in the system, and their interests (utilities) in these resources.

The model is a system model because exchange of resources takes place within a specified group of actors, and every exchange is dependent on the other exchanges taking place in this system. Individual exchanges within the system are competitive. Now 'competive' is a label that excellently fits most educational systems, and certainly the analyses of Posthumus, De Groot and Doornbos all point in this direction.

**the model applied to higher education data**

Educational systems can be modelled by Coleman’s theory. As an example data for the first year examination in the department of Law at the University of Amsterdam will be analysed. The resources being exchanged here are marks (from teachers to students) and time spent (from students to teachers). At the system level it is evident from the raw data that students are investing in their study only half of the 1600 hours they are expected to (by law), and only a minority have passed the first year examination ‘in time.’ The situation is ‘stable,’ in the sense that over the (observed) years no appreciable changes in the statistics have taken place. The amazing thing is that the individual student is almost always able by a relatively small extra investment to pass the examination in time, should he or she wish to do so. However, the number of students wishing to do so must be small, witness the data. (The paper will present the basic statistics here.) Available data allow determination of the exchange rate of marks against time, a macro-level characteristic of this educational system, and of the interest (or utility) of every student (and teacher) in marks and in time (i.e.: in time not to be spent in study). The analysis presupposes the system to be in equilibrium (after marks have been assigned).

This kind of analysis shows why everybody in this systems acts as he or she acts, given the characteristics of the system, i.e. in this case given the exchange rate of marks and time. There are many details that might be discussed, but the main point is that this kind of analysis shows every actor within the system acting under influence of important system characteristics. Traditional techniques of data analysis (experimental and correlational methods) treat every individual observation as independent of other observations, denying from the very start the possibility of dependent, competitive, 'systemic' behaviors.

Every policy maker interested in improvement of attrition, in higher motivation of students, or in better marks, must have insight into the ways in which the educational system influences individual behaviors en in turn is shaped by collective individual behaviour. Especially government policies are directed at effects at the system level. Failure to evaluate the effects of education as obtained under the particular constraints of the system as (traditionally) implemented will lead to failure of law-induced educational innovation.

Social system analysis using marks and time expenditure will expose the weaknesses of current methods of educational assessment. The current grading system is taken for granted by almost every author on the subject. Certainly De Groot (1966) severely criticized the Dutch grading system, but he failed to indicate the historical roots of this particular grading system. Also in the handbooks of Thorndike (1972) and Linn (1989) no mention is made of the historical underpinnings of the current marking system in the U.S.A.. Chapman (1988) does present a historical survey of the American grading system, showing intelligence testing to be an offspring of the movement of standardized testing in education. A social system where marks are exchanged against time is extremely vulnerable to the weaknesses of assessment and grading. The scale of marks is extremely coarse, not allowing any fine-tuning between amount of study time and mark received. It is possible that poor predictability of marks is the main reason for the observed exchange rate for marks against time in the law education case. Certainly the social system analysis in this case indicates there is plenty of room for improvement using modelling techniques for educational assessment as developed by Van Naerssen (1974) and Wilbrink (1978). Also in secondary education a social system analysis approach could point out the classical grading system as the main culprit in the case against (Dutch) retention rates of 15 to 25 percent every year. Remarkably, in English speaking countries there is an 'educational reform' movement in the direction of stricter grading policies (Shepard & Smith, 1989), in the direction also of more sorrow for more students.

Social system analysis re-establishes a fact that seems to be known to everyone except specialists in educational assessment: the difference between a psychological test and an educational test is that for the first the student prepares by sleeping well, and for the second by studying until a personally satisfactory level of mastery is reached. The characteristics of the educational system determine what the student thinks is a satisfactory level. Change this characteristic of the system, and the effectiveness of schooling will change.

Resemblances and differences to alternative methods (for example: van der Heijden & van der Kamp, 1991; Jackson, Brett, Sessa, Cooper, Julin & Peyronnin,1991), if useful, will be indicated.

Bos, D.J. (1984). Blijven zitten met zittenblijven? Den Haag: Stichting voor Onderzoek van het Onderwijs.

Chapman, P.D. (1988). Schools as sorters. Lewis M. Terman, applied psychology, and the intelligence testing movement, 1890-1930. New York: New York University Press.

Coleman, J. (1990). Foundations of social theory. Cambridge, Massachusetts: The Belknap Press of Harvard University Press.

Doonbos, K. (1985). Naar een macro-educatieve analyse van het zittenblijven en verwante expulsiefenomenen in het Nederlandse schoolwezen. In Wald, A. (1985). Een jaartje overdoen. Verslag van het SVO-symposium over zittenblijven in het voortgezet onderwijs. Den Haag: SVO. Lisse: Swets & Zeitlinger.

Heijden, P.G.M. van der, & van der Kamp, L.J.Th. (1991). Latente budget analyse en onderzoek naar schoolloopbanen. TOR, 16, 297-314.

Jackson, S.E., Brett, J.F., Sessa, V.I., Cooper, D.M., Julin, J.A., & Peyronnin, K. (1991). Some differences make a difference: individual dissimilarity and group heterogeneity as correlates of recruitment. promotions, and turnover. American Psychologist, 46, 675-689.

Linn, R.L. (Editor)(1989). Educational measurement; second edition. Washington, D.C.: American Council on Education.

Naerssen, R.F. van (1974). A mathematical model for the optimal use of criterion referenced tests. Nederlands Tijdschrift voor de Psychologie, 29, 431-445. pdf

Posthumus, K. (1940). Middelbaar onderwijs en schifting. De Gids, jaargang 104, deel 2, 24-42. integraal op dbnl.nl

Shepard, L.A., & Smith, M.L. (Editors)(1989). Flunking grades. Research and policies on retention. London: The Falmer Press.

Thorndike, R.L. (Editor)(1971). Educational measurement; third edition. New York: American Council on Education / Macmillan.p class='lit' Unesco (1980). Wastage in primary and general secondary education: a statistical study of trends and patterns in repetition and drop-out. Paris: Unesco Office of statistics, (CSR-E-37).

Wilbrink (1978). Studiestrategieën. Amsterdam: Stichting Centrum voor Onderwijsonderzoek. html

Wilbrink (1986). Toetsen en testen in het onderwijs. SVO Jaarboek 1985, 275-288. Den Haag: Stichting voor Onderzoek van het Onderwijs. html

1992 European Conference on Educational Research (ORD)

Understanding grade retention, drop-out and study-delay as system rigidities.

Grote Bickersstraat 72, 1013 KS Amsterdam.

Certain unwanted states of affairs in education are extremely resistant to change, as certainly is the case for grade retention in secondary education and for attrition and study-delay in higher education. The individual student might prevent flunking of grades or failing the examination by stepping up his or her investment of time and attention. The catch, of course, is that what is possible at the individual level, is not possible at the group- or system level. This is what history teaches us, whether we understand it or not. Doornbos (1985) points out that teachers in their behavior are restrained by the characteristics of the system they are working in: the problem of retention is a problem at the level of the educational system as system. Evaluation of education should reflect this systemic character of education, and of most or all of its characteristics that currently are proposed as performance indicators. The problem then is that there are no known methods of data analysis or educational measurement that are fit to the task to elucidate this kind of systemic phenomena. Standard statistical methods assume independent observations, also in causal relations, in fact assuming that causal relations existing at the individual level are valid also for massive behavior changes. Because there are no adequate evaluaton instruments, it is not possible for the educational researcher to give policy makers a handle on this kind of systemic problem.
The general problem in the educational field is how individual behaviours connect to macro-educational characteristics, with the direction of influence obviously going both ways. Recently Coleman (1990) presented a theory of social systems that connects behavior of actors (for example students and teachers) at the micro-level with phenomena occurring at the macro-level of the social system involved. The theory of Coleman has its roots in micro-economics, and is conceptually very different from traditional methodological approaches to social (and educational) phenomena. The paper explores the possibility to apply this theory to data from the Department of Law of the University of Amsterdam. In a companion paper in the section Higher Education this application and its implications for educational policy are reported on more fully.

I will explain Coleman’s approach in the terms of the particular example used, the first year of the study in Law. The basic idea is that students within a social system exchange time spent in study for marks received in ways that maximize satisfaction with the total number of marks received and the time kept for private use. Coleman uses the basic premise of micro-economic theory about human behavior: the higher the marks obtained, while remaining at the same level of satisfaction (because of having less of something else, private time in this case), the less of the still available time he will be giving up to get still higher marks. Coleman uses this and other strong assumptions from micro-economic theory to model the transactions taking place in the educational system, and to present a mathematical structure isomorphic to the structure of transactions. Given empirical data like marks obtained and time spent, the model allows estimation of a conceptually very different set of variables: the interests in marks obtained and interests in alternative uses of their time are characteristics of individual students and teachers, the exchange rate of marks and time is a characteristic of the educational system.

*perfect market*

The assumption is made that there is a competitive market, and therefore a common rate of exchange for all students and teachers. The assumption is necessary to be able to estimate the rate of exchange, given data on time spent and marks obtained. The relative values for the two resources in our market, time and marks, might be .6 and .4, under the constraint that their sum is 1. Before exchange the teachers together control the whole budget of available marks, the students still control all of their time budget. I will say that the power of the teachers before exchange is .4, i.e. equal to the budget of marks evaluated against their market price. The power of the students .6.
In a competitive market the power of the actors before and after exchange will remain the same. Because everyone is trading at the same market prices, and because these maket prices are competitive, nobody wins or loses. Of course in reality some actors do win or lose. The values for time and marks may be determined by minimizing the violations of the assumption of a competitive market. Coleman (chapters 25 and 26) presents the mathematics for minimizing the sum of squared errors, an error being the difference of the power of a student before and after exchange.

The basic idea is that marks and time have a price lable attached to them, the prices or values having established themselves over many years, and that the prices or the exchange rate can be estimated given an appriopriate set of data.

*satisfaction*

If nobody wins or loses on this market, unles it is an error or accident, why is that students and teachers still are in business? The answer is that people maximize their satisfaction, or utility if you want to call it that. Students can gain in satisfaction by exchanging an appriopriate amount of their time against an apprioriate mark earned on a particular test.

It would be nice to have a mathemtical expression for this satisfaction. Coleman’s choice is an analogue to the Cobb-Douglas production function

U = Π c_{ij} ^{xji}

where c_{ij} is the *control* person i has over good j, and x_{ij} is the *interest* person i has in good j; controls as well as interests are constrained to sum to 1. The Cobb-Douglas production function as such may be applied in the educational setting (see Polachek, Kniesner, & Harwood, 1978), Coleman uses it as a utility function. Now this utility function is remarkably different from the usual type of utility function in test-based decision making (van der Linden, 1991, reviews the latter). For one thing, it is possible to determine this function using empirical data on time spent and marks received. For every student the relative interest in time and in marks can be derived from available data; the controls of course are the number of hours the student has managed to keep for himself, and the marks he has obtained, expressed as proportions of the total budget available. These interests are characteristics of the individual person, knowledge of these interests might be of interest to educational policy makers (Coleman, p. 718).

This kind of analysis shows every actor within the system acting under influence of important system characteristics. Traditional techniques of data analysis (experimental and correlational methods) treat every individual observation as independent of other observations, denying from the very start the possibility of dependent, competitive, 'systemic' behaviors.

Table 1 gives the results of a Coleman-type of system analysis on the six tests of the first year examination. Here the six tests together are taken to be the system. Every test is represented with seven parallel forms because the data are assembled in the seven courses from 1983 to 1988. The 179 students are not a representative sample: they are chosen because they participated in all six tests and complete filled out the six questionnaires. The results for this group of students are not valid for first year law students in general. The mean mark received by this select group is 6.63, an outstandingly high value, I can assure you.

**Table 1**

Analysis of a six-test educational system: the first year Law examination 1983 - 1988

__________________________________________________________ mean | sum n ---------- | ------------------------ | control | ----------- actors mark time | marks time power error2 ---------------------------------------------------------- students 6.63 783 | .626 .544 .579 44 179 teachers 3.37 657 | .374 .456 .421 802 6 __________________________________________________________

Note. The values for marks and time are respectively .429 en .571. Error2 is times 10^{6}.

Students control .626 of the total budget of marks. This proportion is not equal to the mean mark received because of a peculiarity of many marking systems. The scale of marks starts at 1, not at zero. Only nine points can be earned. The control accordingly is ( 6.63 - 1 ) / 9 = .626. Teachers control the complement of this, the marks they have not handed out. The students are able to keep a high proportion of the time budget to themselves. In this market the power of the students before exchange is euqual to the value of time, because they control the whole budget of, in this case, 1440 hours; after exchanges the power of the students is .579, very close to the initial power. Remember that the students’ power is the sum of the marks en time he controls evaluated at market prices, in this case the exchange rate of .429 versus .571. Does the model fit the data? The assumption is that for every actor his power before and after exchange will be the same. Table 1 gives the sum of squared errors that is minimized by the the values for marks and time respectively of .429 and .571. Table 2 presents a breakdown of the errors for the six tests or teachers.

Nothing new is revealed yet, but then in Table 1 only the assumption of a perfect market is used, that is the assumption of equal power before and after exchange. Assuming every actor has maximized his satisfaction, his relative interests in marks and time can be estimated. For the six teachers the interests and satisfaction are tabulated in Table 2. There is no equity in the satisfaction obtained by the six teachers, 'general introduction' obtaining a much higher satisfaction than ‘penitentiary law,’ the ratios between them are nearly the same as the ratios between their powers. Is there an explanation? Students have a time budget of 934 hours, they are not constrained to spend a certain proportion of it on first or on any other test, so the way they distribute their time over the six tests could be highly revealing. The available time as programmed by the Department of Law is given in the last column in Table 1. Now it is clear the big loser is penitentiary law; it is the last test, many students might not be interested investing the time to obtain yet another high mark they do not need to pass the examination. On the whole, however, the power of the six teachers is highly correlated with the number of hours programmed for the course. The explanation might be that in the regulation for this examination the marks for the tests are weighted approximately in the same proportion as the time budgets.

Table 2

The six-test educational system: results for the tests

________________________________________________________------------__ mean | interest power error2 satis- time ---------- | ----------- faction pro- | gram- test / teacher mark time | marks time med ---------------------------------------------------------------------- general introduct. 3.15 161 | .281 .719 .089 298 .093 320 constitutional law 3.32 121 | .354 .646 .075 9 .075 240 private law 3.55 126 | .360 .640 .078 45 .079 240 history of law 3.66 90 | .448 .552 .065 43 .065 200 sociology of law 3.55 70 | .503 .497 .056 235 .057 200 penitentiary law 2.97 88 | .403 .597 .058 171 .058 240 _________________________________________________________________-----

Note. The values for marks and time are respectively .429 en .571. Error2 is times 10^{6}.

In Table 2 the interests and satisfactions of teachers were presented, and were shown to be related to particular characteristics of the tests that were not incorporated in the model, as for example penitentiary law being the last test in a series. The interests and satisfactions of the students are constructed variables that have discriminant validity, see the multitrait-multimethod lay-out (Campbell & Fiske, 1959) in Table 3. Table 3 is based on the results of six separate analyses, each of the tests being analysed as a separate educational system. In this way there are six estimates of the interest of every student in marks obtained. In the two-variable system analysis the interest in time is simply the complement of the interest in marks, and is not tabulated.

There is a new empirical variable in Table 3, it is the expected mark, as reported by every student just before taking the particular test. It is not the expected mark itself, but a corrected expectation that is used in the analysis, because there is a somewhat limited varibility in the expected marks as reported by the students. The correction is the mean difference between expected and obtained marks for the particular student. The motivation for the correction is threefold. (1) In a technical sense this correction would not influence the correlation of expected and obtained marks of the individal student over the series of six tests; this correlation is an important indicator of the ability of the student to predict his test results. (2) The second reason is that some students knowingly or unknowingly tend to systematically under- or overestimate their expected marks, legitimating a correction for this personal tendency. (3) The third and most relevant reason is that the system analysis might be done on the basis of expected marks in stead of obtained marks, and then the analysis based on corrected expected marks results in higher validities (Table 4).

Expected and obtained marks are two manifestations of the same construct. The obtained marks are the pure manifestations of this construct because of their legal meaning. The rather low correlations for obtained marks in the validity diagonals, contrasted with the high correlations for expected marks, should have the attention of the examination committee of the Department of Law. The total time is the sum of answers to five questions in the survey: time spent in direct preparation for the test, in attending lectures, in preparing for lectures, in attending seminars, and in preparing for seminars. Although there are substantial correlations between time spent on the different tests, the correlations with obtained marks as well as with expected marks are characteristically low, indicating that there must be an interaction with intellectual capacity.

Satisfaction should correlate with marks obtained and correlate negatively with time spent to study, and so it does (see Table 3, correlatons within tests). There is pattern of satisfaction being less determined by marks obtained as the year is progressing.

Interest in marks obtained is correlated .7 with these marks, and is correlated .5 with (corrected) marks expected.

Campbell and Fiske's criteria for construct validity are met: (1) the bold printed entries in the validity diagonals are significantly different from zero (convergent validity), (2) with the already commented upon exception of the values for marks obtained they are higher than the values lying in its column and row in the 'heterotrait-heteromethod' triangles adjacent to the validity diagonals, (3) with a number of exceptions they are higher than the corresponding correlations in the hetrotrait-monomethod triangles, and (4) the same pattern of interrelationship is shown in all of the heterotrait triangles of both the monomethod and heteromethod blocks. The exceptions regarding criterion 3 are an artefact of the method of estimating interests and satisfactions, this method necessarily resulting in hogh correlations with marks obtained.

**Table 3**

Relations between empirical variables and the constructs 'interest and 'satisfaction'

__________________________________________________________________________________________________________________________________________ test/ m. s.d. general introduction constitutional law private law history of law sociology of law penitentiary law vari- ------------------- ------------------- ------------------- ------------------- ------------------- --------------- able 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 _____________________________________________________________________________________________________________________________________________ general introduction 1 6.85 1.39 2 6.80 1.26 67 3 161 47 21 23 4 0.507 0.118 69 55 82 5 0.0032 0.0005 69 44 -46 11 constitutional law 1 6.68 1.47 49 71 27 46 25 2 6.63 1.27 57 80 23 49 34 72 3 121 42 15 21 72 58 -30 26 21 4 0.507 0.141 36 52 67 69 -06 66 50 86 5 0.0031 0.0006 34 46 -22 09 51 61 46 -47 01 private law 1 6.45 1.53 43 70 14 31 26 54 66 24 39 22 2 6.62 1.31 66 86 23 53 41 73 82 21 50 46 76 3 126 46 10 21 70 54 -36 18 18 75 61 -38 25 21 4 0.520 0.150 29 50 61 60 -11 38 45 69 67 -14 63 50 88 5 0.0030 0.0006 27 40 -43 -13 56 29 40 -40 -13 57 53 41 -61 -21 history of law 1 6.34 1.61 49 73 16 41 31 54 73 10 33 40 53 76 12 34 34 2 6.44 1.27 60 81 27 53 33 68 74 23 49 40 65 83 21 46 36 80 3 90 37 15 22 67 55 -29 20 19 69 59 -29 21 23 77 68 -43 24 27 4 0.439 0.143 39 57 55 64 01 44 55 51 60 08 41 57 57 64 -08 71 62 81 5 0.0032 0.0006 20 32 -42 -14 50 21 34 -49 -23 59 20 34 -58 -33 67 47 32 -68 -16 sociology of law 1 6.45 1.37 43 70 11 34 34 52 68 13 37 38 48 70 09 31 34 55 66 16 42 27 2 6.44 1.24 58 72 16 44 41 67 64 20 46 41 64 75 13 40 41 66 71 25 53 26 73 3 70 34 05 19 60 46 -29 17 15 69 59 -30 04 13 74 61 -51 14 19 72 57 -51 16 14 4 0.344 0.104 29 53 49 54 02 42 50 56 65 06 30 48 56 61 -13 40 51 59 65 -18 69 51 78 5 0.0035 0.0006 16 15 -48 -24 45 08 17 -57 -36 50 20 21 -65 -40 68 12 13 -60 -33 63 31 20 -87 -39 penitentiary law 1 7.03 1.67 31 65 22 35 17 48 65 18 37 31 45 68 15 32 26 51 65 18 44 23 52 59 21 44 04 2 6.88 1.28 60 73 19 47 41 70 75 17 46 47 66 82 12 40 43 66 72 20 51 31 65 74 11 45 21 73 3 88 39 08 17 55 41 -27 17 12 68 55 -34 14 14 72 61 -45 07 16 69 49 -52 01 10 72 52 -64 16 06 4 0.378 0.112 29 54 47 51 01 43 51 53 61 05 36 52 53 58 -07 38 53 55 62 -15 36 45 60 67 -35 73 52 74 5 0.0036 0.0006 17 28 -32 -08 41 17 34 -46 -20 56 16 32 -52 -30 60 29 28 -46 -09 65 37 31 -46 -09 61 48 43 -74 -13 _____________________________________________________________________________________________________________________________________________

Note. 1 = mark obtained on scale 1-10, 2 = corrected mark expected on the same scale, 3 = total of hours spent, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, m. = mean, s.d. = standard deviation; correlations are written without decimals; expected marks are corrected by the mean difference between marks obtained and marks expected on the six tests.

**Table 4 **

Relations of the empirical variables and the constructs 'interest and 'satisfaction' estimated on the basis of (corrected) marks expected

_____________________________________________________________________________________________________________________________________________ test/ m. s.d. general introduction constitutional law private law history of law sociology of law penitentiary law vari- ------------------- ------------------- ------------------- ------------------- ------------------- ---------------- able 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 _____________________________________________________________________________________________________________________________________________ general introduction 1 6.85 1.39 2 6.80 1.26 67 3 161 47 21 23 4 0.508 0.115 51 70 82 5 0.0031 0.0005 39 66 -46 09 constitutional law 1 6.68 1.47 49 71 27 56 39 2 6.63 1.27 57 80 23 59 50 72 3 121 42 15 21 72 61 -27 26 21 4 0.510 0.134 39 54 67 79 06 51 61 87 5 0.0031 0.0006 34 47 -29 10 68 34 58 -55 -10 private law 1 6.45 1.53 43 70 14 43 45 54 66 24 43 27 2 6.62 1.31 66 86 23 61 54 73 82 21 52 46 76 3 126 46 10 21 70 59 -28 18 18 75 64 -42 25 21 4 0.532 0.139 37 53 66 75 04 44 48 69 75 -10 46 57 89 5 0.0030 0.0006 42 48 -31 07 70 42 48 -38 -04 76 31 55 -61 -23 history of law 1 6.34 1.61 49 73 16 51 49 54 73 10 40 47 53 76 12 41 47 2 6.44 1.27 60 81 27 61 47 68 74 23 50 39 65 83 21 50 45 80 3 90 37 15 22 67 58 -24 20 19 69 61 -32 21 23 77 71 -40 24 27 4 0.450 0.131 39 54 64 75 08 46 49 61 71 00 43 54 66 77 -05 53 65 87 5 0.0032 0.0006 25 33 -40 -08 59 26 32 -45 -19 63 22 33 -58 -31 74 28 38 -73 -32 sociology of law 1 6.45 1.37 43 70 11 46 54 52 68 13 42 45 48 70 09 38 48 55 66 16 42 30 2 6.44 1.24 58 72 16 50 50 67 64 20 44 34 64 75 13 41 47 66 71 25 50 23 73 3 70 34 05 19 60 53 -20 17 15 69 61 -35 04 13 74 67 -41 14 19 72 63 -51 16 14 4 0.347 0.098 35 53 56 70 15 48 46 64 73 -03 35 49 62 73 -02 44 52 67 76 -23 51 62 83 5 0.0035 0.0006 20 14 -48 -26 43 14 14 -56 -37 52 24 21 -65 -46 64 15 13 -58 -37 62 17 30 -89 -49 penitentiary law 1 7.03 1.67 31 65 22 52 42 48 65 18 43 38 45 68 15 40 40 51 65 18 45 28 52 59 21 47 06 2 6.88 1.28 60 73 19 53 48 70 75 17 46 44 66 82 12 43 52 66 72 20 47 29 65 74 11 48 23 73 3 88 39 08 17 55 46 -22 17 12 68 55 -41 14 14 72 63 -42 07 16 69 58 -50 01 10 72 61 -62 16 06 4 0.377 0.097 42 55 51 66 16 52 53 59 70 02 45 55 58 72 03 42 52 62 72 -17 39 48 62 78 -34 52 60 79 5 0.0036 0.0006 27 24 -37 -10 49 22 30 -50 -21 66 21 31 -57 -31 71 30 24 -49 -23 64 36 31 -54 -23 66 23 47 -82 -30 _____________________________________________________________________________________________________________________________________________

Note. see the note under Table 3.

Clearly the theoretical constructs ‘interest in marks obtained’ and ‘satisfaction’ have construct validity. More importantly, they have discriminant validity in relation to the three empirical variables. As such they carry the promise of being a significant new instrument for the educational researcher.

Students exchange their time against marks, but obviously in this continuous proces of exchange the only thing they can get in direct return is a higher *expectation* for the mark to be received for the test. The model should fit better using the expected marks than using the marks that in fact are obtained. Because empirical data on expected marks are available, the prediction can be tested. Two criteria for fit of the model are: (1) the validity of the new constructs ‘interests’ and ‘satisfactions,’ and (2) the sum of squared errors. Table 4 presents the data on validity, revealing distinctly higher validities for interests and satisfactions, and less violations of Campbell Fiske’s third criterion.

**Table 5**

Error terms using obtained or expected marks

_________________________________________________________________ system / actors sum error^2 for analysis based on n ----------------------------------------- marks obtained marks expected ---------------------- corrected uncorrected ----------------------------------------------------------------- 6-test system 85 83 75 185 students 4 4 4 179 teachers (tests) 80 78 71 6 ----------------------------------------------------------------- every test a system ------------------- general introduction 5 5 4 180 constitutional law 14 13 13 180 private law 47 14 14 180 history of law 62 33 36 180 sociology of law 32 27 29 180 penitentiary law 12 8 8 180 _________________________________________________________________

Note. All sums are multiplied by 10^{5}. Under ‘every test a system’ the sum of squared errors for students as well teachers is given.

The summary statistics for the analysis with (corrected) expected marks are very close to the results of the previous analysis with obtained marks: the exchange rate of expected marks and time is .421 / .579. The sum of squared errors is smaller now, as predicted; Table 5 summarizes the results of both analyses. Remember that error is the difference beween power before and after exchange, the assumption being that in a perfect market the individual assets evaluated at market prices remain the same. The correction of expected marks, being based on the mean of the marks obtained over six tests, through the back door introduces the obtained marks again; therefore the analyses were repeated using the uncorrected expected marks. The reduction in errors is very marked for corrected as well as uncorrected expected marks. The mtmm-matrix, not presented here, shows validities that are lower than the validities presented in Table 4 for the analysis based on corrected expected marks.

What is the quality of the estimation of interests using the Coleman model? To answer this question a simulation study is done to see how good the Coleman model is at reproducing known interests. When the interests are known, for example by specifying a certain distribution for these interests, it is possible to determine the exchange rate of marks and time as well as the combination of expected mark and time spent in study that maximizes the satisfaction for the particular student (Coleman, p. 684 and 676 respectively, presents the mathematics). I will use the values .39 and .61 respectively for obtained marks and time spent in extra-curricular activities. Interests for time are generated by pseudo-random sampling from the beta distribution with parameters 12 and 8, having mean .6 and variance 2/175 ( the pseudo-random number generator ran3 and procedures for sampling from the beta distribution used are given by Press, Flannery, Teukolsky, & Vetterling (1989)). All individual differences between students are absorbed in the differences in interests; think of intellectual capabilities, study habits, social economic background, educational background, and aspiration level.

The next step is constructive: given the expected mark, a process must be specified to generate an obtained mark. A crucial characteristic of the educational system is that the test functions as a clearing house for the exchange of marks and time. This clearing house can be modelled using a binomial model (Wilbrink, 1978, van den Brink, 1982) for the generation of test scores given the true mastery on the domain of questions that is sampled by the test. The crucial step then is from the expected mark to the true mastery. The trick here is to suppose the expected mark is the based on the score on a pretest sampled form the same domain of questions the test is sampled from. A beta density Beta(a,b) for the true mastery can then be specified, and sampled for the true mastery that will generate the score on the test. ( a + b = number of items in pretest, a / ( a + b ) = score on pretest; for the details see Wilbrink, 1978).

**Table 6**

Simulation results 100 runs, 100 students per run, both test and pretest 25 or 50 items

_____________________________________________________________________ vari- mean s.d. able 1 2 3 4 ______________________________________________________________________________ test and pretest 25 items ----------------------------------------------------------------------------- 1 6.612 .206 2 6.571 .166 .672 (.0877) 3 79.108 2.308 .740 (.0462) .871 (.1434) 4 .398 .0121 .970 (.0073) .766 (.1145) .869 (.0278) 5 .0062 .0001 .518 (.1057) -.064 (.1005) -.066 (.1036) .325 (.1125) ============================================================================== test and pretest 50 items ----------------------------------------------------------------------------- 1 6.646 .186 2 6.590 .162 .751 (.0953) 3 79.634 2.300 .844 (.0309) .859 (.1385) 4 .404 .0117 .980 (.0048) .806 (.1183) .928 (.0163) 5 .0062 .0001 .451 (.1130) -.012 (.1223) .013 (.1236) .298 (.1181) _____________________________________________________________________-----

Note. 1 = obtained marks, 2 = expected marks, 3 = hours spent, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, mean = mean of 100 samples, s.d. = standard deviation of the sample means, cross-tabulated are the means of the product moment correlations and their standard deviations. Start values: mean interest in marks .4, in time .6, variance .0115; computed: value for marks .39, time .61.

The results of the simulation stydy presented in Table 6 demonstrate that interests are estimated quite well, at least under the conditions established for the simulation. The mean of the interests is replicated, with a small standard deviation for samples of 100. The correlation between expected and obtained marks increases from .67 to .75 when test and pretest length are doubled from 25 to 50 items.

The results presented establish Colema's social system theory as a potentially useful tool for the educational researcher concerned with the kind of problems mentioned in the subtitle of the paper: grade retention, drop-out and study-delays. Using Coleman’s model it will be possible to use micro-economic theory to derive predictions of which measures might have effect on the relative interests of students for marks and for time. Economic theory predicts that a 10% increase in the time budget will be distributed over time spent in study and time spent in extracurricular activities in the same proportion as applies to the old budget (income elasticity is 1). Teachers could decide to grade less lenient, also in this case students will correct their aspiration level for marks in such a way that they have to spend the same proportion of their time budget as before. It is evident much more drastic measures have to be taken to effect any substantial change in behavior of students as wel as teachers, measures that inflence directly the interests themselves.

A social system where marks are exchanged against time is extremely vulnerable to the weaknesses of assessment and grading. The scale of marks is extremely coarse, not allowing any fine-tuning between amount of study time and mark received. Educational testing itself functions as a kind of 'clearing house', to make possible the exchange of time and marks, marks being the valued scores of the tests used. Regrettably educational measurement experts, with few exceptions, as for example Van Naerssen’s (1974) work on models of strategic study behavior of students, are not concerned with this ‘clearing house’ function of tests. The whole of item response theory is irrelevant to this ‘clearing house’ function of educational testing. Poor predictability of marks (predictability by the student is meant here) quite possibly is one of the reasons for the observed exchange rate for marks against time in the law education case, where students are able to keep a vey substantial part of their time budget to spend it on extra-curricular activities.

Social system analysis re-establishes a fact that seems to be known to everyone except specialists in educational assessment: the difference between a psychological test and an educational test is that for the first the student prepares by sleeping well, and for the second by studying until a personally satisfactory level of mastery is reached. The characteristics of the educational system determine what the student thinks is a satisfactory level. Change this characteristic of the system, and the effectiveness of schooling will change.

ECER, june 24 1992

Aiken, L. R., Jr. (1963). The grading behavior of a college faculty. Educational and Psychological Measurement, 23, 319-322.

Brink, W. P. van den (1982). Binomiale modellen in de testleer. Proefschrift Universiteit van Amsterdam.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Coleman, J. (1990). Foundations of social theory. Cambridge, Massachusetts: The Belknap Press of Harvard University Press.

Doonbos, K. (1985). Naar een macro-educatieve analyse van het zittenblijven en verwante expulsiefenomenen in het Nederlandse schoolwezen. In Wald, A., Een jaartje overdoen. Verslag van het SVO-symposium over zittenblijven in het voortgezet onderwijs(pp. 35-67). Lisse: Swets & Zeitlinger.

Kula, W. (1986). Measures and men. Princeton, New Jersey: Princeton University Press.

Linden, W. J. van der (1991). Applications of decision theory to testbased decision making. In Hambleton, R.K., & Zaal, J. N. Advances in educational and psychological testing: theory and applications. Dordrecht: Kluwer, 129-156.

Naerssen, R. F. van (1974). A mathematical model for the optimal use of criterion referenced tests. Nederlands Tijdschrift voor de Psychologie, 29, 431-445. pdf

Polachek, S. W., Kniesner, T. J.,, & Harwood, H. J. (1978). Educational production functions. Journal of Educational Statistics, 3, 209-231.

Press, W. H., Flannery, B. P., Teukolsky, S. A., & Vetterling, W. T. (1989). Numerical recipes in Pascal; the art of scientific computing. London: Cambridge University Press.

Wilbrink, B (1978). Studiestrategieën. Amsterdam: Center for Educational Research. pdf

Table 1 (Test prep time version)

Analysis of a six-test educational system: the first year Law examination 1983 - 1988

_____________________________________________________________________ mean | sum n ---------- | ------------------------ | control | ----------- actors mark time| marks time power error2 -------------------------------------------------------------- students 6.63 564 | .626 .604 .613 5065 179 teachers 3.37 370 | .374 .396 .387 38080 6 _____________________________________________________________________ | interest satisf. time | ----------- pro- test / teacher | marks time grammed --------------------------------------------------------------------- general introduct. 3.15 79 | .307 .693 .074 8210 .0753 180 constitutional law 3.32 63 | .371 .629 .065 8 .0649 170 private law 3.55 72 | .352 .648 .073 6007 .0731 147 history of law 3.66 61 | .398 .602 .067 198 .0665 160 sociology of law 3.55 48 | .450 .550 .057 6483 .0575 171 penitentiary law 2.97 47 | .413 .587 .052 17173 .0521 192 _____________________________________________________________________

Note. The values for marks and time are respectively .391 en .609. Error2 is times 10^{8}. (* Contributions to the time budget are supposed to be (net values) 180, 170, 147, 160, 171 and 192 hours, totalling 934 hours.*)

Table 2 (test prep time version-versie)

Relations between empirical variables and the constructs 'interest and 'satisfaction'

_____________________________________________________________________________________________________________________________________________ test/ m. s.d. general introduction constitutional law private law history of law sociology of law penitentiary law vari- ------------------- ------------------- ------------------- ------------------- ------------------- ------------------ able 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 _____________________________________________________________________________________________________________________________________________ general introduction 1 6.85 1.39 2 6.80 1.26 67 3 79 36 15 18 4 0.411 0.129 58 46 84 5 0.0035 0.0007 46 28 -70 -23 constitutional law 1 6.68 1.47 49 71 23 40 11 2 6.63 1.27 57 80 19 42 19 72 3 63 32 15 19 63 56 -35 24 21 4 0.419 0.161 34 45 60 67 -17 57 44 87 5 0.0034 0.0007 24 32 -26 01 45 40 29 -60 -17 private law 1 6.45 1.53 43 70 12 26 13 54 66 21 32 08 2 6.62 1.31 66 86 17 43 25 73 82 20 44 30 76 3 72 33 10 21 54 46 -37 18 21 63 51 -42 30 22 4 0.528 0.189 22 38 51 50 -25 31 36 62 57 -29 52 40 93 5 0.0030 0.0007 24 37 -24 -05 41 25 33 -21 00 44 41 34 -52 -23 history of law 1 6.34 1.61 49 73 13 33 17 54 73 11 28 25 53 76 15 26 26 2 6.44 1.27 60 81 21 43 18 68 74 22 43 24 65 83 22 36 30 80 3 61 28 14 21 58 50 -37 20 23 61 53 -35 21 23 68 61 -34 25 25 4 0.426 0.161 34 49 49 56 -16 41 51 46 52 -05 37 51 53 54 -07 64 52 85 5 0.0032 0.0007 20 30 -35 -13 47 21 28 -44 -21 58 16 30 -52 -36 61 40 26 -65 -20 sociology of law 1 6.45 1.37 43 70 07 28 23 52 68 12 33 29 48 70 11 25 32 55 66 18 39 24 2 6.44 1.24 58 72 14 38 24 67 64 20 41 26 64 75 14 30 35 66 71 27 52 26 73 3 48 26 04 16 48 39 -32 14 14 63 55 -37 04 11 65 58 -37 10 16 65 49 -51 16 12 4 0.341 0.129 23 43 36 41 -09 35 42 54 62 -08 24 38 49 52 -05 30 41 55 55 -23 57 39 82 5 0.0036 0.0007 21 19 -36 -15 43 13 22 -43 -22 55 20 24 -53 -37 59 15 17 -47 -22 62 33 21 -76 -29 penitentiary law 1 7.03 1.67 31 65 21 34 06 48 65 20 36 17 45 68 15 25 24 51 65 21 41 18 52 59 19 36 07 2 6.88 1.28 60 73 20 43 20 70 75 21 43 27 66 82 17 33 33 66 72 25 51 27 65 74 12 39 22 73 3 47 25 07 17 36 30 -23 18 15 57 49 -35 11 14 58 52 -25 12 17 53 42 -34 05 12 70 59 -46 19 13 4 0.307 0.116 27 50 36 43 -05 39 49 49 57 -04 33 48 45 49 09 37 48 47 54 -06 37 42 59 69 -15 66 50 78 5 0.0039 0.0006 20 26 -12 05 29 14 30 -34 -14 50 18 31 -39 -25 51 21 25 -28 -03 51 34 28 -43 -14 58 38 33 -66 -10 _____________________________________________________________________________________________________________________________________________

Note. 1 = mark obtained on scale 1-10, 2 = corrected mark expected on the same scale, 3 = hours spent in preparation, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, m. = mean, s.d. = standard deviation; ccorrelations are written without decimals; expected marks are corrected by the mean difference between marks obtained and marks expected on the six tests.

Table 5 (extended)

Error terms using obtained or expected marks

_____________________________________________________________________ system / actors sum error^2 for analysis based on n ------------------------------------- marks obtained marks expected ---------------------- corrected uncorrected --------------------------------------------------------------------- 6-test system 8455 8254 7476 185 students 437 437 365 179 teachers (tests) 8018 7817 7111 6 --------------------------------------------------------------------- every test a system ------------------- general introduction 502 458 388 180 students 502 458 388 179 teachers (tests) 0 0 0 1 constitutional law 1371 1265 1252 180 students 630 617 537 179 teachers (tests) 741 648 715 1 private law 4673 1422 1384 180 students 740 696 584 179 teachers (tests) 3933 726 800 1 history of law 6196 3339 3579 180 students 786 678 617 179 teachers (tests) 5410 2661 2962 1 sociology of law 3243 2685 2877 180 students 757 741 688 179 teachers (tests) 2486 1944 2189 1 penitentiary law 1232 814 766 180 students 722 701 638 179 teachers (tests) 510 113 128 1 _____________________________________________________________________

Note. All sums are multiplied by 107. Under 'every test a system' the entries essentially are the sum of squared errors for students, the errors for the teachers being negligible.

Understanding grade retention, drop-out and study-delay as system rigidities

The general problem in the educational field is how individual behaviours of students and teachers connect to characteristics of the educational system such as the attrition rate, with the direction of influence obviously going both ways. Recently Coleman (1990) presented a theory of social systems that connects behavior of individual actors with phenomena occurring at the system-level of the social system involved. The theory of Coleman has its roots in micro-economics, and is conceptually very different from traditional methodological approaches to social and educational phenomena. The competitive market in micro-economics is one of the few models where behaviours at different levels of aggregation are meaningfully connected with each other, so why not try to model educational system like market places where students and teachers exchange their resources, the resources of course being time and marks respectively. The paper explores the possibility to apply this model to educational data, in this case data from the Department of Law of the University of Amsterdam. In a companion paper in the section Higher Education this application and its implications for educational policy are reported on more fully.

I will explain Coleman’s approach in the terms of the particular example used, the first year of the study in Law. The basic idea is that students within a social system exchange their time for marks in ways that maximize satisfaction with the total number of marks received and the time kept for private use. Coleman uses the basic premise of micro-economic theory about human behavior: the higher the marks obtained, the less of the still available time the student will be giving up to get still higher marks. Coleman uses this and other strong assumptions from micro-economic theory to model the transactions taking place in the educational system, and to present a mathematical structure isomorphic to the structure of transactions. Given empirical data like marks obtained and time spent, the model allows estimation of a conceptually very different set of variables: the interests in marks obtained and interests in alternative uses of their time are characteristics of individual students and teachers, the exchange rate of marks and time is a characteristic of the educational system.

*perfect market*

The assumption is made that there is a competitive market, and therefore a common rate of exchange for all students and teachers. The assumption is necessary to be able to estimate the rate of exchange, given data on time spent and marks obtained. The relative values for the two resources in our market, time and marks, might be .6 and .4, under the constraint that their sum is 1. Before exchange the teachers together control the whole budget of available marks, the students still control all of their time budget. I will say that the power of the teachers before exchange is .4, i.e. equal to the budget of marks evaluated against their market price. The power of the students is .6. The assumption of the competitive market implies that the power of students and teachers after exchange still is the same, .6 and .4 respectively. Because everyone is trading at the same market prices, and because these maket prices are competitive, nobody wins or loses. Of course in reality some actors do win or lose. The values for time and marks may be determined by minimizing the violations of the assumption of a competitive market. Coleman (chapters 25 and 26) presents the mathematics for minimizing the sum of squared errors, an error being the difference of the power of a student before and after exchange.

The basic idea is that marks and time have a price lable attached to them, the prices or values having established themselves over many years, and that the prices or the exchange rate can be estimated given an appriopriate set of data.

*satisfaction*

If nobody wins or loses on this market, unles it is an error or accident, why is that students and teachers still are in business? The answer is that people maximize their satisfaction, or utility if you want to call it that. Students can gain in satisfaction by exchanging an appriopriate amount of their time against an apprioriate mark earned on a particular test.

It would be nice to have a mathemtical expression for this satisfaction. Coleman’s choice is an analogue to the Cobb-Douglas production function

U = Π c_{ij} ^{xji}

where c_{ij} is the control person i has over good j, and x_{ij} is the interest person i has in good j; controls as well as interests are constrained to sum to 1. Now this utility function is remarkably different from the usual type of utility function in test-based decision making (van der Linden, 1991, reviews the latter). For one thing, it is possible to determine this function using empirical data on time spent and marks received. For every student the relative interest in time and in marks can be derived from available data; the controls are the number of hours the student has managed to keep for himself, and the marks he has obtained, expressed as proportions of the total budget available. These interests are characteristics of the individual person, knowledge of these interests might be of interest to educational policy makers (Coleman, p. 718).

This kind of analysis shows every actor within the system acting under influence of important system characteristics. Traditional techniques of data analysis (experimental and correlational methods) treat every individual observation as independent of other observations, denying from the very start the possibility of dependent, competitive, 'systemic' behaviors.

Table 1 gives the results of a Coleman-type of system analysis on the six tests of the first year examination. Here the six tests together are taken to be the system. Every test is represented with seven parallel forms because the data are assembled in the seven courses from 1983 to 1988. The 179 students are not a representative sample: they are chosen because they participated in all six tests and complete filled out the six questionnaires. The results for this group of students are not valid for first year law students in general. The mean mark received by this select group is 6.63, an outstandingly high value, I can assure you.

Table 1

Analysis of a six-test educational system: the first year Law examination 1983 - 1988

_________________________________________________________________ mean | sum n ---------- | ------------------------ | control | ----------- actors mark time| marks time power error2 -------------------------------------------------------------- students 6.63 783 | .626 .544 .579 44 179 teachers 3.37 657 | .374 .456 .421 802 6 _________________________________________________________________

Note. The values for marks and time are respectively .429 en .571. Error2 is times 106.

Students control .626 of the total budget of marks. This proportion is not equal to the mean mark received because of a peculiar characteristic of the marking systems. The marking scale runs from 1 to 10, suggesting 10 points can be earned while in reality it is only 9 points. Teachers control the complement of this, the marks they have not handed out. The students are able to keep a high proportion of the time budget to themselves. In this market the power of the students before exchange is equal to the value of time .571, because they control the whole budget of, in this case, 1440 hours; after exchanges the power of the students is .579, very close to the initial power. Remember that the students’ power is the sum of the marks en time he controls evaluated at market prices, in this case the exchange rate of .429 versus .571. Does the model fit the data? The assumption is that for every actor his power before and after exchange will be the same. Table 1 gives the sum of squared errors that is minimized by the the values for marks and time respectively of .429 and .571. Table 2 presents a breakdown of the errors for the six tests or teachers.

Table 2

The six-test educational system: results for the tests

_____________________________________________________________________ mean | interest power error2 satis- time ---------- | ----------- faction pro- | gram- test / teacher mark time | marks time med --------------------------------------------------------------------- general introduct. 3.15 161 | .281 .719 .089 298 .093 320 constitutional law 3.32 121 | .354 .646 .075 9 .075 240 private law 3.55 126 | .360 .640 .078 45 .079 240 history of law 3.66 90 | .448 .552 .065 43 .065 200 sociology of law 3.55 70 | .503 .497 .056 235 .057 200 penitentiary law 2.97 88 | .403 .597 .058 171 .058 240 _____________________________________________________________________

Note. The values for marks and time are respectively .429 en .571. Error2 is times 106.

Nothing new is revealed yet, but then in Table 1 only the assumption of a perfect market is used, that is the assumption of equal power before and after exchange. Assuming every actor has maximized his satisfaction, his relative interests in marks and time can be estimated. For the six teachers the interests and satisfaction are tabulated in Table 2. There is no equity in the satisfaction obtained by the six teachers, 'general introduction' obtaining a much higher satisfaction than 'penitentiary law,' the ratios between them are nearly the same as the ratios between their powers. Is there an explanation? Students have a time budget of 1440 hours, they are not constrained to spend a certain proportion of it on first or on any other test, so the way they distribute their time over the six tests could be highly revealing. The available time as programmed by the Department of Law is given in the last column in Table 1. Now it is clear the big loser is penitentiary law; it is the last test, many students might not be interested investing the time to obtain yet another high mark they do not need to pass the examination. On the whole, however, the power of the six teachers is highly correlated with the number of hours programmed for the course.

The interests and satisfactions are constructed variables that have discriminant validity, see the multitrait-multimethod lay-out (Campbell & Fiske, 1959) in Table 3 that gives the results for the students. Table 3 is based on the results of six separate analyses, each of the tests being analysed as a separate educational system. In this way there are six estimates of the interest of every student in marks obtained. In the two-variable system analysis the interest in time is simply the complement of the interest in marks, and is not tabulated.

There is a new empirical variable in Table 3, it is the expected mark, as reported by every student just before taking the particular test. It is not the expected mark itself, but a corrected expectation that is used in the analysis, because there is a somewhat limited varibility in the expected marks as reported by the students.

(* The correction is the mean difference between expected and obtained marks for the particular student. The motivation for the correction is threefold. (1) In a technical sense this correction would not influence the correlation of expected and obtained marks of the individal student over the series of six tests; this correlation is an important indicator of the ability of the student to predict his test results. (2) The second reason is that some students knowingly or unknowingly tend to systematically under- or overestimate their expected marks, legitimating a correction for this personal tendency. (3) The third and most relevant reason is that the system analysis might be done on the basis of expected marks in stead of obtained marks, and then the analysis based on corrected expected marks results in higher validities (Table 4). *)

(* Expected and obtained marks are two manifestations of the same construct. The obtained marks are the pure manifestations of this construct because of their legal meaning. The rather low correlations for obtained marks in the validity diagonals, contrasted with the high correlations for expected marks, should have the attention of the examination committee of the Department of Law. The total time is the sum of answers to five questions in the survey: time spent in direct preparation for the test, in attending lectures, in preparing for lectures, in attending seminars, and in preparing for seminars. Although there are substantial correlations between time spent on the different tests, the correlations with obtained marks as well as with expected marks are characteristically low, indicating that there must be an interaction with intellectual capacity. *)

Satisfaction should correlate with marks obtained and correlate negatively with time spent to study, and so it does (see Table 3, correlatons within tests). There is pattern of satisfaction being less determined by marks obtained as the year is progressing.

Interest in marks obtained is correlated .7 with these marks, and is correlated .5 with (corrected) marks expected.

Campbell and Fiske's criteria for construct validity are met.

(* (1) the bold printed entries in the validity diagonals are significantly different from zero (convergent validity), (2) with the already commented upon exception of the values for marks obtained they are higher than the values lying in its column and row in the 'heterotrait-heteromethod' triangles adjacent to the validity diagonals, (3) with a number of exceptions they are higher than the corresponding correlations in the hetrotrait-monomethod triangles, and (4) the same pattern of interrelationship is shown in all of the heterotrait triangles of both the monomethod and heteromethod blocks. The exceptions regarding criterion 3 are an artefact of the method of estimating interests and satisfactions, this method necessarily resulting in hogh correlations with marks obtained.*)

Clearly the theoretical constructs 'interest in marks obtained' and 'satisfaction' have construct validity. More importantly, they have discriminant validity in relation to the three empirical variables. As such they carry the promise of being a significant new instrument for the educational researcher.

Students exchange their time against marks, but obviously in this continuous proces of exchange the only thing they can get in direct return is a higher expectation for the mark to be received for the test. The model should fit better using the expected marks than using the marks that in fact are obtained. Because empirical data on expected marks are available, the prediction can be tested. Two criteria for fit of the model are: (1) the validity of the new constructs 'interests' and 'satisfactions', and (2) the sum of squared errors. Table 4 presents the data on validity, revealing distinctly higher validities for interests and satisfactions.

Table 5 summarizes the results on the fit of both analyses. Remember that error is the difference beween power before and after exchange, the assumption being that in a perfect market the individual assets evaluated at market prices remain the same. The correction of expected marks, being based on the mean of the marks obtained over six tests, through the back door introduces the obtained marks again; therefore the analyses were repeated using the uncorrected expected marks. The reduction in errors is very marked for corrected as well as uncorrected expected marks.

Table 5

Error terms using obtained or expected marks

_____________________________________________________________________ system / actors ∑ error2 for analysis based on n ------------------------------------- marks obtained marks expected ---------------------- corrected uncorrected --------------------------------------------------------------------- 6-test system 85 83 75 185 students 4 4 4 179 teachers (tests) 80 78 71 6 --------------------------------------------------------------------- every test a system ------------------- general introduction 5 5 4 180 constitutional law 14 13 13 180 private law 47 14 14 180 history of law 62 33 36 180 sociology of law 32 27 29 180 penitentiary law 12 8 8 180 _____________________________________________________________________

Note. All sums are multiplied by 105. Under 'every test a system' the sum of squared errors for students as well teaches is given.

What is the quality of the estimation of interests using the Coleman model? To answer this question a simulation study is done to see how good the Coleman model is at reproducing known interests. When the interests are known, for example by specifying a certain distribution for these interests, it is possible to determine the exchange rate of marks and time as well as the combination of expected mark and time spent in study that maximizes the satisfaction for the particular student (Coleman, p. 684 and 676 respectively, presents the mathematics). I will use the values .39 and .61 respectively for obtained marks and time spent in extra-curricular activities. Interests for time are generated by pseudo-random sampling (* from the beta distribution with parameters 12 and 8, having mean .6 and variance 2/175 ( the pseudo-random number generator ran3 and procedures for sampling from the beta distribution used are given by Press, Flannery, Teukolsky, & Vetterling (1989)). All individual differences between students are absorbed in the differences in interests; think of intellectual capabilities, study habits, social economic background, educational background, and aspiration level. *)

Table 6

Simulation results 100 runs, 100 students per run, both test and pretest 25 or 50 items

_____________________________________________________________________ vari- mean s.d. able 1 2 3 4 ___________________________________________________________________________________ test and pretest 25 items ----------------------------------------------------------------------------- 1 6.612 .206 2 6.571 .166 .672 (.0877) 3 79.108 2.308 .740 (.0462) .871 (.1434) 4 .398 .0121 .970 (.0073) .766 (.1145) .869 (.0278) 5 .0062 .0001 .518 (.1057) -.064 (.1005) -.066 (.1036) .325 (.1125) ================================================================================== test and pretest 50 items ----------------------------------------------------------------------------- 1 6.646 .186 2 6.590 .162 .751 (.0953) 3 79.634 2.300 .844 (.0309) .859 (.1385) 4 .404 .0117 .980 (.0048) .806 (.1183) .928 (.0163) 5 .0062 .0001 .451 (.1130) -.012 (.1223) .013 (.1236) .298 (.1181) ___________________________________________________________________-------__

Note. 1 = obtained marks, 2 = expected marks, 3 = hours spent, 4 = interest in obtained mark (the complement is the interest in time), 5 = satisfaction, mean = mean of 100 samples, s.d. = standard deviation of the sample means, cross-tabulated are the means of the product moment correlations and their standard deviations. Start values: mean interest in marks .4, in time .6, variance .0115; computed: value for marks .39, time .61.

The next step is constructive: given the expected mark, a process must be specified to generate an obtained mark. A crucial characteristic of the educational system is that the test functions as a clearing house for the exchange of marks and time. This clearing house can be modelled using a binomial model (Wilbrink, 1978, van den Brink, 1982) for the generation of test scores given the true mastery on the domain of questions that is sampled by the test. The crucial step then is from the expected mark to the true mastery. The trick here is to suppose the expected mark is the based on the score on a pretest sampled form the same domain of questions the test is sampled from. A beta density Beta(a,b) for the true mastery can then be specified, and sampled for the true mastery that will generate the score on the test. ( a + b = number of items in pretest, a / ( a + b ) = score on pretest; for the details see Wilbrink, 1978).

The results of the simulation stydy presented in Table 6 demonstrate that interests are estimated quite well, at least under the conditions established for the simulation. The mean of the interests is replicated, with a small standard deviation for samples of 100. The correlation between expected and obtained marks increases from .67 to .75 when test and pretest length are doubled from 25 to 50 items.

discussion

The results presented establish Colema's social system theory as a potentially useful tool for the educational researcher concerned with the kind of problems mentioned in the subtitle of the paper: grade retention, drop-out and study-delays.

(* Using Coleman’s model it will be possible to use micro-economic theory to derive predictions of which measures might have effect on the relative interests of students for marks and for time. Economic theory predicts that a 10% increase in the time budget will be distributed over time spent in study and time spent in extracurricular activities in the same proportion as applies to the old budget (income elasticity is 1). Teachers could decide to grade less lenient, also in this case students will correct their aspiration level for marks in such a way that they have to spend the same proportion of their time budget as before. It is evident much more drastic measures have to be taken to effect any substantial change in behavior of students as wel as teachers, measures that inflence directly the interests themselves. *)

A social system where marks are exchanged against time is extremely vulnerable to the weaknesses of assessment and grading. The scale of marks is extremely coarse, not allowing any fine-tuning between amount of study time and mark received. Educational testing itself functions as a kind of 'clearing house', to make possible the exchange of time and marks, marks being the valued scores of the tests used. Regrettably educational measurement experts, with few exceptions, as for example Van Naerssen’s (1974) work on models of strategic study behavior of students, are not concerned with this 'clearing house' function of tests. Poor predictability of marks (predictability by the student is meant here) quite possibly is one of the reasons for the observed exchange rate for marks against time in the law education case, where students are able to keep a vey substantial part of their time budget to spend it on extra-curricular activities.

Social system analysis re-establishes a fact that seems to be known to everyone except specialists in educational assessment: the difference between a psychological test and an educational test is that for the first the student prepares by sleeping well, and for the second by studying until a personally satisfactory level of mastery is reached. The characteristics of the educational system determine what the student thinks is a satisfactory level. Change this characteristic of the system, and the effectiveness of schooling will change.

companion paper see

http://www.benwilbrink.nl/publicaties/92ColemanApplicationECER.htm

Pascal program: contact me

Hubert M. Blalock, Jr., and Paul H. Wilken (1979). **Intergroup processes. A micro-macro perspective.** The Free Press.

- It's a pity I missed this one in 1992

Ulrich Trautwein and Oliver Lüdtke (2007). students’ self-reported effort and time on homework in six school subjects: between-students differences and within-student variation. *Journal of Educational Psychology, 99*, 432-444.

**abstract**- In the theoretical framework recent theory on the topic of time-on-task, its assessment, and its relations to achievement.

J. Sidney Shrauger & Timothy M. Osberg (1981). The Relative Accuracy of Self-Predictions and
Judgments by Others in Psychological Assessment. *Psychological Bulletin, 90*, 322-351.

Philip L. Ackerman & Stacey D. Wolman (2007). Determinants and validity of self-estimates of abilities and self-concept measures. *Journal of Experimental Psychology: Applied, 13*, 57-78.
abstract

May 20, 2013 \ contact ben apenstaartje benwilbrink.nl

http://www.benwilbrink.nl/publicaties/92ColemanModelingECER.htm http://goo.gl/Id1AU