One of the first teachers in Western Europe was Alcuin. He was invited by Charles the Great to ground an educational system. Alcuin's method of teaching was that of questions and answers, the questions and answers to be learned by heart, of course. In his time this probably was a quite sensible method. The scarce manuscripts that were available were almost unreadable so one had first to learn the text before the book could be 'read.' In the middle ages 'to know' was 'to know by heart' (Bolgar, 1954; Hindebrandt, 1992). This type of question-and-answering, the catechetical method, was still in use in university-level examinations in the nineteenth century (Foden, 1989). In present day standardized tests (USA) and university entrance examinations (Japan, see Rohlen, 1983) its remnants are still discernable, where testees must know definitions etc. by heart.
Easily the most fascinating form of assessment known is the medieval disputatio, already made famous by the alleged founder of the university of Paris, Abelard, by his disputational fights with William Beauchamps. It is a form of organised argument, a serious dispute with winners en losers. One had to prove his intellectual prowess in the presence of dignitaries of the university, the church, and the town. The propositions to be defended or attacked were new propositions with no known answers. The disputational method also was the scientific method of the day; logic was the instrument to be used (Kretzman & Stump, 1988). University examinations were disputations; being admitted to the examination by one's master in practice was a guarantee that one would get the licentiate. In a non-trivial sense the defending of a dissertation, although this certainly does not have the form of the disputatio, is a present-day equivalent. (Weijers, 1987; Lawn, 1993; Ahsmann, 1990). In the Muslim world of about A.D. 1000 disputations in law could make and break reputations of its participants, who had a very high status in Muslim society (Makdisi, 1981).
The Hanze city Zwolle at the end of the 14th century had a famous schoolmaster, Joan Cele, who attracted pupils from very far. Having only two assistants, Cele had to organize a school with 900 pupils. So he invented the educational system of classes, examinations, and grouping on the basis of level of mastery (not on the basis of age). His system influenced the Parision method of education, in its turn the basis of the influential Jesuit Ratio Studiorum. The historical ancestry of the dominant western educational method was discovered only in the sixties (Codina Mir, 1968; see aso Scaglione, 1986). This conception of the educational curriculum was conditioned on the lack of teaching manpower, and reduced the teaching load using peer teaching ('Helfersystem'). Imagine Cele visiting a school or university in 1995! Is it possible that we could learn something from Joan?
Even in medieval times there were traces of meritocratic assessment in the universities, but first and foremost the order in the examinations (locatus) was determined by birth, not by achievement. In the daily practice in his own house the master used some incentives; a prize being given to the student with the best, the asinus to the student with the worst performance. Later the humanists banned punishment and made much of the system of prizes to stimulate intellectual achievement. Late in the nineteenth century systems of ranking by order of merit (in achievement and behavior) gradually were replaced by systems using marks or grades. Marking systems were seen as 'modern,' as far as I know there were no compelling reasons given to replace the simple and transparent ranking system with a pseudo-scientific system that still essentially was a ranking system. The change probably was very much in the spirit of the 19th century. Present marking systems do not seem to have a respectable ancestry; many educational researchers wonder why marking systems are used at all, given the lack of absolute norms in education (for example Hartog & Rhodes, 1936)
In the 18th and 19th century there is a fascinating development of civil service examinations in the disguise of university examinations and even entrance examinations (the Prussian Abitur), especially in Germany and later also in England. In England Oxford and Cambridge had paved the way by instituting competitive examinations (the mathematical tripos). In France admission to and graduation from the School of Roads and Bridges was the key to positions of power. There is some speculation whether knowledge of the Chinese civil service examinations influenced this development; there was a China-mania in Europe in the 18th century, even philosophers like Leibniz and Voltaire participating in it (Guy, 1963). There is a definite link with state-formation in Europe, the need to have means to select more and more new civil service members and more and more to do so on the basis of merit, not (only) rank. Typically the outcomes of examinations were over-interpreted; in general one was not aware of the limited reliability of these examinations (Edgeworth, 1988, could not change this). Validity was no issue at all; in this and some other aspects European examinations were a match to the Chinese examinations (Ringer, 1969).
Already in the nineteenth century participation in higher education was growing. In the 20th century this growth continued, spectacularly so after WW II; it changed the character of meritocratic assessment (Wilbrink & Dronkers, 1993). Diploma's gradually became very important as tickets to attractive positions in society. The fallout of this development is that children of high ranking families also have to deliver; they have to compete now with 'outsiders' (Horowitz, 1985). Even more important: assessment is now used to legitimate the ranking and ordering decisions of the educational system. Present-day assessment is still basically humanistic, i.e. a 15th century method, rewarding achievement, and neglecting students with lesser achievements. The pressure to legitimate the sorting and selecting that is going on has resulted in the prominence of psychometric techniques that emphasize individual differences between students (Chapman, 1980), in stead of the intellectual growth of the individual student (Astin, 1990; Records of Achievement).
Earli 1995 Nijmegen Session 61 Paper
Assessment of student performance is a hotly debated topic these days, not only in in scientific journals and by special committees, but also in politics ('high stakes' assessment). In the U.S.A. the Ford Foundation established a 'National Commission on Testing and Public Policy' to investigate the impact of standardized testing on society; the commission issued a series of volumes, thematically organized, edited by Bernard Gifford (f.e. Gifford & O'Connor, 1992). In England there are positive (Records of Achievement: Hall, 1989; Jessup, 1991) as well as negative developments, the latter even leading to a teacher boycott of government-issued testing materials (Harrison, 1995). In the Netherlands mandated testing has been introduced in secondary education this year. In the U.S.A and the U.K. 'authentic testing' is a movement in reaction to standardized testing that wins many supporters (Berlak et al., 1992; Wiggins, 1993).
Even though there is a lot of reflection on assessment going on, the tendency is to focus attention on the methods of assessment, not on the 'why' of assessment, or on the question of how assessment has come to get the prominent position in modern society that it evidently has. In a devastating critique on marking systems, De Groot (1966) did not even mention the historical question, robbing himself of an opportunity to show how modern assessment practice contains many possibly atavistic elements. Hanson (1993), from the perspective of the anthropologist, and not limiting himself to assessment of student performance, critically treats the place of testing in modern (especially American) society. There are two powerful approaches to stimulate reflection on assessment: comparative studies and historical studies.
Comparative studies on assessment are scarce. On the interface between secondary and tertiary education there are the volumes edited by Clark (1985), Broadfoot, Murphy, & Torrance (1990), and Eckstein & Noah (1992), and the monograph by Eckstein & Noah (1993). These studies, by showing unexpected differences in assessment practice in different countries, certainly generate questions regarding the legitimacy of assessment practice in one's own country, but the answers to those questions demand analysis of the historical and cultural circumstances surrounding assessment practices elsewhere.
Historical studies have the power to directly illuminate contemporary assessment practice, for it generates hypotheses on the reasons why contemporary practice has particular characteristics. History is a powerful instigator of reflection on assessment. Of course many authors do pay lip service to this importance of history; so did Gifford by soliciting an historical essay by Webber (1989). The verdict, however, is that there is no systematic treatment of the subject available. Some authors come close, f.e. Smallwood's (1936) study on the history of examinations in the U.S.A., or Prahl's (1974) study on the history of examinations in Western Europe's universities and nation states. Essentially the story has to be collected from many different monographs, school histories, and studies on this or that aspect or period of assessment practice.
The purpose of this paper is sketch the major themes that belong to this yet-to-be-written history of assessment. Because of the dominant role of the universities in the history of assessment, it comes natural to speak here of a historical sketch of assessment in the university.
The university is almost the oldest institution in the western world. In its history of more than 8 centuries it has given the 20th century a number of traditions, among them traditions in the field of assessment, that gave contemporary western society its essential characteristics (Levine & White, 1986:1). So there is a reason for trying to understand the historical roots of assessment practices that we do not know the exact reasons for, such as marking systems, particular ways of putting questions and expecting answers, and the way professions are defined by university examinations.
The sheer age of some assessment traditions suggests that they are relatively immune to changes in the cultural environment, and that they will be resistant to planned change in our own time. School organization in forms (the graded school), and the perennial problem of retention, is a phenomenon that is not supported by educational theory and research, to the contrary (Shepard & Wilson,1989).; nevertheless, in the last half century there have been no successes in the battle against retention.
University examinations and the idea of the school form are obvious subjects for historical analysis. University examinations are as old as the universities themselves. The concept of different forms, and so the concept of a curriculum, is only six centuries old, as will be shown later. Now the examinations in the 12th century may share some aspects of form with contemporary examinations, but it is not to be assumed that they had the same function and meaning to the actors involved in the 12th as they have in the 20th century. Assessment practice must be studied in its historical context, in order to understand how a particular practice was a solution to problems and tasks as perceived by historical actors.
The reverse problem is just as interesting: the possibility that the solutions of the past have survived in contemporary traditions, while the educational environment does no longer pose the kind of problem that tradition was a solution for. It just could be that our uneradicable habit of ordering and ranking students is such a solution to a problem that no longer exists, or no longer is a legitimate solution to the original and still existing problem.
The concept of 'assessment' will intentionally be left vague, leaving it to the historical analysis to give it shape and content. The resulting concept might be different from textbook definitions of assessment, and surely will be different from the American concept of 'educational measurement' (as given an operational definition by referring to Linn's (1989) Educational Measurement). That it would be futile to attempt to define assessment on the basis of contemporary practice may be illustrated by the fact that in the medieval university the disputation was a prominent part of the examination, an activity that has no parallel or analogue in contemporary education.
Education in the middle ages may be characterized as 'teaching' students to learn sacred and other texts by heart. To know something was to know it by heart (Riché, 1989 p. 218). In the early middle ages the texts to be learned were religious texts, and most of the learning took place in monasteries and convents. There was an urgent motivation to learn the Holy Scripture and other religious texts, because doing so made it more likely after one's death to be admitted to heaven. Not only the scarcity of manuscripts forced the monks to learn the scripts by heart, also medieval manuscripts were difficult to read: one had already to know the text, in order to be able to correctly read it (Lowe, 1926; Bolgar, 1954:111). In the Moslim world manuscripts were ambiguous because they consisted only of consonants. The Moslim student had to give proof by recitation to his master that he 'knew' the text; that being the case, his master would authorize him to teach the text (Berkey 1992:29 ff). One was educated to preserve the cultural heritage, and to be able to pass it on(Berkey, 1992:23, 24, 27). The cultural heritage consisting of religious text, studying was a religious activity (Berkey 1992:55)
Meditation was an important activity for the monk, meditation in this age consisting of the recitation of religious texts. Holy texts, of course, were written in Latin. So one had to study Latin grammar in order to learn to understand and to speak Latin (Coleman, 1992, p. 144). The study of grammar consisted in the learning by heart of famous grammars dating from the Roman Empire, or simpler textbooks used for beginners (Mostert, 1995: 110; for the late middle ages see Post, 1954, Frederiks, 1960:74). These grammars were written in question-and-answer style, which in antiquity was a familiar style, also in the Bible, e.g. the questioning of Adam and Eve by the Lord (Viola1982:12). Memory could use some support, so many manuscripts had illustrations that served as mnemonics. The 'art of memory' (Yates, 1966) was practiced widely, the Jesuit Matteo Ricci even tried to convince the Chinese of its use in preparation for their exams (he got praise, but no following). There have always been teachers who emphasized understanding next to the learning by heart, and especially humanists like Erasmus (Bot, 1955, p. 54 ).
Assessment of learning of necessity took the form of recitations, and of answering the questions as posed in the grammar that one used (Bolgar 1954:111). The arts examinations at the medieval universities consisted mainly of very simple questions and answers (Lewry, 1982:116). Students also would question each other, using the questions from the book, the questions being also learned by heart. Questioning and answering was the dominant didactic form in teaching and learning. Knowing the right answers to questions about religious texts was of course extremely important. Out of this kind of questioning grew the catechismus, and in its wake the catechetical method. At Leyden in the early 17th century, Arminius had to undergo an examination of two days before he was appointed as professor in theology: he had said nothing during that examination that could be interpreted as being inconsistent with religious belief or the catechismus (Schotel, 1875:76).
These archetypes of assessment were still dominant in education, also in higher education, as late as the 19th century (Foden, 1989:12). In Harvard ca. 1650 mornings were used for recitations (Rudolph, 1977:31). Only in the second half of the 19th century did the American Colleges replace the recitation method by lectures or 'group discussions' (Rudolph, 1977:144). The recitation method was a combination of learning and examining. In the American colonies the examining part was in fact non-existent. According to Rudolph: 'The colonial college student was essentially ungraded and unexamined. (...) public oral examinations were gestures in public relations and therefore not designed to show up student deficiencies.' Only in the 1830's written examinations were introduced at Yale and Harvard. (Rudolph, 1977:146), somewhat earlier in Oxbridge (Rothblatt, 1974:290, 294) .
The first written examinations in Oxbridge in a sense followed the catechetical method, because no questions were put that allowed different interpretations (Rothblatt, 1974:292-94): 'the way to achieve a more accurate and certain means of evaluating a student's work was to narrow the range of likely disagreement and carefully define the range of likely disagreement and carefully define the area of knowledge students were expected to know.' One and a half century later the Japanese are masters of the art of trivial questions of places, events and names, even in the university entrance examinations which in Japan are decisive of one's future career (Rohlen, 1983:61; Bowman, 1981:305; Takeuchi, 1991:109).
It is the experience of almost every living adult in developed countries that even today a substantial part of all questioning and assessment in education essentially is recitation and giving the 'right' answers to known (types of) questions. Most standardized tests only count the proportion of 'right' answers. The difference with the middle ages is that the salvation of one's soul no longer seems to depend on knowing the right answers, so the proportion of right answers typically is a bit lower; educational measurement specialists intentially keep it low, their professional peace of mind being disturbed by anything that approaches 100% correct. The reflective question is: what is so holy about our learning material that it legitimates testing its verbal reproduction?
Important priciples of curricular and school organization were developed by Joan Cele, rector of the Latin school of Zwolle, a Hanze town in the Low Countries, in the period ca 1375 to 1415. Cele, a famous teacher, had to run a school with 800 to 1000 students (Post, 1954, p. 99) in a town with only ca 5000 inhabitants (Frederiks, 1960, p. 16). Many of these students were 'externs,' coming from Utrecht, Luik, Flanders, and the German countries. Cele hired two parisian masters in the arts to teach philosophy in the highest two forms. However, most of the students were in the lower forms, learning Latin and its grammar. Cele solved the organizational problems posed by the sheer number of his students by imposing a strict division of students and curriculum in eight different forms, an important innovation. Post gives details on this graded structure of the many Latin schools in the late middle ages in the Low Countries. At Paris the curriculum of the faculty of arts in the first centuries of its existence was essentially unordered, there being no prescribed order in which the lectures on the examination books were to be 'heard'; an indication is that examination compendia circulated that tried to order the material for the student about to be examined (Lewry, 1982:102).
Cele's school forms implied examinations (twice a year) for promotion to a higher form (Codina Mir, 1968:172-3). For the lower forms the exam was a recitation to check on the achievement of the task posed in that form; in the higher forms Cele also looked for insight (Frederiks, 1960:86). Still being confronted with forms of up to hundred students, a further grouping was introduced in decuriae, groups of circa ten students (Frederiks, 1960:66; Codina Mir, 1968:60). Each group had a leader, responsible for learning and discipline, who was changed every week.
Why are Cele's innovations important? Cele's students introduced these innovations in schools all over Europe, and also the university of Paris adopted Cele's didactic principles. The Jesuits, whose Ratio Studiorum was inspired by this modus parisiensi, definitely established this pedagogy in Europe's schools and universities, not only the Jesuit ones (e.g. Codina Mir, 1968:319 ff). Joan Cele single-handedly established the European model of the graded school, of examinations for promotion, and of ranking of students on the basis of merit (Frederiks, 1960: 66; Codina Mir, 1968: Ch. III, IV; Scaglione, 1986:12; Frijhoff, 1992b:10). The historical importance of Cele's innovation is only recently reveiled by the work of Post, Frederiks, and Codina Mir. For example, Philppe Ariès (1960) presented a meticulous study on the evolution of the graded school, unaware of the source of the innovations in Zwolle in the fourteenth century.
The medieval class contained pupils of different ages, and for different durations. Contemporary school organization certainly is based on the ideas of Cele, but now classes (or forms) are constituted bureaucratically according to age and duration of stay (a sharp attack of Paulsen,1885:621, on grade retention as a consequence of bureaucratic rules, where earlier students were promoted on the basis of their learning potential; Ingenkamp, 1972:24, 42). The educational philosophy legitimating the modern school organisation after the model of the standing armies of the newly formed states, was already formulated in the 17th century by Comenius (Ingenkamp, 1972:16).
The university of Paris in the middle ages was an organisation of masters, in contrast to the university of Bologna that was an organisation of its wealthy students. Most of the Parisian masters, however, were masters of arts, and at the same time were students in one of the 'superior' faculties of law or theology. Nobody could be a student in Paris without having a master (Thorndike, 1944:30). So the first thing the newly arrived woud-be student had to do, was to seek himself a good master. The master was responsible for his students, saw to it that they spent their time in study and not in idleness, set daily exercices and heard their recitations (Schwinges 1992:223) ; he put students in competition to each other by explicit praise for the student with the best achievement of the day, and blame for the student that blundered worst: the last one got the asinus, that is the cap with earflaps. Assessment was part and parcel of the daily life of the medieval student (Cobban, 1975, p. 209)
A major responsibility of the master was to nominate his students for examinations, but only if he deemed them ready to take the exam. Examinations were formal and public events: failing a candidate was an extremely rare event (Schwinges (1992, p. 235), and even then the reason would be the moral behavior of the candidate. The candidate was questioned on his knowledge of the books that were prescripted, had to deliver a lecture on a text that was only hours before stated to him (Weyers, 1987:390), and he had to take part in a public disputation. The examination was true to life (authentic): the candidate had to give a proof of what the examination would qualify him to do: to lecture. In the early German universities the 'propedeutic' arts examination tested students on questions and answers that were extensively practised in the years before (Schwinges, 1986:356). Also the level of the arts examination was surpassed by that of many schools, such as the schools of the Brethren of the Common Life (Schwinges, 1986:336). Already in de 13th century 'examination compendia' were available for the candidate for the arts examination (Lewry, 1981). These compendia fulfill a universal need of students: also in China there was a publishing industry of books with questions and answers and model-poems from the civil service examinations (Hu, 1984:13).
A master tutoring the group of students entrusted to him is still the same form that centuries before was found in the monasteries. What really is new, and characteristic for the university as a wholly new institution, is the examination by a committee of other masters, acting in behalf of the representative of the pope, in this case the chancellor of Paris. In the words of Verger (1992a:43): 'More particularly, they were the only institutions - and this was one of the great innovations of the medieval university system - to link teaching and examinations closely together.' The prize is the 'licence to teach,' a certificate that was valuable because it enabled the licentiate to teach anywhere in the Christian world, and to attract students (Bannenberg 1953:12). Until the rise of the university, the authority to teach was self-declared, or based on a written statement from one's own master, and the license to teach was given temporarily by the local representative of the church (Verger, 1992b, p. 145). The genesis of the examination coincides with the loss of the absolute autonomy of the individual master, around 1200 (Rüegg, 1992, p. 23). The individual master became dependent on his examining colleagues: only they could recommend his pupil for the 'licence to teach.' Another way to describe the introduction of the examination is to say that the chancellor of Paris lost his autonomy in the appointment of university teachers: there had to be an examination of the candidate by a (committee of) the professors of the university. In the case of posts in theology there was even outright comparative selection because in the beginning of the 13th century the pope had limited the number of positions to only eight (Rashdall 1895/1987:i 466). Gradually there grew a distinction between the examination and the appointment as a master (Moraw, 1992, p. 247, 254; Rüegg, 1992:20). Still later the examinations qualify for a certain profession, but do not give entry to that profession: an academic grade is only one of many qualifications, descent and wealth being the more important ones (Moraw, 1992, p. 246). The new institution and its examinations for the first time in Western history defined what knowledge was, thereby also encouraging professionalization (Bullough, 1978:508). How characteristic this has become of Western society, is documened by Stehr (1994) in his Knowledge Societies.
The university examination was a new institution, having no model in the past, nor in any other country. Webber (1989:36) suggested that the sudden appearance of examinations was influenced by contacts with the Chinese. There are two problems with this hypothesis: at the time there were no direct contacts with the Chinese, and Chinese examinations have no resemblance whatsoever with the new university examinations, except assessment by a committee independent of the teacher of the candidate. The traffic of scientific ideas and technology between China and Europe is the theme of the work of Joseph Needham; technological ideas did find their way from China to the West, scientific ones did not (Needham, 1954 volume 1, p. 222). The Muslims kept in contact with China during the period the West had lost that contact, but there is no impact whatsoever of the then existing Chinese examinations on higher education in the Muslim wordl (Needham, volume 5, part 4, p. 388-509, esp. p. 416 ff). Another possibility is influence from practices in higher education in the Moslim world, but there the individual masters were strictly autonomous in licencing their disciples; the conlusion of Makdisi (1981) is that the organizational forms of the universities were real innovations. Webber is not the only author that is misled by misinformation on the medieval examinations: Wiggins (1993) repeatedly speakes of 'secret' examinations, in sharp contrast with the historical reality of examinations that were held in public, with pomp and circumstance.
The methods of lecturing and studying made it necessary to 'hear' the lecture series on a particular book more than once, before one had a reasonably sure knowledge of the text and its commentaries, so the regulations of the university stipulated the minimum number of times to hear the lecture series on every book in the examination (Thorndike 1944:227), making repetition a natural characteristic of education, in the universities as well as in the schools, and contributing to the very long duration of studies (Verger, 1992a, p. 43). In the middle ages there was no such thing as 'retention' in a certain form, one stayed as long as was necessary. In contemporary education, however, the notion that retention is nothing to really bother about is contradicted by the research evidence, as already mentioned in the context of Cele above.
Order of merit in the middle ages was based on one's position in society. The right order was extremely important, even in sitting positions at daily lectures; rich students could buy themselves a place in the 'noble bench' (f.e. Schwinges, 1992: 203, 205 ff)). Also the order of merit at examinations, the locatus, is first of all an order of social merit, otherwise it was determined by the objective criterion of length of study (the longer the stay, the higher the place) (Rashdall, 1895 i:459; Schwinges (1986:355; 1992:234). Of course, many students did not even have the intention to go for the arts examination, see the description of five types of medieval student by Schwinges (1992, p. 196) . The conclusion is: yes, there was an honours list for every examination, but the place on the list had little or nothing to do with academic merit. In the medieval university merit in the modern sense of academic achievement was important in daily practice, but was not explicitly recognized in the examinations.
The disputation is the high mark of medieval education, as well as of medieval theology and philosophy. Famous are the disputations between Abelard and William of Champeaux; Abelard describes in his autobiography the flavour of the times, and the details of his contests with Willam (Thorndike, 1944, p. 3-6). Abelard, of noble descent, changed the martial arts for the art of disputation, in a sense a martial art of the mind. These disputations attracted large numbers of 'students,' and marked the beginnings of what would become the university of Paris, in the perception of its early members the university in Paris (Ferruolo, 1988, p. 24). Of course there are many different forms of disputation, and over the centuries there have been important developments in techniques and in traditions (Bazan 1982:31). A description of the typical dispute in the university is given by f.e. De Rijk (1977: 129); Lansink (1967, 174,188). A disputation was a major event: all other activities in the university were cancelled so as to give everybody the opportunity to attend. The pièce de résistance of the disputation was a theorem or problem posed by the master who chaired the disputation. The position of the master was to be defended by one of his students (the respondens), and could be opposed by other masters and students. The disputation could last the better part of the day, or even the whole day. The next day the master would give a summary of the arguments pro and contra, and indicate why the opposition failed and what the conclusion or solution (determinatio) of the problem should be. For the respondens participation in the disputation was part of his examination requirements.
Most of the time, the disputations were exercises intended to sharpen the wits of the participants, and as such they were related to the didactic form of questions and answers (Hadot). In rare cases the problem posed was a sincere problem eagerly waiting for a solution; here the disputation was a method of finding new secure knowledge. In the middle ages the disputation was the only method to develop new knowledge, and to critically analyse newly translated or discovered theories (Piltz 1981:274). A contemporary example is the practice of law in the Courts, resulting in jurisprudence (= new knowledge on what the law means in particular circumstances). Also in the Muslim world in the 11th century the disputation was an important instrument in he development of Muslim law, and for that reason an important method in higher education (Makdisi, 1981); Makdisi, in good disputational style, posits the primacy of the Muslim disputational form over that of the later European universities. In the development of logic the disputational method was crucial, as described by Kretzmann & Stump (1988, p. 6)
There is an extensive body of literature on the disputation. Many reports have been preserved in the particular literary form of the report as authorized by the master, most of them in manuscript form, only some in printed form. McDermott (1993) presents in his anthology of the works of Thomas Aquinas a number of quaestiones disputatae. In this anthology there also is a lecture of Thomas; in this lecture one can find the same elements as used in the diputation: arguments and counterarguments, conclusions and refutations. In the field of logic a number of disputations, and an introduction to the genry, are to be found in Kretzmann, Kenny & Pinborg (1982). Lawn (1993) treats the disputation in medicine and science, shows its essential place in the development of science, and gives some examples (in Latin). References to the literature are to be found in Weijers (1987) and Les genres littéraires dans les sources théologiques et philosophiques médiévales (Louvain-la-Neuve, 1982). The authority of the master giving the determinatio was an important aspect of the disputation (Makdisi 1974:660), so the disputation was not always a perfectly rational discussion.
Little is known about the role of the disputation in the instructional process, 'about how students were taught' (Perreiah, 1984: 85). Perreiah gives details about how about ca. 1400 'trial disputations' were delivered: under very strict rules, more resembling a game of chess than a contemporary court session. The rules were the rules belonging to the particular type of trial disputaton: the obligation or the insoluble, and of course the rules of logic. In the context of the trial disputation Perreiah, following Aristoteles (Topics 159a 250), explicitly speaks of an instrument to test the knowledge of the participants.
In the Jesuit schools the disputation was an instrument to rank students according to merit (Compère, 1985:83): the lower ranked student could 'win' the rank of his adversary, and vice versa. Winning or losing was determined by the number of errors made by each contestant. This was also the practice in the Latin school of Sturm (Codina Mir 1868:173). This kind of ranking by competitive disputations was also known in late Antiquity (Lim, 1995), and in the Muslim world about ca. 1000 (Makdisi, 1981).
Disputations kept a prominent place in the university curricula and examinations until the 18th century. In the 18th century they became more farcical, in the 19th century they gave way to modern forms of examination. The Harvard curriculum midway the sixteenth century had disputations scheduled for the afternoon classes (Rudolph, 1977:31). In Leyden early in the seventeenth century disputations took place once in every two weeks, the very rude discussions regularly resulting in a serious scuffle (Schotel, 1875:332). In Oxbridge in the eighteenth century students and faculty no longer took disputations seriously, although only halfway the nineteenth century they disappeared altogether (Rothblatt, 1974:288, 290) .
The disputation has not survived in the university, but maybe there is a modern educational equivalent that shares some essential characteristics with it. Disputations were public events, and because of that the participants must have been highly motivated to do a good job and to give a good public impression. Assessment in this case also was self-assessment and assessment by one's public. The disputation has been replaced by examinations in question-and-answer style. In other disciplines than law and theology there is a modern equivalent to the disputation: scientific research and all the preparation for it that goes into modern secondary and higher education. In this sense there is a continuity, for the disputation was the only scientific instrument in the middle ages, and many other techniques and methods of scientific research have been added since the middle ages. To be able to do scientific research demands extensive preparation in mathematics, statistics, discipline-specific research methods, and in the peculiar scholastics that has developed around reporting and publishing research (f.e. in psychology: Madigan, Johnson, & Linton, 1995). The assessment characteristics are very much alike those of the disputation: reporting is public, examination regulations specify the student must have done research or participated in it, and standards for good practice are explicit and objective. There are good reasons, to accord research activities a larger share in the examination requirements at the expense of the part testing plays.
A perennial problem in education is to keep the student's attention on the educational tasks. Punishment is one of the instruments that traditionally was used to this purpose, often taking the form of punishment for non-disciplinary behavior. The heads of medieval schools and universities were empowered to punish their students, even for crimes committed outside the school. Punishment was daily routine for the medieval student. Already in the 11th century Egbert, a teacher in Luik, criticized the harsh punishment in the schools of his day (Schoengen, 1911:316); 14th century Joan Cele was mild in his punishments (Frederiks, 1960: 56). The humanists propagated competition and reward instead of punishment as a motivating instrument (Bot, 1955:56 ff; Codina Mir 1968:174). Scaglione (1986:13) sees a connection between the emergence of these new ideas and practices and the innovations of Joan Cele. Scaglione also points out (1986:93) that in the Renaissance there was an extraordinary eagerness to learn, in contrast to the periods before and after. The influence of the humanists led to a system of prizes for the best students of the class that dominated Western education until deep in the 19th century, and still has its remnants today in many prizes that are rewarded for the best achievement, dissertation, article by junior researcher, etc.
In order to be able to reward the best student, one should know who he is, and preferably one should have some rules to rank order students for this purpose. The prize mechanism implied the bookkeeping of achievement throughout the academic (half) year; points could be earned bij good behavior, or lost by making academic mistakes as well as by bad behavior. The prize system is a driving force behind the development of systems of notae and, in the 19th century, of marks. Already the schools of the Brethren in the late middle ages had an elaborated sytem of ordering students according to merit; the examinations were used to determine the ranking according to merit. Students even could challenge the rank given them, in which case a contest between the student and the next better ranked student was held (Codina mir, 1968:173). Haskins (1923/1965:74) gives the example, from a 15th century student manual, of the daily disputation held by the master with his own pupils, where a prize as well as a symbolic punishment (asinus) was given for keeps until the following dispute; the same practice existed in Calvin's Academy of Geneva (1559) (Scaglione, 1986:46-7) Already then there was a practice of keeping a record of earned points or notae. Sgalione, 1986:47: 'Classes were divided into decuriae not by age or social rank but by merit and achievement. The decurio supervised all work, and punishment for intellectual sluggishness could take the typical form of the nota asini 'the ass's mark' or the nota sermonis soloecismi "the mark of bad Latin."' Centuries earlier in the Muslim world the same practice of ordering of pupils according to merit was to be found (Makdisi 1981:81, 91).
In Jesuit schools competition and ranking by academic merit was the bread and butter of the pedagogical program. Scaglione, 1986:74: "The decuriae were also related to the matter of grading, since the ordering of students by merit within the decuriae was the closest thing to individual grading. The Jesuits, as educators general before modern times, did not formally grade students' homework or even tests, but by their results they listed the students publicly in order of merit." From the seventeenth century lists with grades have survived (Compère, 1985:83).
There have always been objections to the prize system. In the middle ages Italian parents objected to the leniency of the system: they preferred punishments. The general objection was that the many students who would never be able to earn a first or second prize, were in fact neglected by this system of rewards. Then there were the objections against certain moral problems in the wake of the competition for prizes: fraud, malicious delight, high stress, lying (Fortgens 1958:115).
In England during the latter part of the 18th en the first half of the 19th century the university climate grew competitive (Rothblatt, 1982:17, 18 note 56). Written examinations replaced the examinations in front of a public or examinators. However, candidates were ranked according to achievement, and these lists of 'honours' candidates were made public (Rothblatt 1974:295). Candidates could hide their shame by taking a 'pass', in that case they were not ranked and their names were not mede public. At Cambridge until 1910 the participants in the 'mathematical tripos' were ranked according to achievement, de best achievement got honoured with the titel Senior Wrangler, the least one with a title as well as an attribute: the Wooden Spoon (Stoker, 1927, opposite p. 34, presents a photograph of the last Wooden Spoon, with his man-sized attribute). The competitive examinations in Oxbridge in the early 19th century put the students under great pressure (Rothblatt 1982:6).
In the 19th century ranking of students is still the dominant practice in secondary education (in France: Caspard, 1992, p. 19-23). According to Compère (1985:83) in France there was no marking system in use before 1850. During the 19th and 20th century the system of ranking and of notae was changed everywhere into a grading system based on marks. In Germany in the beginning of the 20th century class ranking was still in general use: Stern (1920) compared the scores on his intelligence test with rank in class, not with marks. Even so, high marks remain almost as scarce a good as the first or second place in the class order of merit (Deutsch 1979: 393-94 ). In the Netherlands the gymnasium of Groningen was the last school to change the use of notae and ranking lists for a marking system, doing so only in 1901 (Van Herwerden, 1947:41). For the United States the history of grading systems is well known (Smallwood (1936); Rudolph, 1977, p. 15-16, 147; Geisinger, 1982). In England the first example of marking examination papers is found in the mathematical tripos of 1836: 'Earlier examiners and moderators tended to rely on impression' (Rothblatt, 1982:14). A short description of the emergence of the marking system in England is given by Rothblatt (1993, p. 44); the competitive examinations in Oxbridge demanded objective assessment, and credible objectivity demanded the curriculum to be narrowed so as to be able to assess by using marks.
Marking systems differ from country to country (Lienert, 1987:40), while the basic (historical) ideas underlying them are the same everywhere in the Western world: the system of ranking stripped of its prizes, and pseudo-objectified by directly evaluating achievement on a marking scale. The modern difference between scoring and marking is helpful to see this. In the ranking system rank was determined by the summed scores (= notae) of all the students in the form. For that purpose notebooks were kept; in Groningen, for example, every student had a notebook wherein all notae were jotted down, not only those of himself, but also those of all other students (Van Herwerden, 1947:41; Rudolph, 1977, p. 147 for a parallel at Harvard). The notebooks in Western education resemble the Books of Merit and Demerit in China in the 16th and 17th centuries (Brokaw 1991); there probably is no link between the two systems. In the marking system the errors made in tests are to be evaluated in the form of a mark. With hindsight, the problem in that procedure is the lack of rules or standards that could make the translation from the number of errors to a mark objective in any real sense of the term (this is an enigma that researchers now try to solve using decision making theory). The conclusion must be that marking systems suggest the presence of norms that really are absent. Ranking systems do not carry that suggestion, so marking systems can be said to represent a regressive development. Educational measurement, based on the psychometrics of individual differences, can be said to be a restauration of the old ranking system in an aggravated form because individual differences here are made a goal in itself. The use of pseudo-objective marking systems made it possible for politicians to demand 'results' from the schools. Especially in the U.S.A. this has led to the growth of an educational testing culture that has disastrous consequences in terms of efficiency of learning processes and quality of curricula (Resnick & Resnick, 1985). Is there a contradiction between these statements about educational testing? Not really, because 'standardized' tests get their 'standards' from nationally representative samples of students. If this explanation is not convincing, it is because of the reasons summed up by Resnick & Resnick (1985). It seems we have not understood the impact of the many centuries of humanist ranking in education, and instead have hurried from one modern alternative to another.
Time-on-task must have been the crucial goal for the educators in the past, as it is today. Punishment and reward are given with the intention to stimulate the student to invest still more time in her study, or lose still less in idleness. In the middle ages the ways students spent their time surely was a problem to the responsible masters and school teachers: a major reason that every student should have a master in the first place! After all, students were generously exempted from productive work (Rüegg 1992:30; Schwinges, 1992, on student life in the middle ages). Yet from the 17th to the 19th century many students took all available time to themselves (Rothblatt (1974 303; 1982:18 note 56)), as did their teachers, for that matter. With the arrival of competitive examinations one's time once again became a scarce good. The competitive spirit, and scarcety of time, dominate campus climate in the U.S.A. in the 1960's: Becker, Geer & Hughes (1968) describe the connectons between grade point averages, competition, and the use of one's time. Studies like that of Becker c.s. reveal the mechanisms of the grading games played by students and their teachers, games that originated together with the universities.
Our modern examinations, and educational assessment in general, were formed in the critical period of the late 18th and early 19th century. The genesis of the modern examination system had much to do with the rise of the modern states in Europe. In fact, it was state influence that was the crucial factor in most countries, England being a special case because of the autonomous nascence of the Oxbridge competitive examinations, and the U.S.A. not yet participating in the this process of state formation. Any history of assessment worthy of the name should deal with this period, with the connections between the (state) universities and recruitment for the civil service and for the professions.
University enrolments in the 17th and 18th century were low (Rothblatt 1982:3), and in many countries in fact no examinations existed any more, or what was called examination was farcical (the American Colonies: Rudolph 1977:145; Germany: Prahl 1974:298); England: Engel 1974:307, Rothblatt, 1974: 247; the Netherlands: Frijhoff 1981, Wachelder 1991:70).
In continental Europe the general trend in the 17th and especially the18th century was that the state tried to get a hold on the universities and its examinations, entrance examinations included, in order to control the numbers and qualities of its civil servants (Wuthnow 1989:239; Frijhoff 1992; Lindroth 1976:101 for the case of Sweden). Where earlier one's family, wealth and relations were decisive to get lucrative govenrment positions, now merit is becoming the prime criterion. This does not mean that other factors now have become unimportant, or that elite positions are threatened by newcomers (Fischer & Lundgreen, 1975). The importance of merit surely does not mean that positions are now open to all talented: the costs involved in reaching competitive positions in education are so high that only the established elites and wealthy merchants can bear them, as was the case in the middle ages also (Schwinges 1986:5, 343 ). Only the 20h century will see the combination of merit and more equal opportunity.
The development of 'modern' examinations in England begins already in the first half of the 18th century with the institution of the Senate House examination at Cambridge, later to become the 'mathematical tripos' (Roach, 1971:13). The why and how of this development is unknown. Rothblatt (1974) presents a flood of relevant facts and interesting speculations. Prahl (1974:252) points out that in England the universities took the initiative, while on the continent the governments did so. Fischer & Lundgreen (1975:459) stipulate that Britain was relatively late in developing a civil service. Roach (1971:12) affirms the decisive role the English university examinations played as a model for the civil sevice examinations that were established in the middle of the 19th century. The pervasive influence of the university examinations is described by Rothblatt (1982:15): The Oxbridge model was followed in the schools, in military academies, in the system of local examinations and in the various branches of the civil services, excepting the Department of Education and the Foreig Office. Different career phases became linked together by the same examinations (...).
Present-day France pre-eminently is the country of the educational contest, the concours, for entrance to prestigious instutions and colleges. This tradition has its origin in a legate of Louis Legrand that started a yearly contest between 10 Parisian colleges in 1747 (Palmer, 1985:24). A concours nowadays is an ordinary examination that is used to select candidates for a limited number of open places (numerus clausus). In the later 18th century more examinations began to be used, and in a more stringent manner, for recruitment to technical institutions for the army (école du génie) and the government (école des ponts et chaussées), after the revolution the école polytechnique, much followed after by other European countries. (Frijhoff, 1992:1256 gives a sketch of French higher education in the 18th and 19th century). The whole point of the concours is that admission to a grande école, for example, will practically guarantee a prestigious job. In contrast to this the free admission to a university does not in itself guarantee anything. In France it was the government that made examinations, for the first time in French history, decisive for many a state career; for this purpose it instituted examinations that did not exist before in this form.
The Prussian rulers in the 18th century built the most efficient bureaucy of Europe. They instituted the earliest civil service examinations, also with the intention to break the monopoly of the aristocracy in high government positions (Prahl 1974:300). In the 18th century next to the traditional faculties of theology, law, and medicine, a study preparing for government jobs was instituted. To regulate numbers there came restrictions, also for the other faculties, taking the form of a final examination of the Gymnasium: the Abitur. In the 19th century the students in the government tracks and of limited means, the Brotstudenten, were cramming for their state examinations; this group was not sold on the Humboldtion ideal of Wissenschaft (McClelland 1980:200). Growing numbers of students in this century led to bureaucratization of the state examinations themselves as well, strengthening the natural tendency of the Brotstudenten to cram for their exams (McClelland 1980:277ff). In these strong developments in the 18th and 19th century the form and function of assessment in Germany were definitely set.
In England, as described above, in the 19th century assessment was in fact a mix of educational testing and personnel selection, as it still is today in the typical French concours for restricted entrance to prestigious institutions like the écoles normales supérieures. In modern terms, education was an investment in human capital as well as being a screening device. With the expansion of education especially after the second world war assessment of student achievement has seemingly lost its personnel selection functions, even to the point that wholly different bands of disciplinarians have emerged: educational measurement specialists doing the educational assessments, personnel selection psychologists the job assessments. In the process assessment is insulated from the use of validity criteria that in the field of personnel selection are self-evident and explicitly formulated in standards of professional bodies, such as of psychologist or personnel managers, if not in state legislation.
There have always been suspicions that examinations are far from acceptably reliable and valid. The first extensive analysis of the reliability of examinations was done by Edgeworth (1988; see also Roach 1971:283); he observed that examinations in fact worked out as weighted lotteries, one's chances of succes being correlated with one's abilities relative to those of others (p. 626). The weighted lottery for admission to university studies with a numerus clausus (medicine, dentistry and vetenary medicine, but sometimes also other disciplines) is practiced in the Netherlands, and is based on the kind of technical argument that Edgeworth immediately would understand. The Dutch parliament voted for a mix of the meritocratic principle (examination marks) and the equality principle (lottery among those qualified) (Frijhoff, 1992a).
The characteristic evolution in the 18th and 19th century is that assessment has become a serious matter. No longer is it only a question of honour to win the prize, now once's future career depends on it. No wonder the competitive examinations are going to dominate the educational scene: assessment now serves many other lords and interests besides those of the transmission of the cultural heritage. Assessment serves no longer didactic purposes, instead it dictates didactic purposes in the form of cramming for narrowly defined examinations. Rothblatt (1982) studied the stress that Oxbridge students experienced in their years of study early in the 19th century. From now on for most students only counts what will ultimately be tested. That bias has stayed with us, even in the movement for 'authentic testing' (Berlak c.s. 1992), that still is a testing movement, taking testing as a natural thing in education. Students are now treated in the same ways the labourers in the middle ages were treated by landlords with unfair or at least unreliable instruments to measure the fruit of their day's labour (Kula, 1986), with this difference that for the student the labour of many months or even years is evaluated, and his future is at stake. The amazing thing is that essentially no progress has been made the last one or two centuries in finding standardized measures that give students a fair treatment, in the way medieval measuring instruments and juridical procedures have been stripped of their shortcomings.
Because so much now depends on the outcome of examinations, the pressure is in the direction of kinds of questions that do not divide assessors, and on procedures of counting errors or assigning marks that give the impression of exactness. Assessors now stand on the side of the interests of the state or of the professional association, not any longer like the medieval master on the side of the student. Merit assessment has its price: an objectifying distance between the assessors and the assessed. Yet the same meritocratic procedures, once in place, made it possible in the 20th century to really offer educational and career possibilities to the talented from all classes in modern society, even though in the eyes of some this may have been a mixed blessing (Ringer, 1979, voicing this feeling).
In contemporary literature on educational measurement the Chinese imperial examinations frequently are mentioned as the first known written examinations (Webber, 1989; DuBois, 1965:8). These examinations gave entry to the civil service, and they were very, very selective. They were held once in every few years, in dedicated examination halls. The examinations were thoroughly meritocratic, reflecting the Confucian philosophy of the place of merit in an essentially hierarchic society (Ho, 1968:6). Though more then two millennia old, the examination in different periods knew different forms and functions. For the examinations of the Tang dynasty, ca. 800, see Des Rotours (1932) Waley (1949) and Herbst (1988); for the examinations of the Sung dynasty, ca 1100, Lee (1985); for the examinations of the Ming and Tsing dynasties, 14th century until 1905, Ho (1968), Miyazaki (1985). Miyazaki's title, 'China's examination hell,' adequately depicts the character of these examinations.
The differences in examination 'culture' in Europe between the early 18th and late 19th century were manifold, and in line with the chief characteristics of the Chinese examinations and Chinese bureaucracy: from oral to written examinations, from inconsequential examinations to explicit selection for the civil service, from only formal to intentionally competitive examinations, from small numbers to numbers of participants many times higher than numbers of available places. The resemblance of the European developments during the Enlightenment with the existing situation in the Chinese imperium, in the 18th century adored by many intellectuals in Europe, points to some influence of the Chinese model. During the 18th and 19th century many factors influenced the development in Europe towards competitive examinations, among them the achievement of free trade, a principle that also could be of use in government and educaion (Roach, 1971:16). Among the numerous factors mentioned in the literature, the availability of the model of the Chinese civil service examinations deserves special mention. It was widely known in Europe, and examinations modeled after this Chinese examination format were propagated by, for example, Adam Smith in his Wealth of nations; see Teng, 1943, and Guy, 1965, for details on the way the Chinese model influenced European thinking on examinations and their societal role. Japan in the 19th century instituted meritocratic civil service examinations after the Chinese model, with some Prussian influence because a Prussian advisor was hired by the Japanese (Spaulding, 1967; Rohlen, 1983:61).
The possible Chinese connection should strengthen our reflective mood regarding the dominant presence in our daily life of examinations. The Chinese civil service examinations were just what the name suggests: a means for selection of civil service personnel, not an educational system. Imperial China never developed an adequate educational system, although in the Sung period a serious effort was made. The suggestion is that a strong examination system threatens the quality and even the existence of the educational system. Selection is not a productive process per se; a society that takes educational production seriously, should carefully monitor its selection processes.
Note. The research for this paper was partly subsidized by the Netherlands Foundation for Educational Research (SVO) in The Hague, grant number 94 707.
Aquinas, Thomas (1993). Selected philosophical writings. Selected and translated by T. McDermott. Oxford: Oxford University Press.
Ariès, Ph. (1960). L' enfant et la vie familiale sous l'ancien régime. Paris: Plon.
Bannenberg, G.P.J. (1953). Organisatie en bestuur van de middeleeuwse universiteit. Nijmegen, Katholieke Universiteit
Bazan, B. C. (1982). La 'quaestio disputata', in Les genres littéraires dans les sources théologiques et philosophiques médiévales. Louvain-la-Neuve. 31-50.
Becker, H., Geer, B., & Hughes, E.C. (1968). Making the grade: the academic side of college life. Wiley.
Berkey, J. (1992). The transmission of knowledge in medieval Cairo. A social history of islamic education. Princeton: Princeton University Press.
Berlak, H., Newman, F. M., Adams, E., Archbald, D. A., Burgess, T., Raven, J., & Romberg, T. A. (1992). Toward a new science of educational testing and assessment. Albany: NY: SUNY.
Borst, A. (1993). The ordering of time. From the ancient computus to the modern computer. Cambridge: Polity Press.
Bot, P. N. M. (1955). Humanisme en onderwijs in Nederland. Utrecht/Antwerpen.
Bowman, M.J., with Ikeda, H., & Tomoda, Y. (1981). Educational choice and labor markets in Japan. Chicago: The University of Chicago Press.
Broadfoot, P., Murphy, R., & Torrance, H. (eds) (1990). Changing educational assessment: international perspectives and trends. London: Routledge.
Brokaw, C. J. (1991). The ledgers of merit and demerit. Social change and moral order in late imperial China. Princeton, New Jersey: Princeton University Press.
Bullough, V.L. (1978). Achievement, professionalization, and the university. In J. IJsewijn, & J. Paquet (Eds.) The universities in the late middle ages. Leuven, at the University Press. 497-510.
Carruthers, M. (1990/92). The book of memory. A study of memory in medieval culture. Cambridge: Cambridge UP.
Clark, B.R. (Editor, 1985). The school and the university. An international perspective. London: University of California Press
Codina Mir, G. (1968). Aux sources de la pédagogie des Jésuites; le 'Modus Parisiensis.' Roma: Institutum Historicum S.I.
Coleman, Janet (1992). Ancient and medieval memories. Studies in the reconstruction of the past. Cambridge: Cambridge University Press.
Compère, M-M (1985). Du collège au lucée (1500-1850). Généalogie de l'enseignement secondaire français. Parijs: Gallimard/Julliard.
Deutsch, M. (1979). Education and distributive justice: some reflections on grading sytsems. American Psychologist, 34, 379-401.
DuBois, P.H. (1965). A test-dominated society: China 1115 B.C. - 1905 A.D. Proceedings of the 1964 Invitational Conference on Testing Problems. Princeton: Educational Testing Service.
Eckstein, M. A., & Noah, H. J. (eds) (1992). Examinations: comparative and international studies. Oxford: Pergamon Press.
Eckstein, M. A., & Noah, H. J. (1993). Secondary school examinations. International perspectives on policies and practice. New Haven: Yale University Press.
Edgeworth, F. V. (1888). The statistics of examinations. Journal of the Royal Statistical Society, 51, 599-635.
Engel, A. (1974). Emerging concepts of the academic profession at Oxford 1800-1854. In Stone, L. The university in society. Vol I Oxford and Cambridge from the 14th to the early 19th century. Princeton University Press. p. 305-351.
Ferruolo, S.C. (1988). Parisius-Paradisus: The city, its schools and the origins of the university of Paris. In Bender, T.: The university and the city, from medieval origins to the present. Oxford: Oxford University Press. 22-46
Fischer, W., & Lundgren, P. (1975). The recruitment of administrative personnel. In Tilly, C. (Ed.). The formation of national states in western Europe. Princeton: Princeton University Press. 456-561
Foden, F. (1989). The examiner. James Booth and the origins of common examinations. Leeds: School of Continuing Education.
Fortgens, H. W. (1958). Schola latina. Uit het verleden van ons voorbereidend hoger onderwijs. Zwolle: Tjeenk Willink.
Frederiks, J. (1960). Ontstaan en ontwikkeling van het Zwolse schoolwezen tot omstreeks 1700. Een historische studie. Proefschrift VU. Zwolle: Tijl.
Frijhoff, W. (1981). La société néerlandaise et ses gradués, 1575-1814. Amsterdam: APA Holland University Press.
Frijhoff, W. T. M. (1990). Latijnse school en gymnasium als schooltype tot in de negentiende eeuw. In Frijhoff, W. T. M. et al.: Tempel van hovaardij; zes eeuwen Stedelijk Gymnasium Haarlem. Haarlem; De Vrieseborch. 7-24.
Frijhoff, W. (1992a). The Netherlands. In Clark, B. R., & Neave, G. R. (Eds.). The encyclopedia of higher education. Oxford: Pergamon Press. I, 491-504
Frijhoff, W. (1992b). Universities: 1500-1900. In Clark, B. R., & Neave, G. R. (Eds.). The encyclopedia of higher education. Oxford: Pergamon Press. II 1251-1259.
Geisinger, K. F. (1982). Marking systems. In Mitzel, H. E. (Ed.). Encyclopaedia of educational research. New York: The Free Press, 1139-1149.
Gifford, B.R., & O'Connor, M.C. (Eds.) (1992). Changing assessments. Alternative views of aptitude, achievement and instruction. Dordrecht: Kluwer.
Groot, A.D. de, & Wijnen, W.H.F.W. (1966/1983). Vijven en zessen. Groningen: Wolters-Noordhoff.
Guy, Basil (1963) The Chinese examination system and France, 1569-1847. In vol. 25 of Besterman, T. Studies on Voltaire and the eighteenth century, vol. 25, 741-778. Geneva: Institut et Musée Voltaire.
Hadot, P. (1982). La préhistoire des genres littéraires philosophiques m´diévaux dans l'Antiquité. in Les genres littéraires dans les sources théologiques et philosophiques médiévales. Louvain-la-Neuve. 1-10.
Hall, G. (1989). Records of Acievement. Issues and practice. London: Kogan Page.
Hanson, F. A. (1993). Testing testing. Social consequences of the examined life. Berkeley: University of California Press.
Harrison, C. (1995). Youth and White paper: The politics of literacy assessment in the United Kingdom. English Journal, feb 1995, 115-119.
Herbst, P. A. (1988). Examine the honest, praise the able. Canberra: Australian National University, Faculty of Asian Studies.
Herwerden, P.J. van (1947). Gedenkboek van het Stedelijk Gymnasium te Groningen. Groningen: Wolters.
Ho, Ping Ti (1962). The ladder of success in imperial China. Aspects of social mobility, 1368-1911. New York: Columbia University Press.
Hu, C. T. (1984). The historical background: examinations and control in pre-modern China. Comparative Education, 20, 7-26.
Ingenkamp, K. (1972). Zur Problematik der Jahrgangsklasse. Weinheim: Beltz
Jessup, G. (1991). Outcomes. NVQs and the emerging model of education and training. London: Falmer.
Julia, D. (1990). Gaspard Monge, examinateur. Histoire de l'éducation, no. 46, 111-133.
Kretzmann, N., Kenny, A., & Pinborg, J. (Eds.) (1982). The Cambridge Later Medieval Philosophy. Cambridge: Cambridge University Press.
Kretzmann, N., & Stump, E. (Eds.) (1988). The Cambridge translations of medieval philosophical texts. Volume one: logic and the philosophy of language. Cambridge university Press.
Kula, W. (1986). Measures and men. Princeton, New Jersey: Princeton University Press.
Laudan, L. (1977). Progress and its problems. Towards a theory of scientific growth. Berkeley: University of California Press.
Lawn, B. (1993). The rise & decline of the scholastic 'quaestio disputata' with special emphasis on its use in the teaching of medicine and science. Leiden: Brill.
Lee, T. H. C. (1985). Government education and examinations in Sung China. Hong Kong: The Chinese University Press.
Levine R. A., & M. I. White White (1986). Human conditions. The cultural basis of educational developments. London: Routledge & Kegan Paul.
Lewry, O. (1982). Thirteenth-century examination compendia from the faculty of arts. In Les genres littéraires dans les sources théologiques et philosophiques médiévales. Louvain-la-Neuve. 101-116.
Lienert, G.A. (1987). Schulnotenevaluation. Frankfurt a.M.: Athenäum.
Lim, R. (1995). Public disputation, power, and social order in late antiquity.
Lindroth, S. (1976). A history of Uppsala University 1477-1977. Uppsala: Uppsala University (Stockholm: Almqvist & Wiksell).
Linn, R. K. (Ed.) (1989). Educational measurement. New York: American Council on Education; Macmillan.
Lowe, E. A. (1926). Handwriting. In Crump, C. G., & Jacob, E. F. (eds) (1926). The legacy of the middle ages. Oxford university Press. 197-226
Madigan, R., Johnson, S., & Linton, P. (1995). The language of psychology: APA style as epistemology. American Psychologist, 50, 428-436.
Makdisi, G. (1974). The scholastic method in medieval education: an inquiry into its origins in law and theology. Speculum, 49, 640-661.
Makdisi, G. (1981). The rise of colleges: institutions of learning in Islam and the west. Edinburgh University Press.
McClelland, Ch. E. (1980). State, society, and universities in Germany 1700-1914. Cambridge: Cambridge University Press.
McDaniel, M. A., & Pressley, M. (Eds) (1987). Imagery and related mnemonic process. Theories, individual differences, and applications. New York: Springer
Miyazaki, I. (1976). China's examination hell. New York: Weatherhill.
Mostert, M (1995). Kennisoverdracht in het klooster: over de plaats van kezen en schrijven in de vroegmiddeleeuwse monastieke opvoeding. In Stuip, R. E. V., & Vellekoop, C. (red.). Scholing in de middeleeuwen. Hilversum: Verloren.
Palmer, R. R. (1985). The improvement of humanity. Education and the French revolution. Princeton: Princeton University Press.
Paulsen, F. (1885/1921). Geschichte des gelehrten Unterrichts auf den deutschen Schulen und Universitäten vom Ausgang des Mittelalters bis zur Gegenwart. Berlin/Leipzig.
Perreiah, Alan R. (1984). Logic examinations in Padua circa 1400. History of Education, 13, 85-103.
Post, R.R. (1954). Scholen en onderwijs in Nederland gedurende de middeleeuwen. Utrecht: Het Spectrum.
Prahl, H. W. (1974). Abschlussprüfungen und Graden. Sozialhistorische und ideologiekritische untersuchungen zur akademischen Initiationskultur. Dissertation Kiel.
Resnick, D. P., and Resnick, L. B. (1985). Standards, curriculum, and performance: a historical and comparative perspective. The Educational Researcher, 14(4), 5-20.
Riché, Pierre. (1989). Écoles et enseignement dans le Haut Moyen Age. Fin du Ve siècle - milieu du XIe siècle. Paris: Picard.
Ringer, Fritz (1979). Education and society in modern Europe. Bloomington: Indiana University Press.
Roach, J. (1971). Public examinations in England 1850-1900. Cambridge: Cambridge University Press.
Rohlen, T. P. (1983). Japan's high schools. Berkeley: University of California Press.
Rothblatt, S. (1982). Failure in early nineteenth century Oxford and Cambridge. History of Education, 11 (1), 1-21.
Rothblatt, S. (1993). The limbs of Osiris: liberal education in the English-speaking world. In Rothblatt, S., & Wittrock, B. The European and American university since 1800. Historical and sociological essays. Cambridge: Cambridge University Press.19-73.
Rotours, R. des (1932). Le traité des examens, traduit de la Nouvelle Histoire des T'ang (Chap. XLIV, XLV). Paris, Librairie Ernest Leroux. Bibliothèque de l'Institut des Hautes Études Chinoises, volume II.
Rudolph, F. (1977). Curriculum. A history of the American undergraduate course of study since 1636. San Francisco: Jossey Bass.
Rijk, L. M. de (1977). Middeleeuwse wijsbegeerte. Traditie en vernieuwing. Assen: Van Gorcum. In French translation: La philosophie en Moyen Age. Brill: Leiden.
Schoengen, M. (1898). Die Schule von Zwolle von ihren Anfängen bis zur Einführung der Reformation (1582). I. Von der den Anfängen bis zu dem Auftreten des Humanismus. Freiburg (Schweiz).
Schoengen, M. (1911). Geschiedenis van het onderwijs in Nederland. Amsterdam.
Schotel, G. D. J. (1875). De academie te Leiden in de 16e, 17e en 18e eeuw. Haarlem: Kruseman & Tjeenk Willink.
Schwinges, R. C. (1986). Deutsche Universitätsbesucher im 14. und 15. Jahrhundert: Studien zur Sozialgeschichte des alten Reiches. Stuttgart: Steiner.
Schwinges, R. C. (1992). Student education, student life. In de Ridder-Symoens, H. A history of the university of Europe. Volume I, Universities in the middle ages. Cambridge: Cambridge University Press. 195-243
Shepard, L.A., & Wilson, M. (Eds.) (1989). Flunking grades: research and policies on retention. London: Falmer
Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities. Cambridge: Harvard University Press.
Spaulding, R.M. (1967). Imperial Japan's higher civil service examinations. Princeton: Princeton University Press.
Stehr, N. (1994). Knowledge societies. London: SAGE.
Stern, W. (1920). Die Intelligenz der Kinder und Jugendlichen und die Methoden ihrer Untersuchung. Leipzig: Barth.
Stokes, Rev. H. P. (1927). Ceremonies of the university of Cambridge. Cambridge, at the University Press.
Sutherland, G. (1984). Ability, merit and measurement, mental testing and English education, 1880-1940. Oxford: Clarendon Press.
Takeuchi, Y. (1991). Myth and reality in the Japanese educational selection system. Comparative Education, 27, 101-112.
Têng Ssu-rü (1943). Chinese influence on the western examination system. Harvard Journal of Asiatic Studies, 1943, 7, 267-312.
Verger (1992a). Patterns. In H. de Ridder-Symoens. A history of the university of Europe. Volume I, Universities in the middle ages. Cambridge: Cambridge University Press. 35-74
Verger (1992b). Teachers. In H. de Ridder-Symoens. A history of the university of Europe. Volume I, Universities in the middle ages. Cambridge: Cambridge University Press. 144-169.
Viola, C. (1982). Manières personnelles et impersonnelles d'aborder un problème: saint Augustin et le XIIe siècle. Contribution à l'histoire de la 'quaestio', in Les genres littéraires dans les sources théologiques et philosophiques médiévales. Louvain-la-Neuve.
Wachelder, J. C. M. (1992). Universiteit tussen vorming en opleiding. De modernisering van de Nederlandse universiteiten in de negentiende eeuw. Hilversum: Verloren.
Waley, A. (1949). The life and times of Po Chü-I, 772-846 A. D. London: Allen & Unwin.
Webber, C. (1989). The mandarin mentality: civil service and university admissions testing in Europe and Asia. In Gifford, R.: Test policy and the politics if opportunity allocation: the workplace and the law. Dordrecht: Kluwer. 33-60.
Weijers, O. (1987). Terminologie des Universités au XIIIe siècle. Edizione dell' Ateneo, Roma.
Wiggins, G. P. (1993). Assessing student performance. Exploring the purpose and limits of testing. San Francisco: Jossey-Bass.
Wolferen, K. van (1995). De Japanse verlamming. Waarschuwing tot de toekomstige bureaucratische elite aan de Todai-universiteit. [Japanese paralysis. Admonition to the future bureaucratic elite at the Todai university]. NRC Handelsblad, 1 juli, p. 5. Translation by J. Engelsman of a lecture given at the department of law of the Todai-university. Engelsman: his latest book, 'The system that makes the Japanese unhappy,' is a bestseller in Japan.
Wuthnow, R. (1989). Communities of discourse. Ideology and social structure in the reformation, the enlightenment, and European socialism. Cambridge, Mass.: Harvard University Press
Yates, F. A. (1966). The art of memory. London: Routledge & Kegan Paul.
The university is one of the oldest institutions in western society. The same mechanisms behind this longevity of the university might have resulted in our copying old and even medieval forms of instruction and assessment unwittingly. That would not be a problem if those old forms still would be at least as functional as they were originally , but that is not a very likely proposition.
The rules of the English nation of the university of Paris in 1252 stipulated that a bachelor coming up for the licentiate in arts 'should have heard the books of Aristotle on the Old Logic (...) at least twice in ordinary lectures and once cursorily, (...) the Topics of Aristotle and Elenci twice in ordinary lectures and once at least cursorily .... ' etcetera. (Thorndike, 1944, p. 227). In the early university repetition of particular courses was a strict rule, and understandably so, given the instructional method used. Today, retention is accepted practice in education, more so in continental European (with the exception of Scandinavia) than in Anglo-American countries. It has been demonstrated time and again bij educational researchers (Shepard & Smith, 1989) that this practice is counter-productive and detrimental to the mental well-being of the students involved. It just might be that this practice inherits its respectability from the medieval practice of repetition of courses.
In general it will not be possible to demonstrate a definite link between assessment procedures used in, for example, renaissance higher education and possibly slightly atavistic procedures of today. An example is the work of Makdisi (1981), trying to demonstrate that the European medieval disputatio has as its ancestor the disputation as known in Muslim higher education some two centuries earlier; his hypothesis, but not his work, is controversial.
History may inform us of possibly forced relations between tradition and rational consideration in assessment. The intention of this study is to explore possible links for the insight they might give in ways that modern assessment procedures are based on tradition rather than rational consideration (are baseless or not). An example of rational consideration in the context of retention would be the following (imaginary) casus: A psychologist, committed to the Standards for educational and psychological tests of the American Psychological Association, in the majority of cases will not be able to recommend retention, given the evaluation reasearch as summarized in for example Shepard & Smith (o.c.) without risking legal action.
Educational assessment as practiced today is not what reasonably can be called a professional activity. There is a serious problem with our current assessment procedures; teachers and professors are not able consistently to explain why they are using the procedures they do in the way they do, neither what the relevant facts are in the historical development of these procedures. Ask teachers why they support retention, or why they think retention is bad educational practice, and you will get almost as many different answers as there are teachers you asked (Wald, 1986, asked 150 school leaders this question). Ask them how the practice of retention developed historically, and honest teachers will tell you they do not know, simply because nobody knows this. I do not know of any historical study that deals directly with this question. The conclusion must be that the practice of retention is not based on a solid body of knowledge or expertise; the corollary is that the practice of retention cannot possibly be called professional. Historical studies may shed light on why this is the state of affairs, and may give a lead as to what could be done about it.
This study is exploratory because it would not be fruitful to only let the historical data speak for themselves, without the guidance of a theoretical framework; it would be unclear what to look for in which source materials, or how to choose secondary sources. Also it cannot possibly be rational to use modern assessment theories as a guide for historical research; that would be begging the question, and it would be an invitation of trouble because of the dangers of anachronism it entails. In the absence of comprehensive studies that could show the way, the only possibility is exploration where historical search and the building of a theoretical framework go hand in hand, informing each other. Reading about the repetitive element in the curriculum of the early university of Paris brought me an 'Aha-experience;' here was an instance where repetition of a course still was a sensible thing to do, where research on retention and repetion in the present curriculum has shown it not to be sensible anymore. It must be said to the credit of the historians that they have made available useful primary (Thorndike), secondary and tertiary sources and studies. Present assessment theory cannot possibly match that, because of its heterogeneity, the movement in the direction of authentic measurement (USA) and Records of Achievement or ROAs (GB) only adding to that heterogeneity, and its being dominated by the psychometric branch of the trade (Linn, 1989).
That comprehensive historical studies on assessment in education are lacking is quite amazing. Textbooks on educational measurement either have nothing to say on the history of assessment, or narrow it down to the history of standardized testing. There are however many studies on special topics, such as the development of examinations in England, or the civil service examinations in the last two millennia of imperial China. The problem is, as Webber (1989) also indicated, to trace the possible connections between early developments in for example imperial China, and modern methods of assessment and personnel testing.
A number of interesting cases, or if you like 'hypotheses', will be presented, the emphasis being placed on historical materials, not on assessment theory. The point is not that these cases are important or interesting. They are. The message is that cases like these, taken together, suggest that we have inherited from the past more than we knew, and more than we wish. Consciousness of our assessment heritage makes one sensitive to the weak and ineffcicient spots in present-day assessment in higher education.
question-answering or catechetic questioning;
One of the first teachers in Western Europe was Alcuin. He was invited by Charles the Great to ground an educational system. Alcuin's method of teaching was that of questions and answers, the questions and answers to be learned by heart, of course. In his time this probably was a quite sensible method. The scarce manuscripts that were available were almost unreadable so one had first to learn the text before the book could be 'read.' In the middle ages 'to know' was 'to know by heart' (Bolgar, 1954; Hindebrandt, 1992). This type of question-and-answering, the catechetical method, was still in use in university-level examinations in the nineteenth century (Foden, 1989). In present day standardized tests (USA) and university entrance examinations (Japan, see Rohlen, 1983) its remnants are still discernable, where testees must know definitions etc. by heart.
the disputatio as major educational form as well as form of assessment
Easily the most fascinating form of assessment known is the medieval disputatio, already made famous by the alleged founder of the university of Paris, Abelard, by his disputational fights with William Beauchamps. It is a form of organised argument, a serious dispute with winners en losers. One had to prove his intellectual prowess in the presence of dignitaries of the university, the church, and the town. The propositions to be defended or attacked were new propositions with no known answers. The disputational method also was the scientific method of the day; logic was the instrument to be used (Kretzman & Stump, 1988). University examinations were disputations; being admitted to the examination by one's master in practice was a guarantee that one would get the licentiate. In a non-trivial sense the defending of a dissertation, although this certainly does not have the form of the disputatio, is a present-day equivalent. (Weijers, 1987; Lawn, 1993; Ahsmann, 1990). In the Muslim world of about A.D. 1000 disputations in law could make and break reputations of its participants, who had a very high status in Muslim society (Makdisi, 1981).
curricular organisation and the grading of pupils: Joan Cele of Zwolle
The Hanze city Zwolle at the end of the 14th century had a famous schoolmaster, Joan Cele, who attracted pupils from very far. Having only two assistants, Cele had to organize a school with 900 pupils. So he invented the educational system of classes, examinations, and grouping on the basis of level of mastery (not on the basis of age). His system influenced the Parision method of education, in its turn the basis of the influential Jesuit Ratio Studiorum. The historical ancestry of the dominant western educational method was discovered only in the sixties (Codina Mir, 1968; see aso Scaglione, 1986). This conception of the educational curriculum was conditioned on the lack of teaching manpower, and reduced the teaching load using peer teaching ('Helfersystem'). Imagine Cele visiting a school or university in 1995! Is it possible that we could learn something from Joan?
from order of rank to order of merit to marking systems
Even in medieval times there were traces of meritocratic assessment in the universities, but first and foremost the order in the examinations (locatus) was determined by birth, not by achievement. In the daily practice in his own house the master used some incentives; a prize being given to the student with the best, the asinus to the student with the worst performance. Later the humanists banned punishment and made much of the system of prizes to stimulate intellectual achievement. Late in the nineteenth century systems of ranking by order of merit (in achievement and behavior) gradually were replaced by systems using marks or grades. Marking systems were seen as 'modern,' as far as I know there were no compelling reasons given to replace the simple and transparent ranking system with a pseudo-scientific system that still essentially was a ranking system. The change probably was very much in the spirit of the 19th century. Present marking systems do not seem to have a respectable ancestry; many educational researchers wonder why marking systems are used at all, given the lack of absolute norms in education (for example Hartog & Rhodes, 1936)
the introduction of meritocratic and competitive examinations in Europe
In the 18th and 19th century there is a fascinating development of civil service examinations in the disguise of university examinations and even entrance examinations (the Prussian Abitur), especially in Germany and later also in England. In England Oxford and Cambridge had paved the way by instituting competitive examinations (the mathematical tripos). In France admission to and graduation from the School of Roads and Bridges was the key to positions of power. There is some speculation whether knowledge of the Chinese civil service examinations influenced this development; there was a China-mania in Europe in the 18th century, even philosophers like Leibniz and Voltaire participating in it (Guy, 1963). There is a definite link with state-formation in Europe, the need to have means to select more and more new civil service members and more and more to do so on the basis of merit, not (only) rank. Typically the outcomes of examinations were over-interpreted; in general one was not aware of the limited reliability of these examinations (Edgeworth, 1988, could not change this). Validity was no issue at all; in this and some other aspects European examinations were a match to the Chinese examinations (Ringer, 1969).
growth of participation in (higher) education
Already in the nineteenth century participation in higher education was growing. In the 20th century this growth continued, spectacularly so after WW II; it changed the character of meritocratic assessment (Wilbrink & Dronkers, 1993). Diploma's gradually became very important as tickets to attractive positions in society. The fallout of this development is that children of high ranking families also have to deliver; they have to compete now with 'outsiders' (Horowitz, 1985). Even more important: assessment is now used to legitimate the ranking and ordering decisions of the educational system. Present-day assessment is still basically humanistic, i.e. a 15th century method, rewarding achievement, and neglecting students with lesser achievements. The pressure to legitimate the sorting and selecting that is going on has resulted in the prominence of psychometric techniques that emphasize individual differences between students (Chapman, 1980), in stead of the intellectual growth of the individual student (Astin, 1990; Records of Achievement).
Ahsmann, M. (1990). Collegia en colleges. Juridisch onderwijs aan de Leidse univesiteit 1575-1630 in het bijzonder het disputeren. [Disputations in the faculty of Law.] Groningen: Wolters-Noordhoff / Egbert Forsten. In Dutch.
American Psychological Asociation (1985). Standards for educational and psychological tests. Washington, D.C.: Author.
Astin, A.W. (1985). Achieving educational excellence. San Francisco: Jossey-Bass.
Bolgar, R. R. (1954). The classical heritage & its beneficiaries. Cambridge, at the University Press.
Chapman, P. D. (1988). Schools as sorters. Lewis M. Terman, Applied Psychology, and the Intelligence Testing Movement, 1890-1930. New York: New York UP.
Codina Mir, G. (1968). Aux sources de la pédagogie des Jésuites; le 'Modus Parisiensis.' Roma: Institutum Historicum S. I.
Edgeworth, F.V. (1888). The statistics of examinations. Journal of the Royal Statistical Society, 51, 599-635.
Foden, F. (1989). The examiner. James Booth and the origins of common examinations. Leeds studies in adult and continuing education.
Guy, B. (1963) The Chinese examination system and France, 1569-1847. In Besterman, T., Studies on Voltaire and the eighteenth century, vol. 25, 741-778. Geneva: Institut et Musée Voltaire.
Hartog, Ph., & Rhodes, E.C. (1936) The marks of examiners. London.
Hildebrandt, M. M. (1992). The external school in Carolingian society. Leiden: Brill.
Horowitz, H. (1987). Campus life: undergraduate cultures from the end of the eighteenth century to the present. New York: Knopf.
Kretzmann, N., & Stump, E. (Eds) (1988). The Cambridge translations of Medieval Philosophical texts. Volume I. Logic and the philosophy of language. Cambridge: Cambridge University Press.
Lawn, B. (1993). The rise & decline of the scholastic 'quaestio disputata' with special emphasis on its use in the teaching of medicine and science. Leiden: Brill.
Linn, R. L. (Ed.) (1989). Educational Measurement. London: Collier Macmillan.
Makdisi, G. (1981). The rise of colleges: institutions of learning in Islam and the west. Edinburgh University Press.
Ringer, Fritz (1969). The decline of the German mandarins. Cambridge, MA: Harvard University Press.
Rohlen, T. P. (1983). Japan's high schools. Berkeley: University of California Press.
Scaglione, A. (1986). The liberal arts and the jesuit college system. Amsterdam: Benjamins.
Shepard, L. A., & Wilson, M. (eds) (1989). Flunking grades: research and policies on retention. London: Falmer.
Thorndike, L., 1944, University records and life in the middle ages. New York: Columbia University Press.
Wald, A. (1985). Een jaartje over doen [Flunking the grade]. The Hague: Foundation for Educational Research in the Netherlands (SVO). In Dutch.
Webber, C. (1989). The mandarin mentality: civil service and university admissions testing in Europe and Asia. In Gibbons, (1989). Test policy and the politics if opportunity allocation: the workplace and the law. Dordrecht: Kluwer. 33-60.
Weijers, Olga (1987). Terminologie des Universités au XIIIe siècle. Roma: Edizione dell' Ateneo.
Wilbrink, B., & Dronkers, J. (1993). Dilemma's bij de groei van de deelname aan hoger onderwijs. [Participation growth in higher education; dilemmas, no solutions.] Zoetermeer: Education and Science Department / Leiden: DOP. In Dutch. html
short presentation
A history of assessment of student learning has not been written yet. A remarkable fact is that in books on educational measurement its history is either absent, or paid only scant attention by limiting it to the history of testing formats that starts with the innovative work of Alfred Binet late in the 19th century. Asking several Dutch researchers whether they knew of any publication on the history of systemts of marking I met astonishment at the question as at their inability to produce a positive answer.The typical reaction was, 'Wow, that's a terribly good question. No, I've never come across any publicaton on the subject.'
Of course, in journals dedicated to the history of education one may find the occasional article on the use of a marking system by the Jesuits in 17th century France. And university libraries shelve special studies like the one by Mary Lovett Smallwood on the history of examinations in American colleges and universities. But stray publications do not constitute a history. Why bother? Aren't we nowadays using sophisticated assessment instruments and techniques that owe nothing to history? No, we are not. Most assessments are done by teachers in traditional ways. What is worse: sophisticated techniques are only variants on methods that have come to us by tradition. So history is important, were it only to dispel the misleading idea that whatever these historical roots they surely have had no detrimental influence on assessment practices today.
My paper gives an outline of what a history of assessment could be. It won't be a surprise that Imperial Chinese examinations find a prominent place in this history: it is the one topic that at least regularly is mentioned in the literature on educational measurement. What will be a surprise is that this historical outline does not begin at the end of the 19th century, but ends there; that a 14th century Dutch schoolmaster is a main character in its plot; that much attention is given to an assessment format that was able to steal the show in antiquity as well as in the renaissance but does not exist any more; that Collins' thesis of the credential society comes close to characterizing the 18th and 19th century roots of our end of the 20th century assessment practices.
A simple consideration motivates this historical exercise: what yesterday was an efficient assessment procedure, needs not be one today. The method of questions and answers, for exanple, is at least as old as the Book of Genesis, today it seems to be the prime method of the educational testing industry. Learning texts by heart once was a meaningful act, the text in question probably being a holy one, and the manuscript being kept in the next monastery. The humanist prize system was to replace the habit of punishment as motivational instrument in education, but in the 20th century the mildly competitive forms of the humanists have hardened into the silly metrical competitive marking systems that poison education. The extremely competitive imperial Chinese examinations were used as a model by monarchs and states seeking to use education and examinations to their own purposes, espeially to build a dependable and able civil service. The Chinese examinations, by the way, ultimately did China in: they were not so good a model to follow up as many in the late 18th and early 19th century thought them to be.
Knowledge of historical facts and developments can stimulate critical reflection on today's assessment habits that all too often are taken for granted. That reflection is a necessary condition for assessment theories from outside the mainstream, like the one I presented yesterday, a chance to be heard and tried.
short presentation
'Educational measurement' stands for control, sorting, and selection; it is a more an instrument of oppression, surely in its high stakes forms of implementation, than part of a science of education. Historical reasons partly explain this rather sad state of affairs; I have a poster on the historical roots of our assessment practices in the session tomorrow morning. Stating that educational measurement primarily is a science of control implies that important eduational principles are absent in educational measurement. 'Knowledge of results' or 'feedback' is of vital importance in the daily activities of learners and teachers; one won't find principles like these mentioned frequently in the educational measurement literature. Oh yes, there are so-called alternatives under trendy names like 'authentic assessment,' but these still accept most of the premises that make mainstream educational measurement a threat to the quality of education. 'Feedforward' or 'backwash' is another concept that is not to be found in the vocabulary of educational measurement specialists; yet every student, teacher and politician knows that high stakes assessments induce strategic behaviours on the part of all actors involved.
There is a class of study strategies in preparation for examinations that one might define as being conditioned on the form, the content, and the circumstances of the examinations. In the literature one will only find some sociological or historical studies spelling out the relations between certain student strategic behaviors in test preparation, and the institutional characteristics of the assessment system. There is, however, at least one exception: in the seventies Bob van Naerssen specified a model for the optimal strategies for students under given examinations conditions. The optimal use of time, a scarce good for students, is the crux of this model that uses a decision-theoretic framework. I have developed this model into a general assessment model that is true to the strategic choices in the uses of their time that students have to make on a daily basis. The student is the 'consumer' in this theory.
The paper is about how it is possible to explain to teachers what the backwash of their testing probably is, and how they are able by changing the characteristics of their tests to influence student strategic behaviours in preferred directions. A computer smulation of the assessment model is used to help perform this trick, making it possible to bypass the mathematical formulas of the model and approach complex situations by simulation techniques using a computerized die . The poster shows sample output of the computer program that will run on any Macintosh, not on PC's under Windows 95, I'm afraid.
This model takes the individual student and his or her goals seriously, as well as the meta-goals of teachers. As such the model is apt to be used in the development of instruction-driven assessments, or to monitor the quality of assessment-driven instruction, the topic of the panel discussion in session 77 tomorrow. The model most definitely is not a psychometric model: it does not presuppose that individual differences are important, or that examinations are competitive.
The model is about the efficiency of examination preparation behaviour, and so also about the efficiency of the curriculum. It forces teachers to consider whether their testing is transparent to students, that is whether students have all the information available to them to be able to efficiently prepare for examinations. The concept of transparency was introduced by De Groot in 1970, as an essential characteristic of educatioal tests next to their reliability and validity. My model or theory is the operationalization of the concept of transparency.
Two simple considerations inform this historical exercise. What yesterday was an efficient assessment procedure, needs not be one today. And: many of our assessment practices are manifestly irrational because the community of teachers is not able to present consistent reasons for using them; I am thinking here especially of using marking systems, and of grade retention.
1 How to assess assessment?
Assessment practices seem highly self-evident.
Faculty and students could use some reflection on these practices.
International comparison is a powerful approach in stimulating reflection, but systematic studies are scarce.
Historical cases are another possibility to provoke reflection;
their explanatory power is better than that of comparative cases.
2. Why history?
The university is an institution over 8 centuries old.
This institution has strong traditions,
among them tradtions of assessment.
It follows that the history of assessment is a prerequisite
for a better understanding of today's assessment practices.
The long tradition indicates that these practices
will be resistant to change:
they are embedded in Western intellectual culture.
3. What does it mean to know something?
In the medieval monastery and convent meditation consisted in the recitation by heart of religious texts.
Indeed in the middle ages learning texts by heart was the means of preserving this cultural heritage.
Learning grammar at the court of Charles the Great consisted of the recitation of the questions and answers from grammar books.
The 'catechetical method' of questions and answers was a general examination form until late in the nineteenth century.
4. Joan Cele, 14th century originator of Western style education
Joan Cele, friend of Gerard Groote, had to run a school with 900 pupils, having the assistance of two parisian masters only.
So he invented the school organisation in forms.
His model was followed by the university of Paris,
by the Jesuits in their Ratio Studiorum,
and ultimately by the Western world
and even the world as a whole.
Homogeneous classes implied promotion decisions at regular intervals of six months, for example.
5. The medieval university of Paris' examinations
Every student had to have a master. The master nominated his students for the examination, but only if adequately prepared.
Proof of mastery consisted in presenting a lecture,
after only a few hours of preparation time.
Examinations typically required repetition of courses: two or three times having 'heard' (the lecture series on) book so-and-so.
The examination resulted in an order of merit, the locatus,
merit predominantly being determined by social rank.
6. The disputation: a lost examination format
The disputation characterized medieval university teaching and examining. Today the exchange of arguments in courts of law is the closest thing to this long extinct medieval public event.
The logical techniques necessary for the disputation
were learned in the baccalaureate.
Having participated in disputations was a
major requirement for examinations.
Knowledge of the classics and the commentators was essential for the contestants in disputations in law or theology.
7. Punishment or reward?
Ranking and marking systems
In Antiquity and later students were punished for making mistakes.
Humanists like Erasmus championed rewarding achievement; the best students were rewarded with 'prize books.'
The prize mechanism still existed in schools around 1900, until it was exchanged against 'modern' marking systems.
During centuries students were ordered according to merit.
'Modern' marking systems do the same, but in a veiled way.
8. Competition and the state
A new development in the 18th and 19th century is the explicit competition for places in the state administration.
In France it is the concours for a place in the elite
École des Ponts et Chaussées.
In Germany measures were taken for a controlled intake of new personnel from outside the landed gentry, resulting in the Abitur of the gymnasium as a ticket of entrance for a state career.
In England the order of merit in the mathematical tripos was important for a possible state career, later to be supplanted by selective examinations for the Civil Service.
9. The global connection:
Imperial Chinese examinations as model
Competitive examinations were not an outgrowth of the prize system, but resulted possibly from the combination of the need for more civil service personnel, and the promise of the model of the Chinese examination system to fulfill this need in ways directly controlled by the sovereign or the state.
The Chinese examination system ruined the country, and went out of existence in 1905. Ironically, decennia earlier the Japanese, using German advisors, introduced examinations 'Chinese style,' with some German influences. Its examination system might ruin Japan some day.
http://www.benwilbrink.nl/publicaties/95HistoryAssessmentEARLI.htm