Ben Wilbrink literatuur over cijfergeven (grading and marking systems)

Literatuur over cijfergeven

Ben Wilbrink

Wat is het toch met dat cijfergeven, dat ritueel dat de lelijke trekken van onze harde samenleving altijd maar weer toedekt? Althans, voor wie niet goed kijkt. Het is het feest van winner takes all. Wie mag er meedoen in deze samenleving, wie niet? De grote uitdelingen van cijfers zijn het verdeelmechanisme waarbij vele uitdelers denken schone handen te kunnen houden. Als zij er al bij nadenken. Wie anderen labelt met ‘zesjescultuur’ heeft er tenminste een kort moment over nagedacht. De echte regent hoeft er geen seconde over na te denken, die heeft het oordeel altijd al klaar. De regent hoort dan ook tot de winnaars. En winnaar ben je pas echt als vele anderen de status van ‘waardeloosheid’ (Lampert, 2013 ) hebben gekregen, en definitief geen bedreiging meer vormen voor de winnaars.

Wie zich niet wil laten gebruiken, maar zich niet kan onttrekken aan het systeem waarin er voortdurend vergelijkend wordt beoordeeld, wil ik graag de gelegenheid geven om het ontbreken van een ziel in dat cijfergeven te doorgronden. Vandaar deze verzameling van relevante literatuur, op deze webpagina. Hierin grasduinen moet een ontdekkingsreis zijn. Wie geen tijd heeft om te reizen, kan de blog lezen waarin ik het wezenskenmerk van het cijfergeven probeer te schetsen, hopelijk zo verhelderend dat het voor eens en voor al op het netvlies van de lezer blijft staan. Nou ja, ik overdrijf, maar het schijnt zo te zijn, met dat netvlies, dat het een onderdeel van de hersenen is. Die blog verschijnt elders, maar staat in iets uitgebreidere versie (inclusief literatuurverwijzingen) ook op mijn eigen wedsite: Zescultuur? Wie zeggen dat?.

Een verwant onderwerp is uiteraard cesuurbepaling, maar dat is toch voldoende anders om relevante literatuur afzonderlijk voor bijeen te brengen. Over beide onderwerpen heb ik veel geschreven, zodat de meeste oudere literatuur daar al wel eens genoemd zal zijn (over cijfergeven en cesuurbepaling). Zie ook deze pagina: niveautrends.htm.

In onderstaande box een poging om een fors aantal thema’s bij dat cijfergeven onderling te verbinden, laten we zeggen onder de vlag van ‘zesjescultuur’ (voorbeeld van stemmingmakerij onder deze vlag: http://www.youtube.com/watch?v=yyXiNKgpRQc). Het moet wel een goed afgerond statement worden, over cijfergeven vooral (incl. rankings [Langville & Meyer, 2012 hier], accountability). Er is een jaar geleden een discussie op dit thema geweest waar ik nogal uitvoerig aan bijgedragen heb, op LinkedIn (niet door mij aangezwengeld). Een raak stuk is dat van Ionica Smeets — De Wiskundemeisjes leggen het nog één keer uit — ‘Zesjesmentaliteit’. De Volkskrant, 18 mei 2013, Wetenschap V7 site.

Cijferverdelingen in het voortgezet onderwijs. Een historisch perspectief en recente ontwikkelingen. Door Paul van der Molen en Jos Keuning (zonder datum). pdf Staatsblad 1870: https://www.delpher.nl/nl/tijdschriften/view?coll=dts&page=1&identifier=MMKB10:001079002:00068&sortfield=date&facets%5BalternativeFacet%5D%5B%5D=Staatsblad+van+het+Koningrijk+der+Nederlanden&facets%5Bperiode%5D%5B%5D=1%7C19e_eeuw%7C1870-1879%7C&objectsearch=examen

Het rapport geeft een kort overzicht van (wijzigingen in) de formele betekenis van cijfers (Nederlandse wetgeving), met globale verwijzingen naar de publicaties ervan. De lezer mag zelf uitzoeken waar dan precies in Staatsblad 1929 de betreffende regeling is te vinden. Een klusje dat ik misschien zelf moet doen, of heeft Mandemakers wèl de exacte gegevens? [Kees Mandemakers (1996). HBS en gymnasium. Ontwikkeling, structuur, sociale achtergrond en schoolprestaties, Nederland ca. 1800-1968. Amsterdam: Stichting beheer IISG ]

Zesjescultuur? Wie zeggen dat?

Het zijn boven ons gestelden die het woord ‘zesjescultuur’ in de mond nemen. Toch hebben juist zij vaak de voorwaarden voor die zesjescultuur geschapen — met perverse prikkels of uit liefdeloze onverschilligheid. Of uit overtuiging: wie niet mee kan of wil komen, is waarde-loos (Lampert, 2013 ). Ik zou hier kunnen stoppen, maar toch wil ik ook een ander verhaal doen. Heeft die zesjescultuur dan misschien te maken met een machtsspel tussen onderwijsveld en politiek, tussen leerlingen en leraren, tussen scholen en inspectie? En kunnen we dat misschien ook speels opvatten?

Maar wacht even, wat is er bedoeld met zesjescultuur? Laat ik het zo beschrijven: het gaat om een vermeende overmaat aan prestaties die nipt voldoende zijn (vandaar de zesjes) of anderszins middelmatig. Het is dus nogal een generalisatie, die zeker geen recht doet aan de inspanningen van velen.

Het is goed om te bedenken dat cijfers staan voor judicia, zoals ‘twijfelachtig’, ‘goed’ en ‘cum laude’. Want dat roept meteen de vraag op: wat is ‘goed’? Wat is ‘twijfelachtig’? Dan doel ik niet op die onderwijsminister die voor de oorlog de betekenis van de ‘5’ veranderde van ‘even voldoende’ naar ‘bijna voldoende’ (Bartels 1963). De waardering kan zijn voor het resultaat dat de leerling neerzet, rekening houdend met zijn capaciteiten — toegevoegde waarde, zouden we kunnen zeggen. Of slaan op een absolute standaard die ergens is vastgelegd — wat de commissie-Meijerink ons heeft proberen te verkopen. Het gekke is dat we al heel lang een volkomen andere manier van waarderen gebruiken: een meritocratische (Young, 1958) waarin vooral telt of prestaties ‘beter’ zijn dan die van anderen.

‘Zesjescultuur’: dat gaat over cijfergeven en cijfers halen. Weet iedereen dan waarover we het hebben? Weten de cijfergevers het zelf? Velen kennen het boek bij uitstek over dat onderwerp: ‘Vijven en zessen’ van A. D. de Groot (1966). Hij beschrijft hoe lerarengroepen binnen de school elkaar aftroeven op het punt van aandacht en tijd die zij van de leerlingen krijgen: dat gaat via het cijfergeven. De Groot moest bekennen—op mijn rechtstreekse vraag—dat het niet bij hem was opgekomen om te zoeken naar de wortels van dat cijfergeven. Dat is verrassend. Het geeft ook aan dat vrijwel niemand van de cijfersgevers uit kan uitleggen waar zij precies mee bezig zijn.

Waarschijnlijk is ons cijfergeven in de 19e eeuw ontstaan uit het vigerende stelsel van rangordenen. In Groningen ging dat zo: iedere leerling hield in een eigen boekje de gemaakte fouten bij, niet alleen van hemzelf, maar ook van de klasgenoten; bij de afsluiting van het jaar werd de leerling met de minste fouten gelouwerd en beloond met een prijsboek. In de geschiedenis van de Franse concours d’agrégation wijst Chervel (1993, p. 136 e.v.) aan hoe eerst de rangorde van slechtst naar best werd gestandaardiseerd tot een rangorde op het beperkte bereik van 1 tot 10 in plaats van dat van 1 tot het aantal deelnemers, en hoe vervolgens de extreme scores ongebruikt werden gelaten wanneer de indruk bestond dat de slechtse kandidaten die lage scores eigenlijk niet verdienden, of de beste de hoogste scores. Eenmaal in gebruik, kon elders dat moderne systeem worden overgenomen. In Nederland was het Stedelijk Gymnasium van Groningen de laatste die het rangordenen verving door cijfergeven (in 1903).

Het rangordenen is een pseudo-objectief systeem om leerlingen te motiveren hun uiterste best te doen. Maar zo werkte het eeuwenlang niet, omdat alleen de nummers een en twee, soms ook drie, een beloning kregen in de vorm van een kostbaar prijsboek. Pedagogen hebben hier altijd mee in hun maag gezeten, omdat zij ook wel zagen dat dit systeem ontmoedigend is voor bijna alle andere leerlingen. En dat heeft weer met het klassikale onderwijs te maken, in de 19e eeuw door de staat dood-geüniformeerd, maar ooit in Zwolle als een briljante vernieuwing ontstaan. Joan Cele, vriend van Geert Groote, ontwikkelde in de 14e eeuw een curriculum op basis van niveaugroepen. Hij moest wel, hij had tot wel 900 leerlingen uit deze Europese streken. Maar zijn groepen waren heterogeen naar leeftijd, zijn lessen bestonden maar uit een vak — Latijn — en leerlingen konden halfjaarlijks naar een volgende groep wanneer ze de stof kenden (een soort leren-voor-beheersing). Bij Cele is iedere onderwijsgroep een tamelijk homogene groep, waarin het wel degelijk eerlijk kan zijn om met een prijzensysteem op basis van rangordenen naar gemaakte fouten, whatever, te werken. Ergo, wie in het huidige klasssikale systeem met competitie wil werken, moet dat niet op individueel niveau doen, maar de klas verdelen in twee gelijkwaardige groepen die met elkaar de sportieve strijd aangaan. Ha, dat gebeurde zo ook aan de Leuvense universiteit, in de eeuw van Erasmus: strijd tussen de vier pedagogieën: het Varken, de Lelie, de Burcht en de Valk.

Cele onderzocht zelf halfjaarlijks welke leerlingen naar een volgende groep konden, en ik stel me zo voor dat hij vooral een abolute maatstaf hanteerde. Waarschijnlijk was de gemiddelde verblijfsduur in een Cele-klasje iets meer dan twee keer een half jaar, zoals ook aan universiteiten een collegecyclus in de regel vaker dan een keer werd gehoord.

Het interessante van dat rangordenen van leerlingen is dat de leraar naast het beoordelen van wat goed of fout is, weinig ruimte heeft voor een eigen subjectief oordeel. Dat verandert met de overgang van rangordenen naar cijfergeven radicaal, ook al blijft het oordeel van goed of fout de basis. De leraar heeft nu grote vrijheid om te spelen met de cijferschaal. In eigen land zien we dat meteen gebeuren in Thorbecke’s Hogere Burger School: leraren veroordelen altijd bijna een kwart van de leerlingen als ongeschikt — ongeschikt voor toelating, voor de overgang, in het eindexamen (Posthumus, De Gids 1940). Het kan vriezen of dooien, wereldoorlog of niet, industrialisatie, depressie: altijd wordt bijna een kwart als ongeschikt veroordeeld. Het gekke is dat de HBS-leraren met deze starre gewoonte zichzelf collectief hebben beroofd van de mogelijkheid om over de boeg van het cijfergeven behoorlijk beleid te voeren. Zesjescultuur in optima forma.

Is dat mogelijk: het cijfergeven gebruiken om te sturen? Ja, James Coleman heeft in zijn Foundations of Social Theory laten zien hoe je dat kunt onderzoeken. Ik heb dat onderzoek gedaan voor propedeusestudenten in Amsterdam zie hier. Er is sprake van een collectieve onderhandelingssituatie. En eigenlijk weten we dat ook al wel: wanneer alle studenten de lijn gaan trekken, is het voor docenten lastig om daar goed op te reageren. Iets anders is: hebben docenten in de gaten hoe dit spel wordt gespeeld? Maken zij eigenlijk wel goed gebruik van de situatie? Nee, dus.

Nu wil het geval dat Robert van Naerssen al veel eerder (1970) een tentamenmodel ontwikkelde dat uitgaat van de eenvoudige waarneming dat studenten zich strategisch voorbereiden op hun tentamen — soms of vaak door op een ‘zesje’ te mikken — en van de al even eenvoudige slotsom dat het voor docenten dus mogelijk moet zijn om met de inrichting van hun toetsen en vooral ook van de examenregeling, dat strategische gedrag te beïnvloeden. En daarmee dus ook de studieresultaten en het studierendement. Kortom: de heilige graal waar iedereen nog steeds naar op zoek is. En soms wordt hij gevonden, zoals hier. In deze judo met ‘zesjescultuur’ van studenten is het de kunst om tot een win-win-situatie voor iedereen te komen. En dat blijkt dus mogelijk te zijn.

En dan de zesjescultuur als dans net boven de grens voor wat nog juist als voldoende wordt beoordeeld. Die dans is heel riskant, want een toets is meestal maar een armzalige steekproef, de kandidaat weet zelf niet exact hoe goed de stof erin zit, dus dat verwachte/gehoopte zesje kan ook zomaar een vier zijn. Op zich is dit nog moreel neutraal: zowel de hardwerkende marginale student, als het luierende talent heeft ermee te maken. Op zich hoeft de geringe trefzekerheid van toetsen geen moreel probleem te zijn; Edgeworth wees er in enkele belangrijke artikelen eind 19e eeuw al op dat de kandidaten door extra inspanning de eigen kansen op succes immers kunnen vergroten. A. D. de Groot wees er in 1970 op dat de docent/de instelling/het Cito dan wel de morele plicht heeft ervoor te zorgen dat kandidaten zich inderdaad doeltreffend op de toets kunnen voorbereiden. De Groot heeft er dus geen probleem mee dat er op zesjes wordt gemikt: dat is eigen verantwoordelijkheid van de kandidaten, en tot op zekere hoogte is het de plicht van de docent om dat mogelijk te maken. ‘Tot op zekere hoogte’ betekent bij De Groot: de kandidaten moeten zelf het risico kunnen dragen dat ze ondanks een adequate voorbereiding zakken.

Het is dus allemaal een spel, met spelregels voor beide partijen. Maar ook bloedserieus voor de relatief zwakke studenten (die er ook zijn in topopleidingen zoals wiskunde, vliegtuigbouw): kunnen zij echt wel het vege lijf redden door met veel inspanning die zesjes te halen? Of lopen ze cumulatief steeds grotere achterstanden op? Bij vakken die een cumulatieve kennisopbouw hebben zou dat best eens het geval kunnen zijn, en zouden juist de zwakkere studenten voor de hogere cijfers moeten gaan.

Laat ik het niet vrolijker maken dan het is: met ‘spel’ bedoel ik dat er heldere spelregels zijn: voor een voldoende resultaat moet je tenminste een ‘6’ scoren (al dan niet gemiddeld). Is dat eerlijk? Binnen het meritocratisch gedachtengoed: ja. Edgeworth, grondlegger van de mathematische statistiek, legde dat eind negentiende eeuw al uit: de leerling kan de kans om te zakken beperken door zich beter voor te bereiden. Eerlijk genoeg? Nog niet helemaal: voorwaarde is dat de leerling dat ook doeltreffend moet kunnen, zoals A. D. de Groot in 1970 bepleitte. En ceteris paribus: veronderstellend dat al het andere gelijk blijft of als gegeven moet worden beschouwd; maar waarom zouden we dat doen? Verander die examenregeling en toetsgewoonten. Verander het onderwijsstelsel.

Er bestaat dus niet zoiets als een streven bij iedereen om maar zo hoog mogelijke cijfers te krijgen, respectievelijk te geven

sterker nog: dat zou pas irrationeel zijn (voorbeelden? Rent seeking. Japan, Korea)

vraag: kan ik iets zeggen over de gebrekkige filosofie achter die zesjescultuur-verwijten?

ja, omdat vaak iets anders zal zijn bedoeld: te weinig tijd besteden, gebrekkige motivatie om via het onderwijs een maatschappelijk aantrekkelijke positie te kunnen bereiken

ja, ik zal dat nog wel moeten zoeken in filosofische hoek (Michael Young? Maar dat is een socioloog) (Michel Foucault dan?) (Bourdieu en Passeron, weer sociologie?) (Lucas Swaine (2012). The false right to autonomy in education. Educational Theory, 62, 107-124. abstract
ja, Brian Barry (1989). Theories of Justice. A treatise on social justice volume I. University of California Press. Het volgende citaat trof me, het boek bekijkend op de Leidse marktdag, omdat het punt natuurlijk is dat een ongebreidelde cijferjacht — no constraints on the pursuit of self-interest — indruist tegen belangrijke doelen die we in een beschaafd land als het onze hebben met het onderwijs aan de jeugd.
. . . justice arises from a sense of the advantage to everyone of having constraints on the pursuit of self-interest.
flaptekst
De verhouding tussen leraren en leerlingen bij het cijfergeven is analoog aan die tussen Inspectie en scholen: Annette Roeters vindt dat scholen de lijn trekken (te middelmatig zijn).
Ik heb onverwacht veel moeite om dit opstel uit te werken: ik weet te veel van dat cijfergeven in al zijn contexten. Maar het gaat lukken.
Ethische en filosofische kwesties moeten hier niet hoog worden opgespeeld. Natuurlijk is cijfergeven autoritair en makkelijk onderdrukkend te noemen, maar houd dan het grotere verband in de gaten: de enorme investering die gemoeid is met het voorbereiden van nieuwe generaties op hun mogelijke rollen in de samenleving. Beoordelen en cijfergeven zijn maar instrumenten.
Ik kan dat demonstreren aan een probleem dat A. D. de Groot niet kon oplossen (maar Edgeworth een eeuw eerder wel, en De Groot zelf eigenlijk ook, in zijn 1970): een rechtvaardiging geven waarom Marietje net slaagt met een 6-, en Piet zakt met een 5,5 (die we niet naar boven helen). De inhoudelijke rechtvaardiging moet die van de validiteit van de toets zijn; de zak-slaaggrens is iets van een andere orde. Die andere orde is een afspraak tussen partijen, een spelregel. Daar valt nog van alles over te zeggen, maar het belangrijkste is wel de ethische kant ervan: omdat het om onderwijs gaat, moet de leerling in staat zijn om het risico te dragen van de ongewisheid rond de zesjes. Ziedaar de zesjescultuur. Het aardige is dat De Groot (1970) zelf heeft opgschreven wat het vraagt van docenten om hun leerlingen dat draagvermogen te geven: http://goo.gl/7ZgrN
McKinney, Arlise P., Kevin D. Carlson, Ross L. Mecham III, Nicholas C. d'Angelo, Mary L. Connerley (2003). Recruiters' use of GPA in initial screening decisions: higher GPAs don't always make the cut. Personnel Psychology, 56, 823-845.abstract
- This study demonstrates that there appears to be little consistency in the use of GPA as a screening tool in college recruiting. Although many decision sets support the general perception that recruiters use a minimum GPA cut score in screening—a view that is consistent with research indicating GPA is a valid predictor of job performance—more than half of the decision sets we examined appear to suggest decision rules that do not use GPA or that select against high GPA levels. Although selecting against high GPA would result in lower validity selection practices, and as a result lower utility in selection procedures, there are reasons to believe that such selection practices may be rational when other components of the staffing cycle—capacity to attract high quality applicants, the capacity to get high quality applicants to accept offers or to retain those individuals once they are on the job—are considered. Determining the rationality of these decisions, though, will require more comprehensive evaluations of staffing decisions.
  Conclusion, p. 844
- in wezen nog steeds rangordenen, maar met een twist: pseudo-absoluut
- dat blijkt ook uit de wetmatigheid van Posthumus: er is altijd weer een nieuw kwart dat onvoldoende wordt beoordeeld (dat moet aan de leraren liggen, zij zijn de constante factor)
- bij Posthumus lijken de leraren de leerlingen in een houdgreep te hebben: er zit immers geen beweging in? Is dat ook zo?
- rechten UvA: ook tijdbesteding en verwachtingen gevraagd; er is sprake van een impliciete onderhandeling.
- wat niet wil zeggen dat iedere partij optimaal onderhandelt: mogelijk bewegen docenten de verkeerde kant op (maken het studenten makkelijker, in plaats van juist moeilijker, na ‘slechte’ tentamenresultaten in een voorgaande jaargroep)
- leuk hoor, zo'n Coleman-model, maar wat zijn de mogelijk exogene factoren (de perverse prikkels, zeg maar): zijn hogere cijfers ‘nuttig’? (op de arbeidsmarkt waarschijnlijk niet)
- laat ik het anders formuleren, afstand nemend van de cijfers: is het verkrijgen van een betere stofbeheersing dan het minimum nuttig? (mijn oude ‘rechts-optimale’ strategie, of hoe noemde ik dat?) (herkansen, achterliggende mechanismen, 1980) (zit er een opbouw in de stof van het curriculum, of blijft het bij verbale competenties en andere tijdvullende nutteloosheid?
- Worden extra inspanningen wel echt beloond? Wie creatiever is dank zij een betere voorbereiding? Wie de stof beter begrijpt omdat hij ook nog andere literatuur heeft bestudeerd? Bijzonder inzicht, past dat wel in het format van de vierkeuzevraag?
- Desondanks blijft cijfergeven een vorm van institutioneel geweld. Dat is een stelling die zeker een heel eind valt te verdedigen.
- Maar de institutie is waardevol, laten we dus geen drama van dat cijfergeven maken.
- En zo zijn we aanbeland bij een meer subtiele betekenis van het spel van vijven en zessen.
- Wie er echt greep op wil krijgen, kan het bovenstaande meenemen en omvormen tot beleid door er ook bij te betrekken dat voor behoorlijke slaagkansen er voldoende tijd moet worden besteed. Wat is voldoende tijd: op zijn minst de programmatisch beschikbare tijd, zeg zo’n 1600 uur per jaar, voor leerlingen voor wie het onderwijs ook is bedoeld. Er is dus onderzoek naar die tijdbesteding nodig (in het HO beter bekend als de problematiek van voorgeschreven en gerealiseerde studielast, waar enkele opleidingen de afgelopen jaren behoorlijk wat moeite mee bleken te hebben). (Inspectierapport)

Cultuur? Maar dan zijn er ook andere culturen, dus? Zoals daar zijn: de examenhel van de keizerlijke examens in China [I. Miyazaki (1976). China's examination hell. Weatherhill.], of de jukensenso (examenoorlog) in Japan (o.a. Zeng, 1995). Zuid-Korea ook (Asia Times). Radicaal anders. Maar beter?

Examenkoorts heerst. Dit jaar is het de vrees voor vijfjes op meer dan een kernvak die de temperatuur verder opjaagt. Het is geen gek idee om te stellen dat het juist de politiek is die onder de slogan ‘de lat omhoog’ ervoor heeft gezorgd dat het begrip ‘zesjescultuur’ extra inhoud heeft gekregen. Het is immers een metafoor uit een sport waarin het niet telt hoe hoog je over die lat gaat, maar alleen of je nog net over de lat op deze hoogte heen komt. Kennelijk lokt cijferspel in onderwijs en examens makkelijk rare beeldspraak uit.

Ik had me niet meteen gerealiseerd dat de zesjescultuur waar de laatste weken over wordt gesproken, die van de scholen is (niet van de leerlingen). Althans, dat is wat Annette Roeters uitdraagt n.a.v. het onderwijsverslag van de Inspectie. Dat is een bijzonder taalgebruik: het gaat kennelijk om de Inspectie die cijfers uitdeelt aan scholen. Ik zal er iets over opschrijven, maar het samenhangende beeld dat ik over die vermeende zesjescultuur ga schetsen vertrekt toch echt vanuit het cijfergebeuren binnen de scholen.

De feitelijke situatie is dat weliswaar individuele leerlingen door harder te werken veel betere cijfers kunnen halen, maar dat dit voor groepen leerlingen niet geldt. De ‘normen’ voor de cijferbeoordeling zijn bepaald van elstiek, en passen zich soepel aan aan veranderd gedrag van leerlingen. Dat geldt zelfs voor centraal schriftelijke eindexamens.

Wie deze mechanismen kent, kan ze beleidsmatig te gelde maken. Er zijn nu in meerdere universitaire instellingen pogingen gaande om dat althans voor het eerste studiejaar ook te doen (o.a. in Delft).

- ratings (nogal wiedes voor schaken, elo-ranking, maar waarom voor onderwijs? ( )
- ergo: dat cijfergeven is een relatief gebeuren, er zijn geen absolute normen, het gaat om elastiek.
- Maar daarmee is niet alle gezegd, want er is wel degelijk houvast te krijgen: (1) tijdbesteding, (2) als het elastiek gelijk blijft, kan toegevoegde waarde worden geschat.
- Wat leidt tot de observatie dat niet eindniveaus op zich van belang zijn, maar wat het onderwijs heeft toegevegd, wat de leerling heeft toegevoegd (en beloon die leerling daar dan ook naar; dus niet competitief maar ipsatief beoordelen).
- in het VWO winnen de docenten dat, afgaande op wat Posthumus daarover in 1940 rapporteerde: een kwarteeuw cijfergeven in de HBS laat een verpletterende standvastigheid zien van de docenten: er valt altijd bijna een kwart af, bij de toelating, bij iedere overgang, bij het eindexamen. Een wereldoorlog, industrialisatie, ingrijpende maatschappelijke ontwikkelingen: het maakt allemaal niet uit.
- Er dus dus ook sprake van een cultureel of zo je wilt een psychologisch probleem. In een meritocratische context, whatever.
- Roeters (Inspectie: onderwijsrapportage). De Inspectie neemt geen genoegen met prestaties beneden het gemddelde (u ook?). Die platte onlogica zien we nog wel eens bij naïeve journalisten, maar nu dus ook bij de top van de Inspectie.
- studiestrategieën: altijd met de hakken over de sloot leidt op den duur tot problemen. Maar dat geldt veel minder voor onderwijsprogramma’s die versnipperd zijn dan voor programma’s met een duidelijke opbouw. Veel minder voor programma’s die de nadruk leggen op leren denken en probleemoplossen, dan op programma’s waar vakexpertise wordt opgebouwd.
Niet iedereen kan een uitblinker zijn, dan krijgt het begrip uitblinken immers een andere betekenis. Dat beseffend, kunnen we er wel iets aan doen om niet altijd en eeuwig leerlingen MET ELKAAR te vergelijken, in plaats van MET ZICHZELF. Herken hierin een pleidooi voor het koersen op toegevoegde waarde, maar dat is nog knap lastig in praktijk te realiseren.
Fons van Wieringen, in zijn afscheidsrede als voorzitter van de Onderwijsraaad:
“Van de studenten heeft 87% een bijbaan. Voor studenten rechten en studenten in de lerarenopleidingen is dit 95%; de studenten economie doen het met 80% wat rustiger aan. Ook het aantal uren dat studenten per week aan de bijbaan besteden is niet mis. De helft van de studenten is tussen de 10 en 20 uur per week aan het werk. Een student wordt geacht 42 weken te studeren van 40 uren, dat is 1.680 uren per jaar, maar dat wordt zelden gehaald. Het gemiddelde in Nederland is 1.300 uren, in Vlaanderen wordt een aanzienlijk hoger urental van 1.746 uren gerapporteerd. Op een driejarige studie studeren Vlaamse studenten dus ruim één jaar meer, het is niet goed voorstelbaar dat dit niet leidt tot niveauverschillen.”
Het probleem dat A. D. de Groot niet kon oplossen: er is geen valide inhoudelijke reden waarom Marietje met nèt voldoende punten wèl overgaat, en Pietje met nèt te weinig punten blijft zitten. De Groot had even ‘out of the box’ kunnen denken: bedenken dat er ook andere goede redenen kunnen zijn om dit soort beslissingen te nemen, ook al vallen ze in specifieke gevallen op inhoudelijke gronden niet te verdedigen. Definieer de situatie als een spel, met spelregels. Gegeven de examenregeling, moet je voor je overgang voldoende punten bij elkaar sprokkelen. Lukt je dat niet, dan heb je misschien pech gehad, of is deze opleiding een tikje te zwaar voor je. Het briljante van de eis van transparantie van A. D. de Groot is dat dit precies de voorwaarde is waaronder dit spel zo kan worden gespeeld. Voor docenten de taak om die transparantie te borgen.

A. D. de Groot (1970). Some badly needed non-statistical concepts in applied psychometrics. Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden, 26, 360-376. Integraal: http://www.benwilbrink.nl/publicaties/70degroot.htm
(Novum) - Een hele grote groep Nederlandse scholen voldoet maar net aan de minimumeisen en slechts heel weinig scholen presteren goed. Het Nederlandse onderwijs blinkt uit in middelmatigheid, concludeert de Inspectie van het Onderwijs in haar jaarverslag.
Zwakke scholen zijn er niet veel meer, maar daar is volgens de inspectie ook alles mee gezegd. Inspecteur-generaal Annette Roeters schrijft een stagnatie te zien. "De afgelopen jaren was er een duidelijke positieve ontwikkeling aan de onderkant, waardoor er nu nauwelijks nog zwakke scholen zijn. Maar daarna houdt het op."
Nederland heeft volgens Roeters 'heel erg weinig' excellerende leerlingen en scholen. "Het aantal goed presterende leerlingen lijkt zelfs af te nemen de laatste jaren."
Scholen hebben te veel het idee dat ze niet meer hun best hoeven te doen als ze aan de minimumnormen voldoen, vreest Roeters. "Dat is een zorgelijke ontwikkeling."
Uit de actuele tekst van Toetsvragen ontwerpen, paragraaf 2.6 over validiteit html
Een bijzonder en tegelijk universeel fenomeen is dat van de stilzwijgende onderhandeling tussen docenten en studenten over de strengheid van de beoordeling. Ha, de wetmatigheid van Posthumus zult u denken. Zeker, maar het gaat nu om het achterliggende mechanisme, om het strategisch handelen van docenten — prachtig beschreven door Adriaan de Groot in zijn Vijven en zessen van 1966 — waar studenten als groep hun eigen strategie — een vorm van de lijn trekken, ‘zesjescultuur’ heet dat sinds een platte opmerking van onze eerste minister Balkenende — tegenover die van hun docenten zetten. Klinkt dit bekend, of juist vaag? Welnu, het fenomeen is prima te onderzoeken (Coleman (niet gepubliceerd), Wilbrink 1992a, 1992b), Becker, Geer en Hughes (1968) hebben een sociologisch casus beschreven, James Coleman (1990) heeft er een methodologisch apparaat voor gegeven. In theorie hoeft deze stilzwijgende onderhandeling geen impact te hebben op de wijze van ontwerpen van examenvragen, in de praktijk is het een illusie te menen dat de validiteit hier geen bedreiging van ondervindt.
H. Becker, B. Geer & E. C. Hughes (1968). Making the grade: the academic side of college life. New York: Wiley. html en html download: https://annas-archive.org/search?q=Making+the+grade%3A+the+academic+side+of+college+life
James S. Coleman (1990). Foundations of social theory. Harvard University Press.
James S. Coleman (1994 unpublished). What goes on in school: A student’s perspective. html
A. D. de Groot (1966). Vijven en zessen. Wolters-Noordhoff.
Ben Wilbrink (1992). Modelling the connection between individual behaviour and macro-level outputs. Understanding grade retention, drop-out and study-delays as system rigidities. html
Ben Wilbrink (1992). The first year examination as negotiation; an application of Coleman’s social system theory to law education data. html
Op het eerste gezicht minder interessant, maar in potentie een bedreiging van onderwijskwaliteit, is het sluiten van compromissen overal waar het lastig of onmogelijk is om op een directe manier te vragen naar specifieke kennis. Bijna iedere formele beoordelingssituatie is een kunstmatige situatie, zodat in deze zin toetsvragen per definitie geen perfecte match kunnen vormen met de kennis waar het in de cursus om gaat.
blog: Eindexamens hbo: model CSE?
De ‘minimale inspanning’ in mijn blog slaat niet op een vermeende zesjescultuur. Over dat laatste: het Nederlandse stelsel fixeert minimale eindniveaus, waarvan uiteraard de consequentie is dat de studieduur dan flexibel moet zijn. Het Engelse stelsel doet het andersom. Dat laatste is logistiek natuurlijk veel handiger. Het gestandaardiseerde eindniveau heeft alleen nut voor wie gelooft dat zoiets garanties voor het leven geeft, wat niet zo is.
Mijn ‘minimale inspanning’ slaat op de inspanning die de opleiding moet leveren. Briljante studenten kunnen selecteren aan de poort betekent dat je er verder weinig meer aan hoeft te doen. Als een ziekenhuis dat alleen patiënten met een neusverkoudheid toelaat. Zo doen top-instituten zoals Harvard het, wat een maatschappelijk schandaal is (Harvard heeft trouwens de laatste jaren iets aan die belabberde eigen inspanning gedaan, hopelijk is dat geslaagd).
Investeren in vermogen. Sociaal en Cultureel Rapport 2006. rapport als pdf ophalen

“Bij veel van de initiatieven gaat de aandacht vooral uit naar de ontwikkeling van bètatalent. Speciale programma’s voor getalenteerde studenten hebben gemeen dat er sprake is van selectie, kleinschaligheid en een intensief contact tussen (deels buitenlandse) studenten en hooggekwalifi ceerde docenten. Het egalitaire denken en de zesjescultuur lijken duidelijk op hun retour.
Afgaande op de populariteit van examencursussen voor havisten en vwo’ers, lijkt de zesjescultuur enigszins op haar retour, althans bij een deel van de scholieren in het voortgezet onderwijs. Deze door universiteiten verzorgde cursussen mogen zich in een enorme populariteit verheugen. De examencursus van de Universiteit Leiden startte tien jaar geleden met twintig belangstellenden en telde er dit jaar meer dan duizend. Dat zijn lang niet allemaal scholieren die een voldoende moeten halen om te kunnen slagen; 40% volgt de cursus om een hoger cijfer te halen of om te worden toegelaten tot een opleiding waarvoor een numerus fi xus geldt (Van Zweden 2006). Onduidelijk is in hoeverre deze ontwikkeling samenhangt met de toename van het aantal opleidingen met een toegangsselectie. Een dergelijke samenhang is goed mogelijk, zo leren ervaringen in landen met meer selectieve stelsels van hoger onder- wijs. Van meer niveaudifferentiatie in het hoger onderwijs gaat vermoedelijk een voorafschaduwend effect op het voortgezet onderwijs uit.
Van Latent naar Talent: een inventariserend onderzoek naar talentontwikkeling in het voortgezet onderwijs. Marianne Rensema, Inspectie Van Het Onderwijs Nederland, Hariet Pinkster. pdf
Inleiding:
Het nieuwe kabinet heeft de wens om tot de top vijf van kenniseconomie.n te behoren. Om deze wens te realiseren is het o.a. belangrijk om alle talenten in het onderwijs te benutten en de zesjescultuur voorbij te streven. ( . . . ) Uiteindelijk hopen wij kansrijke indicatoren die objectief de mate weergeven van de bijdrage die scholen leveren aan talentontwikkeling op te kunnen nemen in het waarderingskader, ons instrument waarmee we de kwaliteit van het onderwijs beoordelen.
http://www.youtube.com/watch?v=yyXiNKgpRQc
“Gepubliceerd op 14 nov 2012

Een film over de Nederlandse zesjescultuur. De zesjescultuur die alom wordt erkend als hardnekkig probleem. Deze film gaat over de achtergrond van deze cultuur. Maar gaat vooral in op de noodzaak en de mogelijkheden om die te veranderen. Want dat kan!
In de film horen we opinieleiders als Alexander Rinnooy Kan, Paul Schnabel, Robbert Dijkgraaf en Hans Wijers. En daarnaast zien en horen we jongerendeskundigen uit de wetenschap en uit de praktijk van het onderwijs.
De film, gemaakt in opdracht van het Platform Beta Techniek, verschijnt deze maand en gaat naar alle scholen in het land. Het doel is om scholen te helpen hun eigen visie te vormen en met het onderwerp actief aan de slag te gaan.
Want op de scholen en met de leraren moet het gebeuren. Zij kunnen het verschil maken.”

abstract

Ben Wilbrink (1992). Modelling the connection between individual behaviour and macro-level outputs. Understanding grade retention, drop-out and study-delays as system rigidities. In Tj. Plomp, J. M. Pieters & A. Feteris (Eds.), European Conference on Educational Research (pp. 701-704.). Enschede: University of Twente. Paper: auteur. html

Ben Wilbrink (1992). The first year examination as negotiation; an application of Coleman's social system theory to law education data. In Tj. Plomp, J. M. Pieters & A. Feteris (Eds.), European Conference on Educational Research (pp. 1149-1152). Enschede: University of Twente. Paper: auteur. html

Ben Wilbrink (1995). What its historical roots tell us about assessment in higher education today. 6th European Conference for Research on Learning and Instruction, Nijmegen. Paper; auteur. html

Eerste poging om een samenhangende uiteenzetting over dit onderwerp te presenteren. Het is niet de eerste publieke presentatie, dat was die voor de staf van het Cito, in 1993.

Ben Wilbrink (1995). Leren waarderen. html concept

SVO-project

Ben Wilbrink (1997). Assessment in historical perspective. Studies in Educational Evaluation, 23, 31-48. html

Op deze webpagina is tevens de latere literatuur bijgehouden (overigens ook eerdere :-)

George F. Madaus (1994). Boekbespreking van F. Allan Hanson Testing Testing. American Journal of Education, 102, 222-234

p. 230
( . . ) the invention of the quantitative mark by William Farish in 1790, a key development in testing’s history because of the bureaucratic potential of the quantified mark. Farish’s invention made it possible to accumulate and aggregate student marks, organize them, rank them, classify them, form categories, determine averages, fix norms, describe groups, compare results across units of aggregation, and fix individuals and groups in a population distribution. Farish’s invention evetually opened up a whole new technique for program and school-level accountability (Madaus and Kellaghan 1992)."
p. 230
F. Allan Hanson (1993). Testing testing. Social consequences of the examined life. University of California Press. online

Madaus, G. F., & Kellaghan, T. (1992). Curriculum evaluation and assessment. In P. W. Jackson: Handbook of resarch on curriculum. New York: Macmillan (119-154). [POW NASLAG 81.62]

Tot mijn verbazing hierin een uitgebreid deel Major historical developments in the evaluation component of the curriculum 121-126. Waar ik naar zoek is meer informatie over die Farish, daarvoor Hoskins, K. (1968). The examination, disciplinary power and rational schooling. History of Education, 8, 135-146. (on Farish) p. 121:
Certainly, the 16th century had many examination enthusiasts. For example, Philip Melanchton, the great Protestant German teacher, is quoted as saying in his De Studiis Adolescentum that ‘no academical exercise can be more useful than that of the examination. It whets the desire for learning, it enhances the solicitude of study while it animates the attention of whatever is taught.’ (quoted in Hamilton 1853, 769).
Hamilton, W. (1853). Discussions in philosophy and literature, education and university reform. London: Longman (2nd edition)]
p. 121 een onbegrijpelijke generalisatie:
The Jesuits can probably take credit for the widespread acceptance of examinations as a means of raising educational standards in European universities.
Verwijzend naar Aries 1962, over contract tussen de vroedevaderen van Treviso en zijn schoolmeester in 1444: payment by results. Madaus en Kellaghan leggen veel te veel in dit ene casus, veel meer ook dan de voorzichtiger Aries doet.
p. 121 payment by results: een reeks van publicaties over 18e-eeuws Ierland, tot Jamaica.
p. 122:
There were many contemporary critics of the Payment by Results scheme. Matthew Arnold, a school inspector in England at the time, rendered a classic indictment of the sceme when he described it as: ‘a game of mechanical contrivance in which teachers will and must learn how to beat us. It is found possible by ingenious preparation to get children through the . . . examination in reading, writing and ciphering, without their really knowing how to read, write and cipher.’ (quoted in Sutherland 1973, 52)
Sutherland, Gillian (1973). Elementary education in the nineteenth century. London Historical Association.
Arnold’s observations illustrate how high-stakes tests are believed to influence both teacher and pupil behavior. Examinations, through the associated rewards or sanctions, are perceived to exercise prescriptive authority on the curriculum and teaching, delimiting and evetually defining what and how things are tought and learned.
The measurement of educational products. 1918 NSSE yearbook.

Kurt F. Geisinger (1982). Marking systems. In Mitzel, H. E. (Ed.). Encyclopaedia of educational research.The Free Press, 1139-1149. abstract

Geeft nuttige informatie over van alles en nog wat, bijv. hoe Pass Fail grading werkt. Een paar nuttige verwijzingen gevonden. Geheel doorgewerkt. Smith and Dobbin (1960) and Cureton (1971) reported that percentage grading was the most popular system during the latter half of the nineteenth century and early part of the twentieth. In this system, a teacher assigns each student a number between 0 and 100; often this number is supposed to correspond to the percentage of the material that the student has learned. This procedure implies an analogy between educational and physical measurements that is more apparent than real. Perhaps a more serious problem was the finding of Starch (1913) that teachers were generally unable to make disctinctions of less than 4 to 7 points out of the total of 100. Therefore, they suggested that only scores that are multiples of five be used. Furthermore, the entire scale in percentage grading is rarely used, because scores below 50 are infrequently given. Equipped with the above information, most educational institutions switched from numerical to letter grades during the 1930s and 1940s, and most recently only about 16 percent of American high schools use percentage grading (Hills, 1976)." Cureton, L. W. (1971). The history of grading practices. National Council on Measurement in Education: Measurement News, 2 (Whole No. 4). Starch, D. (1913). Reliability and distribution of grades. Science, vol. 38. Leiden: Museum Natuurlijke Historie. Smith, A. Z., & Dobbin, J. E. (1960). Marks and marking systems. In C. W. Harris (Ed.), Encyclopedia of Educational Research. New York: Macmillan, 783-791.

"Prescription For Learning"

"An A is not an A is not an A: A History of Grading"- Dr. Mark Durm May 05, 1999
Last week I reread an article that Dr. Mark Durm wrote for The Educational Forum in the spring of 1993. The article is very interesting in that it deals with the history of grading in college systems. I hope you will enjoy this summary of "An A is not an A is not an A: A History of Grading", written by Dr. Durm, Professor of Psychology at Athens State University.
The first question asked is "why do most schools use the A, B, C,D, and F as a grading system? Why are there divisions of grades? Durm (1993, p. 294) writes that Finkelstein, in 1913, was also concerned about this matter, stating that "great stress is laid by teachers and pupils alike upon these marks as real measures or indicators of attainment . . . What faults appear in the marking systems that we are now using?"
This is definitely true today. Students stress out totally if they think they are not getting an A, and as Durm suggests , they are more concerned with the grade than with the learning of the subject. There seems to be ( among some students) no intrinsic motivation to learn. The A is the force that drives them!
Durm (1993) writes that grading was first used at Yale in 1785, and was based on a list of descriptive adjectives. These may have been the first academic grades given in the United States. Yale later created a scale in the early 1800s based on a scale of four. This might have been the origin of the 4.0 system. There was still no letter grade attached.
After 1813, there were other attempts to evaluate students In 1830, Harvard initiated a numerical grading system based on a sale of 20! " In 1837, mathematical and philosophical professors at Harvard used a scale of 100. Yale, which had used the 4.0 scale in 1813, apparently changed to a 9.0 scale. In 1832 the faculty returned to a 4.0 system" ( Durm, 1993, p. 296).
Other colleges were also working with a grading system. William and Mary used expressive adjectives, and after 1850 went to a numerical scale. The University of Michigan used a numerical system, and then in 1851 went to a pass-no pass system. Later, in 1860 they returned to a 100 scale numerical system. However, in 1867 they adopted a letter system. (Durm, 1993) It appears that colleges were trying to find their way through the grading maze.
Durm, (1993) writes that "evidently such experimentation with grading systems at universities was the norm" (p. 296). Harvard used numerical scales ( 1877), letter grades (1883), classifications of groups of students ( 1884), classes ( 1886), and classifications for merit (1895). In 1897, Mount Holyoke used letters for grading students. The following was used:
- A. Excellent, equivalent to percents 95-100
- B. Good , equivalent to percents 85-94
- C. Fair, equivalent to percents 76-84
- D. Passed, equivalent to percent 75
- E. Failed ( below 75) ( p. 296)
Durm states that this was the beginning of the college grading system. There have been some alterations and additions, including that of including the letter grade in conjunction with the point scale. This history also shows that colleges have indeed struggled with the grading of students.
It is a struggle today. Durm states the following at the beginning of his article. " It seems, for some, that securing a higher grade point average takes precedence over knowledge, learning career-related skills, and other aspects needed to compete in today's world. This fact, coupled with the realization that many college students will, if given a choice, opt for the "easy teacher" rather than one from whom they may learn more, should make teachers reexamine the current system of grading" ( p. 294).
E-mail Dr. Diane Hudson

--------------------------------------------------------------------------------------------------
Education historian Mark Durm (1993) provides a detailed description of the history of grading practices in American universities. Briefly, marking or grading in American education first began at Yale University in the 1780s when a four-point scale was used. "In all probability," Durm notes, "this was the origin of the 4.0 system used by so many colleges and universities today" (p. 295). There is no record of William and Mary using a numerical scale until 1850. Harvard University's first numerical scale was initiated in 1830; however, this was a 20-point scale instead of a 4-point scale. As a precursor to letter grades, Harvard began classifying students into "Divisions" in 1877:
- • Division 1: 90 or more on a scale of 100
  • Division 2: 75-89
  • Division 3: 60-74
  • Division 4: 50-59
  • Division 5: 40-49
  • Division 6: below 40
In 1897, Mount Holyoke College began using the letter grade system, which is so widely used in education today:
- • A: Excellent - equivalent to percents 95-100
  • B: Good - equivalent to percents 85-94
  • C: Fair - equivalent to percents 76-84
  • D: Passed (barely - equivalent to percent 75)
  • F: Failed - below 75
Perhaps the most interesting point about grading practices at these institutions is that prior to utilizing a numeric scale, all three institutions relied solely on written descriptions of students' performance—what today are commonly called anecdotal reports. [from A Comprehensive Guide to Designing Standards-Based Districts, Schools, and Classrooms by Robert J. Marzano and John S. Kendall http://www.ascd.org/portal/site/ascd/template.chapter/menuitem.b71d101a2f7c208cdeb3ffdb62108a0c/?chapterMgmtId=8fd0a2948ecaff00VgnVCM1000003d01a8c0RCRD]

John A. Laska & Tina Juarez (Eds.) (1992). Grading and marking in American schools. Two centuries of debate. Thomas. abstract

- S. G. B. [alleen initialen bekend] (1840). Weekly reports in school. The Common School Journal, 2, 185-187 [11-14];
- Winfield Scott Hall (1906). A guide to equitable grading of students. School Science and Mathematics, 6, 501-510 [1519];
- Meyer, M. (1908). The grading of students. Science, 28, 243-250 [20-27];
- Daniel Starch and Edward C. Elliott (1913). Reliability of grading work in history. School Review, 21, 676-681 [28-32];
- S. L. Pressey (1925). Fundamental misconceptions involved in current marking systems. School and Society, 21, 736-738 [33-36];
- Robert Oliphant (1986). Letter to a B student. Liberal Education, 72, 183-187 [37-42];
- Henry C. Morrison (1926). The practice of teaching in the secondary school. Chicago: University of Chicago Press (exc. pp. 35-40; 74-75; 79-81) [45-52];
- Carleton Washburne (1932). Adjusting the school to the child. Yonkers-onHudson, New York: World Book Co. (exc.: 1-9, 159-167) [53-62].
- Jon J, Denton and Kenneth T. Henson (1979). Mastery learning and grade inflation. Educational Leadership, 37, 150-152 [63-66];
- William Spady (1987). On grades, grading and school reform. Outcomes, 7-12 [67-79];
- Horace mann (1848). Lectures on education. Boston: Fowle (exc. 104-105) [80-81];
- Francis W. Parker (1894). Talks on pedagogics. Chicago: Kellogg (exc. 363, 366-371) [82-84];
- Stephen S. Colvin (1912). Marks and the marking system as an incentive to study. Education, 32, 560-568 [85-92];
- William L. Wrinkle (1935). School marks - why, what and how? Educational Administration and Supervision, 21, 218-225 [93-100];
- William Glasser (1971). Reaching the unmotivated. The Science Teacher, 38, 18-22 [101-104];
- Robert L. Ebel (1980). Failure of schools without failure. Phi Delta Kappan, 61, 386-388 [105-112];
- Agnes M. Lathe (1889). Written examinations - their abuse, and their use. Education, 9, 452-456 [117-120];
- Ralph W. Tyler (1935). Evaluation: a challenge to progressive education. Educational Research Bulletin, 14, 9-16 [121-128];
- Lemuel R. Johnston (1950). Are there better ways of evaluating, recording, and reorting pupil progress in the junior and senior high schools? NAssP Nulletin, 34, 73-89 [129-135];
- Lucille Morris (1952). Evaluating and reporting pupil progress. The Elementary School Journal, 53, 144-149 [136- 142];
- Doug A. Archibald and Fred M. Newmann (1988). Beyond standardized testing: assessing autehntic academic achievement in the secondary school. Reston, Va.: National Association of Secondary School Principals (exc. 1-4, 25-32) [143-148].

Michael Young (1958). The rise of the meritocracy 1870 - 2033. An essay on education and equality. London: Thames and Hudson.

Recente literatuur op het thema:
Ben Wilbrink (1997). Terugblik op toegankelijkheid: Meritocratie in perspectief. In Marian van Dyck: Toegankelijkheid van het Nederlandse onderwijs. Studies (p. 341-384). Den Haag: Onderwijsraad. 97MeritocratieORaad.htm [In dat bestand is ook recentere literatuur bijgehouden]
Pedagogische Studiën heeft in 2004 (81-2) een themanummer over meritocratie uitgebracht. Helaas niet vrij online
Kenneth Arrow, Samuel Bowles & Steven Durlauf (Eds) (2000). Meritocracy and Economic Inequality (p. 5-16). New Delhi, Oxford University Press. isbn 0691004684 site

W. A. Mehrens & B. G. Rogers (1970). Relations between grade point averages and collegate course grade distributions. The Journal of Educational Research, 64, #4. abstract

Een belangrijk artikel, al was het slechts omdat het bijzonder helder weergeeft wat het typisch Amerikaanse beleid is m.b.t. cijfergeven. Bijv.:

"Measurement specialists, in general, advocate the use of relative marking systems in preference to absolute systems. But since relative systems typically permit the instructor to control the percentages of A's, B's, etc. to be awarded, it is possible for these proportions to vary widely between classes, even for groups of similar abilities. Accordingly, many institutions seek to establish a uniform grade distribution policy and encourage or require adherence to it by the faculty."

Roy D. Goldman (1974). Grading practices in different major fields. AERJ 11, 343-357 DOI:10.3102/00028312011004343 abstract & scihub pdf

Roy D. Goldman & Mel H. Widawski (1976). A within-subjects technique for comparing college grading standards: Implications in the validity of the evaluation of college achievement. Educational and Psychological Measurement, 36, 381-390. abstract

Grading standards do not differ capriciously among different fields. Instead, those fields with the &dquo;best qualified&dquo; students (i.e., highest HSGPA and SAT scores) have the most stringent grading standards. This finding is well in accord with Helson’s (1947) adaptation-level theory. Apparently the assessment of human performance is rarely absolute-people are judged in comparison to their peers. If their peers are very highly qualified, then judgment is rigorous.

A. Christopher Strenta & Rogers Elliott (1987). Differential grading standards revisited. Journal of Educational Measurement, 24, 281-291.

Dorothy C. Holland & Margaret A. Eisenhart (1990). Educated in romance. Women, achievement, and college culture. University of Chicago Press. Passages van belang: IX 6 regels van beneden; 237 bovenaan; 259 Horowitz; 165 2e, 3e, 4e alinea 171 3e a., laatste a.! 172 4e a. ‘teachers . . . ’ 173 5e, 6e, 7e, 8e 179 2e einde, 3e & laatste a. 178 2e, 3e, 4e a. 179-180 194 3e.

Claude Montmarquette & Sophie Mahseredhan (1989). Could teacher grading practices account for unexplained variation in school achievements? Economics of Education Review, 8, 335-343. abstract

Jerome E. Singer (1964). The use of manipulative strategies: Machiavellianism and attractiveness. Sociometry, 27, 128-150. preview

Onderhandelen:
"In an explorative study of the utility and efficacy of manipulative strategies of behavior, positive relationships were found between machiavellianism and students’ grades with abilities held constant. Further studies demonstrated that there were birth order effects: later-born males are more successful as manipulators than first-born. Evidence was presented that women also use manipulative strategies, those of physique. Again, there were birth order effects: there was a significant partial correlation between attractiveness and grades for first-born girls but not later-borns. It was then found that first-born girls are more concerned about their physique and are more apt to make themselves noticed."

Howard S. Becker, Blanche Geer & Everett C. Hughes, E.C. (1968). Making the grade: the academic side of college life. Wiley. Reprinted 1995 by Transaction site

Dit is een fantastisch goed boek over de dagelijkse werkelijkheid van het GPA, voornamelijk vanuit optiek van de studenten beschreven. De enige goede beschrijving van het Amerikaanse cijfersysteem: de auteurs nemen de moeite precies aan te geven wat de regels etc. zijn, iets dat heel zelden is omdat vrijwel alle auteurs het systeem en zijn regels bekend veronderstellen. Heel expliciet (66 en p. 68 bv.) geven de auteurs aan dat de student in zijn class onderhandelt met de docent over de grades (exact het verbale Coleman-model!), en daartoe voortdurend uit is op wat er nodig is om een behoorlijk cijfer te kunnen scoren (doorzichtigheid dus, in een situatie in 1960 waar studenten bijzonder in het onzekere worden gehouden over welke prestaties hoe beoordeeld zullen worden en hoe verschillende oordelen uiteindelijk in de grade voor het betreffende vak zullen resulteren. Veel en veel interessanter boek dan dat van Pollio et al 1986). In zekere zin geeft dit boek veel beter het soort informatie dat ik zelf bij een bezoek aan de USA verzameld zou willen zien over het cijfersysteem (oude suggestie van Hofstee: ga eens kijken hoe die GPA-systemen functioneren, als je echt werk wilt maken van die compensatorische examenregeling). Het is een hard verhaal, en laat nog eens zien, ook zonder dat te kwantificeren, hoe een hard systeem er niet toe leidt dat vrijwel iedereen binnen boord blijft. Bedenk dat bij pleidooien om regels te verharden, ook in het belang van studenten: de USA-situatie laat zien dat er ook dan hopen uitvallers zijn, hoewel deze auters geen poging doen uit te zoeken waarom dat zo is, al laten zij wel zien en presenteren zij dat zo ook heel duidelijk dat studenten vaak denken dat het aan eigen falen ligt, en dat docenten denken dat het aan gebrek aan capaciteiten ligt.

relevantie voor Coleman-model:
"( . . . ) faculty fail to give sufficient weight to the pull of other intersts. They do not see, for instance, that the student may not be able to afford any more interest in their course because he needs to devote time and effort to another course that is giving him more trouble. They see even less that the student feels he may not be able to afford any further interest because he thinks that other rewards available in organizational activity and personal relationships are equally important and that academic rewards must be balanced against that competition. They do not understand, in short, that from the student's point of view true maturity consists in striking that balance in a reaonable way. It is probably incorrect to say, as we just have, that faculty do not know these things. We could put it more precisely by saying that what they do not see is the legitimacy students accord to this competition to the interest their course should generate, the legitimacy that arises from its grounding in the students' view of muturity. In this sense, they do not understand that able students do not feel free to strike the kind of bargain faculty members propose, for to do so would be imature and unbalanced. It is likely that students willing to make such a bargain are, from the student point of view, unbalanced, for they would be students who had no other interests, who were insensitive to the attraction of other worthwhile activities possible on campus. ( . . . ) Evene though we are primarily concerned with students' definitions of the classroom situation, we rely, in the analysis that follows, largely on our own observations of classroom interaction (..): students do not typically describe the situation but rather talk about their difficulties with it. Thus students do not often express the notion that the classroom is a place where grades are exchanged for academic performance. But they do tlk about the difficulties involved in holding up their end of the exchange, in a way that implies that definition of the situation."

C. R. Snyder & Mark Clair (1976). Effects of expected and obtained grades on teacher evaluation and attribution of performance.Journal of Educational Psychology, 68, 75-82. abstract

The present evidence, then, supports a notion that a teacher can get a "good" rating simply by assigning "good" grades. The effect of obtained grades may bias the students' evaluation of the instructor and therefore challenges the validity of the ratings used on many college and university campuses.

Richard M. Warren (1995). Criterion shift rule and perceptual homeostasis. Psychological Review, 92, 574-584. abstract

A criterion shift rule is proposed, which considers that the bases employed for evaluative judgments are displaced in the direction of a preceding or simultaneously encountered value.

Robert F. van Naerssen (1982). Over punten en judicia en ‘mastery’ bij het hoger onderwijs. Tijdschrift voor Onderwijsresearch, 7, Notities en Commentaren, 223-225. Tijdschrift voor Onderwijsresearch scans in deze lijst

Een interessant voorstel van Bob van Naerssen om studenten de gelegenheid te geven een zwak cijfer voor een vak op te waarderen tegen inlevering van een relatief behoorlijk aantal 'punten' voor dat vak. Een compensatorische examenregling dus, maar in een onverwachte vorm.

Richard Winter (1993). Education or grading? Arguments for a non-subdivided honours degree. Studies in Higher Education, 18, 363-378.

Nieuwigheden in het Engelse hoger onderwijs brengen verschuiving met zich mee van het traditinele vergelijkende beoordelen naar meer criterium-georienteerd beoordelen. Ik weet niet of dit stuk van enig belang is, ik heb het in ieder geval beschikbaar.

John W. Young (1993). Grade adjustment methods. Review of Educational Research, 63, 151-163. preview

Van belang omdat het een volstrekt onomstreden ‘bewijs’ is voor de stelling dat cijfergeven relatief is. Ook interessant omdat in al die verhalen over grade adjustment ontbreekt dat cijfers het gecombineerde resultaat van investeren en capaciteiten zijn. Zie ook Lei, Bassiri and Schulz (2001).

Pui-Wa Lei, Dina Bassiri and E. Matthew Schulz (2001). Alternatives to the Grade Point Average as Measures of Academic Achievement in College. ACT Research Reports 2001-4 pdf

David J. Woodruff, Robert L. Ziomek (2004). Differential Grading Standards Among High Schools. ACT Research Reports 2004-2 pdf

David J. Woodruff, Robert L. Ziomek. (2004). High School Grade Inflation From 1991 to 2003. (ACT Research Report 2004-43 pdf).

a>).

The results of this study not only provide evidence supporting the grade inflation hypothesis, but also that the phenomenon appears to be especially substantial at the higher end of the grade point scale.

John S. Brubacher (1947). A history of the problems of education. McGraw-Hill. archove.orghr>

Interessant materiaal, die bladzijden een keer doornemen op archive.org.

Mark W. Durm (1993). An A is not an A is not an A: a history of grading. The Educational Forum, 57, 294-297. pdf

"For example, in the early years of Harvard, students were not arranged alphabetically but were listed according to the social position of their families (Eliot, C. W. (1935). Harvard memories. Cambridge: Harvard University Press.). In addition, there was apparently no standard process for the selection of the valedictorian [degene die de afscheidstoespraak houdt]. Ezra Stiles, the presidient of Yale in the late eighteenth century, had an interesting statement in his diary concerning the valedictory oration in Latin for July of 1781, The valedictorian was elected by the class. Stiles wrote: ‘The Seniors presented me their Election of Gridly for Vaedictory Orator, whom I approved . . . ’ (Stiles, E. (1901). The literary diary of Ezra Stiles. New York: Charles Scribner's Sons)."
p. 295
"The history of grading in American colleges was eloquently detailed by Mary Lovett Smallwood (1935). She related that marking, or grading, to differentiate students was first used at Yale. The scale was made up of descriptive adjectives and was included as a footnote to Stiles' 1785 diary. President Stiles wrote that 58 students were present at an examinaion, and they were graded as follows: ‘Twenty Optimi, sixteen second Optimi, 12 Inferiores (Boni), ten Pejores’ (Stiles, 1901)."
p. 295
"As Smallwood wrote, ‘Before 1850 descriptive adjectives and various numerical systems of evaluation had been tried, Through the next fifty years, several new scales of merit and demerit were devised.’"
p. 296
"When we consider the practically universal use in all educational institutions of a system of marks, whether numbers or letters, to indicate scholastic attainment of the pupils or students in these institutions, and when we remember how very great stress is laid by teachers and pupils alike upon these marks as real measures or indicators of attainment, we can but be astonished at the blind faith that has been felt in the reliability of the marking system. School administrators have been using with confidence an absolutely uncalibrated instrument. . . . . [V]ariability in the marks given for the same subject and to the same pupils by different instructors is so great as frequently to work real injustice to the students . . . . Nor may anyone seek refuge in the assertion that the marks of the students are of little real importance. The evidence is clear that marks constitute a very rela and a very striong inducement to work, that they are accepted as real and fairly exat measurements of ability or of performance. Moreover, they not infrequently are determiners of the student‘s career."
Finkelstein, I. E. (1913). The marking system in theory and practice. Educational Psychology Monographs 10. Aangehaald in Durm p. 294

Mary Lovett Smallwood (1935). An historical study of examinations and grading systems in early American universities. Harvard University Press. [alleen in UB UvA! Bestellen op nummer 1395 B 23]

David F. Labaree (201). The lure of statistics for educational researchers. Educational Theory, 61, 621-632. abstract of scan

Voor cijfergeven is dit een niet onbelangrijk thema, want onderzoekers grijpen heel makkelijk naar cijfers om een kwantitatieve draai aan hun onderzoeken te geven. En zoiets heeft dan zijn weerslag in de opvattingen van politici en bestuurders, die dan weer gaan sturen op cijfers, enzovoort enzoverder.

Randall R. Curren (1995). Coercion and the ethics of grading and testing. , 425 abstract

I will begin, in what follows, with the complaint that grading is intrinsically coercive, a complaint I shall refer to as the ‘Coercion Argument.’ I will then review some conventional answers to this argument, and will conclude that they suffice to show that the normal uses of testing and grading are not “strongly coercive,” that is, not wrongful violations of students’ rights2 Yet I will also conclude that these answers are not wholly satisfactory, because even “weak” intellectual coercion that involves an infringement justified by other interests of the child is a matter of serious concern. I will then present a response to this problem of “weak” coercion, and in doing so will rely on the moral framework deployed by Allen Buchanan and Dan Brock in their book, Deciding For Others.3The object of their inquiry is responsible surrogate decisionmaking in a health care context, and I might well have chosen to offer an analysis similar to theirs without reference to anything beyond the educational domain; but the structural similarities between the two domains make their framework a convenient and illuminating one to use.

K. Posthumus (1940). Middelbaar onderwijs en schifting. De Gids, 104 deel 2, 24-42. integraal op dbnl.nl

        Aantal    Bevorderd Afgewezen % Bev. % Afgew.
1930     31422     24016     7406     76,4     23,6
1931     32371     24936     7435     77,0     23,0
1932     33531     25639     7892     76,5     23,5
1933     35848     26886     8962     75,0     25,0
1934     38158     28874     9284     75,7     24,3
1935     39585     29897     9688     75,5     24,5
1936     41135     31358     9777     76,2     23,8

data uit publicaties van of verstrekt door het CBS

p. 37:
"De gymnasia en de hogere burgerscholen zijn sinds 1875 volkomen veranderd. Vakken, leerplannen, urentabellen, zijn gekomen en gegaan. Het aantal leerlingen is verveelvoudigd. De meisjes hebben hun intrede gedaan. De maatschappelijke samenstelling der schoolbevolking is geheel veranderd; de wereldoorlog heeft mens en maatschappij vervormd; het gezinsleven, de sport, de vermaken, het is alles anders geworden. Maar de onvolledige statistieken die er zijn, tonen, dat het gedeelte uitvallers der middelbare scholen gelijk bleef."
p. 41:
"Wij moeten besluiten tot de onaanvaardbaarheid van het gebruikelijke stelsel der gelijktijdige en gezamenlijke beoordeling en bevordering. Wie dit alles goed heeft doordacht, zal zich bij het nemen van deze beslissingen nooit meer gerust voelen. Wezenlijke vernieuwing van het middelbaar onderwijs zal slechts bereikbaar zijn, indien dit stelsel geheel wordt verlaten." [Het leerlingjaarklassensysteem dus, b.w.]
Wat houdt 'beter' onderwijs dan in?
p. 41-42:
"Zij berge de schoolcijfers, met de roede en de plak in de opvoedkundige gruwelkamer. Zij stelle zich echter in leerstof en eisen ook niet meer op de uiterste draagkracht van de 'middelmaat' in en plaatse degenen, die daar beneden vallen, ook niet meer, dag in, dag uit, voor onoverwinbare moeilijkheden. Zij bepale haar leerstof slechts uit de mogelijkheden der kinderen en de wenseljkheid der opvoedingsmiddelen en -doeleinden. Zij richte daarbij niet meer één oog op de schifting."
p. 24:
"en als hij er lang op gestaard had werden het cijfers. Alles viel uiteen tot cijfers, bladen vol cijfers. Dat vond docter Cijfer heerlijk, en hij zeide, dat het hem licht werd, als de cijfers kwamen, doch voor Johannes was dat duisternis."
F. van Eeden.

K. Posthumus (1958). Rendement en beoordelingsgewoonten. Universiteit & Hogeschool, 4, 156-161.

p. 158:
"In alle klassen van alle scholen voor middelbaar onderwijs wordt 25% van de leerlingen beoordeeld met een gemiddeld rapportcijfer beneden zes, en zakt voor de volgende klassen of voor het eindexamen. Na 1, 2, 3, 4, of 5 jaar zijn 75%, 56%, 42%, 32% en 24% van de begingeneratie overgebleven. De aansluiting tussen verschillende klassen van de middelbare school is even slecht als die tussen het lager en het middelbaar onderwijs. Het wel of niet afnemen van een toelatingsexamen heeft geen invloed op het rendement, evenmin als veranderingen in de onderwijs- of examenprogramma's of in de schoolbevolking."
p. 158:
"De beoordelaars ontlenen hun maatstaven dus niet aan de programma's, maar aan het ter beoordeling aangeboden werk van de groep. Zij vereenzelvigen het begrip 'middelmaat' met het begrip 'middelste helft', beoordelen die middelmaat met een gemiddeld rapportcijfer tussen 6 en 7 en beschouwen het laagste kwart als onvoldoende. De uitspraak 'é´n vierde gedeelte van de leerlingen voldoet niet aan de eisen' is dus definitie van de eisen. Het kwantitatieve rendement is niet de uitkomst, maar opzet van de schifting; het beschrijft niet een eigenschap van de beoordeelde groep, maar een gewoonte van de beoordelende groep." [mijn nadruk, b.w.]

Prinz von Hohenzollern, J. G. & Liedke, M. (1991). Schülerbeurteilungen und Schulzeugnisse. Historische und systematische aspekte. Bad Heilbrunn: Julius Klinkhardt.

Liedke, M. Ist das Zeugnis das Armutszeugnis der Schule? 25-38.
Fischer-Elfert, H-W: "Das Ohr eines knaben sitzt auf seinem Rücken, er hört nur, wenn man ihn schlägt." Schülerbeurteilungen im Alten ägypten. 39-48.
Rösger, A. Zur Schülerbeurteilung in der Antike - Hellenistische Schulwettbewerbe. 49-60.
Ebneth, B. Schulprüfungen in Spätmittelalter und Frühneuzeit an einem Beispiel: Die Beurteilung der chorales am Neuen Spital in Nürnberg. 61-68.
Keck, R. W. Zensieren und zZertieren: Zut Kontroll- und Gratifikationspraxis der katholischen Pädagogik im jesuitischen Einflussbereich. 69-88.
Doerfel, M. Schülerbeurteilungen in der ‘Pietistenschule’ Neustadt/Aisch im 18. Jahrhundert. 89-94.
Buchinger, H. Zur Geschichte von Zensuren und Zeugnissen in der bayerischen Realschule. 95-110.
Hartleb, W. Das Beurteilungssystem in der Reichsgrafschaft Ortenburg. 111-131.
p. 97:
Da seit dem 1. 9. 1777 in den realschulen ‘die klassen nicht einander subordiniert (waren), so, dass man jährlich von einer Klasse in die andere vorrückt(e), sondern Fächerweise coordiniert’ (Mayr, G. K. 2, 1784, 934), gab es auch keine Jahreszeugnisse mehr. Meinte ein Schüler in einem Fach ‘den gehörigen Grad der Wissenschaft’ (ebd. 935) erlangt zu haben, so konnte er sich nach entsprechender prüfung und Entscheid der lehrerversammlung einem anderen Fach zuwenden. Wollte er jedoch entlassen werden, so wurde ‘ihm ein seiner Fähigkeit und dem gemachten Fortgange angemessenes, und von den Rektor und dem Lehrer gefertigtes Attestat . . . ertheilt’ (ebd. 935). Welchen ausserschulischen Berechtigungswert derartige Zeugnisse inzwischen erlangt hatten, verdeutlicht par63 der Schulverordnung von 1777 in dem es heit: ‘Die gewöhnlichen Attestaten und Testimonien sollen mit der genauesten Unparteylichkeit ausgestellt, und dabey von dem Rekotr sowohl als professor die grosse Pflicht gegen den Staat, dem durch Empfehlung unwürdiger Diener so sehr geschadet wird, nicht ausser Acht gelassen werden’ (ebd. 940).
Breitschuh, G. Der Frankfurter Wachensturm von 1833 und seine Bedeutung für das Reifezeugnis in Deutschland. 132-147.
Apel, H-J. "Der Leitung des Jungen fehlte die starke Hand des Vaters." Beurteilungsvorschriften und Beureteilungspraxis: Gutachtliche Bewertungen zum Abitur zwischen 1925 und 1936 in Rheinpreussen. 148-159.
Rump, H-U. über-Ich und Untertan. Das Lehrer-Schüler-Verhältnis im Spiegel ausgewählter Beispiele der deutschen Literatur des 20. Jahrhunderts. 160- 93.
Grunder, H-U. Regionale Besonderheiten des Schulzeugnisse in der Schweiz. Ansätze zu einer historisch-systematischen Skizze. 175-193.
Ciperle, J. Das Schulzeugnis in Jugoslawien - Historische Entwicklung und gegenwärtige Problematik. 194-207.

Anna Südkamp, Johanna Kaiser & Jens Möller (2012, March 26). Accuracy of Teachers' Judgments of Students' Academic Achievement: A Meta-Analysis. Journal of Educational Psychology, 104, 743-762 abstract

Zie dit ook in relatie tot het artikel van Bowers, hierbeneden.

Alex J. Bowers (2011): What's in a grade? The multidimensional nature of what teacher-assigned grades assess in high school. Educational Research and Evaluation: An International Journal on Theory and Practice, 17, 141-159. abstract

Herbert Hoijtink en Klaas Sijtsma (2009). Meten Onder Druk. Advies aan de CEVO Inzake de Normering van Eindexamens Voortgezet Onderwijs. pdf

Zie hierover o.a. de WiskundE-brief 560: N-termen wetenschappelijk gewikt en gewogen.

Iasonas Lamprianou (2009). Comparability of examination standards between subjects: an international perspective. Oxford Review of Education, 35, 205 - 226 abstract

Sarah Warshauer Freedman (1979). Why do teachers give the grades they do? College Composition and Communication, 30, 161-164. pdf

opstelbeoordeling; experiment met in vier opzichten gemanipuleerde opstellen: content, organization, sentence structure, spelling e.d.

P. van Rijn, A. Béguin & H. Verstralen (2009). Zakken of slagen? De nauwkeurigheid van examenuitslagen in het voortgezet onderwijs. Pedagische Studiën, 86, 185-195. abstract .doc

Als maat voor de 'nauwkeurigheid' van de examenuitslagen gebruiken de auteurs de fictie dat zak-slaagbeslissingen 'terecht' of 'ten onrechte' kunnen zijn. Zij wijden er een korte beschouwing aan, waarin ze dicht komen bij de constatering dat het natuurlijk een onzinnig idee is omdat examenkandidaten met een 'ware' beheersing precies op de grens zakken-slagen een kans hebben van 50% om te zakken. Wat is daar 'terecht' aan? Of juist 'niet terecht'? Laat de flauwekul dus achterwege, en analyseer in termen van alleen die slaagkansen: die worden beter naarmate de 'ware beheersing' hoger is, dus ook naarmate de voorbereiding op het examen beter is. Ik maak er deze wat venijnige opmerkingen over omdat al sinds het begin van de mathematische statistiek perfect duidelijk is dat een discours over 'terecht' of 'onterecht' zakken of slagen onzinnig is (Edgeworth, eind 19e eeuw).

Willem K. B. Hofstee (2009). Promoting intersubjectivity: a recursive-betting model of evaluative judgments. Netherlands Journal of Psychology, 65. abstract

J. H. Stein: Inrichting der examens voor onderwijzers. In M. J. Koenen en J. H. Stein (Red.) (1882). School en Studie, Maandschrift voor Opvoeding en Onderwijs, 7-10. Vierde jaargang. Tiel: D. Mijs. [behalve de eerste jaargang volledig aanwezig in de KB, niet gedigitaliseerd] abstract

"Werden de examinandi voorheen meestal in groepen geëxamineerd, nu is 't examen hoofdelijk; slechts voor enkele bijvakken mogen groepen van niet meer dan drie candidaten tegelijk onderzocht worden." De waardering van de prestaties verandert van toekenning van punten naar cijfergeving: "Volgens de jongste voorschriften worden voor de verschillende vakken niet meer aantallen punten in verhouding tot hun gewicht gegeven. De candidaat verwerft voor elk vak na afgelegd examen een der praedicaten: UITMUNTEND, ZEER GOED, GOED, RUIM VOLDOENDE, EVEN VOLDOENDE, ONVOLDOENDE enz., voorgesteld door de cijfers 10, 9, 8 enz. Onder ONVOLDOENDE zijn dus nog drie graden. Dezelfde cijfers zijn aangenomen bij de eindexamens der H. Burgerscholen en bij verschillende andere examens." "Met het compensatiestelsel is gebroken. Wie òf voor Nederlandsche taal òf voor rekenen 4 krijgt, wordt onherroepelijk afgewezen, evenals hij die voor twee andere vakken dit cijfer verwerft. De candidaat, die nu voor zang en schrijven een te laag cijfer ontvangt, is verloren, al ware hij overigens nog zo bekwaam." Stein merkt wel op dat de examencommissie over het slagen van iedere candidaat afzonderlijk stemt, en in theorie dus iemand die met de laagst mogelijke cijfers dreigt te slagen, af kan wijzen, en omgekeerd. Ik ben benieuwd of commissie sin d praktijk daar ook de moed voor op hebben gebracht.

Liying Cheng (1999). Changing assessment: washback on teacher perceptions and actions. Teaching and teacher education, 15, 253-271. pdf

P. L. Roth, C. A. BeVier, F. S. Switzer & J. Schippmann (1996). Meta-analyzing the relationship between grades and job performance. Journal of Applied Psychology, 81, 548-556. abstract

Het hoeft niet te verbazen dat in de VS er een verband is tussen grades en performance, waar dat in Nederland ontbreekt of zo oninteressant is dat er zelfs helemaal geen onderzoek naar wordt gedaan. Afijn, deze auteurs komen tot .3 of nog iets meer.
Several factors were found to moderate the relationship. The most powerful factors were the year of research publication and the time between graduation and peformance measurement. p. 553: Grades reported before 1961 appeared to be more valid. In addition, validities were higher after 1 year on the job.
Problem is here: the authors do not consider the argument of Dawes 1975 Graduate admission variables and future success.

Nathan R. Kuncel, Marcus Credé & Lisa L. Thomas (2005). The validity of self-reported grade point averages, class ranks, and test scores: A meta-analysis and review of the literature. Review of Educational Research, 75, 63-82. abstract

P. J. Hartog (1918). Examinations and their relation to culture and efficiency.. London: Constable. pdf

Full text of "The Case For Examinations An Account Of Their Place In Education With Some Proposals For Their Reform" http://www.archive.org/stream/caseforexaminati011620mbp/caseforexaminati011620mbp_djvu.txt Bewaard als brereton.1943.rtf

C. W. Valentine (1932).The Reliability of Examinations. An Enquiry. London: University of London Press. [niet online beschikbaar, 2013]

H. van den Bergh, E. Rohde en M. Zwarts (2003). Is het ene examen het andere? Over de stabiliteit van schoolonderzoek en centraal examen. Pedagogische Studiën, 80, 176-191 open access op http://www.open.ou.nl//vor/3_Pedagogische_Studiën/80.htm

Teachers' Marks; Their Variability and Standardization by Frederick James Kelly 1913 Columbia University. integraal online

C. T. Gray (1913). Variations in the grades of high school pupils. Warwick and York. integraal online

Jay Parekh (2002). Do Median Grades Vary Across Departments? Cornell Higher Education Research Institute, Working paper WP 30. pdf

Korte scriptie-achtige studie. Wel aardig als casus van Amerikaans cijfergeven, verder niet diepgravend.

Becker, H., Geer, B., & Hughes, E. C. (1968). Making the grade: the academic side of college life. New York: Wiley. http://home.earthlink.net/~hsbecker/ http://home.earthlink.net/~hsbecker/grades.html

Baird, L., & Feister, W. J. (1972). Grading standards: the relation of changes in average student ability to the average grades awarded. American Educational Research Journal, 9, 431-441.abstract

(wet van Posthumus) fc 440; This study confirms the earlier research of Aiken (1963), Hills 91964), Hills and Gladney (1968), Webb (1963), Wilson (1970), and others which indicated that faculty members, at least collectively, prefer or are committed to a certain distribution of grades. Thus, faculties show an ‘adaptation level’ by awarding, on the average, about the same average and distribution of grades, whether their current students are brighter or duller than last year’s.

Keith Chapman (1996). Entry qualifications, degree results and value-added in UK universities. Oxford Review of Education, 22, 251-264. abstract

Er zijn studierichtngen die een hoge proportie kandidaten met hoge cijfers opnemen, maar tegelijk zelf relatief lage cijfers geven! Het probleem voor Chapman is dat hij graag 'value-added' zou bepalen aan de hand van het verschil tussen cijfers. Ik zie dit dan ook vooral als een heel leuke studie in cijfergeven, posthumus-effecten. e.d. T.z.t. zorgvuldiger bestuderen.

Harris (1940). Factors affecting college grades: a review of the literature, 1930-1937. Psychological Bulletin, 37, 125-166. abstract

Een grappig maar ook zeer volledig overzicht van onderzoek naar van alles dat met cijfers zou kunnen correleren. Nee, geen conceptueel interessante analyse. Wie goed oplet zal constateren dat er allerlei variabelen zijn onderzocht die betrekking hebben op de studenten, maar geen enkele die het beoordelingsgedag van docenten betreft, noch individueel, noch docenten als groe (vgl het Coleman-model, zoals in mijn 1992).

Larson & Scontrino (1976). The consistency of high school GPA and the verbal and mathematical portion of the SAT of the CEEB as predictors of college performance: an eight year study. Educational and Psychological Measurement, 36, 439-443. abstract

Lewis, W.A., Dexter, H.G., & Smith, W.C. (1978). Grading procedures and test validation: a proposed new approach. Journal of Educational Measurement, 15, p. 219-

David Pennycuick and Roger Murphy (1988). The impact of graded tests. London: The Falmer Press. isbn 1850002789

Please (1971). Estimation of the proportion of examination candidates who are wrongly graded. BrJMStPs, 24, 230-238. (fc)

Simon, S.B., & Bellanca, J.A. (Eds.)(1976). Degrading the grading myths. o.a. Evans, F.B. What research says about grading. (30-50). Bellanca, J.A., & Kirschenbaum, H. An overview of grading alternatives. (51-62).

Slavin, R.E. (1977). Classroom reward structure: an analytical and practical view. Review of Educational Research, 47, 633-650. competitie

Weeren, J.van. Cijfers geven. De groep als norm bij proefwerken en schoolonderzoek. Arnhem: Cito; 1990. 19 blz.; [heb ik niet beschikbaar]

Willmott, A. S., & Nuttall, D. L. (1975). The reliability of examinations at 16+. London: Macmillan Education. '95

Het is ongelooflijk maar waar: het boek gaat inderdaad in zijn geheel over betrouwbaarheid van die examens! Maar zelfs die bescheiden aanpak is ontluisterend voor de betreffende examens (Abstract:) . . . . all that can properly be said about a candidate awarded a grade 3 is that his ‘true’ grade could be as high as a grade 2 or as low as a grade 4 (that is, lies in the range grade 2 - grade 4). En dat op een vijfpuntschaal!

F. J. Vaes (1930). Statistiek betreffende de 1e Hoogere Burgerschool met vijf-jarigen cursus te Rotterdam. Tweede uitgaaf 1865-1930. Niet in den handel, juli 1930. Ongelooflijk: een lijst van alle leerlingen, hun schoolloopbaan en eindexamen, en wat er later van hen geworden is. De toelichting op de tekens staat op blz 9 e.v. '93

Tluanga (1974). A scaling formula for bounded mark intervals. BrJMStPs, 27, 53-61. (fc)

Thorndike, R.L. (1969). Marks and marking systems. In Ebel, R.L. Encyclopedia of educational research. London: MacMillan.

Baet, A., Moret, L, Schoonen, R., & Sjoer, E. (1993). Zo haal je een hoog cijfer voor je examenopstel: adviezen van en voor leerlingen. De perceptie van de doelstellingen van het opstelonderwijs in de bovenbouw van havo-vwo. Tijdschrift voor Taalbeheersing, 15, 173-192. gezien

Bendig, A. W. (1953). The reliability of letter grades. Educational and Psychological Measurement, 13, 311-321.

Berkel, K. van (1996). Dijksterhuis, een biografie. Amsterdam: Bert Bakker. Hierin een aardig casus over cijfergeven: een conflict tussen twee wiskundeleraren, Dijksterhuis en Kerremans, in de 20er en 30er jaren, eindigend met het ontslag van Kerremans. Over onjuist en te laag cijfergeven.

Please (1971). Estimation of the proportion of examination candidates who are wrongly graded. BrJMStPs, 24,, 230. (fc)

Bridgeman, Brent, & Lewis, Charles (1994). The relationship of essay and multiple-choice scores with grades in college courses. Journal of Educational Measurement, 31, 37-50.

Brookhart, S. M. (1993). Teachers' grading practices: meaning and values. Journal of Educational Measurement, 30, 123-142.

p. 241:
The grading process, as currently practiced, leaves teachers to work out the compromises they must make in their dual role as both judge and advocate for their students. Recommended grading practices, suggesting no compromises, are of limited help to teachers on this issue. This study's results suggest that teachers mix the roles of judge and advocate differently for students of different ability, and this in itself is a value-laden act.

Siero, F., & van Oudenhoven, J. P. van (1993). De invloed van contingente feedback op attributies en prestaties in de klas. TOR, 18, 343-354. Wat ik er zo in de gauwigheid van heb gezien: een naïeve conceptie van het inspanningsparadigma ‘als je maar je best doet, word je daar ook naar gewaardeerd.’ Dat miskent dus het inherent competitieve karakter van beoordelen in het onderwijs. Het artikel is dan ook aardig als illustratie van de doorwrochte naïviteit die op dit gebied kan heersen.

Stricker, L. J., Rock, D. A., Burton, N. W., Muraki, E., & Jirele, T. J. (1994). Adjusting college grade point average criteria for variations in grading standards: a comparison of methods. Journal of Applied Psychology, 79, 178-183. fc

Werts, C., Linn, R. L., & Jöreskog, K. G. (1978). Reliability of college grades from longitudinal data. Educational and Psychological Measurement, 38, 89-96.

Caspard, P. et al. (ed.) (1992). Travaux d'élèves; pour une histoire des performances scolaires et de leur évaluation, XIX-XX siècles. Paris: I.N.R.P. ISBN 2734203316. (Themanummer van Histoire de l'éducation, mai 1992 nr. 54). IJSB: PEDA.

Intelligenz und Schulleistung. Kapitel X: Stern, W. (1920). Die Intelligenz der Kinder und Jugendlichen und die Methoden ihrer Untersuchung. An stelle einer dritten Auflage des Buches: Die Intelligenzprüfung an Kindern und Jugendlichen. Leipzig: Verlag von Johann Ambrosius Barth. 194-225. Interessant omdat kennelijk geen 'cijfers' voorhanden waren om testscores mee te vergelijken, maar wel rangorde in de klas!

Spoelder, J. (1978). Over prijzen en promotie op de Latijnsche Erasmiaansche Scholen. In Blom, N. van der (1978). Grepen uit de geschiedenis van het Erasmiaans Gymnasium 1328-1978. Rotterdam: Backhuys. 106-128. (over notae, de voorloper van cijfersystemen). t

J. Spoelder (2000). Prijsboeken op de Latijnse school: een studie naar het verschijnsel prijsuitreiking en prijsboek op de Latijnse scholen in de Noordelijke Nederlanden, ca. 1585-1876, met een repertorium van wapenstempels. Dissertatie. open access: https://repository.ubn.ru.nl/handle/2066/147057 Pas op: 100 Mb

Coleman, J.S. (1959). Academic achievement and the structure of competition. HER, 29, 330-351. Reprinted in Halsey, A.H., Floud, J., & Anderson, C.A. (Editors) (1961). Education, economy, and society. A reader in the sociology of education. London: Collier-Macmillan. 367-389

Naerssen, R. F. van (1982). Over punten en judicia en ‘mastery’ bij het hoger onderwijs. Tijdschrift voor Onderwijsresearch, 7, 223-225. combineren Een interessant voorstel van Bob van Naerssen om studenten de gelegenheid te geven een zwak cijfer voor een vak op te waarderen tegen inlevering van een relatief behoorlijk aantal 'punten' voor dat vak.

Simon French (1985). The Weighting of Examination Components. The Statistician, Vol. 34, No. 3. (1985), pp. 265-280. Stable URL:http://links.jstor.org/sici?sici=0039-0526%281985%2934%3A3%3C265%3ATWOEC%3E2.0.CO%3B2-9

Abstract All examinations are divided into a number of components (e.g. papers, sections, questions, etc.), each designed to assess some aspect of the candidate's intelligence, knowledge, ability and achievement. To provide an overall assessment of a candidate, his or her performance on these components must be combined into a single mark, or, more generally, a single grade. It has long been realised that simply summing marks has unsatisfactory properties; in particular, it does not reflect the relative importance of components. This paper discusses various alternatives to summation of marks. It is suggested that previous approaches to the combination of performance on components have misinterpreted the task before them. Parallels between the structure of examination assessment and the theory of prescriptive choice are indicated, and it is suggested that multi-attribute value theory can provide a framework for tackling the problem.

Simon French and Marilena Vassiloglou (1986). Strength of performance and examination assessment. British Journal of Mathematical and Statistical Psychology, 39, 1-14.

About relative and absolute strength of performance, en dus ook de meetproblematiek die daaraan inherent is (additive conjoint).

Marilena Vassiloglou and Simon French (1982). Arrow’s theorem and examination assessment. British Journal of Mathematical and Statistical Psychology, 35, 183-192. kopie in bak ex regeling. Abstract: Usually in examinations an overall assessment of a candidate’s performance is made by means of a weighted sum of the marks attained on the various components. However, recently it has been suggested that the combination should be based on the candidate’s rankings on the components alone, and not on the actual marks. This paper discusses whether such an approach can lead to a fair and consistent system of assessment. Ik heb er (2-2008) een stukje over gemaakt, en dat toegevoegd aan '97 Assessment in historical perspective. Dit is nogal wat. Een leuk casus misschien om een aantal van de vanzelfsprekende vooronderstellingen in de literatuur te demonsteren, zoals het niet letten op backwash, de beperkte definitie van wat fair en wat consistent is, e.d. De suggestie van rangordenen is gedaan door Wood & Wilson (1980), in vd Kamp, Langerak en De Gruijter 1980 fc in bak ex regeling

Cremers, P.G.J., Konstruktie van een schaal voor bereikt niveau van voortgezet onderwijs. TOR 1980, 5, 80- .

Crijns, J. H. J. (1969). Een school in cijfers. Een cijfermatige analyse van een Nederlandse H.B.S. 's-Hertogenbosch: Malmberg. ophalen

Cross, L.H. e.a., Establishing minimum standards for essays: blind versus informed reviews. JEM 1985, 22, 137-146

Davies, J., & Skinner, V. (1992). Parental responses to records of achievement: a primary school case study. Ed Res 34, 117-132.

Andrew Davis (1998) The Limits of Educational Assessment. Oxford: Blackwell. isbn 0631210202. Special Issue: The limits of educational assessment. Journal of Philosophy of Education, 32(1), 1-155. full contents

Keith, T. Z., & Benson, M. J. (1992). Effects of manipulable influences on high school grades across five ethnic groups. Journal of Educational Research, 86, 85-93. preview

Het gaat hier om een studieresultatenmodel met cijfers als afhankelijke variabele, en dat is nogal bijzonder. Ook bijzonder is dat tijdbesteding is onderzocht, zowel als coursework, als als homework. De analysetechniek is LISREL, en dat lijkt me hopeloos inadequaat (omdat immers ieder lineair model inadequaat is met afhankelijke variabelen die relatief van aard zijn). Interessant is verder dat vijf etnische groepen zijn onderscheiden, evenals sexe. Een nationaal representatieve steekproef van high school students (High School and Beyond) is gebruikt. Auteurs geven een overzicht van literatuur over ‘instructional time’: Carroll (1963, 1989), Bennett 1978; Cooley & Leinhardt 1975; Bloom 1976; Harnischfeger & Wiley, 1976 wat theorie betreft. Research reviews: Hawley & Rosenholtz, 1984; Karweit & Slavin, 1982; Gamoran, 1987; Jencks & Brown, 1975; Lee & Bryk, 1989; Natriello, Pallas & Alexander, 1989; Keith in pressPaschal, Weinstein & Walberg, 1984. Tabel 1 geeft correlaties, gemiddelden en varianties voor etniciteit, milieu, geslacht, intelligentie, onderwijskwaliteit, motivatie, coursework, homework, en grades.
These analyses also supported the school learning model across the five major ethnic groups. This finding was consistent with findings from a previous study that tested a model across groups using achievement test scores as the outcome (Keith, in press). With grades as the learning criterion, however, there were greater differences among the groups.

Domino, G. (1992). Cooperation and competition in Chinese and American children. Journal of Cross-Cultural Psychology, 23, 456-467. fc (gaat niet over cijfergeven, maar over competitief versus cooperatief gedrag. Kennen de Chinezen een systeem van cijfergeven dat vergelijkbaar is met in het Westen gangbare systemen? Goede vraag.)

Ebel, (1969). The relation of scale fineness to grade accuracy. Journal of Educational Measurement, 6, 217-221. (fc)

Etaugh et al. (1972). Reliability of college grades and GPA's: some implications for prediction academic performance. EPM 32, 1045-1050.

Simon French (1981). Measurement theory and examinations. British Journal of Mathematical and Statistical Psychology, 34, 38-49. Sommeren van cijfers (dus compenseren) zou geen goede methode zijn. Stelt een alternatief voor.

Isidor Edward Finkelstein (1913). The marking system in theory and practice. Baltimore, Md. Warwick & York, Inc. Educational psychology monographs . . . , no. 10, "Studies from the Cornell educational laboratory, no. 14." integraal online

- Introduction 5
- Chapter II.
- Theoretical Considerations 9
- 1. Should marks indicate performance or ability or accomplishment? 9
- 2. What is the theoretical distribution of the qualities of traits that marks are to indicate?. . . 11
- 3. What is the best method of translating the distribution into a scale of symbols? 16
- Chaptee III.
- The Distribution of Marks at Cornell University: Combined Results for Numerous Courses 21
- 1. Marks given in 1902 25
- 2. Mapks given in 1903 25
- 3. Marks given in 1911 26
- 4. Combined Curve of Marks in 1902, 1903 and 1911 28
- Chapter IV.
- The Distribution of Marks at Cornell University : Results for Individual Courses 37
- 1. Variation produced by changes of instructors. . 39
- 2. Typical distributions of "high markers" 42
- 3. Typical distributions of "low markers" 49
- 4. Peculiarities of distribution in other courses. . . 60
- 5. Marking system of the College of Law 72
- Chapter V.
- SUMMAEY AND CONCLUSIONS 79

Gesualdi, M. (1967). Die rotschool van u. Brief aan een onderwijzeres. door de kinderen van Barbiana gevolgd door een nabeschouwing van Oscar de Wit, Sibe Soutendijk en Co van Calcar. Utrecht: Bruna. (Zeer scherpe uitspraken, en ook wel empirisch onderbouwd, over beoordelen van leerlingen) bo

A. D. de Groot (1966). Vijven en zessen. Groningen: WoltersNoordhoff.

A. D. Groot en W. H. F. W. Wijnen (1966/1983). Vijven en zessen. Cijfers en beslissingen: het selectieproces in ons onderwijs. Groningen: Wolters-Noordhoff. isbn 9001355501 (NB: Wijnen was er in 1966 uiteraard niet bij!)

Hewitt, B.N., & Jacobs, R. (1978). Student perceptions of grading practices in different major fields. Journal of Educational Measurement, 15, 213-217. (adaptation-level hypothesis) fc

Hoskin, K. (1979). The examination, disciplinary power and rational schooling. History of Education, 8, 135-146. fc Aardig geschreven, maar oppervlakkig en onnauwkeurig. Dweept met Foucault’s Discipline and punish. p. 144 in about 1792, William Farish, one of the moderators, suggested that marks should be assigned for individual questions. Hoskin gaat ervanuit dat dat voorstel inderdaad is overgenomen, maar ik heb dat nooit ergens bevestigd gezien. Interessant is verder dat hij veronderstelt dat het schriftelijk werk van de kandidaten gelijk was, maar dat was niet waar, zie Wilson 1982 p. 337.

Hoyt, D. P. The relationship between college grades and adult achievement. A review of the literature. (ACT Research Report 7) Iowa City, Iowa: The American College Testing Program Publications Office. 1965.

Ingenkamp, K. (Her.) (1971). Die Fragwürdigkeit der Zensurengebung. Texte und Untersuchungsberichte. Weinheim und Base: Beltz Verlag. (u.a.: Dohse: Die geschichtliche Entwicklung des Schulzeugnisses; Starch & Elliott; Hartog & Rhodes: Prüfungszensuren in Geschichte und English; Weiss; Finlayson; Eells; Carter; Hadley. Zeer uitgebreide, vnl. Duitse literatuurlijst)

D. W. Johnson, G. Maruyama, R. Johnson, D. Nelson & L. Skon (1981). Effects of cooperative, competitive, and individualistic goal structures on achievement: a meta-analysis. Psychological Bulletin, 89, 47-62. pdf

Johnson, D. W., & Johnson, R. T. Instructional goal structure: cooperative, competitive or individualistic. Review of Educational Research, 1974, 44, 213240.

Karlins, J. et al (1969). Academic attitudes and performance functions of differential grading systems: an evaluation of Princeton's P/F system. JExE, 37 (3), 38-50. fc

Kelley (1950). The use of literal grades. JEP. (fc)

Karl Josef Klauer (1984). On criterion-referenced grading models. JESt, 9, 237-251. preview

Koenraads, W. H. A. (1957). Cijfergeving als probleem. Openbare les. Groningen: Noordhoff. [Niets van enig belang}

M. J. Langeveld (1961 7e). Inleiding tot de studie der paedagogische psychologie van de middelbare-schoolleeftijd. Wolters.

Een zeer felle passage op p. 467-8; verwijst naar werk van J.H. Gunning Over rapporten en cijfers Verz. Paed. Opstellen II; A de Vletter De donkere dagen voor kerstmis, Aerdenhout 1929, en in Gezin en School, october 1932; Hartog & Rhodes: An examination of examiners (Macmillan)

Le Monde de l’éducation, octobre 1994, p. 19 Les notes, une science inexacte. N.a.v. een onderzoek dat misschien aardig is om aan te schaffen: Merle, Pierre (1994?). La compétence en question. école, insertion, travail. Presses universitaires de Rennes, 209 p., 94 F.

Lenders, J. (1988). De burger en de volksschool. Culturele en mentale achtergronden van een onderwijshervorming. Nederland 1780-1850. Nijmegen: SUN. isbn 9061682886

Hoofdstuk 6: Straffen en belonen in de nieuwe school. 168-210. Het interessante in de publicaties in die periode is het benadrukken van de nadelen en beperkingen van het vergelijken van leerlingen. Tal van prachtige passages zijn door Lenders geciteerd.

Michaels, J.W. (1977). Classroom reward structures and academic performance. RER, 47, 87-98. preview

Naerssen, R.F. van (1972). Het schalen van testscores. NTvdPs, 27, 471-485. (procedure om middelen van cijfers e.d. wat beter te kunnen rechtvaardigen hoeveel punten behaald zijn: als hoger cijfer nodig is, dan minder punten, etc.

Stewart & White (1976). Teacher comments, letter grades, and student perormance: what do we really know? JEP, 68, 488-500.

Stiggins, R.J., D.A. Frisbie, & Ph.A.Griswold (1989). Inside high school grading practices: building a research agenda. EdMeas, 8 #2, 5-14. researchgate.net hr>

Do teachers actually follow the grading practices recommended by reserachers and textbook authors? Where are the discrepancies? What are possible explanations for such discrepancies? What is an agenda for further research on actual classroom grading practices of teachers?

Daniel Starch (1913). Reliability and distribution of grades. Science, vol. 38, 630-636. Leiden: Museum Natuurlijke Historie. read online

( . . . ) as Starch found in an experiment conducted at the University of Wisconsin, the same teachers gave different marks when they regraded their own papers without knowledge of their former marks.
Kandel, p. 63-4:

Smallwood, M.L. (1935). An historical study of examinations and grading systems in early American universities. Harvard University Press. geleend, ik heb een kopie gemaakt als ik me goed herinner

Schoenfeldt, L.F., & Brush, D.H. (1975). Patterns of college grades across curricular areas: some implications for GPA as a criterion. AERJ, 12, 313-321.

Deutsch, M. (1979). Education and distributive justice: some reflections on grading sytsems. American Psychologist, 34, 379-401. 10.1037/0003-066X.34.5.391 pdf

Geisinger (1982, p. 1147):
"His views flow from a comparative approach to grading. He describes grades as artificially scarce rewards, which are allocated on the basis of ability, drive, and character to fill the societal purposes of motivating and socializing children. He argues that comparative grading probably developed to foster a belief in the competitive, meritocratic ideology needed to legitimize socioeconomic inequalities. In contrast, he discusses other value systems through which grades could be awarded and suggests that grades should serve society by helping students gradually make the transition from the family to the world of work."
393: Now, in the context of merit, there is a strange thing about the distribution of grades in most American school systems: There is an artificially created shortage of good grades to be distributed. High grades are typically limited by grading curves or norms which, in effect, restrict the total number of high grades to be distributed within a group of students.
394:
What function is served by the artificially created scarcity of high marks? On the face of it, such an artificial shortage flouts what we know about the cultivation of ability, drive, and character; namely, if these are manifested, recognizing and rewarding them well are apt to foster their development. Disappointing rewards, induced by an artificial scarcity, are likely to hamper the development of educatioal merit and the sense of one’s own value. A strange thing, this artificially induces scarcity of rewards: its effects are probably quite opposite to its ostensible purpose, discouraging rather than encouraging the growth of educational merit.
Deutsch gaat dan verder met een beschouwing over dit cijfersysteem als opvoedend tot de waarden van de meritocratische samenleving, dus als legitimering van de verschillen in die samenleving, aansluitend op een verwijzing naar Bowles and gintis (1976).

Edgeworth, F. V. (1888) The statistics of examinations. Journal of the Royal Statistical Society, 51, 599-635 JSTOR read online free, also 'The element of chance in competitive examinations', ibid, vol. LIII (1890), pp. 460-75 JSTOR read online free, 644-63 JSTOR read online free

Een voorloper van Posthumus’s wetmatigheid: noot 15 p. 606:
" . . . there is no dount that at many of our public examinations the fluctuations in the ‘scale’ are enormous. Mr. Eve, in his wise and witty lecture on marking, alludes to this average as the most independent variable which he ever encountered. It is said to be a frequent occurrence at some of our civil service examinations that a candidate, when examined for the second time, after a year’s hopeful study of a subject, obtains fewer marks in it than he had obtained at the first examination. I have heard of an examiner, at one of the older universities, marking on a scale lower by half than that employed by his colleagues. Mr. Eve mentions similar occurrences at school examinations. An experienced examiner, who is also one of the highest living authorities on statistics, Dr. Venn, writes to me: ‘I have frequently raised or depressed my own marks (or my colleagues’) by as much as 25 per cent. all through in order to bring them into general harmony.’

Crijns, J. H. J. (1969). Een school in cijfers. Een cijfermatige analyse van een Nederlandse H.B.S. Malmberg

p. 8:
Wie het gebruik van cijfers uitgevonden heeft, zegt Gunning (Gunning, Wz., J.H. (1901). Rapporten en cijfers. In Verzamelde paedagogische opstellen, tweede bundel, 2e druk, 1917. pag. 82), ‘weet ik niet, ik vermoed echter, dat zij het eerst gebruikt zijn bij groote examencommissiën’, en in een voetnoot lezen we: ‘Zeer waarschijnlijk stammen zij van . . . de Chinezen.!’
p. 8:
"Dohse (Dohse, W. (1963). Das Schulzeugnis. Sein Wesen und seine Problematik. Publ. nr. 10 in de reeks: Pädagogische Studiën. Weinheim, pag 44) duidt als oorsprong van het cijfer de ‘fünffache Stufung’ van de belastingplichtigen door de Censor in het oude Rome. ( . . . ) Ook maakt Dohse melding van het reeds in 1586 op Jezuïetenscholen heersende gebruik de beste prestaties te onderstrepen door ‘die schmachvolle Demütigung der ‘Besiegten’ im Anschluss an die Preisverteilung’ (op. cit., pag. 48). Het eer- en schaamtegevoel wordt hierbij duidelijk bespeeld. Dat geschiedt ook in het Gymnasium van de Benedictijnen in Etzingen waar een reglement van 1781 voorschrijft ‘für jede Klasse ein Buch der Ehre und ein Buch dr Schande sowie eine Ehenbank und eine (schwarz angestrichene, abseits stehende) Schand- und Strafbank’ aan te schaffen. Op grond van de aantekeningen in deze boeken moet ‘am Schluss des Semesters bei der Klassifikation der Schüler der Rang’ bepaald worden (op. cit., pag. 48). p. 9: "Dat de ontwikkeling van het cijferstelsel al vroeg begonnen is blijkt ook uit de al vroeg optredende ‘Klassenkataloge’. Dohse zegt daarvan: ‘Ein solcher Klassenkatalog mit Zensurenskala ist bereits in der ratio studiorum der Societas Jesu vom Jahre 1599 nachweisbar. In den ‘Regulae communes Professoribus classium inferiorum’ wird unter Ziffer 38 die Einrichtung eines alphabetischen Schülerkatalogs angeordnet, in dem die klassifikationen ª‘gradus’) ‘optimus’, ‘bonus’, ‘mediocris’, ‘dubius’, ‘retinendus’, ‘rejiciendus’ anzuenden sind. Diese ‘notae’ können auch durch Zahlen bezeichnet werden: 1, 2, 3, 4, 5, 6’ (op. cit., pag. 49)."
p. 9:
"In de vorige eeuw komt dan een ontwikkeling van de cijferschaal tot stand en worden in plaats van kwalificaties cijfers gebruikt. Een uitvoerige uiteenzetting hierover vindt men bijvoorbeeld bij Dohse ( , p. 49 e.v.) en Göller ( , pag. 49 e.v.). Posthumus (Posthumus, K. (z.j.). Levensgeheel en school. Bezinning v&0acute;ór vernieuwing van voortgzet onderwijs in Nederland en in Indonesië. , pag. 34) zegt ervan: ‘De school-‘cijfers’, oorspronkelijk omschreven als afkortingen voor oordelen (slecht, voldoende, goed, uitmuntend), voor welke afkortingen men dezelfde figuurtjes koos als voor de getallen der rekenkunde, worden behandeld als wáren zij deze getallen. Na de verdeling en scheiding van de werkelijkheid in ‘vakken’ wordt de houding van het kind tegenover elk dier ‘vakken’ in een cijfer vastgelegd. Door optellen en ‘middelen’ van die cijfers, als waren zij getallen, ontstaat een uitkomst, die maatgevend wordt geacht voor het oordeel over de houding van het kind tegenover het geheel, en die bij de beslissing over zijn toekomst een grote rol speelt. De wijze waarop het cijferstelsel wordt gebruikt, is rechtlijnige uitkomst van het rationalistische, analytische, atomische denken in zijn drang tot vervanging van qualiteiten door quantiteiten, en staat en valt daarmede.’"
(cursief bij Crijns, hij geeft niet expliciet aan dat het ook bij Posthumus cursief is).
Crijns laat dan het cijfer zijn zegetocht in de vorige eeuw beginnen, en wel met Bartels (p. 112) over het KB van 10 maart 1870 (reglement eindexamens HBS).
p. 9:
Reinsma (Reinsma, R. (z.j.) Scholen en schoolmeesters onder Willem I en II. , pag. 236) citeert een gedeelte van het rapport van de hoofdinspecteur Wijnbeek over de Latijnse school te Den Bosch, die deze laatste in de periode 1832-1849 bezocht. Hierin lezen wij de aanbeveling van Wijnbeek ‘op bepalde tijden, b.v. om de drie maanden, aan de ouders of voogden der leerlingen eene nota van ieders goede en kwade aantekenignen te doen toekomen, teneinde dezen te meer belang in het gedrag en de vorderingen dr hunnen stellen.’"
p. 10:
De huidige beteenis van de in Nedrland gebruikte cijfers is - bijvoorbeeld volgens het Koninklijk Besluit van 17 mei 1962 (S. 188, art. 16) -: 1 = zeer slecht; 2 = slecht; 3 = gering; 4 = onvoldoende; 5 = bijna voldoende; 6 = voldoende; 7 = ruim voldoende; 8 = goed; 9 = zeer goed; 10 = uitmuntend."
Interessant is dat Frankrijk heel lang hetzelfde tientallig stelsel had dat Nederland tot 1930 kende.
p. 11:
In Frankrijk is men nog lang aan het tientallig stelsel verknocht geweest. Het enige verschil met het onze was dat in Frankrijk het cijfer 5 nog de betekenis ‘passable’ bezit, aldus Gielen (Gielen, J. J. (1965). Het sociale in opvoeding en opvoedkunde. 2e druk. ‘s-Hertogenbosch: Nijmeegse bijdragen tot de opvoedkunde en haar grensgebieden. Bij overgang of eindexamens geldt een kandidaat als geslaagd als hij minstens 50% van het maximum bereikt. Volgens Dohse ( , pag. 61) wordt thans een puntensysteem van 0-20 gebruikt."
p. 13:
Tenslotte wijst Langeveld (Langeveld, M. J. 1950, Inleiding tot de studie der paedagogische psychologie van de middelbare-schoolleeftijd, 4e druk, p. 61) nog op een andere duidelijke moeilijkheid in de praktijk: ‘Hoeveel leraren geven hun cijfers inderdaad alleen voor het werk? Zij geven 1, wanneer ze een kind op spieken betrappen of wanneer het z’n huiswerk verzuimde in te leveren. Ze geven ‘waarschuwingscijfers’. Ze belonen en straffen met cijfers, schrikken er mee af en sporen er mee aan. Kortom: ze wekken er door henzelf onderschatte spanningen mee op, die m.n. in de puberteit hoogst ongewenst zijn.’
p. 12:
Tegelijk met de kwalificatie of het cijfer is ook het rapport ontstaan. Oorspronkelijk (in de 16e eeuw) is het rapport als ‘Benefizienzeugnis’ of ‘Stipendiatenzeugnis’ — cfs. Dohse (p. 11/12) — een uitzondering, later ontwikkelt het zich tot ‘Reifezeugnis’ om ‘die armen. auf Freitische und Stipendien angewiesene Schüler’ aan een strenge controle te onderwerpen bij hun overgang naar de universiteit (op. cit., pag. 14)

Amsterdam. Het Gymnasium te Amsterdam. Verslag cursus 1853-1854. (Bezit van bibliotheek POW: IJsbaanpad). abstract

Hierin een interessant Uittreksel uit het Huishoudelijk Reglement voor het Stedelijk Gymnasium te Amsterdam. Over het toekennen van strafpunten en van prijzen, over kennelijk bij gelijk eindigen loten tussen prijskandidaten, en over (art. 25) de vorm van het rapport (niet aan de leerling, maar van de leraren aan de Rector).
art. 8 De leerling, die na het sluiten der poort binnenkomt, zal met eene nota negligentiae worden gestraft. Deze straf kan bij gedurige herhaling door den Rector worden verzwaard.
art. 9. Het gedrag en de vlijt der leerlingen zullen naar notae en fouten worden beoordeeld, waarvan afzonderlijk aanteekening zal gehouden worden. De notae zijn tweederlei: Eene nota negligentiae, gelijkstaande met ééne fout; eene nota malitiae, gelijkstaande met tien fouten.
art. 10. Tot aanmoediging der leerlingen zullen er twee soorten van prijzen worden gegeven, als eene voor goed gedrag en eene voor gemaakte vorderingen.
art. 11. Aan elken leerling, die bij geen zijner onderwijzers met een nota staat aangetekend, zal een prijs van goed gedrag gegeven worden, zelfs boven en behalven dien, welken hij voor gemaakte vorderingen mocht ontvangen. Ook zullen er getuigschriften voor goed gedrag aan de naast bij hem komenden worden toegewezen.
art. 12. Op elke klasse zullen in ieder vak van studie een prijs of een of meer getuigschriften, doch op de vijf bovenste klassen der eerste Afdeeling in het Latijn twee prijzen voor gemaakte vorderingen worden gegeven.
art. 13. In bijzondere gevallen kunnen buitengewone prijzen en getuigschriften worden gegeven. Zoodanig getuigschrift wordt altijd verleend aan hem, die bij loting den prijs heeft verloren.
art. 14. Niemand zal meer dan één prijs voor gemaakte vorderingen ontvangen, doch zal er op het ingevoegde getuigschrift worden vermeld, in welke vakken hij dien heeft verdiend. twee artikelen over het opzeggen van een gratiarum actio, of het voordragen van een redevoering.
art. 17. De leeraren zullen in de klasse, waar zij zich bevinden, de orde handhaven en den leerling, die zich onbehoorlijk gedraagt, naar bevind van zaken straffen.
Art. 18. De straffen bestaan naar gelang van zaken in: a. Strafwerk; b. Notae negligentiae; c. Notae malitiae; d. Tijdelijke wegzending; e. Geheele ontzegging van het onderwijs.
Art. 19. Het strafwerk zal, zoo veel mogelijk, eene nuttige strekking hebben en niet in te groote hoeveelheid worden opgegeven. Tot een meer algemeen overzigt daaromtrent zal de Rector daarvan mededeeling ontvangen.
Art. 20. Wanneer de leerling verzuimt het strafwerk op den bepaalden tijd te leveren, zal hij voor de eerste en de tweede maal met eene nota negligentiae worden gestraft en bij verdere nalatigheid aan den Rector worden bekend gemaakt.
Art. 21. Eene nota negligentiae wordt gegeven wegens verzuim, slordigheid, stoornis der orde, en wat dies meer zij. De leerling, die verzuimd heeft eene les te leeren of eenig werk te verrigten, zal met twee, en wanneer de les slecht opgezegd of het werk slordig behandeld is, met eene nota negligentiae worden gestraft. Het werk van een ander afgeschreven, wordt als niet gemaakt beschouwd.
Art. 22. Eene nota malitiae wordt voor moedwillige afwezigheid, opzettelijk verzuim, ongehoorzaamheid of onbescheidenheid jegens een’ der leeraren gegeven en daarvan onmiddellijk aan den Rector mededeeling gedaan.
Art. 25. De onderscheidene leeraren zullen, vóór of op den laatsten dag van elk kwartaal, aan den Rector inzenden een’ nauwkeurigen staat van de Leerlingen hunner Klassen, met vermelding van de fouten, notae en verdere bijzonderheden, alsmede opgave van het gedane werk en proeven hunner vorderingen."

Covington, M. V. (1992). Making the grade. A self-worth perspective on motivation and school reform. Cambridge University Press. ISBN 052134803X

Expectancy x Value theory is een parallel met expected utility theory, zou je toch zeggen.
p. 42.
Another limitation of Atkinson’s theory concerns a preoccupation with achievement as a competitive phenomenon. Although competition can, and at times does, mean striving to overcome one’s own inexperience and ignorance, more often than not the experimental pardigms [sic] employed by researchers in the need achievement tradition have pitted one subject against another as the preferred way to arouse achievement striving. Also, as Nicholls (1989) observes, attempts to define task difficulty in this literature reflect a confusion as to what kinds of excellence are worth pursuing. Some researcher (e.g., Hamilton, 1974; Moulton, 1965) measures task difficuolty against a yardstick of personal probability of success, that is ‘hard or easy for me,’ a self-reference that, according to Nicholls, implies the opportunity for an individual to exercise and extend his or her competency, irrespective of how well or poorly others are doing. However, other researchers (e.g., Meyer, Folkes, & Weiner, 1976; Trope & Brickman, 1975) have employed a normative definition of task difficulty - ‘hard for me compared to others’ - that is an open invitation to become preoccupied with one’s ability status. Here students are faced with the prospect that their best performance may still leave them feeling dissatisfied and incompetent. Although it is true that much achievement, especially in our society, involves winners and losers, not all acoomplishments are driven by the competitive spirit of Hermes. Even McClelland occasionally questioned the universality of his own competitive metaphor. Further doubts were eventually raised by researchers who more recently have distinguished between task-involved and ego-involved motivation (Deci, 1975; Deci & Ryan, 1980; Nicholls, 1989). The grasping, self-absorbed image of Hermes as the archetypal achiever seems quite consistent with the concept of ego-involvement with its emphasis on immediate payoffs for work done and achievement for personal gain. For this mentality, learning becomes a way to enhance one’s status, often at the expense of others, rather than a way to satisfy curiosity or to create personal meaning. These latter intrinsic goals are largely missing from the need achievement tradition. The larger point is that much of what can go wrong with achievement — as reflected by atypical shifts, irrational goal setting, and overweening anxiety — is the product of ego-involvement brought on by normative comparisons. This is why those who doubt their ability but still hope to salvage a reputation for competency, like John, prefer difficult tasks because most others will fail at them, too. This is also why those who are convinced of their incompetency, like Ralph, are likely to content themselves with the easy, low-risk assignment. Obviously, neither of these behaviors is desirable from the standpoint of making the most of one’s talent; fortunately, as we shall see, neither of them is inevitable.

CBS (1978). Eindexamencijfers vwo 1977. Mededelingen no. 7681, juli 1978.

Tabel 1. Geslaagden voor een dagschooldiploma vwo 1977 naar gemiddeld eindexamencijfer, geslacht en leeftijd


groep           n     ≥ 8      ≥ 7,5   ≥ 7     ≥ 6,5   ≥ 6    < 6		

totaal         1046     6%      10      22      29      32      1		
mannen          573     7       11      23      31      27      1		
vrouwen         473     4        8      21      27      38      1		
≤ 18 jaar       480     9       13      24      26      27      0		
≥ 19 jaar       566     3        7      20      33      36      2

Tabel 2. Geslaagden voor een dagschooldiploma vwo 1977 naar gemiddeld eindexamencijfer van schoolonderzoek (SO) en schriftelijk examen (S)

	
groep   n     ≥ 9     8,5  8    7,75 7,5  7,25  7   6,75  6,5  6,25  6  < 6		

SO      1046    0%    1    5    3    8    9    17   13   20   12     9    3		
 S      1046    0     2    3    3    6    6    11   10   18   12    15   15

Tabel 3. Geslaagden voor een dagschooldiploma vwo 1977 naar gemiddeld eindexamencijfer van schoolonderzoek (SO) en schriftelijk examen (S)


groep   n     ≥ 8      ≥ 7,5   ≥ 7     ≥ 6,5   ≥ 6    < 6
		
SO      1046    7%      11      26      33      21      3		
mannen   573    8       11      26      33      23      3		
vrouwen  473    5       11      26      33      23      3		
S       1046    5        9      17      28      27      15		
mannen   573    6       10      19      30      25      9		
vrouwen  473    3        7      15      25      29      23

Tabel 6. Mannelijke geslaagden vhmo (voorbereidend hoger en middelbaar onderwijs) 1954-1955 resp. vwo 1977 naar gemiddeld eindexamencijfer

		
groep           n     ≥ 8    ≥ 7,5     ≥ 7     ≥ 6,5   < 6,5

vhmo 1956     12055     5%      9       20      31      34		
vwo 1977        573     7       11      23      31      28

Fortgens (1958). Schola Latina.

Voor de 17e, 18e en 19e eeuw geeft Fortgens (1958) veel informatie over overgangsexamens, examenthemata en promoties op de Latijnse scholen. De overgangsexamens vinden veelal halfjaarlijks plaats, en zijn in de 17e eeuw inderdaad examens die ten overstaan van de rector en het schoolbestuur worden afgelegd. De promotie van degenen die overgaan naar een volgende klas is vaak een zeer plechtige gebeurtenis voor de stad. In de 18e eeuw worden de examens meer een formaliteit, omdat de overgang wordt beslist op basis van de door de leerlingen gemaakte fouten en van hun gedrag.

Er is een puntenstelsel van 'goede' en 'slechte' punten; voor de overgang moest de leerling een zeker overwicht van goed op slechte punten hebben. Fortgens over hoe dit in de 19e eeuw toegaat (p. 179):
"Van al deze notae wordt door de docenten nauwkeurig boek gehouden. De notae geven bij de bevordering de doorslag, daar bevordering alleen mogelijk is bij een zeker overwicht van notae bonae boven notae malae. De beoordeling van de leerlingen is nu zeer vergemakkelijkt, daar alle prestaties, themafouten en gedragingen op één noemer zijn gebracht."
Hoewel Fortgens geen inzage geeft in verhouding van bevorderden en niet-bevorderden, mag uit de plechtigheden en uit de regelgeving worden afgeleid dat bevorderd worden niet voor iedereen was weggelegd. ( Zie ook Coebergh van den Braak (1988, p. 81).
p. 115-116.
Het prijzen- en themasysteem met zijn overmatige prikkeling van de eerzucht was pedagogisch niet zonder bedenkingen. Merkwaardige staaltjes zijn in de literatuur te vinden. D. J. van Lennep (1774-1853), die in 1785 in de onderste afdeling van de Latijnse school te Amsterdam plaats nam, vertelt in zijn autobiografie hoe moeilijk hij het soms had.
Als David Jacob van Lennep zelf een zoon op de Latijnse school heeft, herhaalt zich het spel. Treffend tekent Jacob de angst die op het gelaat van zijn vader af te lezen was, als hij naar de uitslag van de thema vroeg, de weifelende toon van zijn stem, de glans van genoegen of de sombere blik die zich vertoonde naar gelang van de uitslag. Ook Jacob leed verschrikkelijk, als hij naar huis ging met de gedachte, dat hij een nederlaag te vermelden had - ook hij loog zijn vader voor . . . .

P. J. van Herwerden (1947). Gedenkboek van het Stedelijk Gymnasium te Groningen. Wolters.

p. 14:
Het bestuur van de Latijnse school te Groningen lag in 1845 in handen van drie, door de gemeenteraad voor het leven benoemde, curatoren ( . . ). Ze zagen toe op de overgangsexamens en hadden daarbij de definitieve beslissing; hetzelfde was het geval met het eindexamen, zij bepaalden dus tenslotte welke leerlingen der school tot de universiteit zouden worden gepromoveerd.
(van Herwerden, 1947, p. 17).
De gehele opleiding duurde vier jaren, doch elke klasse had twee ‘orden’ en ieder half jaar vond er promotie plaats, waarbij aan de beste leerlingen fraaie, speciaal in leer gebonden, boekwerken, versierd met het wapen der stad, werden uitgereikt, hoe fraai die waren kan men afleiden uit de kosten er van, deze waren omstreeks 1840 fl. 700 per jaar.
Over percentages zittenblijvers is bij Herwerden weinig te vinden. Bijv. op p. 43, over de openbare promoties:
Aanvankelijk werden alle leerlingen naar het Concerthuis geroepen. Voor velen van hen was de redevoering van den rector, men was in die tijd nu eenmaal lang van stof, een penitentie, omdat ze immers niet wisten of ze geslaagd of gezakt waren of zich aan een herexamen zouden moeten onderwerpen. Ze zaten met bonzend hart te wachten totdat te midden van al die plechtigheid en deftigheid hun vonnis zou worden uitgesproken. In 1882 maakte men aan de openbare te pronkstelling van de gezakten een einde en werd aan hen een dag te voren hun lot meegedeeld.
en op p. 128:
De eindexamens hadden evenals voorheen een goed resultaat. Bij de overgang echter zakten er reeds sedert lange tijd veel meer leerlingen dan in de negentiger jaren van de vorige eeuw. Toen was het, zoals we weten, 15% van het geheel geweest. In de jaren 1927 tot 1936 beliep het gemiddeld bijna 24%; een ernstig te betreuren stijging. Waarschijnlijk is dit meer het gevolg van het dalen van het peil der leerlingen dan van het stellen van hogere eisen.
Van Herwerden geeft wel een lijst van alle toegelaten leerlingen van 1847 tot en met 1946, maar geen overgangs of eindexamenstatistieken.
p. 56:
Zojuist werden herexamens genoemd; deze, ze heetten destijds na-examens, waren in 1850 op voorstel van rector Schneither ingesteld en wel voor leerlingen, die met de grote vacantie voor een van de hoofdvakken onvoldoende kennis toonden en voorheen, in dergelijke gevallen, waren blijven zitten. Soms kreeg een leerling wel eens twee herexamens. Aanvankelijk waren na-examens in zeer beperkte mate opgelegd, later steeg hun aantal. Over de herexamens werd gewoonlijk door een commissie, bestaande uit curatoren, rector en die leraren, die ze hadden opgegeven, beslist, d.w.z. door een soortgelijke commissie als voor de toelatingsexamens. Steeds gaven de curatoren bij verschil van mening de doorslag. De notulen van de vergaderingen dezer commissies zijn sedert 1861 bewaard gebleven, ze zijn echter van weinig belang. De na-examens waren echte examens en hoewel verreweg de meeste opgeroepenen slaagden, zijn er toch ettelijke leerlingen in de loop der jaren afgewezen. Sommigen van deze laatsten verlieten teleurgesteld de school.
p. 93:
Tijdens het verdere verloop van de vergadering [de leraarsvergadering 19 october 1901] werd op voorstel van den rector unaniem besloten om aan de curatoren voor te stellen 1e het zogenaamde winterexamen, 2e de openbare promotie en 3e de prijzen bij de overgang af te schaffen. Hierbij dient te worden opgemerkt, dat het winterexamen eigenlijk al sedert de ophefing in 1847 van de halfjaarlijkse overgang een schijnvertoning was geweest, trouwens het zomerexamen was niet veel anders en werd enige jaren later ook afgeschaft.
In Hoofdstuk IV meer over de strijd even na 1900 over cijferstelsel en herexamens.
p. 41:
Iedere leerling hield de puntenlijst van zijn gehele klasse bij. De daarvoor gebruikte speciale negligentie- of ignorantieboekjes werden bij de custos gekocht. Zodra een leraar zeide, dat er een of meer fouten moesten worden aangetekend, klapten alle leerlingen hun vakken open, haalden de boekjes voor de dag en vulden in wat hun werd opgedragen, waarvan de leraar zelf natuurlijk ook aantekening hield. De schaduwzijde van dit systeem, dat zo licht verkeerde gevoelens en praktijken kon oproepen, is duidelijk. In de meeste klassen hadden de twee of meer rivalen naar het primusschap hun aanhang, met alle kwade gevolgen daarvan. Er waren leerlingen, die met een zeker leedvermaak, gepaard met zelfverheffing, fouten van anderen noteerden. Een ongewenste concurrentiegeest kon de onderlinge verhouding verstoren en niet alle jongens waren even goed tegen de innerlijke spanning, gewekt door deze foutenjagerij, bestand, anderen evenwel dreven de spot er mee, een enkele leraar eveneens."
In van Herwerden (1947, p. 54) is een afbeelding opgenomen van een schoolrapport uit 1893 waarin voor de onderscheiden vakken het aantal notae wordt vermeld, met een optelling van alle notae over alle vakken heen. Hier (in Groningen) is dus nog geen sprake van cijfers. p. 53:
"Uit de tijd, die we thans bespreken, d.w.z. uit 1865, dateert aan het Groningse gymnasium een lijst, die men de voorgangster van het schoolrapport zou kunnen noemen: Op de eerste April toch, en dat zou voortaan op die datum en op 1 December gebeuren, zond de rector voor het eerst aan alle ouders een lijst, waarop voor hun zoon het volgnummer stond aangetekend, dat hij voor elk vak in zijn klasse had. Niet lang hierna werd dit in zoverre gewijzigd, dat tweemaal ’sjaars aan de ouders behalve zijn volgnummer tevens zijn aantal fouten en notae en dat van nummer één en van nummer laatst van de klasse werden meegedeeld."
p. 40, zittenblijven:
"De lijsten, waarop dit alles werd aangetekend, waren niet alleen beslissend voor het toekennen van de prijzen maar hadden naast en meer dan het zomerexamen invloed op al of niet promotie van de leerlingen."
In 1901 kwam de nieuwe rector van Geer met het voorstel van stelsel te veranderen (p. 94 e.v.).
"In plaats van de totdien gebruikelijke lijsten met rangnummers moesten andere komen met cijfers ( . . ). Deze rapporten moesten enkele malen per jaar aan de ouders gezonden en door dezen getekend worden. ( . . ) Zo werd aan het Groningse gymnasium het notenstelsel, dat haast nergens in Nederland meer bestond en door nagenoeg alle leerlingen in die tijd werd veracht afgeschaft."
Van Herwerden geeft geen zakelijke argumenten die bij de overgang naar het huidige cijferstelsel mogelijk zouden kunnen zijn uitgewisseld. De overgang was meteen naar een tientallig stelsel, waarvan op blz. 99-100 enkele voorbeelden worden gegeven. Het zou mij niet verbazen wanneer bij nader onderzoek van dergelijke schooldiscussies zou blijken dat de weerstand tegen het systeem van rangordenen vooral weerstand tegen het ostentatief competitieve karakter ervan is: iedereen die opschuift, drukt tenminste een ander naar beneden. Het drama is dan natuurlijk dat het cijferstelsel alleen maar het ostentatieve van het competitieve stelsel afhaalt, en men dat destijds volstrekt niet moet hebben beseft, zoals het ook vandaag voor de meeste docenten toegedekt blijft hoe competitief, hoe relatief dat cijfergeven is. Zou het een leuk scriptie-onderzoek zijn een aantal van die discussies uit de archieven van deze en gene school op te diepen, en ze op deze steling te analyseren?

Ball, W. W. R. (1921). Cambridge notes, chiefly concerning Trinity College and the University. Cambridge: Heffer & Sons. integraal online

Dit is een sleutelboek, en dan gaat het om hoofdstuk XXI The mathematical tripos 259-311. NB: Dit examen kon men maar een keer afleggen, een feit dat in dit hoofdstuk nogal impliciet blijft, kennelijk omdat Ball het vanzelfsprekend vond. Uitgebreide passages zijn van belang, kijk daarvoor in de kopie. Juist omdat in deze geschiedenis zoveel kernpunten uit de geschiedenis van het beoordelen aan de orde zijn, heb ik er uitvoerig aandacht aan besteed. Die aandacht is ook daarom gerechtvaardigd dat dit examen als model heeft gediend voor het inrichten van examens op andere plaatsen in de samenleving, waar vaak oud-Cambridge studenten bij waren betrokken. Hoe zit het met Oxford: die volgde in de ontwikkeling van zijn examens wat er in Cambridge gebeurde, laat ik het zo maar even samenvatten; Ball gaat er in het geheel niet op in.

Een aantal punten: :264 in 1710, in verband met een verbouwing, werden alle kandidaten op dezelfde dag voor ondervraging bijeengeroepen, gebeurde die ondervraging in het Engels en in het openbaar, over filosofie en wiskunde. De ‘proctors’ hadden van oudsher het recht om kandidaten na een dispuut vragen te stellen, en daar berustte deze vorm op.
:266 Met deze nieuwe examenvorm, die ook na de opheffing van de ruimtenood bleef bestaan omdat zo examineren toch wel erg handig was, was het mogelijk om serieuze rangordes tussen kandidaten aan te brengen, die vanaf 1748 gedrukt werden. Over de periode daaraan voorafgaand is weinig bekend. Aan nauwkeurig rangordenen werd vanaf het begin veel aandacht besteed, ook voor overleg daarover tussen examinatoren werd veel tijd uitgetrokken. In het begin werden kandidaten afkomstig van hetzelfde college samen geëxamineerd, alles mondeling natuurlijk.
:268 In 1763 werden kandidaten gegroepeerd naar prestaties in voorafgaande disputaties, en vond het examen plaats met groepen kandidaten van ongeveer gelijke capaciteiten.
:270 Ook in dat jaar werd besloten dat de relatieve positie van de senior en second wrangler bepaald zou worden door het examen, niet door de disputen. (Ball, noch Rothblatt, gaat in op deze wijze van groeperen, en op de rol van het college-lidmaatschap in de competitie. Toch is dat een interessant thema, omdat met het rangordenen evident niet alleen de persoonlijke eer, maar ook die van het eigen college in het geding is, waarover Ball overigens wel een enkele opmerking maakt. In het zoeken naar de motor van de ontwikkeling in de richting van een sterk competitief examen speelt het bestaan van die colleges waarschijnlijk een belangrijke rol). Na 1770 werden de vragen nog wel gedicteerd, maar moesten de antwoorden worden uitgeschreven.
:272 (citeert uit The works of J. Jebb, London, 1787, vol. II, p. 290-297), en dan gaat het over het examen in 1772;
When the division under examination is one of the highest classes, problems are also proposed, with which the student retires to a distant part of the senate-house, and returns, with his solution upon paper, to the moderator, who, at his leisure, compares it with the solutions of other students, to whom the same problems have been proposed.
(Het gaat inderdaad om het vergelijken, want de uitslag van het examen is een rangorde. Hoe dat vergelijken precies gaat blijft onduidelijk, er is in ieder geval niet gewerkt met een soort cijfergeving per gemaakte opdracht, terwijl het evenmin waarschijnlijk is dat ranglijstjes per opdracht werden opgesteld, die dan over een reeks opdrachten heen gesommeerd werden tot een eindranglijst. Het geheel moet heel globaal en impressionistisch in zijn werk zijn gegaan, vandaar ook dat een halve dag, later zelfs een hele dag, werd uitgetrokken voor overleg over de juiste volgorde. bw)
:273 De competitie tussen colleges komt naar voren in de figuur van de college fathers die het privilege hebben kandidaten uit andere colleges grondig te examineren! (Jebb meldt dat).
:274
It may be added that it was now (ca 1772, b.w.) frankly recognized that the examination was competitive. Also that though it was open to any member of the senate to take part in it, yet the determination of the relative merit of the students was entirely in the hands of the moderators. Although the examination did not occupy more than three days it must have been a severe physical trial for anyone who was delicate. It was held in winter and in the senate-house: that building was then noted for its draughts, and was not warmed in any way; and, according to tradition, on one occasion the candidates on entering in the moring found the ink frozen in the pots on their desks.
(Dat doet denken aan de condities van de Chinese examens!, bw).
:283-4 (examens ca. 1802)
At eight o’clock on Thursday morning a forst list was publishes with all candidatens of about equal merits bracketed. Unil nine o’clock a candidate had the right to challenge anyone above him to an examination to see which was the better. At nine a second list came out, and a candidate’s right of challenge was then confined to the bracket immediately above his own. If he proved himself the equal of, or better than, the man so challenged, his name was transferred to the upper bracket. To challenge and then to fail to substantiate the claim to removal to a higher bracket was considered rather ridiculous. Revised lists were published at eleven, three, and five, according to the results of the examinaton during that day. At five the whole examination ended. The proctors, moderators, and examiners then retired to a room under the public library to prepare the list of honours, which was sometimes settled in a few hours, but sometimes not before tw or three the next morning. The name of the senior wrangler was generally announced at midnight, and the rest of the list the next morning.
:284
It is clear from the above account that the competition soetered by the examination had developed so much as to threaten to impair its usefulness as guiding the studies of the men. On the other hand, there can be no doubt that the carefully devised arrangements for obtaning an accurate order of merit stimulated the best men to throw all their energies into the work for the examination. It is easy to point out the double-edged result of a strict order of merit. The problem before the university was to retain its advantages while checking any abuses to which it might lead.
:289
The tendency to cultivate mechanical rapidity was a grave evil, and lasted long after Whewell’s time. According to rumour the highest honours in 1845 were obtained by assiduous practice in writing.
(Beperking in beschikbare tijd was dus een belangrijke factor, bw).
:297 (over cijfers)
Mr Earnshaw, the senior moderator in 1836, informed me that he believed that the tripos of that year was the earliest one in which all the papers were marked, and that in previous years the examiners had partly relied on their impression of the answers given.
(Mogelijk heeft de laatste ontwikkeling, gepaard aan een heilig geloof in de vergelijkbaarheid van de zo uitgedeelde cijfers, ertoe geleid dat het systeem van de a-priori groepering voor de examens in 1839 werd afgeschaft)
New regulations came into force in 1839. The examination now lasted for six days, and continued as before for five hours and a half each day: eight and a half hours were assigned to problems. Throughout the whole examination the same papers were set to all candidates, and no reference was made to any preliminary classes.
(Het cijferstelsel was kennelijk een puntenstelsel, gezien de volgende intrigerende opmerking van Ball)
:299
I may, in passing, mention a curious attempt which was made in 1853 and 1854 to assist candidates to estimate the relative difficulty of the questions asked. This was effected by giving the candidates, at the same time as the examination paper, a slip of paper on which the marks assigned for the book-work and rider for each question were printed. I mention the fact merely because these things are rapidly forgotten, and not because it is of any intrinsic value. I possess a complte set of slips which came to me from Todhunter.
:300 In 1873 werden de examenvragen verdeeld in vijf divisions en werd voor ieder daarvan aangegeven tot welke proportie van het totaal men daar punten mee kon verdienen.
The assignment of marks to groups of subjects was made under the mpression that the best candidates would concentrate their abilities on a selection of subjects from the various divisions. But it was found that, unless the questions were made extremely difficult, more marks could be obtained by reading superficially all the subjects in the five divisions than by attaining real proficiency in a few of the higher ones: while the wide range of subjects rendered it practically impossible to cover all the ground thoroughly in the time allowed.
Dit probleem leidde in later jaren tot tal van voorstellen, compromissen en veranderingen in de inhoud en de zwaartetoekenning van de examens. Uiteindelijk betekende dit de ondergang van het homogene examen (:302). In 1908 werd onder weer een nieuwe regeling het rangordenen afgeschaft (:303).
:303 Een uiteenzettng over de rol van repetitoren (private tutors, coaches). Die rol ging heel ver, in feite gaven zij het onderwijs dat door reguliere hoogleraren gegeven had moeten worden. Voor de studenten bracht dit dus hoge extra kosten met zich mee. Op p. 305 geeft Ball een beschrijving van hoe een bekende coach zijn onderwijs had ingericht. En voegt daaraan toe:
Under Hopkins and Routh there was no trace of what is called cramming.
:306
The scandal of the system consisted of the fact that a man was compelled to pay heavy fees to the University and his College for instruction, and yet found it advatageous at his own expense to go elsewhere to get it.
(Dat lijkt weer heel erg op de chinese feitelijke onderwijssituatie, en doet vermoeden dat het een begeleidend verschijnsel is van meritocratische beoordeling).
:307 een uitvoerige uiteenzetting over de oorsprong van de term tripos.

L. R. Aiken, Jr. (1963). The grading behavior of a college faculty. Educational and Psychological Measurement, 23, 319-322. abstract

Verwijst terug naar
- Helson, H. (1948). Adaptation level as a basis for a quantitative theory of frames of reference. Psychological Review, 55, 297-313. abstract
- Hollingworth, H. L. (1910). The central tendency in judgment. Journal of Philosophy, Psychology, and Scientific Methods, 7, 461-468. complete jaargang online
- Webb, S.C. (1959). Measured changes in college grading standards. College Board Review, 39, 27-30. [geen online bestand gevonden, 2013]
'It is the purpose of this paper to show that, whatever teachers may say, they usually grade with reference to the existing ability level of their students, i.e., intuitively or statistically, they 'curve' their grades. Although this may sometimes be the fairest procedure, the meaning and interpretation of such grades, when the ability level changes annually, may be a problem of some concern.'
'Webb (1959) has discussed the 'freezing' of grades at one college. Selection of students of greater ability was not followed by higher grades. Presently, the same situation exists at the Woman's College of the University of North Carolina, and probably elsewhere as well. In 1959, the Woman's College began selecting students on the basis of a multiple regression equation, which consisted of assigning numerical weights, based on the 1958 freshman class, to three predictor variables: Schalostic Aptitude Tests - Verbal (SAT-V), Scholastic Aptitude Tests - Mathematical (SAT-M), and a converted two-digit score of rank in high school graduating class (HSR). This equation, Predicted Grade (PG) = .037 SAT-V + 0.10 SAT-M + .328 HSR - 21.98, yielded an R of .70 and was used to predict freshman average grades. Due to the progressively decreasing selection ratio for the years 1959, 1960, and 1961, the mean scores on SAT-V, SAT-M, and HSR for students who were admittded progressively increased, but increaes in the means of the predictor variables were not accompanied by an invcrease in the criterion mean.'
```
year	   1959 (N=738)       1960 (N=894)	   1961 (N=953)			
variable mean    st dev    mean    st dev    mean    st dev			
SAT-V	  453.90    87.27	  480.71	83.05    492.66   77.16			
SAT-M	  453.94    77.92	  469.89	76.26    485.26	74.59			
HSR      60.61     7.09    61.37     6.54     62.60	 6.30				
PG       19.23     5.29	   20.63     5.39     21.63    5.32				
FAG      19.45     6.90	   19.49     6.90     19.68    6.63
R          .64               .63                .60
```
Noot: PG = Predicted Grade, FAG = Freshman Year Average Grade.

Davis, J. (1964). Great aspirations. Aldine.

Genoemd in Milton, O., Pollio, H.R., & Eison, J.A. (1986). Making sense of college grades. Why the grading system does not work and what can be done about it. Jossey Bass. [POW XIX 6-41], p. 17:
The National Opinion Research Center (Davis, 1964) surveyed 1,637 students from 135 colleges and universities; all of the students had been National Merit Scholarship holders, finalists, or semifinalists. It was found that 70 percent of that select group had received high grades in lower-quality colleges and universities while only 36 percent had received high grades in higher-quality ones. But many of these intellectually superior young people who had less than a B+ average from high ranking schools chose not to pursue graduate study; many underestimated their ability and potential on the basis of feedback provided by GPA’s. Thistlewait (1965) obtained similar results in a survey of two thousand students attending 140 different institutions, suggesting the damaging personal effects of the GPA on academically talented students.
D. Thistlehwaite: Effects of college upon student achievement. Cooperative research project #D-098, USOE. Nashville: Vanderbilt University). [UB Leiden SOCIOL R4-176].

W. C. Eells (1930). Reliability of reported grading of examinations. Journal of Educational Psychology, 21. abstract

Cox (1969, p. 71-72) geeft enkele resultaten uit deze publicatie.
In 1930 W.C. Eells had sixty-one teachers re-mark two history and two geography essays at an interval of eleven weeks. The average correlation between the two markings was 0.365. On individual essays it was 0.25, 0.31 and 0.39. Three years later G.P. Williams (1933: The Northhampton Study Composition Scale, London) reported that when almost a hundred teachers marked fifty mathematics essays, one of these received marks from 16 to 96 (out of 100) and another from 26 to 92.

Lienert, G.A. (1987). Schulnotenevaluation. Frankfurt a.M.: Athenäum. [UB Leiden? 3895 A 27].

Een buitengewoon teleurstellend boek: detaillistisch gefrut over schalen, correlaties, etc., zonder enige aandacht voor de rol die cijfergeven in het onderwijsproces en bij leerprocessen speelt.

Op p. 40 een kort overzichtje van cijferschalen in diverse landen, ik vermoed gebaseerd op Schulze, zie beneden.
Bepaald naieve denkbeelden bevat de paragraaf 'Desiderate der Zeugnisbenotung,' waar verschillen in 'belang' van vakken worden gerelateerd aan 'strengheid' van cijfergeven, a.h.w. legitimerend wat De Groot als kwalijke uitwas signaleert.
Op p. 108-110 en 116-117 een factor-analyse over correlaties tussen 7 vakken, voor 297 scholieren. Naar de 'Hauptachsenmethode' wordt één sterke algemene factor gevonden, bij varimaxrotatie vier factoren (resp. met hoge ladingen voor geschiedenis/aardrijkskunde; biologie/natuurkunde; muziek; Duits. Wiskunde laadt mee op de 1e en 4e genoemde factor.)
In de literatuuropgave o.a.:
- Becker, H., & v. Hentig, H. (Hrsg): Zensuren, Lüge, Notwendigkeit, Alternatieven. Frankfurt/M: Klett-Cotta, 1983.
- Beckmann, H-K. (Hrsg.): Leistung in der Schule. Braunschweig: Westermann, 1978.
- Friedrich, L., & Köhler, K. (Hrsg.): Zeugnisnoten und Numerus Clausus. Kronberg: Scriptos, 1975.
- Groot, A.D. de: Fünfen und Sechsen. Zensurengebung, System oder Zufall. (aus dem Niederländischen von A. Piechorowski.). Einheim: Beltz, 1971.
- Heller, K. (Hrsg.): Leistungsdiagnostik in der Schule. Bern: Huber, 1984.
- Höhn, Elfriede: Der schlechte Schüler. Sozialpsychologische Untersuchungen über das Bild des Schulversagers. München: Reihe Erziehung in Wiss. und Praxis, Band 2, 1970.
- Hofer, M. (Hrsg.): Informationsverarbeitung und Entscheidungsverhalten von Lehrern. Beiträge zu einer Handlungstheorie des Unterrichhtens. München: Urban & Schwarzenberg, 1981.
- Rheinberg, F. : Leistungsbewertung und Lernmotivation. Göttingen: Hogrefe, 1980.
- Roeder, P.M., & Treumann, K. : Dimensionen der Schulleistung. Stuttgart: Klett, 1974, Bde 1-2. Sacher, W.: Praxis der Notengebung. Bad Heilbrunn/Obb: Klinkhardt, 1984.
- Tscherner, K.: Schulleistungsentwicklungen in Abhängigkeit von Beurteilungsprozessn. Meisenheim/Glan: Hain, 1979.
- Wendeler, J.: Schulsystem, Schülerleistungen und Schülerauslese. Weinheim: Beltz, 1974.
- Ingenkamp, K. (Hrsg.): Die Fragwürdigkeit der Zensurengebung. Weinheim: Beltz, 1977.
- Kutscher, J. (Hrsg.) Beurteilen oder verurteilen. München: Urban und Schwarzenberg, 1977.
- Schulze, W. (Hrsg.): Paedagogica Europaea. The European Yearbook of Educational Research, Vols. 1-10. Braunschweig: Westermann 1967-1975. mit Bd. 8 (Schülerberatung und Schülerbeurteilung im europäischen Erziehungssystem).

Bartels, A. (1963). Een eeuw middelbaar onderwijs 1863-1963. Wolters.

p. 98:
Eindelijk boekten de inspecteurs succes: na een jaar onderhandelen verscheen een kon. Besluit van 13 augustus 1873 (S. 121), waarbij de art. 21 en 22 in het Reglement voor de Rjks hogereburger- en landbouwscholen in de door hen gewenste zin gewijzigd werden. In het vervolg zou niemand tot enige klasse der school worden toegelaten, ‘dan na het afleggen van een openbaar examen, afgenomen door den directeur en de leeraren, waaruit blijke, dat hij de kundigheden bezit, vereischt om het onderwijs in die klasse met vrucht te kunnen bijwonen. Hetzelfde geldt ten aanzien van de bevordering naar een hogere klasse’. Ook daarvoor was een openbaar examen verplichtend gesteld. De tot nu toe oudste vermelding van een cijferschaal die ik heb kunnen vinden, en dat is meteen de cijferschaal van 1 tot 10, is in het 'Voorschrift betreffende het eindexamen der Hoogere Burgerscholen' van minister Heemskerk, mei 1868, later het algemeen reglement voor de eindexamens der hogereburgerscholen (K.B. van 10 maart 1870, S. 49; het reglement is aangeduid in de art. 55 en 57 van de wet op het middelbaar onderwijs). Het reglement regelde een zekere uniformiteit in de examenvragen en de beoordeling, nadat men ervaren had dat in de onderscheiden provincies bepaald niet op dezelfde wijze werd geëxamineerd. (Zie Bartels, p. 110 e.v.).
p 112:
Voor de beslissing omtrent de toelating der kandidaten werden de vakken waarover het examen zich uitstrekte, verdeeld in vijf afdelingen: A. wis- en werktuigkunde; B. natuurkundige wetenschappen; C. geschiedkundige, staats- en handelswetenschappen; D. taal- en letterkunde; E. hand- en rechtlijnig tekenen. Het eindoordeel over de kennis der kandidaten in elke van deze afdelingen werd uitgedrukt door een der cijfers van 1 tot 10, waarbij 5 even voldoende was. Was aan een kandidaat voor elke der vijf afdelingen het cijfer 5 of hoger toegekend, dan werd hem het getuigschrift wegens voldoend afgelegd examen uitgereikt. Had hij voor één of meer dier afdelingen het cijfer 4 of lager verkregen, dan werd door de commissie over zijn toelating beraadslaagd, en zo nodig, bij meerderheid van stemmen beslist.
(Bartels geeft in extenso de inleiding van het Algemeen Reglement van 1970 weer, met het programma van eisen voor het eindexamen, waarin zeer nadrukkelijk sprake is van toetsen op inzicht, niet van toetsen op eventuele leemten in kennis van zaken van ondergeschikt belang.)
p. 126: Bij Kon. Besluit van 8 juni 1929 (S. 310) werd wederom een nieuw reglement en programma voor de eindexamens der hogereburgerscholen ingevoerd. ( . . . ) Het cijfer 5, dat sinds de aanvang van het eindexamen der hogereburgerschool 'even voldoende' had aangeduid, kreeg de betekenis van 'bijna voldoende'. Een stroom van artikelen werd in de vakpers en ook daarbuiten aan dit feit gewijd, zelfs in de Tweede Kamer werd het ter sprake gebracht, zodat het waarlijk leek, alsof de zegsman van Minister Waszink gelijk had, die van mening was, dat deze wijziging de belangrijkste was in de wetgeving op het middelbaar onderwijs sinds 1863. ( . . )

Casimir, R. (1934). Het Nederlandsch Lyceum 1909-1934. Wolters.

p. 130:
( . . ) Elk jaar wordt nagegaan, welke de resultaten zijn geweest, niet alleen naar het aantal geslaagde leerlingen, maar ook naar de cijfers. Men vindt er leerlingen, die zeer mooie eindexamens afleggen. Voor de twaalf vakken van het eindexamen HBS behaalt in 1924 een leerling 109 punten, een ander jaar is het hoogste getal 99, weer een ander 103. Voor het eindexamen gymnasium slagen in 1925 twee leerlingen met uitsluitend vieren en vijven (in de schaal, waarbij 5 het hoogste is), een ervan had de vijfde klas gymnasium overgeslagen.

Cools, J. (1984). Geschiedenis van het College te Herentals. Herentals: Oud-leerlingenbond van het Sint-Jozefscollege.

In België werd in de negentiende en tot ver in de twintigste eeuw een puntensysteem gehanteerd: van een totaal aantal te behalen punten van bijv. 380 moest een leerling er tenminste de helft behalen om te kunnen overgaan ('descendre'). Zie Cools (1984) voor enkele details. (pqr. 3 De prijskampen, blz. 288; par 4 De staatsprijskamp blz. 290; par 5 Succes en mislukking blz. 353) Helaas is voor het beschreven College te Herentals heel weinig historisch materiaal overgebleven. Het puntenstelsel werkte voor de toekenning van prijzen met criteria die werden uitgedrukt als 6/10e, 7/10e en 8/10e deel van de punten te hebben behaald.
Over de criteria betreffende de overgang naar een hogere klas is niet veel bekend. Men diende de helft van de punten in het totaal te hebben . . .
Cools, p.351.
Het zou dus kunnen zijn dat de cijferschaal van 1 tot 10 hieraan is ontleend, namelijk dat deel van de beschikbare punten te hebben behaald (dat is mijn veronderstelling, bij Cools is daar niets over te vinden). Er is evenmin een aanwijzing over de geschiedenis van dit puntenstelsel, maar hier is het denkbaar dat het puntenstelsel is gegroeid uit het systeem van notae.

Jordens, P.H. (1906). Wet van den 17 augustus 1878, S. 127, tot regeling van het lager onderwijs. Tjeenk Willink.

In het lager onderwijs, en het kweekonderwijs, werd eveneens bij KB (24 oktober 1884, S. 219), de cijferbeoordeling vastgelegd voor het examen ter verkrijging van een akte van bekwaamheid voor huis- en schoolonderwijs. (Jordens, 1906, p. 93):
Het oordeel over de keuze en de bekwaamheid der geëxamineerden in elk vak wordt uitgedrukt door een der cijfers van 1 tot 6, aan welke de volgende betekenis is te hechten: 6 zeer goed, 5 goed, 4 voldoende, 3 onvoldoende, 2 slecht, 1 zeer slecht. Omtrent het toekennen der in het vorige artikel vermelde cijfers en den uitslag van het examen wordt bij meerderheid der uitgebrachte stemmen beslist. Staking van de stemmen wordt geacht eene beslissing te zijn in den voor de geëxamineerden minst gunstigen zin.
In een KB van 17 december 1890, over het examen ter verkrijging van de akte van bekwaamheid als hoofdonderwijzer en hoofdonderwijzeres, (zie Jordens, 1906, p. 108), is sprake van de cijferschaal van 1 tot 10, met 5 als 'twijfelachtig'. Ook hier wordt bij meerderheid van stemmen beslist, dus kennelijk niet door onafhankelijk gegeven cijfers te middelen. Andere vindplaatsen in Jordens (1906): p. 114, 117, 128, 136 (schaal 1 - 10).

Roelants aan Van Gobbelschroy, 21 aug. 1827. In Nooij, J. de (1939). Eenheid en vrijheid in het nationale onderwijs onder koning Willem I. Utrecht, proefschrift.

p. 176-179.
U Hoog Edelgestrenge zie hier mede een opgave der aanteekeningen door de Hoogleraren van het Gesticht gehouden, ten aanzien der vorderingen door de Kweekelingen in dit academiejaar: deze aanteekeningen, evenals in het vorig jaar allen ziende op den uitslag der gehouden examina; terwijl hier wederom dezelfde gemaakte onderscheidingen door de Hoogleeraren in acht zijn genomen: uitstekend 22; zeer loffelijk 43; zeer goed 66; goed 55; middelmatig; 13; zwak 8; afwezig tijdens de examina 20
Leuk, dat middelmatig zo sterk juist niet 'middelmatig' is! Opvallend, hoe hoog het gemiddelde judicium voor deze examinandi uitvalt! Kan ik in de diverse gedenkboeken meer informatie boven water krijgen?

G. Codina Mir, G. (1968). Aux sources de la pédagogie des Jésuites; le ‘Modus Parisiensis.’ Roma. https://archive.org/details/bhsi28

p. 168:
L’un des traits les plus charactéristiques des écoles des hiéronymites est la distribution des élèves en décuries. Submergés pas le grand nombre d'élèves qui se pressent dans chaque classe, les Frères, en effet, subdivisent leurs classes en groupes de huit ou dix élèves (decuriae), à la tête desquels ils placent un décurion (decurio, monitor). La mission du décurion, telle que Sturm la connut à Liège, consiste à surveiller la conduite de ses camarades, à prendre notes des éventuelles fautes et à en référer au Recteur. Le décurion est changé toutes les semaines. S’il est négligent dans l’accomplissement de sa charge, il est relevé de son poste. C’est Jean Cele qui aurait mis au point à Zwolle le système des décuries, et c’est de là qu’il serait passé aux autres écoles. Quelquefois, il est question non pas de ‘décuries’ mais d’ ‘octuries’, comme à Deventer. Ainsi nous savons qu’Érasme se trouva dans la même octurie que Timann Kemener, le futur Recteur de l’école de Münster.
p.160:
La grande innovation de Cele à Zwolle devait être ( . . . ) l’adoption du système de division des élèves en plusieurs classes, chacune avec son ma&ici;rctre ou responsable respectif à la tête.
Codina Mir verwijst o.a. naar Schoengen (1898, p. 107 [zie transcriptie op mijn website]; Hyma (1950 p. 93); Post, p. 95, 96).
(p. 161):
( . . . ) On comptait par centaines le nombre d’élèves attirés à Zwolle sous Cele, arrivés de Cologne, de Trèves, de Liège, de Westphalie, de Hollande, de Saxe, de Clèves, de Geldern, de Frise. Afin d’enseigner et de gouverner toute cette fole, Cele conçut la solution qui devait constituer une des principales trouvailles pédagogiques de toute l’histoire de l'éducation: on distribuerait les élèves en huit classes distinctes, chacune avec son programma spécial, son maître ou responsable à la tête, et son local ou lieu à elle à l'intérieur de l'école. (‘Et quia octo locis separatis scolam suam distinxerat, singulis locis speciales lectiones distribuens . . . ’, J. Busch, Chron. Windesh., 206) En raison du lieu qu’elles occupaient, ces classes ont été connues au début sous le nom de loci, et leurs élèves comme les locistae. Plus tard nous retrouvons aussi le nom d’ordo.
p. 171:
Pour revenir à Zwolle et à Jean Cele, nous pouvons affirmer que, si celui-ci n’inventa pas le système des décuries, il eut au moins l’originalité d’avoir sécularisé cette coutume monastique et de l’avoir étendue et appliquée à l’enseignement des lettres. Ce fut le grand nombre d'élèves et le désir de rendre plus facile le travail des maîtres qui durent l’amener à l’adoption du système. Les conséquences pédagogiques n’en seront pas moindres. C’est notamment ce procédé qui entraînera des éléments aussi charactéristiques de la pédagogie des Frères que le partage des responsabilités, le travail en équipe, l’enseignement mutuel entre les élèves, et l’appel à l'émulation.
p. 172:
Les examens de passage au niveau des études littéraires, en vue de la promotion à une classe supérieure, constituent une autre nouveauté des écoles des hiéronymites. Il n’est pas étonnant qu’on ne trouve pas de traces de cette pratique avant les Frères, puisqu’elle n’a de sens qu’en fonction de l’existence de la division des classes, introduite elle-même également par les Frères. ( . . . ) Au temps de Hegius, on trouve déjà bien établi à Deventer le système des examens et des promotions. On ne monte à une classe plus haute qu’apres un examen. Deux séances d’examens ont lieu chaque année, en fin de semestre, c'est-à-dire une séance à Pâques et la seconde probablement à octobre. On ne signale pas le temps qu’il faut rester dans chaque classe: le passage à une classe plus haute se fait en quelque sorte sur mesure, d’après les capacités et le degré de connaissances de chaque élève. La formule des examens semestriels permet à un élève d’être promu dès qu’il est mûr, sans rester plus longtemps qu’il ne le faut dans une classe plus basse. La durée des études est par conséquent assez élastique, en fonction de chaque individu. Dans les classes inférieures on peut facilement passer à la classe suivante au bout de six mois, tandis que dans les classes supérieures il semble qu’on reste en moyenne plus longtemps. ( . . . ) Nous savons aussi que les examens semestriels marquaient le temps où les élèves passaient ou renouvelaient leur contrat avec leurs maîtres pour une période de six mois.
p. 173:
En certain cas exceptionnels il semble que l’on peut ‘brûler’ les étapes initiales. Dans les mémoires de Butzbach il est question de deux élèves prodiges de Deventer, Paul de Kitzingen et Pierre de Spira, qui après examen furent placés directement en 3e, ‘ce qui arrivait rarement’. L’un d’eux, dont il nous est dit qu'il était le premier de sa classe — détail fort intéressant qui prouve qu’il y avait un ordre de classe! — pouvait même passer en 2e au bout de six mois, lors de la session d’examens suivante. (‘Quorum etiam post dimidii anni sessionem alter primus in ordine cum ingenti laude . . . ad secundum migravit locum.’ J. Butzbach, Hodoeporicon, 251.)
p. 173:
Le système des examens suivi au temps de Sturm se présente comme bien plus perfectionné que celui de Deventer. Chaque année, le 1er octobre, a lieu la cérémonie solennelle des promotiones ou passage à une classe plus haute, une fois achevés les examens suivant lesquels l’ordre des élèves de chaque classe a été dressé. Les élèves qui ne seraient pas contents de la place qui leur est échue, peuvent à ce moment faire en quelque sorte appel, provoquant en duel les élèves mieux placés, sur le terrain d'un exercice de style, d’un thème donné, ou d'une improvisation qui leur permette de se mesurer les uns avec les autres. Dans le cas où cette contentio tourne au profit de l’élève moins bien placé, les deux camarades échangent leurs places, après confirmation du Recteur. De leur côté, les deux élèves mieux placés de chaque promotion reçoivent chacun un prix, le prix du premier étant, bien entendu, d’une plus grande valeur que celui du second, sauf si la contentio a mis en évidence des forces qui se valent. C’est ainsi que, si le premier reçoit les oevres complètes de Virgile, le second ne devra recevoir que les Géorgiques.!. L’usage d’accorder des prix aux meilleures élèves était aussi en vigueur à Liège au moment où Sturm y faisait ses études. Il est vraisemblable qu'il fut aussi pratiqué dans les autres écoles des hiéronymites, au moins dans celles de la même époque. A Liège, chaque mois les élèves apportent un peu d’argent pour acheter un livre ou une petite récompense pour celui qui, d'après l’avis du maître, s'est le plus distingué." Codina Mir verwijst naar M. Fournier, Statuts, IV, 20. Volledige bibliografie: Fournier, Marcel, & Charles (1890-1894). Les statuts et privilèges des universités françaises depuis leur fondation jusqu’à 1789. I-IV. Paris.)
p. 319:
Comme à Paris, les programmes de Messine insistent sur la nécessité de jeter de solides bases en grammaire et de s’y fonder ‘suffisamment’ avant d’avancer plus loin.
p. 320:
Au moment de leur entrée au Collège, tous les élèves doivent être examinées et placés dans la classe qui correspond à leur degré de connaissances. Le principe de base, très parisien, est que personne ne suive une matière qui soit au-dessus de ses forces (supra captum). Il est pourtant très frappant de remarquer à ce sujet un liberté consentie aux usages italiens. ( . . . ) Dans son De Universitate, Nadal précise que l'’déal serait que personne ne suivît des cours supra captum dans d’autres classes que celles qui lui conviennent. Cependant, dit-il, la Compagnie n’entend obliger à cela que ses propres étudiants. Quant aux autres, après un diligent examen de leur capacité, ils seront vivement engagés à ne pas perdre leur temps dans une classe qui dépasserait leur captus. S'ils s’y obstinent, on préviendra leurs parents ou tuteurs. Mais, chose curieuse, ils ne seront pas pour autant exclus des classes au cas òu ils passeraient outre à l’avis des Pères. Sur ce point encore la manière italienne l'emporta sur la parisienne, offrant au moins l’avantage d’une éventuelle auto-correction après coup! En dehors de l’examen d’entrée dont nous venons de faire mention, nulle autre mention n’est faite dans les programmes de Messine à d'autres examens en vue d'une promotion à une classe plus haute. Naturellement dans les facultés supérieures l’obtention des divers grades est conditionnée par différentes épreuves ou examens minutieusement établis, suivant la tradition commune à toutes les Universités. Mais nous ne trouvons pas de mention explicite d’examens de passage semestriels ou annuels au niveau de lettres, comme nous les avons vus, par exemple, à Strasbourg et à Lausanne. Ce qui ne prouve pas qu’il n'y ait pas eu de promotions. La seule distribution des élèves en classes, en effet, demandait déjà le passage d’une classe à une autre. Nous avons aussi entendu du Coudret dire qu'en 1550-1551 les élèves de 3e ne purent pas ‘monter’ en 4e, en raison du grec qui avait ajouté au programme. Or il semble évident que ces ‘ascensions’ ne pouvaient se pratiquer que sur l’autorisation d’un professeur, et très probablement à la suite d’un examen. C’est aussi ce que nous laisse supposer une des Règles promulgées par Nadal en Espagne en 1553, qui reflètent sans doute les usages de Messine: ‘Au début de l’année, lorsque les leçons sont renouvelées, les écoliers seront aussi examinés et seront transferés des classes inférieures aux supérieures. Cela sera fait par le Recteur suivant l’avis des régents, tenant compte spécial pour chaque classe de l’avis de son propre régent.’ Nadal ajoute: ’cette transformation des écoliers vers les classs hautes se fera aussi en cours d'année, quand cela pourra être utile aux élèves, en suivant toujours le même procédé.
(Codina Mir, 1968, p. 322)
Quant à la gamme d’exercices pratiqués dans chaque classe, ils ne peuvent pas être plus variés. Déjà le programme de 1548 annonçait ‘des répétitions, des interrogations, des concertations, des compositions, des déclamations, et d’autres exercices semblables qui conviennent à chacun suivant la manière et l’ordre qui est en usage à Paris’.
(Codina Mir, p. 322)

Lindquist, E. F. (1963). An evaluation of a technique for scaling high school grades to improve prediction of college success. Educational and Psychological Measurement, 23, 623-646. abstract

Lindquist is niet direct geinteresseerd in verschillen in normen voor cijfers, alleen in voorspelbaarheid van cijfers. Lindquist gebruikt correlatieve technieken, en dat is voor mij minder interessant. Maar Lindquist geeft wel tabellen met herschaalde cijfers voor scholen en colleges, en daaruit blijkt meteen dat er inderdaad belangrijke verschillen zijn. In zijn tabel 1 geeft hij voor een groep van 10 colleges aan dat een GPA van 2 herschaalde waarden oplevert tussen 2,34 en 2,94, idem voor high schools tussen 2,05 en 2,98. De instellingen in het onderzoek van Lindquist zijn 'gemiddelde' scholen en colleges, die zich niet onderscheiden door selectiviteit of eenzijdige samenstelling van de groepen studenten. Dat ligt anders bij eerder onderzoek van Bloom en Peters (1961), waarin juist selectieve high schools en selectieve colleges waren betrokken, en waar de gevonden verschillen in grading standards veel groter moeten zijn geweest (moet ik nog nagaan, omdat Lindquist over die verschillen niets zegt, wel over enorme verbeteringen in voorspellingen die Bloom en Peters verkrijgen na schalen van de cijfers).
Pretendeert cijfers van verschillende scholen vergelijkbaar te maken
Lindquist heeft geen oog voor de mogelijkheid (zekerheid) dat studenten zich strategisch gedragen, niet alleen binnen hun school, maar ook met het oog op het college dat ze willen gaan volgen.
What has just been said about the mass of American public comprehensive high schools applies almost equally to the great mass of American colleges and universities. By far the great majority of institutions-the great state universities, the municipal universities and junior colleges (which are rapidly constituting a larger and larger proportion of the total number), the overwhelming majority of private and denominational colleges-are definitely non-selective in character, and are not characterized by large inter-institutional differences in level of student ability. It should not be too surprising, therefore, that the attempt at preliminary scaling of college grades had no overall effect on the within-school correlations for the ACT population.
p. 640
Bloom, B. S. and Peters, F. R. (1961). The Use of Academic Prediction Scales for Counseling and Selecting College Entrants. Free Press of Glencoe. [niet in mijn bezit]

J. Roach, J. (1971). Public examinations in England 1850-1900. Cambridge University Press. [UB UvA? 1923 E 36]

Wat ik daaruit heb gehaald is het volgende. In Engeland hebben in de 19e eeuw examens een grote vlucht genomen, waar zij voor die tijd niet eens bestonden. Vooral als selectiemiddel voor posities in Oxford en Cambridge (de enige universiteiten op dat moment), en later ook voor overheidsposten. Een heel belangrijke rol heeft het Engelse klassensysteem bij een en ander gespeeld, op een wijze die volstrekt onvergelijkbaar is met andere Europese landen. Datzelfde klassenstelsel heeft een krachtig stempel gedrukt op de (uiterst trage) ontwikkeling van het middelbaar onderwijs. Interessant in de examen-gekte is de vaak vergaande ontkoppeling van examens en onderwijs, met alle wantoestanden (uit het hoofd leren) die daaruit volgden (het punt is dat controle over examens samen moet gaan met controle over het onderwijs dat tot die examens leidt. Een interessant punt). Er is in de 19e eeuw weinig inzicht in de beperkte waarde van cijfers en rangordeningen, terwijl daar tegelijk toch hele carrières afhankelijk zijn gemaakt van enkele puntjes verschil. De eerste vanuit enige distantie geschreven studie is van Latham, H. (1877). On the action of examinations considered as a means of selection. [integraal online bij archive.org] Roach bespreekt dit werk uitgebreid. Pas een decennium later publiceert Edgeworth de eerste kritische studies over betrouwbaarheid e.d. van examens. Roach 1971:283, noot 1:
F.Y. Edgeworth, 'The statistics of examinations', Journal of the Royal Statistical Society, vol. LI (1988), pp. 599-635 JSTOR read online free; 'The element of chance in competitive examinations', ibid, vol. LIII (1890), pp. 460-75 JSTOR read online free, 644-63 JSTOR read online free. These are summarized in P. J. Hartog, Examinations and their relation to culture and efficiency (1918) pdf, Appendix E. I have used Hartog's summary here. Edgeworth wrote popular acounts of his work in Journal of Education, 'The statistics of examinations', vol. X (n.s., 1988), pp. 469-470: 'The uncertainty of examinations', vol. XII (n.s., 1890), pp. 95-6, 203, 469.
Roach (p. 285):
His results were taken up by Sir Philip Hartog in a book published in 1918, but in his own day he was a lone voice. His ideas are important rather for the twentieth century than for the end of the nineteenth. ( . . ) In the twentieth century examinations were to be discussed from the point of niew of statistics, of psychology, and later of sociology. In the nineteenth century they had been part of politics, using that word in a very broad sense to cover the whole social structure of the time. Examinations had appeared as one aspect of the theory of open competition which was basic to the Victorian age.
Edgeworth was volgens Roach de eerste die systematisch studie maakte van de eigenschappen van examens, en van de eerlijkheid van aftestgrenzen; kennelijk was men zich daar voor zijn tijd nauwelijks van bewust, al zjn er wel enkele belangrijke uitspraken over te vinden, terwijl het nog tot 1918 duurt voordat er een begin van een vervolg aan zijn werk zichtbaar wordt! Summing up: de achtergronden van Engelse examensystemen zijn dermate uniek, dat Engelse auteurs, ook hedendaagse, vanuit die achtergrond moeten worden begrepen. Een algemeen punt, naast het werk van Edgeworth dat immers ook algemene geldigheid heeft, is dat ons huidige cijferstelsel in het middelbaar onderwijs (in Nederland) zoveel trekken gemeen heeft met het tamelijk mechanische examineren in het Engeland van de 19e eeuw: zonder omhaal cijfers optellen en op basis van rangordeningen e.d. beslissen, ook heel belangrijke en diep ingrijpende beslissingen.
Op p. 193 een razend onthullend citaat van Gladstone uit een brief van januari 1854 aan Russell (voor precieze bron zie Roach p. 193 noot 2).
"I do not hesitate to say that one of the great recommendations of the change (to open competition) in my eyes would be its tendency to strengthen and multiply the ties between the higher classes and the possession of administrative power. As a member of Oxford I look forward eagerly to its operation. There, happily we are not without some lights of experience to throw upon this part of the subject. The objection which I always hear there from persons who wish to retain restrictions upon elections is this: ‘If you leave them to examinations, Eton, Harrow, Rugby, and the other public schools will carry everything.’ I have a strong impression that the aristocracy of this counttry are even superior in natural gifts, on the average, to the mass; but it is plain that with acquired advantages their insensible education, irrespective of booklearning, they have an immense superiority. This applies in its degree to all those who may be called gentlemen by birth and training."
p. 3: Roach citeert E. E. Kellett's autobiografie 'As I remember (1936) p. 276:
If, in fact, I were asked what, in my opinion, was an essential article of the Victorian faith, I should say it was "I believe in examinations.
p. 12:
The connection between more effective education and more effective administration was very close. Government appointments provided good careers for men of ability who lacked influence ( . . ). Lord Robert Cecil, the future Lord Salisbury and prime minister, had very good reasons for saying in 1856 that the proposal to open the Civil Service to competition “was neither more nor less, from beginning to end, than a schoolmasters’ scheme.” Cecil was not alone in making the point. Henry Latham of Trinity Hall, Cambridge, in a most useful book on the subject published in 1877 [On the action of examinations considered as a means of selection, archive.org heef de integrale tekst online], pointed out that examinations had a double purpose. They had a strictly educational objective as an aid to study, and they were the means of selecting candidates for appointment, but the methods used in the two cases need not be the same. This connection was true for all the countries which used the system. Both the Prussian Abitur and the French Baccalauréat were essential qualifications for state service as well as school examinations, and mid-Victorian Englishmen were well aware that efficient systems of state education gave their neighbours great advantages both in testing the basis of school work and in laying down the bases for state employment. But in England the academic origins of the examination system are particularly important. There can be no doubt that the idea of examinations and of the competitive principle in English official life came originally from the prestige of the honours examinations in the two universities, in particular from the Senate House Examination (later Mathematical Tripos) at Cambridge. Latham ( . . . ) says: “From the success at the universities of examinations as a means of awarding distinctions and emoluments with perfect impartiality, they were brought into use as a means of disposing of all kinds of appointments,” And again: “moreover this Examination [the Mathematical Tripos] acquired quite early in the present century a high reputation for the integrity and ability with which it was conducted . . . In consequence, when a difficulty arose about the bestowal of Government patronage, the public caught from [it] the idea of introducing competitive examinations.”
p. 13 over de Mathematical Tripos, de Senate House Examination, at Cambridge:
It developed in the first half of the eighteenth century out of the traditional system of disputations. There had always been in addition to the disputations some sort of examination of the candidates in the schools. This examination, which had originally been supplementary to the exercises, gradually became much more important than they; it was a written test in English, as opposed to the Latin of the disputations, and its content was primarily mathematical, though rather in the sense of providing a training in logical reasoning than in advanced mathematical thought. The examination results were printed after 1747, and after 1752 they assumed their permanent division into wranglers, senior optimes and junior optimes. ( . . ) In the early nineteenth century the Tripos was a genuine though narrow test of intellectual ability.Its standing in the university and the country was very high, and there was no question that the men who came out at the top reached their positions by real ability and hard work and without any question of patronage or favour. ( . . )
p. 14:
. . . in the first half of the nineteenth century more and more of the high places in public life were being taken by men who had enjoyed successful careers at the university — that is, successful in the great test of public examinations.
Roach beschrijft de bedoeling van zijn boek op p. 10-11 Hij bestudeert examens niet vanuit de optiek van politieke ontwikkelingen
From this point of view the victory of competitive examinations provides an exemplification of the new ways of looking at politics and administration which developed in England between 1840 and 1870. One programme of legal and administrative reforms had been formulated by Jeremy Bentham and his followers. The new attitude, among Utilitarians and Non-Utilitarians alike, concentrated on more efficient ways of solving the problems of a complex industrialized society for which the methods of an older and more easy-going world were not effective enough. Government needed sevants who possessed, as Nassau Senior wrote to Lord Melbourne, “diligence, impartiality, decision, discretion, knowledge of human nature . . . invention and resource.” and many came to believe that competitive examination was the best way to get them.
niet vanuit de intentie de veranderingen te beschrijven (zie daarvoor het boek van Montgomery), maar:
It approaches the subject of competitive examinatons from the sides of the teacher and the learner. It is focused very much on the class-room, and it tries to see, through the particular vistas opened up by the examination system, how the practical business of educating the young went on in Victorian times.
Roach gebruikt materiaal uit twee artikelen: Middle-class education and examinations: some early Victorian problems, British Journal of Educational Studies, May 1962, & Examinations in nineteenth century England. State power versus private control, Paedagogica Europaea, 1965.
p. 4:
The Clarendon Commission in its report (1864) drew an interesting contrast between Shrewsbury, where much stress was placed upon examinations and upon promotion by merit, and Eton, which in many ways represented, among the great schools, the ideas of an earlier and more aristocratic age. In the latter, the commissioners remarked, ‘the spur of emulation’ was very little used: “In the system of promotion, the learning of lessons, even the awarding of prizes, there is comparatively little of direct competition, and the distinctions which are given are not conspicuous enough to make them objects of general ambition or respect.”
p. 9:
Open competitive tests may serve to bring forward able boys and girls from a lower social level and to give them opportunities of advancement through higher education. In this sense competition may be used as a means of remedying social inequalities and of broadening the basis of the educated class. Such was the basic philosophy behind the opening of the road from elementary to grammar school through the free place system in the early twentieth century. The Victorians sometimes argued on similar lines; for instance the Taunton Commission argued that the foundation of exhibitions to higher schools was a proper way of using educational endowments. They were however more inclined, as we shall see, to believe that open competition in the award of scholarships and exhibitions would favour those who had enjoyed the best - and therefore the most expensive - preliminary education rater than the poor. In that sense open competition could have, sometimes unintentionally, anti-democratic and anti-egalitarian results.
p. 16:
In economic affairs free trade had been achieved. A fair field and no favour was what the British business man, in control of the most powerful productive machine ever known by man, was demanding. This was the inspiration behind the movement to repeal the Corn Laws; this was the impetus which carried British manufacturers, British shipping, British commercial contacts all over the world. What was effective in business might also be expected to succeed in government service and in education. The triumph of free competition in all these fields took place between 1850 and 1870.
p 17:
The popularity of games — and in higher and secondary education the cult of the athlete was at its height at the beginning of the twentieth century — represented the apotheosis of competition, and of competition endowed with a mystical, almost spiritual value. That this was so is an indication of how deeply the competitive idea runs through the values and ideas of the middle and late Victorian age.

F. Rudolph (1965). Essays in education in the early republic.. The Belknap Press of Harvard University Press. [UB Leiden geleen]

In een essay van Robert Coram (1791) komt een aardige beschrijving voor van de negatieve effecten die het toekennen van prijzen heeft. Ik heb die beschrijving niet overgenomen, het is nogal anecdotisch, maar Coram demonstreert, zoals waarschijnlijk vele anderen dat ook gedaan hebben, dat men wel ziet dat dit soort instrument misschien meer nadelige dan voordelige effecten heeft. Coram geeft in ieder geval aan dat ook degenen die in de prijzen vallen daardoor beschadigd kunnen raken:
. . . the medal never failed to ruin the one who gained it and who was never worth a farthing afterwards . . .
Het voordeel is in het verhaal van Coram alleen dat de school er een enkele leerling meer mee kan aantrekken:
[those medals] had produced but one good effect, which was [that] they had drawn a few more scholars to his school than he otherwise would have had . . .
Samuel Knox (1799). An essay on the best system of liberal education. Philadelphia. 271-372. p. 345:
Public examination should be held thrice a year. The first about the beginneng of the new year; the second in May; and the third about the middle of August. ( . . ) Should one day be found insufficient to go through the business of those examinations, they ought to be continued for two or even three days, affording to every class in the academy an equal opportunity of exhibiting its progress. As in the primary school, so also in the academy, honorary prizes should be impartially conferred on such as excelled; and for this purpose regular catalogues should be kept by the masters of all the youth in the academy, having proper columns opposite their names, specifying the authors they were reading, or the progress they had made at the end of each examination, marking with an asterism such names as had obtained prizes or had given proofs of uncommon industry and application. These catalogues should be put up to public inspection at the next succeeding examination. The prizes conferred might either consist of suitable books provided for that purpose, or of a piece of green or blue ribbon to be worn on the breast, having stamped on them the name of the academy and having the words ‘Merui Laudem’ inscribed on them for the motto. They might also be numbered so as to exhibit different degrees of merit or industry. ( . . ) yet the general object of themshould be understood by the students as a reward for that proficiency which arises from habits of perseverance and industry.
Dit is een plan, geen beschrijving van een bestaande situatie. Knox is geboren in Ierland (1756), en is afgestudeerd aan de universiteit van Glasgow; ik vermoed dat hij een model uit zijn eigen onderwijservaring heeft gebruikt. Ligt overigens in het bijhouden van vorderingen met het oog op rangordenen ook de oorsprong van het rapport? ‘He shared with Samuel Harrison Smith the prize offered by the American Philosophical Society in 1797 for the best essay on a national system of education.’

Sheldon Rothblatt (1968). The revolution of the dons: Cambridge and society in Victorian England. London: Faber. [UB Leiden? 6832 A 2]

p. 252, mathematical tripos ca. 1900: "If any discussion of educational reform in Cambridge were to raise the question of liberal education it would have to be the disputes in 1900 over the mathematical tripos, for in the past mathematics more than any other Cambridge subject had been least liable to the charge of careerism. Yet interestingly enough even here the issue could not be reduced to that of a conflict between profesiional and liberal ideals, no matter how hard defenders of the traditional tripos tried. It is true that the reformers indicated that their changes would improve a student's preparation for professional work; but just as equally and convincingly they argued that the two principal reforms, abolition of the order of merit and a new division of the tripos, would redeem Cambridge mathematical teaching by removing the one factor which more than any other had turned the tripos into a race for success, and was most responsible for the narrow spirit and manner in which Cambridge students pursued mathematical studies.

Alexander W. Astin (1985). Achieving educational excellence. Jossey-Bass.

p. ix:
All these experiences have convinced me of one thing: although a great deal of assessment activity goes on in America’s colleges and universities, much of it is of very little benefit to either students, faculty, administrators, or institutions. On the contrary, some of our assessment activities seem to conflict with our most basic educational mission.
p. 4:
I argue that the basic purpose of assessing students is to enhance their educational development. Another way of saying this is that the assessment of students, more than anything else, should advance the educational mission of our colleges and universities. In the same spirit, I argue that assessment of college and university faculty should enhance their performance as teachers and mentors of students and as contributors to the advancement of knowledge.
p. 5:
The resources conception is based on the idea that excellence depends primarily on having lots of resources: the more resources we have, the more excellent our institution. The resources that are supposed to make us excellent are of three different types: money, high-quality faculty, and high-quality students. ( . . . ) The reputational view of excellence is based on the idea that the most excellent institutions are the ones that enjoy the best academic reputations. In American higher education, there is a folklore that has evolved over the years that implicitly arranges our institutions into a kind of pyramid-shaped hierarchy, or pecking order. ( . . . ) I refer to the pecking order as folklore largely because it is part of our belief system rather than something that has been established independently through systematic study and analysis. ( . . . ) Reputation and resources, in short, tend to be mutually reinforcing. ( . . . ) Under the talent development view, excellence is determined by our ability to develop the talents of our students and faculty to the fullest extent possible. ( . . . ) As far as educational excellence is concerned, the most excellent institutions are, in this view, those that have the greatest impact — ‘add the most vaue’, as the economists would say — to the students’ knowledge and personal development."
p. 52, competitie:
My first concern is with the way multiple-choice tests are scored. Typically, the number of right answers (or a weighted combination of rights minus wrongs) is converted into some type of normed score, either a percentile or a standard score (see the appendix). What do we really do when we make such a conversion? We discard the basic data about how many questions (and which ones) the student answered right or wrong, and replace this information with a score indicating only how well the student performs in relation to other students. Here we have the so-called norm-referenced test. By using tests that are scored normatively, we are putting students in competition with each other. The implied value underlying this type of test seems to be that the cognitive performance of any given student should be judged competitively: How much better or worse did the student do when compared to other students? This competitive scoring procedure is identical in spirit to traditional classroom grading, especially if the grading is done on the curve. I might ad that these relativistic and competitively scored tests are difficult to use in assessing talent development because they make it virtually impossible to determine how much a student has actually changed or improved over time. All we can say is that the student’s performance has increased or decreased in relation to other students. There is another, perhaps even more subtle problem with normative assessment, whether it be through letter grades or standardized tests: when we choose to assess performance using a normed instrument, we create what economists would call a ‘scarce good.’ Only so many students can be at the top of their class and only so many students can score above the 90th percentile. No matter how hard students work and no matter how much they actually learn, there will always be only so many ‘ecxcellent’ test scores or grades or students! Normative assessment, in other words, automatically constrains how much ‘excellence’ you can have. The impoprtant thing to realize is that this shortage is a completely artificial one rather than something inherent in the outcome being assessed. The shortage, in other words, is something created by the asessment method itself. As with any scarce good, the scarcity itself tends to exaggerate the importance of being at the top, so that below-average or even average performance is often viewed as failure. Normative scoring, in other words, guarantees that a substantial number of students, if not the majority, will view themselves as failures.
cursivering van Astin).
Astin behandelt dan even de absolute normen zoals die bij criterium-gerefereerd meten mogelijk zijn, op een aardige manier. Absoluut berust natuurlijk op een afspraak.
p. 42:
The resource view of excellence is fundamental to such beliefs, in that the excellence or quality of the institution is identified with the excellence or quality of the people it admits. [cursivering van Astin]
(p. 50)
Unfortunately, this belief receives little support from the few studies that have tested the center-of-excellence concept. Institutions with highly able students, large libraries, highly paid faculty, and large per-student expenditures do not seem to foster any greater degree of intellectual development than do institutions without such resources (Astin, 1968). Economist Howard Bowen, who has reviewed much of this literature (Bowen, 1977, 1980, 1981a), notes that the wide-ranging variation in the amounts of money institutions invest in their educational programs is not associated with any differences in educational pay-offs.
(p. 54)
Finally — and this is a subtle but very critical point — resource-based conceptions of excellence tend to focus institutional energies on the sheer accumulation or acquisition of resources rather than on the effective use of these resources to further the educational development of the student and to promote faculty development. Paradoxically, in the pursuit of resources, institutions expend resources without generating more resources, thereby depleting the total pool. [cursiveringen van Astin]
(p. 55)
If the emphasis on outcomes leads an institution to strengthen its educational programs, then the system’s excellence is enhanced. On the other hand, if the institution tries to improve outcomes merely by acquiring more resources (brighter students, more productive faculty members), the excellence of the system as a whole remains unchanged. Once again, we are engaged in a zero-sum game: High achieving faculty members and students are simply recruited from one institution to another.
Onder het kopje ‘Benefits of Higher Education’ (p. 18) geeft Astin een overzichtje over drie typen opbrengsten voor zowel studenten die hij wil onderscheiden: educational benefits
( . . ) educational benefits refer to changes in the student — in his or her intellectual capacities and skills, values, attitudes, interests, habits, mental health, and so forth — that are attributable to the college experience.
fringe benefits (p 19).
( . . ) the fringe benefits of attending a given college include those post-college outcomes that are related not to the student’s personal attributes but to the institutional credential that the student receives. Some writers call this the ‘sheepskin effect’.
(p. 22)
The belief system that supports the institutional hierarchy in American higher education is inclined to assume that educational benefits are proportional to fringe benefits. That is, it is widely believed that students learn more and develop their intellectual capacities more fully in an elite or highly selective institution than in a nonselective or unknown institution. Longitudinal studies of student development, however, generally fail to support this belief. Thus, highly selective institutions do not appear to confer more educational benefits on their students than do moderately selective or even non-selective institutions
existential benefits (p. 21).
( . . ) existential benefits refer to the quality of the undergraduate experience itself, independent of any changes in competence (educational benefits) or any sheepskin effect (fringe benefits). Thus, they derive from the subjective satisfaction associated with peer contacts, extracurricular and academic involvement, recreational activities, and virtually any other experience connected with college attendance. Existential benefits are, in effect, the sum total of the student’s subjective experience while attending college. Such experiences may, of course, yield educational benefits (learning, changes in values, and so forth). But the main point here is that these experiences have value to students in and of themselves. Educators frequently overlook the fact that the four or more years involved in a college education represent a sizable portion of the student's total lifespan. For the student, then, existential outcomes are important in themselves, not merely for what they will mean later. Research on student development (Astin, 1977) suggests that existential benefits are more dependent than either fringe or educational benefits on the institutional environment. In other words, institutions can probably exert more direct control over the existential benefits for students than over the other two types of benefits.
Astin, A.W., Undergraduate achievement and institutional excellence. Science, 1968, 161, 661-668.
Astin, A.W., Four critical years: effects of college on beliefs, attitudes, and knowledge. San Francisco: Jossey-Bass, 1977.
p. 205:
While some critics have tried to argue that the talent development approach compromises and theatens academic standards, when we look at the educational system as a whole, there is no better way to promote academic standards than to maximize talent development. ( . . . ) In essence, a talent development approach seeks to add as much as possible to each student’s entering level of performance. If we are indeed able to maximize talent development among these ten students [Astin geeft een voorbeeld met een groep studenten die binnennkomen op academisch niveau 2, 2, 3, 3, 3, 4, 4, 4, 4, en 5, de laatste bij op graduate level, de eersten mar net boven ongeletterd niveau 1], we accomplish at least three important goals: 1. We maximize the number of students who reach minimal performance standards (level 6). 2. We maximize the ‘margin of safety’ by which students exceed this minimal level (that is, the nuber of 7s, 8s, 9s and 10s). 3. We minimize the number of students with borderline skills (that is, levels 2 and 3). ( . . . ) So, even if some of our ten students fail to reach level 6 and drop out of college without a degree, we have still made some contribution to their intellectual functioning and have thus added to their chances of eventually becoming productive members of society. In other words, a talent development apporoach is the surest way not only to maintain academic standards but also to maximize the amount of human capital available to society. ( . . . ) the role of assessment changes dramatically under a talent development perspective. Rather than being used to promote institutional resources and reputation, assessment is used to place students in appropriate courses of study and to determne how much talent development is actually occurring by repeated assessments over time. These latter assessment activities would serve two functions: to document the amount and type of talent development that is occurring, and to provide, in combination with environmental information, a basis for learning more about which particular kinds of educational policies and practices are likely to facilitate talent development. If testing in higher education were revised along the lines suggested here, it seems likely that proponents of expanding access and opportunity in higher education would come to see assessment as an ally rather than as a threat.
p. 207:
When we operate from the narrow perspective of one institution or a single profession, we are concerned only with what happens to those students we admit; the rejected candidates are not of interest to us. On the other hand, when we view such decision problems from a larger system perspective, we concern ourselves with the fate of all candidates, winners and rejects alike. This distinction is precisely analogous to the disctinction between the use of assessment for selection versus asssessment for placement.
p. 206:
The real problem would seem to be placement of people in appropriate courses and programs within the total system. If a person with eight-grade math skills wants to study engineering at the college level, there should be a means available to help develop that person’s mathematical talent to the level at which it would not cause a disproportionate drain on the resources of the engineering program. Furthermore, the person should be assured that if the math remediation is successful, there could be a place available in the engineering program. This same set of principles — appropriate placement with assurances of future opportunity —should be aplied to all persons at all talent levels and to all fields of academic and professional study. An educational system designed and operated according to such principles would not only provide educational opportunities for all but would also encourage each person to view education in its proper light: as a place to develop one’s talents rather than as a place that merely screens and sorts or that limits opportunity. And, if students were permitted to avail themselves of educational opportunities as long as they continued to develop their talents, the public would also be getting the maximum ‘bang’ for its educational ‘buck.’

In short, when we view educational decisions — such as selection and placement — from a larger societal perspective, the goal of maximizing talent development makes more sense than any other educational philosophy. There is no better means by which we can maximize the human capital available in society.
Fantastic.

Eckstein, M. A., & Noah, H. J. (eds) (1992). Examinations: comparative and international studies. Oxford: Pergamon Press. [KB geleend]

p. 83: In Zweden zijn er geen examens in het middelbaar onderwijs, wel nationaal gestandaardiseerde tests. Deze zijn zo genormeerd dat cijfers op een vijfpuntschaal voor het hele land ‘normaal verdeeld’ zijn Bijv. 7% een 1, 24% een 2, 24% een 3, 24% een 4, 7% een 5. Hoe verzinnen ze het.
The mark received by any individual student should express to what extent he or she has succeeded in relation to the total population of students in the country taking the same subject. By means of nationwide application of standardized achievement tests, it has proved possible to to stabilize the marking system. Differences in achievement results between schools are very small, which means that it does not differ very much if a school is situated in a city or in the countryside, in the North or in the South (Marklund, 1988, Education in Sweden: assessment of student achievement and selection for higher education. In S. P. Heyneman & I. Fägerlind: University examinations and standardized testing. Washington, D.C.: World Bank).

K. Ingenkamp (1972). Zur Problematik der Jahrgangsklasse. Weinheim: Beltz. [POW B-7 INGE]

Een belangrijke conclusie uit zijn empirische studies: (p. 291):
Wenn die Schüler der gleichen ‘Unterrichtssituation’ ausgesetzt sind, dann ist die Intelligenz ein leistungsdeterminierender Faktor für die Testergebnisse der einzelnen Schüler. Sind die Schüler in verschiedenen Klassen und damit verschiedenen ‘Unterrrichtssituationen’ ausgesetzt, dann determiniert diese die durchschnittliche Testleistungen der Klassen stärker als die Intelligenze der einzelnen Schüler.
(p. 295):
In unserer Untersuchung wurde auch bewiesen, dass die Zensuren in verschiedenen Klassen nicht vergleichbar sind. Da der Lehrer sich am Niveau seiner Klasse orientiert und die Klassen sich im Gegensatz zu den Voraussetzungen des Jahrgangsklassensystem in ihrem durchschnittlichen Leistunsniveau stark unterscheiden, korrespondieren in verschiedenen Klassen ganz unterschiedliche Zensuren mit der tatsächlich gleichen Leistung. Das klasseninterne Bezugssystem entzieht unserem gesamten Berechtigungswesen die sachliche Rechtfertigung. Ob ein Schüler versetzt und im Gymnasium aufgenommen wird, bedeutet in verschiedenen Klassen und Schulen etwas ganz anderes. Das Abitur einer Schule ist mit dem einer anderen kaum vergleichbar.

Becker, H., Geer, B., & Hughes, E. C. (1968). Making the grade: the academic side of college life. New York: Wiley. http://howardsbecker.com/ http://howardsbecker.com/articles/grades.html H. Becker, B. Geer & E. C. Hughes (1968). Making the grade: the academic side of college life. Wiley. [Niet in UB A'dam. UBL 2829 C 32; is recent herdrukt]

Beschrijft de cijferwereld van de typische Amerikaanse campus, waar alles maar dan ook alles allereerst om de cijfers draait. Relevantie voor GPA zoals dat in USA-colleges functioneert: vooral p. 45 e.v.
Individual course grades are less important than the combined grade point average (GPA). The GPA, calculated every semester and cumulatively over the time one has been in school as well, is part of the student's official record and furnishes the raw material for many published University statistics. It takes into account both the grade earned and the number of hours of credit for which one is taking a course. The student ordinarily receives one hour of credit for each hour the class meets a week (a class that has three hourly sessions a week gives three credit hours), although occasionally more credit will be given. To find the GPA for the semester one multiplies the number of credit hours for each course by three if the grade is A, by two if it is B, by one if it is C, by zero if it is D, and by minus one if it is F. The total of one's grade points is then divided by the number of credit hours taken. The rsulting GPA varies from plus three to minus one; a student who receives all A's has an average of 3.0, a student who receives all F's has an average of -1.0, and so on. In some colleges different numerical values are assigned (an A might receive fout points and an F none, for instance), but the method of calculation is generally the same. A substantial number of students find grades a problem. Table 4 shows the distribution of grades for all undergraduates in the fall semester of 1961. Only 25 percent of the students receives a GPA of 2.0 or better (a B average). At the other extreme, 32 percent of the students received a GPA below 1.0 (a C average).
[Tabel 4: GPA >= 2 1886 studenten, 25%; 1.00-1.99 3234 43%; 0.99-0.81 352 5%, 0.80-0.01 1209 16% 0.00 and below 852 11%, total 7533 100%].
University regulations provide that students whose GPA in any semester is below zero will be dropped for poor scholarship. The Committee on Scholarship and Probation may, at its discretion, permit the student to enroll for another semester; nevertheless, a student whose GPA is zero or less is in imminent danger of being forced to leave. He can transfer to another school, but he cannot return to the University of Kansas without special permission. Freshmen and sophomores whose GPA is below .70 are put on probation, as are juniors and seniors whose GPA is below 1.00. A student on probation who fails to achieve the required GPA in his next semester can also be dropped from school for poor scholarship and, at the least, must have the permission of the Committee on Scholarship and Probation to enroll for the next semester. [These rules apllied to the College of Liberal Arts and Sciences; other School in the University had somewhat different standards.] ( . . ) Many students spoke of their fear of failing or said simply that they were having trouble with their grades. In such cases, the unspoken premise in their argument referred to the rules we have just cited. The grade point average is calculated for the entire time one is in school, as wel as for the current semester. This cumulative average affects those who manage to raise their previously lower grades to a C average, for they will not be able to graduate unless they make in future semesters a GPA sufficiently higher than a C average to offset their low grades in the past. Regulations for graduation form the College of Liberal Arts and Sciences specify that students must earn a total of 124 grade points to graduate and that they must earn a minimum of one grade point for every hour of credit in their major subject. Thus the student's past sins remain to haunt him for semesters to come ( . . ). Students with better averages may also have cause for alarm. Students who have won scholarships may have to achieve quite a high average to keep the scholarship for another year. Although the committees and administrators who oversee scholarship awards do not apply rules mechanically, they do in general follow certain guidelines in renewing awards. For some scholarships, an average as high as 2.5 may be desirable ( . . . ). Students who lose their scholarships may have to leave the university for financial reasons. Even if the student remains on campus, his grades affect the kind of life he can lead. The regulations affecting extracurricular students organizations penalize students with low grades. A number of regulations specify a minimum grade point average as a condition for participation in campus activities. ( . . )
[p. 49:]
The student with low grades may incur a number of other disabilities, not because the faculty or administration make rules dedreeing the disability, but because student groups make use of the GPA, both formally and informally, as a way of sorting people out for reward and preferment. Student organizations adopt formal rules tying membership and participation to grades. In addition, students use GPA as an informal way of choosing among other students, both for organizational positions and in less structured siytuations. Because of the interconnections beteween student activities, the disabilities incurred in one area of campus life produce further disabilities in other areas, as we will see later. Finally, the criteria by which students award prestige to living groups depend in part on their achievements in other areas affected by the GPA. Since the prestige of one’s living group is an important component of student identity, students suffer when their living group suffers. ( . . ) The regulations governing membership in fraternities and sororities make use of the GPA as a criterion.
[p. 51:]
Students are aware of the many interconnections between different areas of campus life and recognize that failure in one area will have consequences in other areas. Thus freshmen who belong to fraternities are more likely to participate in campus organizations and thus have experiences that will prepare them for higher campus offices. If low grades keep a student out of a fraternity, his chances of achieving office suffer. And holding office, students think, has consequences beyond the college years, when the reputation one has earned in campus activities affects the kinds of jobs one is offered or, possibly, one's career in local and state politics.

G. A. Lienert (1987). Schulnotenevaluation. Frankfurt a.M.: Athenäum. [UB Leiden: 3895 A 27]

Cijferstelsels internationaal
1.4. International Schulnotenskalen. Das Zensieren durch Zensoren diente im alten Rom der Steuerveranlagung. Zur schulischen Leistingsbeurteilung diente das Zensurengeben seit der sächsischen Schulordnung aus dem Jahre 1530 (Ziegenspeck 1977, S. 34). Die 5-stufige Notenskala ist seit 1913 in den Matura-Zeugnissen der K&K-Monarchie und den meisten ihrer Nachfolgestaaten üblich, wobei einde ‘Drie'’als Ankernote — analog zum IQ-Anker von 100 — fungiert und mittlere Leistung im Gesamt des Bezugsystems repräsentiert. Die 6-stufige Notenskala des Deutschen Reiches aus 1938 loste 4-stufige und 7-stufige Skalen in Bayern sowie 9- und 10-stufige Skalen in Sachsen ab; sie wird in der BRD und in der DDR seit 1954 wieder benutzt. In österreich gilt weiter die 5-Stufenskala und in der deutschsprachigen Schweiz gelten kantonweise unterschiedliche 10- bis 20-stufige Skalen (Punktskalen). Streng als Rangskalen definiert sind die in England und in den meisten US-Bundesstaaten geltenden Buchstabenfolgen A bis F (fail) oder Prozentrangwerte auf der basis von Schulleistungstests. Das gleiche gilt für die schwedischen Zensuren von A bis C mit den Stufen A (ausgezeichnet), a (sehr gut), AB (gut), Ba (befriedigend), B (ausreichend), Bc (mangelhaft) und C (ungenügend). Ebenfalls ranskaliert, aber mit Ziffernfolgen bei starker Differenzierung im unteren Leistungsbereich ist das 10-stufige Notensystem in den Niederlanden met 1 (sehr slecht) bis 10 (ausgezeichnet). ‘Unten gedehnt’ wie das niederländische, aber 5-stufig wie das österreichische ist das sowjetische Notensystem mit den römischen (statt arabischen) Ziffern I (slect), II (ungenügend), III (genügend), IV (gut) und V (sehr gut), das auch in Jugoslawien gilt. Quasi-metrische Ansprüche an die Schulleistungsbeurteilung erhebt das auslese-orientierte Benotungssystem der Franzosen, das eine Punkteskala mit vorgeblich gleichen intervallen von 1 bis 20 umfasst, wobei das Intervall 1-4 als mangelhaft, 5-8 als ausreichend, 9-12 als befriedigend, 13-16 als gut und 17-20 als sehr gut denotiert wird. Ebenfalls quasi-metrisch, aber mit ungleichen Intervallen arbeitet das dänische Notensystem, das van -16 (schlecht) über 0 (mangelhaft), dann weiter über +8 (befriedigend), +12 (gut), +14 (sehr gut), bis +15 (ausgezeichnet) reicht. Ein 'schlecht' kann also in Dänemark nur durch ein 'ausgezeichnet' neutralisiert werden. Ob die quasi-metrische Skalenanstrüche der Schulleistungsbeurteilung in diesen Ländern durch den Lehrer erfüllt werden können, muss dahin gestelt werden. Mehr als topologischen Notensysteme (der Anglo-Amerikaner) erhebt das metrische System (der Franzosen) implizit die Zensur zu einen ‘Leistungsnachweis und einem Leistungsausweis’ (Zeilinsky, 1961). In den folgenden Abschnitten über metrische Bnotung wird denn auch nur die pädagogische oder die Berechtigungsfunktion von Schulnoten und Schulzeugnissen (Ziegenspeck, 1977, S. 52) in Betracht gezogen." (Lienert, 1987, p. 40)

Coebergh van den Braak, A.M. (1988). Meer dan zes eeuwen Leids Gymnasium. Leiden: Leids Gymnasium.

Hoe gingen de examens op de Latijnse school eigenlijk in hun werk? In een passage in het gedenkboek van het Leids gymnasium ligt de suggestie besloten dat men werkte met tevoren reeds uitgebreid bestudeerde en mogelijk uit het hoofd geleerde stukken te vertalen tekst. "
Bosse organiseerde zijn halfjaarlijkse examens geheel in de stijl, die Wensinck eerder had bepleit, zoals blijkt uit zijn opmerking: Volgens door U goedgekeurde gewoonte heb ik mijn leerlingen niet geprepareerd door ze enige stukken in te stampen, die zij op het examen kunnen opdreunen. Mijn leerlingen zullen dus menigmaal missen, maar juist daardoor tonen wat zij weten en wat niet, en dus aan het examen van Heren Curatoren kunnen voldoen.
Ik moet hier wat verder onderzoek aan wijden, zoals aan zoveel andere onderwerpen waar niet als vanzelfsprekend bij mag worden verondersteld dat vroegere gebruiken overeenkomen met hedendaagse.
, p. 81.

K. Ingenkamp (1972). Zur Problematik der Jahrgangsklasse. Weinheim: Beltz. [POW B-7 INGE]

Bevat een scherp historisch hoofdstuk over klassikaal onderwijs, vooral toegespitst op de negentiende eeuw in Duitsland: de Staat die naar eigen behoefte het klassensysteem verordonneert. Verder is dit boek een empirische studie waarin van het klassikale stelsel weinig heel blijft, een soort Duitse Vijven en zessen. p. 21 Herbart op als de eerste die een duidelijke poging doet verbeteringen aan te brengen binnen het jaarklassensysteem, zonder dat systeem zelf aan te tasten. [Herbart, J. F. (1818). Pädagogisches Gutachten über Schulklassen und deren Umwandlung.] Maakt in zijn historische hoofdstuk een speciaal punt van het onderscheid tussen jaarklassensysteem en vakkensysteem, waarbij in de eerste conjunctieve normen over alle vakken worden gehanteerd, in de tweede de voortgang in verschillende vakken is losgekoppeld. Het Fachklassensystem is sterk door Francke beïnvloed, die het in 1696 in het Pädagogium in Halle ingevoerd had (zie Ingenkamp, p. 19-20:
Zur Begrundung dieses Systems führt Paulsen [1919, I p. 573] an: “Solange Latein der einzige Unterrichtungsgegenstand war, war die Einteilung der Schüler in feste klassen nach dem mass ihrer Kenntnis in diesen Sprache das Natürliche. Es schien aber nicht zweckmässig, diese Einteilung beizubehalten, nachdem so heterogene Fächer, wie Mathematik oder Französisch, hinzugekommen waren: jemand der als guter Lateiner auf die Anstalt kam, mocht hierin noch nicht die ersten Anfangsgründe kennen.”
Ingenkamp vindt de uitspraak van Paulsen wat al te schematisch, en vervolgt met een uitgebreider bespreking van het door Francke gehanteerde systeem. p. 21:
Diese Fachklassensystem war in den höheren Schulen des 18. Jahrhunderts vorherrschend, wurde jedoch auch von Volksschulen in gewissen masse berücksichtigt.
p. 16 en 17 citeert Comenius’ Grosse Didaktik: Flitner, A. (übersetzer) (1960). J. A. Comenius: Grosse Didaktik. Düsseldorf. O.a.:
Der Lehrer muss es in allen Stücken halten wie ein Offizier, der seine übungen nicht mit jedem Rekruten einzeln durchnimmt, sondern alle zugleich auf den Exerzierplatz führt, ihnen gemeinsam den Gebrauch und die Handhabung der Waffen zeigt . . . Damit der Lehrer dies kann, dürfen 1. die Schulen nur einmal im Jahr beginnen . . . ; 2. muss alles, was getan werden soll, so geordnet sein, dass jedes Jahr, jeder Monat, jede Woche, jeder Tag und sogar jede Stunde ein eigenes Pensum hat, wodurch alle gleichzeitig zum Ziel geführt werden, ohne zu straucheln (Kap. XIX, 38 f.)
Ingenkamp (p. 17)
Nach Comenius ist es nicht nur möglich, “dass ein Lehrer (magister) eine Gruppe von etwa 100 Schülern leitet, sondern sogar nötig, weil dies für den Lehrenden wie für die Lernenden weitaus am angnehmsten ist.” (Pak. XIX, 16)
Op p. 19 vat Ingenkamp nog eens samen wat de belangrijkste principes zijn van het systeem van Comenius
(Die Bedeutung dieses Modells ist weniger in seiner unmittelbaren Verwirklichung, als in seiner Wirkung auf die pädagogische Theorie und Schulorganisation des 19. Jahrhunderts zu sehen.):
- 1. Alle Menschen sollen zu den gleichen Zielen geführt werden und haben die gleiche Natur.
- 2. Die Verschiedenheit der geistigen Anlagen ist eine Anomalie, ein Mangel der natürlichen Harmonie, und kann durch die geeignete Methode ausgeglichen werden.
- 3. Dieselbe Methode gilt für alle Fächer und Schüler.
- 4. Das Pensum wird jahresweise bestimmten Altersstufen zugeordnet und in detaillierter, genau vorgeschriebener Reihenfolge durchgenommen.
- 5. Zur gleichen Zeit wird nur ein Gebiet behandelt.
- 6. Der Lehrer kann sehr viele Schüler gemeinsam und gleichzeitig zum gleichen Ziel führen, wenn er sich nicht einzelnen Schülern individuell zuwendet.
Ingenkamp gaat verder:
Diese Organisationsmodell kann nur in seinem historischen Bezug verstanden werden. Es entstand in der Zeit der Glaubenskriege und des Barock. Religiöse Sehnsucht, rationalistische Gedanken einer säkularisierten Forschung, Errichtung der absoluten Staatsgewalt in den Territorialstaaten, vieles trägt zum farbigen und widersprüchlichen Bild dieser Epoche bei.
p. 23:
Wie sehr die Organisationsform der Jahresklasse durch die auf Restauration des Untertanenstaates bedachte preussische Regierung geprägt wurde, zeigt eine Parallele zum revolutionären Frankreich. Dort hatte Condorcet 1792 im Entwurf einer Verordnung über die Organisation des Unterrichtswesens betont: “Gleichheit der geistigen Fähigkeiten und Gleichheit des Unterrichts sind Hirrngespinste. Man muss daher versuchen, diese unumgängliche Ungleichheit nutzbar zu machen.” Er will den Unterricht in Kurse aufteilen, “von denen die einen miteinander verbunden, die anderen getrennt sind.” “Man kann sogar in jedem Fach an diesem oder jenem Punkt verweilen und ihm mehr oder weniger Zeit widmen; so dass diese verschiedenen Kombinationen sich für alle Begabungsarten eigen.”
Condorcet, M.: Bericht und Entwurf einer Verordnung über die allgemeine Organisation des öffentlichen Unterrichtswesens. Mit einer Einleitung von H. H. Schepp. Weinheim 1966, Kleine päd. Texte Bd. 36, S. 43.
p. 24
Das Interesse des preussischen Staates, einheitliche Qualifikationen für die höheren Staatsbeamten zu fordern, wirkte sich auch auf die Normierung des Lehrplanes aus. ( . . . ) Mit der Einführung der jahrgangsweisen Einschulung, der jährlichen Versetzung nach dem Leistungsstand in allen Fächern, des verbindlichen Fächerkanons, der Wochenstundenzahlen und der detaillierten Stoffverteilung war in den höheren Schulen Preussens um diese Zeit [1837] prinzipiell das Jahrgangsklassensystem eingeführt. ( . . . ) Es muss besonders hervorgehoben werden, dass das Jahresklassensystem in dieser perfektionierten Form zuerst in den höheren Schulen durchgesetzt wurde. Die Volksschule wurde weniger detailliert gegliedert.
p. 31:
Beherrschender Grundsatz bei der Bildung unserer Klassen ist die Einteilung nach Jahrgängen. Neben der Gleichsetzung von Lebensalter und Entwicklungsstand sind dafür vor allem verwaltungstechnische Geschichtspunkte verantwortlich. Man hat u. E. zu sehr übersehen, dass diese Pläne formuliert wurden, als die Einrichtung der ‘stehende Heere’ aufkam und die absolutistische Bürokratie die Territoralstaaten durchdrang. Nicht umsonst zieht Comenius so häufig Beispiele aus der Praxis der stehende Heere heran. Das System wurde dann in die Praxis umgesetzt und stabilisiert, als die allgemeine Wehrpflicht eingeführt worden war, die auch Jahrgänge erfasst. Diesen Parallele sollten weiter nachgegangen werden.
(zie Martinez over invoering leerplicht; Ringer misschien ook?)
p. 43
Ein überblick über die Geschichte der Jahrgangsklasse muss zich auch mit dem aerstaunlichen Phänomen beschäftigen, dass das System unser Klassenorganisation bei aller pädagogischen Kritik so unverändert alle Wandlungen der Gesellschaft, der Erziehungsziele, der pädagogischen Theorien und der psycholgischen Erkenntnisse überdauern konnte. Zur Erklärung kann men nicht nur das Beharrungsvermögen der Verwaltung anführen. Man muss zugeben, dass die pädagogischen Reformbemühungen sich nicht auf die änderung des Organisationsmodells konzentrierten, sondern diesee eher ausklammerten. Obwohl kaum eine der Voraussetzungen noch zutrifft, die bei der Einrichtung der Jahrgangsklasse angenommen wurden, ist immer wieder versucht worden, das System durch ‘innere’ Verbesserungen funktionsfähig zu machen.
(atavisme) (vgl mijn eigen formulering dat organisatorische ingrepen ooit zinvol geweest kunnen zijn, dat de zin daarvan in de loop van de tijd verandert (vermindert) of vergeten wordt, en door nieuwe ingrepen wordt aangetast wanneer deze plaatsvinden binnen de ooit aangebrachte organisatiekenmerken.
p. 45:
( . . . ) dass diese Form der Klassenorganisation zu ihrer Zeit auch von ganz bestimmten ‘inneren’ Motiven her geprägt wurde, die für unsere Zeit und unseren Wissensstand einfach nicht mehr gelten.
Ingenkamp geeft de voorkeur aan ‘jaargangsklasse’ boven ‘jaarklasse’: p. 31:
Unser Klassensystem baut in seinem Grundsatz auf der jahrgangsweisen Gruppierung der Schüler auf. Seine Ideologie beruht weitgehend darauf. Wenn es diese Gruppierung in der Praxis nicht durchhalten kann, so ist das ein ‘Schönheitsfehler’ im System, an dem grundsätzlich festgehalten wird. Darum ist u. E. der Begriff ‘Jahrgangsklasse’ kennzeichnender als die Beziehung ‘Jahresklasse’.
p. 36:
Die Voraussetzung des gleichmässigen Lernfortschrittes in allen Fächern ist u. E. den folgenschwerste Aspekt dieses Systems. Eine genaue geistesgeschichtliche Analyse müsste überprüfen, wie tief dieses Prinzip in der Uniformierung des Untertanen genüber dem absolutistischen Staat und Herscher, im Misstrauen gegen die überdurchschnittliche Individualität und gegen die Nonkonformität wurzelt. Unter Berufung auf einen ‘Gemeinschaftsgeist’, eine ‘Klassengemeinschaft’ hat man die offenkundigen Mängel des Systems immer wieder als unvermeidbare Nebenwirkung zu rechtfertigen versucht - oder man wollte den verbindlichen Lehrplan durch ‘beweglichen Stoffreihen’ ersetzen und suchte den Ausweg in ‘inneren Differenzierung.’
onder verwijzing naar Fischer, M. (1962). Die innere Differenzierung des Unterrichts in der Volksschule. Weinheim. p. 126.
p. 42:
Neben den bereits o. a. Motiven führt Flechsig noch zur Bildung der Jahrgangsklasse an: “Das Bedürfnis des Obrigkeitsstaates, durch möglichst gleichförmige Verhältnisse die Schulen besser kontrollieren zu können.” Das Bedürfnis nach bessere Kontrolle und das Streben nach gleichförmiger Gesinnung und Bildung haben die Form jener Einrichtungen geschaffen, in denen sich heute besondere gemeinschaftsbildende Prozesse abspielen sollen. Der absolutistische Staat des 19. Jahrhunderts schuf eine Organisationsform, die seinen Interessen entsprach. Wir sollten sorgfältiger als bisher prüfen, ob diese Form der Klassenorganisation tatsächlich auch den Interessen einer demokratischen Gesellschaft gerecht werden kann.

E. E. White (1888). Examinations and promotions. Education, 8, 519-522.

By 1888, the superintendent in Cincinnati complained that when these essay tests were used to determine the promotion and classification of children they perverted 'the best efforts of teachers, and narrowed and grooved their instruction; they have occasioned and made well-nigh imperative the use of mechanical and rote methods of teaching; they have occasioned cramming and the most vicious habits of study; they have caused much of the overpressure charged upon schools, some of which is real; they have tempted both teachers and pupils to dishonesty; and last but not least, they have permitted a mechanical method of schools supervision.”
Volgens de Centrale Catalogus Periodieken van de KB is dit tijdschrift niet in Nederland, wel vanaf 1978 (bijna een eeuw te laat).
G. F. Madaus and T. Kellaghan (1992). Curriculum evaluation and assessment. In P. W. Jackson Handbook of research on curriculum. New York: Macmillan. p. 119-154.

Kandel (1936). Examinations and their substitutes in the United States. The Carnegie Foundation for the advancement of teaching. Bulletin number twenty-eight.

p. 25 e.v., over de introductie van geschreven examens voor het primair onderwijs in Boston rond 1850. Door de veel grotere aantallen kinderen was het hier al gauw ondoenlijk examens met commissies van externen te blijven organiseren. Kandel gebruikt hier als bron O. W. Caldwell & S. A. Courtis (1924). Then & now in education 1845:1923. New York. Daarin ook de commentaar die Horace Mann gaf in The Common School Journal vol. VII. Kandel:
"Horace Mann ( . . . ) analyzed the merits of the written examination with great penetration. (1) This method of examination is impartial, since the same questions are set to all pupils in the same class in all schools. [It is curious that the Sub-Committee had not hit upon the idea of setting the same examination at the same time, but went from school to school as rapidly as possible in order to prevent any leakage of the questions.] ‘Scholars in the same school, therefore, can be equitably compared with each other; and all the different schools are subjected to measurement by the same standard.’ Further, the questions in a written as contrasted with those in an oral examination are equal in ease or difficulty. (2) The new method is far more just than any other to the pupils themselves. In an oral examination of a whole class each pupil is questioned for at most two minutes; while in the written test he has a whole hour in which to arrange his ideas. (3) Accordingly the method under consideraton is the most thorough, since pupils are not subjected to the chance of the few questions that can be given in the brief time of an oral examination but have a wider range suited to a greater range of attainment and ability. (4) The witten examination does not, like the oral examination, give the teacher an opportunity to interrupt the procedure or offer suggestions to the pupils. (5) It removes the possibility of favoritism. (6) It determines, beyond appeal or gainsaying, wheteher pupils have been faithfully and competently taught, for while the oral question tends to call forth a factual answer, in the written examination the pupils are able to develop ideas and show the connections of facts. (7) Finally, in a written examination ‘a transcript, a sort of Daguerrotype likeness, as it were, of the state and condition of the pupils’ minds, is taken and carried away, for general inspection’; that is, a permanent record is available by which schools may be compared with each other or each school may measure its own progress. A recognized standard of comparison is thus available, as contrasted with the different standards of judging in the minds of different men, for ‘if evenry man’s foot is to be taken as twelve inches long, it becomes an important question by whose foot we shall measure.’
Het idee van de normaalverdeling als norm of ideaal.
p. 33-34: Kandel bespreekt een paper van J. Rendell Harris, The right reform of examinations, 1890.
Professor Harris, while he acknowledged the values claimed for examinations, such as recapitulation and concentration of studies, suggested that ‘the first thing to be reformed is the examiner.’ Examinations would not be improved until the examiner had a clear notion of what to aim at. The attempt to make all of a group of students satisfy a certain taskequally is a reductio ad absurdum; ‘methods owhich aim at democratic results have no place in examinations.’ Nor is the purpose of examinations one of passing some students and failing others but of discovering how the students stand in relation to each other. ‘A well conducted examination divides the studnts one from another like the opening out of a fan. I affirm that the first thing to be aimed at is to produce a dispersion among the group of persons presented for the examination.’ and the more dispersion a teacher produces, the better examiner he is. ‘Our purpose is to show how unequal students are to one another; and the right way to do this is not by setting up a standard of passing or failing, as if there were only two conceivable students A and not A (the elect and the non-elect), nor by the a priori assumption that there are four conceivable classes of students, say A, B, C, D, of which D stands for the non-elect; but by recognizing that there are in reality as many classes as there are students, and trying to make this fact as patent as possible by the process of examination, we can come to the question of ticketing or bracketing afterward.’ ( . . . )
An individual’s place on a normal curve of distribution, which is what Professor Harris means by ‘dispersion,’ is more important than an alphabetical or numerical mark.
p. 48 e.v. zet uiteen hoe de examens van de CEEB voor de verschillende onderdelen over de jaren heen nogal behoorlijke fluctuaties vertoonden in toegekende cijfers (op 100-punt schaal). Bijvoorbeeld elementaire algebra in 1916 38 % met score tussen 60 en 100, in 1917 63 %, 1918 75 %, 1919 39 %, 1920 74 %! p. 50:
The criticism of the fluctuations in the results of the examinations, once stated, continued to disturb the Board for many years. Efforts were made to explain and to justify them. It was stated that the Board from the start had never intended to have a fixed passing mark but to leave to the colleges the decision on the acceptable minimum rating in each subject after the examinations had been held. Other suggested explanations were that with the constant increase in the number of candidates presenting themselves for the examinations there were many who under the new educational conditions were not accustomed to written examinations or were taking them for practice, or that fluctuations were due to variantions in the quality of the candidates. After a time these explanations were discarded, and the ‘violent and regrettable fluctuations’ were attributed to ‘the unfortunate wording of one or more questions. The questions have occasionally been too long and complicated. Sometimes terms unfamiliar to the candidate have been employed. Sometimes the meaning of the troublesome question would have been onbscure to anyone. Sometimes the question has been capable of two or more interpretations, one of which was not even suspected by the examiners.’ ( . . . ) The fluctuations, it was claimed, were at any rate not due to variations in the standards of the readers who have long experience as teachers and readers with well-established standards based upon agreement reached by the different subject groups before the actual reading of papers is undertaken. [Professor L. T. Hopkins in a study on The Marking System of the College Entrance Examination Board, p. 14 (Cambridge, Mass., 1921), concluded that the assignment of marks from 1902 to 1920 rarely approximated the normal, even when only those pupils recommended by their schools as fully prepared took the examinations, nor were the results due to the influx of unprepared candidates. The irregularities were, he concluded, a very natural result of the method of reading and scoring the papers, the lack of standardization of values and corrections in conformity with the curve of error. As the best basis for solving the difficulties Professor Hopkins suggested some approximation to the normal curve, since a certain uniformity in the different subjects should be expected.]
p. 52:
Since it was the opinion of the Board that fluctuations were due to the character and form of the questions set in the examinations, measures were proposed in 1924 to lessen the fluctuations by the following suggested methods: ‘(1) Material increase in the number of questions asked at an examination. (2) Better distribution of the questions over the whole field covered by the requirement. (3) Exclusive use of questions previously tried out by experiment in secondary schools. Another form of the last suggestion is the proposal that each group of examiners draw its examination questions from a reservoir consisting of several thousand questions all of which have been tested by experiments in the secondary schools. In a number of subjects undoubtedly the problem of lessening the fluctuations in the results of the examinations would be solved most simply and most effectively by adopting more detailed and more precise definitions of the requirements ands by setting examinations strictly conforming thereto. [The Work of the College Entrance Examination Board, 1901-1925, p. 215].
Citaat uit J. McKeen Cattell (1905). Examinations, grades and credits. Popular Science Monthly, 66, p. 367-9:
In examinations and grades which attempt to determine individual differences and to select individuals for special purposes, it seems strange that no scientific study of any consequence has been made to determine the validity of our methods, to standardize and improve them. It is quite possible that the assignment of grades to school children and college students as a kind of reward is useless or worse; its value could and should be determined. But when students are excluded from college because they do not secure a certain grade in a written examination, or whhen candidates for positions in government service are selected as a result of a written examination, we assume a certain responsibility. The least that we can do is to make a scientific study of our methods and results.
Een vroeg onderzoek naar de relatie tussen toelatingsexamen en later succes is gedaan door Thorndike, E. L. (1906). The future of the College Board Entrance Examination Board. Educational Review, 31, p. 470 e.v. Nog een opmerkelijk kijkje achter de schermen van de voorstandaers van grading on the curve, p. 63:
Professor W. F. Dearborn found similar discrepancies [als gevonden door Max Meyer en gepubliceerd in 1908] in the grading of students at the University of Wisconsin by forty-fivce instructirs in seven subjects [W. F. Dearborn, School and university grades. Bulletin of the unversity of Wisconsin, 1910, no. 368]; he concluded that ‘marks, representing as they do the teacher’s estimate of mental abilities of various sorts, may themselves be naturally distributed according to the same freqeuncies as are the abilities which they are designed to represent. In so far as the teachr’s judgment is correct and is made of a sufficiently large number of pupils, the frequency of the different marks given should be the same as in a ‘normal’ distribution curve,’
p. 63:
At Cornell University it was found by I. E. Finkelstein that the same students in a year course received entireliy different grades from the two instructors who taught the course in each semester.
[I. E. Finkelstein (1913). The marking system in theory and practice. Baltimore, Md].

A. M. Coebergh van den Braak (1988). Meer dan zes eeuwen Leids Gymnasium. Leiden: Leids Gymnasium.

Fruin is een inspirerend leraar geweest, die hoge eisen stelde aan zichzelf, maar ook aan zijn leerlingen. Het blijkt nogal eens dat zittenblijvers en voorwaardelijke bevorderde leerlingen mede op voordracht van Fruin bleven zitten of herexamen na de vacantie moesten afleggen. In 1857 telden we onder de 24 voorwaardelijk bevorderden er 13, die een herexamen in de geschiedenis moesten afleggen. Maar daar staat tegenover dat hij regelmatig enkele begaafde leerlingen op zaterdagavonden bij hem thuis ontving om met hen zich te verdiepen in de werken van Michelet, Walter Scott en andere in die tijd gelezen schrijvers.
(Coebergh van den Braak, 1988, p. 125.
Een bijzonderheid is dat in een staatje op p. 116 blijkt dat in het jaar 1857 het totale aantal leerlingen slechts 33 was, in het voorgaande jaar 49.
Het gymnasium bracht als nieuwigheid t.o.v. de Latijnse school met zich mee (althans in Leiden, in 1838) dat er een strakke klassenindeling kwam, waarbij ‘klas’ de tegenwoordige betekenis kreeg van jaargroep, waarbij voor het eerst de de klas niet meer één leraar had, maar van meerdere les kreeg. Het volgende citaat laat helaas in het midden hoe de oordelen van deze verschillende leraren leidden tot promotie en prijstoekenning; mogelijk ligt hier een belangrijk moment voor het ontstaan van het cijferstelsel zoals we dat vandaag overal zien gehanteerd worden.
Een systeem van zes klassen was vereist, ieder met een cursusduur van een jaar. Er waren naast de rector maar drie docenten, er waren slechts 27 leerlingen in totaal, de klassen werden daarom ingedeeld in ‘vier scholen of lokalen.’ De eerste was de hoogste klas, de tweede school omvatte de tweede en derde klas, de derde school klas vier en vijf, en de vierde school de zesde of laagste klas. De halfjaarlijkse toelating behoorde tot het verleden, de cursus begon in september. Ook het halfjaarlijkse examen kwam te vervallen, al stelden de curatoren nog wel een ‘inspectie-examen’ in rond kerstmis. Bij het zomerexamen werd het systeem van prijzen bij de promotie, dus met ‘Gratiassen’ en ‘Oratiumculae’, gehandhaafd. Maar omdat de leraren nu per vak over verschillende klassen gespreid les gaven, en niet meer als voorheen aan één klas gebonden waren, was het nodig dat de berekening voor de prijstoekenning werd aangepast. Voorlopig volstond men met het erkennen van dit probleem, de oplossing stelde men tot nader order uit.
(Coebergh van den Braak, 1988, p. 114)
In de 18e eeuw leefden wel ideeën over een gedwongen éénjarig verblijf per klas, dat ook gedeeltelijk is ingevoerd:
. . . een voorstel van van Staveren aan de curatoren uit 1752 om voortaan iedere leerling in de prima (hoogste), secunda en quarta klas een vol jaar te laten verblijven, ook wanneer de novitii of minores het zouden winnen van de veterani of maiores bij het maken van de themata.
(p. 127; originele tekst geheel afgedrukt p. 190.
Destijds werd er halfjaarlijks bevorderd, kennelijk op basis van behaalde punten, d.w.z. gemaakte themata en gedrag. Van Staveren voert aan dat een snelle bevordering na een half jaar ertoe leidt dat deze leerlingen een belangrijk deel van de geprogrammeerde stof niet krijgen. Iedere klas bestond destijds uit veteranen die er al meer dan een half jaar in verbleven, en novieten die minder dan een half jaar in deze klas zaten; er waren belangrijke onderdelen van de stof die alleen voor de veteranen werden behandeld. Dat de vierde klas (tertia in de formele terminologie) geen problemen opleverde komt volgens Van Staveren omdat dat een klas is waarin alleen maar stof wordt herhaald.

Tim Gill & Tom Bramley (2013) How accurate are examiners’’ holistic judgements of script quality?, Assessment in Education: Principles, Policy & Practice, 20:3, 308-324. abstract

Dylan Wiliam interview - Designing the future. Extracts from video feature presentation at ACER Research Conference 2015. vimeo.com/136773589

New to me: personal best scoring, kind of ipsative judgment. Also: the Japanese way to teach to calcultae the area of the trapezium.

Warren W. Willingham, Judith M. Pollack & Charles Lewis (2002). Grades and test scoes: Accounting for observed differences. Journal of Educational Measurement, 39, 1-37. abstract

Niels Smits, GideonJ. Mellenbergh & Harrie C. M. Vorst (2002). Alternative missing data techniques to grade point average: Imputing unavailable grades. Journal of Educational Measurement, 39, 187-206. pdf [the pdf is from a republication as book chapter, without the list of references]

Brian P. Godor (2016). Revisiting differential grading standards anno 2014: an exploration in Dutch higher education. Assessment & Evaluation in Higher Education abstract

Susan M. Brookhart, Thomas R. Guskey, Alex J. Bowers, James H. McMillan, Jeffrey K. Smith and Lisa F. Smith, Michael T. Stevens and Megan E. Welsh (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. AERJ, 86, 803-848. pdf

Regeling omzetting scores in cijfers centrale examens en rekentoets VO 2016 Geldend van 06-04-2017 t/m heden regeling

Zie ook http://www.wiskundebrief.nl/553.htm#1.

Anja J. Boevé, Rob R. Meijer, Hans J. A. Beldhuis, Roel J. Bosker, and Casper J. Albers (2019). On Natural Variation in Grades in Higher Education, and Its Implications for Assessing Effectiveness of Educational Innovations. Educational Measurement. Issues and Practice abstract

Overall, about 17% of the variation in grades could be attributed to random variation between years and courses. With respect to passing courses, this percentage was almost 40%. Nonsignificant improvements in grades could be flagged as highly significant when this is ignored, thus leading to an overrepresentation of significant effects in educational literature. As a consequence, too many educational innovations are claimed to be effective.
from the abstract

B. M. van Dalfsen (1930). De samenhang der Rapportcijfers voor de verschillende leervakken eener H. B. S. Paedagogische Studiën, 11, 230. \ online

Bert Meuffels (maart 2004). Cijfergeven over de grens. Examens. Tijdschrift voor de Toetspraktijk, 15-17. artikel €

Cijferschalen, internationaal.

H. W. F. Stellwag (1955). Selectie en selectiemethoden. Een inleidende studie in het aansluitingsvraagstuk L.O. en V.H.M.O. J. B. Wolters. Hoofdstuk 6, over het cijfer.

J.N. v.d. Ende (1954). Cijfers op de middelbare school Pedagogische Studien

'Over cijfergeven gesproken'

De Groot schreef er al eens over: 'Vijven en zessen'. Op mijn vraag of hij de oorsprong van dat cijfergeven kende, liet hij weten door die vraag verrast te zijn: hij had het zich nooit afgevraagd.

Is deze anecdote relevant? Dat denk ik wel. Want het staat voor mij vast dat het cijfergeven een gestandaardiseerde vorm van rangordenen is. De rangorde van leerlingen of examenkandidaten was eeuwenlang de relevante uitkomst, met speciale beloning van de #1 eventueel ook nog #2. v Het systeem stamt uit een humanistische ontwikkeling in de middeleeuwse scholen die het bestraffen van fouten (met de roede, de plak, de pechvogel) verving door het belonen van prestatie (maar dan alleen de beste(n)). Bedoeld om te motiveren, maar iedereen zag dat dat niet lukte.

Halverwege de 19e eeuw, een eeuw van standaardiseren en statistieken, vertaalde de onvrede over dat rangordenen zich in 'verbeteringen' van dat rangordenen. In plaats van te rangordenen van de beste naar de slechtste, gingen juries dat doen van 1 (slechtst) naar 10(best).

Een verdere verfijning is dan om de hoogste cijfers niet te gebruiken bij een slecht presterende groep, idem voor de laagste cijfers. [A. Chervel, 1993, 'Histoire de l'agrégation. Contribution à l'histoire de la culture scolaire' p. 136 ev.] Dat was vooruitgang, met navolging!

Alzo, met dat cijfergeven in het onderwijs zijn we nog steeds bezig met iets middeleeuws: rangordenen van leerlingen. Laat het tot u doordringen. In een systeem van rangordenen kunnen de niet alle leerlingen het winnen van het systeem, per definitie. Zie ook Karen Heij 14-16.

Ook wie juist van dat cijfergeven af wil, heeft enorm veel aan het inzicht dat cijfergeven neerkomt op rangordenen. En rangordenen is iets dat we tegenwoordig met grote weerzin doen, als het per se niet anders mag of kan. Rangordenen is losgekoppeld van leren, van onderwijzen.

Wie dit allemaal te gek vindt: een grondig boek over alles wat rangordenen is van Amy N. Langville & Carl D. Meyer, 2012, 'Who's #1? The Science of Rating and Ranking.' http://press.princeton.edu/titles/9661.html Tot hier (Ik schreef hier al eens eerder over: 'Assessment in historical perspective')

Twitterdraadje over cijfers, 23 november 2021:

https://twitter.com/benwilbrink/status/1463094983188176900

Literatuur: http://benwilbrink.nl/literature/cijfergeven.htm… Van belang is hoe het rangordenen (sinds de middeleeuwen) plaats maakte voor cijfergeven. Zie In http://benwilbrink.nl/publicaties/97AssessmentStEE.htm "France the marking system seems to have evolved from the ranking system: Chervel (1993, p. 136 ff.)"

Cijfersystemen verschillen van elkaar in 'lengte' (van 1 tot 10, 1-20, 1-6 etcetera), 'richting' ('1' als laagste waardering, of juist als hoogste), gebruik van cijfers of letters, maar dat is de oppervlakte. Algemeen geldt: cijfers zijn pseudo-objectief, want in wezen rangordes.

Omdat cijfers een pseudo-objectieve vorm van rangordenen zijn, is de verleiding groot geweest om ergens op de cijferschaal een punt aan te wijzen waar de bokken van de schapen worden gescheiden. Flauwekul natuurlijk, maar zie er maar eens vanaf te komen.

https://twitter.com/benwilbrink/status/1463097964285542407

Adriaan de Groot schreef erover in (o.a.) 'Vijven en zessen'.

De Nederlandse gekte is dat we doen alsof de cijferschaal een intervalschaal is. Daarmee schiet het NL onderwijs zich in eigen beide voeten: 'onvoldoende' presteren krijgt een enorm gewicht. #overgewicht

De cijferschaal is uiteraard een rangordeschaal. Cijfers 'middelen' is een vorm van Nederlandse poldergekte. Excuus, elders komt het syndroom ook voor. Op de een of andere manier is er ook ingeslopen dat cijfers 'normaalverdeeld' zouden moeten zijn, zoals IQ. Gek en supergek.

Onvoldoendes zoals '1', '2' of '3' uitdelen is een vorm van mentale (kinder)mishandeling. Zie ook de casuïstiek die hoort bij de 'Model gedragscode toetsen, beoordelen en beslissen in het voortgezet onderwijs' U Twente 1998 http://ben-wilbrink.nl/Model_gedragscode_toetsen_beoordelen_en_beslissen_in_het_voortgezet_onderwijs.pdf (het VO nam dit niet over! !!)

Over die malle tweedeling in onvoldoende en voldoende cijfers (waar is dat in vredesnaam ooit goed voor geweest?): Onderwijsminister Gerrit Bolkestein veranderde (voor WOII) de betekenis van de '5' van 'juist voldoende' naar 'juist onvoldoende'. Wie snapt het nog?

a. e. n. rommes, w. k. b. hofstee. g. n. kema (1968). Omzetting van testscores in schoolcijfers. Pedegagische Studiën open

https://twitter.com/benwilbrink/status/1487155781694894084 Meer dan alleen maar grappig: welke cijfers werden in Drente gegeven voor rekenen en taal (LO)? rommes, hofstee, kema 1968 'Omzetting van testscores in schoolcijfers' https://objects.library.uu.nl/reader/index.php?obj=1874-205265&lan=en#page//56/66/32/56663252141502817414213513161243720065.jpg/mode/1up

Robert Coe, Jeff Searle, Patrick Barmby, Karen Jones, Steve Higgins (2008). Relative difficulty of examinations in different subjects. CEM Centre, Durham University. Report for SCORE (Science Community Supporting Education) via academia.edu

Paul van der Molen & Jos Keuning (2023). Steeds meer zesjes. Cito. pdf deze link is dood, staat het stuk niet meer op de site van het cito? Het rapport van Van der Molen en Keuning wel, zie hieronder.

Cijferverdelingen in het voortgezet onderwijs. Een historisch perspectief en recente ontwikkelingen. Door Paul van der Molen en Jos Keuning (zonder datum). pdf

Wouter de Jong (15 maart 2023). Hoe cijfers de motivatie van leerlingen om zeep helpen – twee reacties. blog

Ben Wilbrink (1 maart 2023). Deugdelijk toetsen: psychometrie, grondrechten, en ethiek. blog

Ben Wilbrink (9 februari 2023). ‘Meten is weten’. Werkelijk? blog

Ben Wibrink (19 november 2022). Cijfers, cijfers, cijfers, en zittenblijven blog

Ben Wilbrink (10 augustus 2022). Rangordenen en cijfergeven, kan dat ook samengaan? Een draadje.

Ben Wilbrink (14-11-2022). draadje

Iedereen doet zijn best om het goede te doen. Daarmee voorkomen we geen 'Vijven en zessen'. Overgangsbeslissingen zijn ook voorspellingen. Vul je 'data' aan met gegevens over hoe het de leerlingen verder is vergaan, of de voorspellingen klopten. Zo ja, vraag je af waarom.
Behaalde cijfers voor proefwerken zijn data, klopt. Maar ze zijn eenzijdig, ze vertellen maar een deel van wat je eigenlijk zou moeten weten bij belangrijke beslissingen. De cijfers zijn de resultante van voorkennis, streefniveau, en bestede tijd, wat leerlingen betreft.
Op het niveau van de school, van scholen, van het 'systeem' spelen krachten die ervoor zorgen dat zittenblijven en afstromen een blijvend probleem zijn in de Nederlandse onderwijscultuur. Posthumus beschreef het al eens https://www.dbnl.org/tekst/_gid001194001_01/_gid001194001_01_0040.php
Koppel die twee aan elkaar: de leerling die zich staande probeert te houden binnen de randvoorwaarden die deze selectieve onderwijscultuur stelt. Dan zie je dat leerlingen en leraren tegenover elkaar staan, elkaar in een houdgreep hebben, zeg maar, bij het cijfergeven.
De individuele leerling kan door extra inspanningen en bijlessen betere cijfers halen. Wanneer meer leerlingen hetzelfde proberen te doen, zullen de cijfers een stuk minder verbeteren. En de overige leerlingen krijgen lagere cijfers. (De ramp van schaduwonderwijs)
Kun jij met je collega-leraren daar iets aan doen? Dat is een lastige. Leraren zitten gezamenlijk ook in de klem. Het is niet onmogelijk voor scholen om zich te ontworstelen aan deze mechanismen rond het cijfergeven, maar dat vergt moed, inzicht, en stevig beleid.
Voor wie dit toch wel een interessant dingetje vindt: ik ben in de gelegenheid geweest om dit mechanisme van stilzwijgende onderhandeling tussen (in dit geval) studenten en docenten aan de hand van data aan te tonen: http://benwilbrink.nl/publicaties/92ColemanApplicationECER.htm
Je ziet in dit draadje ook het probleem van communicatie tussen de leraar die echt wel nadenkt over cijfergeven en hoe dat dan het beste kan, en onderzoekers die vrijgesteld zijn om over dat cijfergeven onderzoek te doen op ander niveau dan van de individuele leraar of leerling.
Dit draadje bevat voldoende brisante stof om daar een paar congressen, ResearchEds, boeken, mee te vullen. Ik ben er dan ook permanent over in discussie. Het vult mijn dagen, zeg maar.
https://twitter.com/benwilbrink/status/1592069276001001472

Dylan Wiliam (18-3-2023). What happens when students' high-stakes tests are scored by teachers in their own school? Here is what happened with the New York Regents exams (local and Regents diplomas require scores of 55 and 65 respectively): http://bit.ly/3TseCbQ ($) Twitter

Benjamin S. Bloom (May 1968). Learning for Mastery. Instruction and Curiculum. Reprint from Evaluation Comment, 1 (2), 1-12 pdf

Zie specifiek de sectie 'The normal curve'.

Keith Derrick (July 19, 2020). The story of the normal distribution of grades. Teach to Impact blog

(9 maart 2023). Rapportage Onderzoek LVS en eindtoets. DUO Onderwijsonderzoek en Advies. pdf

Zie ook:

Pointer (maart 2023). Toetscultuur zet jonge kinderen onder druk: 60 procent leraren schat in dat leerlingen buiten school trainen stuk

Jan Drentje didactiefonline tekst

Lars Grue & Arvid Heiberg (2006). Notes on the History of Normality Reflections on the Work of Quetelet and Galton Scandinavian Journal of Disability Research Vol. 8, No. 4, 232246, 2006 pdf

J.G. Hondebrink (19-9-1989). Weg met al die verschillende onvoldoendes. NRC Een hele pagina over cijfergeven, maar wat een onmacht. Delpher

examencijfers

Barrett, G.V., & Alexander, R.A. (1989). Rejoinder to Austin, Humphreys, and Hulin: critical reanalysis of Barrett, Caldwell, and Alexander. Personnel Psychology, 42, 597-612. (over cijfers: grote verschillen tussen disciplines, tussen colleges).

Barrett, G.V., & Depinet, R.L. (1991). A reconsideration of testing for competence rather than for intelligence. AP, 46, 1012-1024. (Gaat in op de nasleep van McClelland's 1973 artikel over de beperkte waarde van schoolcijfers als voorspeller van later succes, laat zien dat er voor de claims van McClelland van destijds maar beperkte ondersteuning in de literatuur is te vinden).

Berk, R. A. (1986). A consumer’s guide to setting performance standards on criterion referenced tests. RER, 56, 137-172.

Berk, R.A. (1977). Determination of optional cutting scores in criterion referenced measurement. JExE, 45, 4-9.

Berkel, K. van (1996). Dijksterhuis, een biografie. Amsterdam: Bert Bakker.

Hierin een aardig casus over cijfergeven: een conflict tussen twee wiskundeleraren, Dijksterhuis en Kerremans, in de 20er en 30er jaren, eindigend met het ontslag van Kerremans. Over onjuist en te laag cijfergeven.

Beuk, C.H. (1982). Vooraf normeren van (examen) toetsen. Een methode voor systematische cesuurbepaling. Cito, algemene publikatie nr. 7.

Beuk, C.H., A method for reaching a compromise between absolute and relative standards in examinations. JEM 1984, 21, 147-152

Birnbaum (1950). On the effect of the cutting score when selection is performed against a dichotomized criterion. Pm, 15, 385-389. (fc)

Birnbaum, A., & Maxwell, A. E. Classification procedures based on Bayes's formula. In Cronbach, L.J., & Gleser, G. C. Psychological tests and personnel decisions. London: University of Illinois Press, 1965.

Pierre Merle (2013). L’&ecute;valuation par les notes: quelle fiabilité et quelles réformes ? In L’école, une utopie à reconstruire. Regards croisés sur l’économie, no 12. Paris : La D&ecute;couverte, 2013, 264 p. (ch. 14).

Zie ook: Pierre Merle (Dir.) (1993). La compétence en question. École, insertion, travail. Presses universitaires de Rennes. isbn 2868470866 - hierin: Pierre Merle: L'exactitude de l'expertise professorale comme objet de croyance. L'exemple des éépreuves écrites du baccalauré'at. 15-52 (met data)-->

Ben Wilbrink (8 februari 2024). draadje

We hebben geen flauwe notie van wat we eigenlijk aan het doen zijn, met dat cijfergeven in onderwijs. Een sterke aanwijzing daarvoor komen we zelfs dagelijks tegen, zonder er notitie van te nemen: de idiotie om 5 niveaus van 'onvoldoende' te onderscheiden, in onze cijferschaal.

Susan M. Bookhart (2009 2nd). Grading. Merrill. Internet Archive borrow

[more publications by Brookhart on Anna's Archive https://annas-archive.org/search?q=brookhart+grading ]

Ph. Kohnstamm (1929). School-cijfers en school-geschiktheid. Wolters. Medeelingen van het Nutsseminarium voor Paedagogiek aan de Universiteit van Amsterdam. No. 4. 12 bladzijden. Delpher

Ph. Kohnstamm (1928). De aansluiting M. O. en L. O. en het toelatingsexamen. Wolters. Mededeelingen van het Nutsseminarium voor Paedagogiek aan de Universiteit van Amsterdam. No. 1. Delpher

#zittenblijven #toelating

G. van Veen en Ph. Kohnstamm (1928). De aaneensluiting tusschen lager en middelbaar (gymnasiaal) onderwijs. Wolters. Mededeelingen van het Nutsseminarium voor Paedagogiek aan de Universiteit van Amsterdam. No. 3 Delpher

(1938). Rapport van het Nutsseminarium aan de directeuren-vereniging van hogere burgerscholen in de voormalige 5e Inspectie naar aanleiding van de proef, in september 1936 genoemen met taal- en rekenwerk in den geest van het Rapport-Bolkestein. WoltersMededelingen van het Nutsseminarium voor Paedagogiek aan de Universiteit van Amsterdam. No. 32 Delpher

Susan M. Brookhart and many others (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86, 803-848. pdf

maart 2023 \ contact ben at at at benwilbrink.nl

http://www.benwilbrink.nl/literature/cijfergeven.htm http://goo.gl/ioZlY