Dit is een soort startpagina over het onheil van prestatie-indicatoren, vooral in het onderwijs. Toegangen tot rapporten, kritische analyses, wetenschappelijke achtergronden (of het gebrek daaraan).

Prestatie-indicatoren (indicator systems)

Ben Wilbrink

Ranglijstjes van scholen worden steeds gewoner, maar wie heeft reden om daar terecht blij mee te zijn? In De Volkskrant van maandag 7 februari 2011 een overzicht waar de in ernstige informatienood verkerende ouder zoal terecht kan. De Volkskrant is slordig met het opgeven van URLs, de URLs hierbeneden zijn in ieder geval vandaag 7 februari correct:

De Onderwijsinspectie: http://www.onderwijsinspectie.nl/ voor groene en rode kaarten voor scholen.

Elsevier: http://www.elsevier.nl/web/Nieuws/Nederland/286390/Elsevieronderzoek-de-Beste-Scholen-2011.htm?rss=true

Trouw: http://www.trouwcommunities.nl/onderwijs/schoolprestaties.html

VO-Raad: http://www.schoolvo.nl/

Zie ook de pagina over het idee dat meer concurrentie in het onderwijs weldadig zou werken, en de pagina met literatuur en annotaties op het onderwerp ranglijsten (universiteiten, scholen).

Een fantastisch document voor de kick-off:

Sharon L. Nichols and David C. Berliner (2005). The Inevitable Corruption of Indicators and Educators Through High-Stakes Testing. Education Policy Studies Laboratory, Arizona State University pdf (180 pp.).

contents:
Criticisms of Testing - Corrupting the Indicators and the People in the World Outside of Education - Corrupting the Indicators and the People in Education - Methodology - Administrator and Teacher Cheating - Student Cheating and the Inevitability of Cheating When the Stakes are High - Excluding Students from the Test - Misrepresentation of Dropout Data - Teaching to the Test - Narrowing the Curriculum - Conflicting Accountability Ratings - The Changing Meaning of Proficiency - The Morale of School Personnel - Errors of Scoring and Reporting).
Executive Summary
This research provides lengthy proof of a principle of social science known as Campbell's law: "The more any quantitative social indicator is used for social decisionmaking, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor." Applying this principle, this study finds that the over-reliance on high-stakes testing has serious negative repercussions that are present at every level of the public school system. Standardized-test scores and other variables used for judging the performance of school districts have become corruptible indicators because of the high stakes attached to them. These include future employability of teachers and administrators, bonus pay for school personnel, promotion/non-promotion of a student to a higher grade, achievement/non-achievement of a high school degree, reconstitution of a school, and losses or gains in federal and state funding received by a school or school district.
Evidence of Campbell's law at work was found in hundreds of news stories across America, and almost all were written in the last few years. The stories were gathered using LexisNexis, Inbox Robot, Google News Alerts, The New York Times, and Ed Week Online. In addition to news stories, traditional research studies, and stories told by educators about the effects of high-stakes testing are also part of the data. The data fell into 10 categories. Taken together these data reveal a striking picture of the corrupting effects of high-stakes testing:
- Administrator and Teacher Cheating: In Texas, an administrator gave students who performed poorly on past standardized tests incorrect ID numbers to ensure their scores would not count toward the district average.
- Student Cheating: Nearly half of 2,000 students in an online Gallop poll admitted they have cheated at least once on an exam or test. Some students said they were surprised that the percentage was not higher.
- Exclusion of Low-Performance Students From Testing: In Tampa, a student who had a low GPA and failed portions of the state's standardized exam received a letter from the school encouraging him to drop out even though he was eligible to stay, take more courses to bring up his GPA, and retake the standardized exam.
- Misrepresentation of Student Dropouts: In New York, thousands of students were counseled to leave high school and to try their hand at high school equivalency programs. Students who enrolled in equivalency programs did not count as dropouts and did not have to pass the Regents' exams necessary for a high school diploma.
- Teaching to the Test: Teachers are forced to cut creative elements of their curriculum like art, creative writing, and hands-on activities to prepare students for the standardized tests. In some cases, when standardized tests focus on math and reading skills, teachers abandon traditional subjects like social studies and science to drill students on test-taking skills.
- Narrowing the Curriculum: In Florida, a fourth-grade teacher showed her students how to navigate through a 45-minute essay portion of the state's standardized exam. The lesson was helpful for the test, but detrimental to emerging writers because it diluted their creativity and forced them to write in a rigid format.
- Conflicting Accountability Ratings: In North Carolina, 32 schools rated excellent by the state failed to make federally mandated progress.
- Questions about the Meaning of Proficiency: After raising achievement benchmarks, Maine considered lowering them over concerns that higher standards will hurt the state when it comes to No Child Left Behind.
- Declining Teacher Morale: A South Carolina sixth-grade teacher felt the pressure of standardized tests because she said her career was in the hands of 12-year-old students.
- Score Reporting Errors: Harcourt Educational Measurement was hit with a $1.1 million fine for incorrectly grading 440,000 tests in California, accounting for more than 10 percent of the tests taken in the state that year.
High-stakes tests cannot be trusted - they are corrupted and distorted. To avoid exhaustive investigations into these tests that turn educators into police, this research supports building a new indicator system that is not subject to the distortions of highstakes testing.

J. R. Lockwood, Daniel F. McCaffrey, Laura S. Hamilton, Brian Stecher, Vi-Nhuan Le, and José Felipe Martinez (2007). The Sensitivity of Value-Added Teacher Effect Estimates to Different Mathematics Achievement Measures Journal of Educational Measurement Spring 2007, 4, 47-67

abstract Using longitudinal data from a cohort of middle school students from a large school district, we estimate separate “value-added” teacher effects for two subscales of a mathematics assessment under a variety of statistical models varying in form and degree of control for student background characteristics. We find that the variation in estimated effects resulting from the different mathematics achievement measures is large relative to variation resulting from choices about model specification, and that the variation within teachers across achievement measures is larger than the variation across teachers. These results suggest that conclusions about individual teachers’ performance based on value-added models can be sensitive to the ways in which student achievement is measured.

Randolph Sloof and Mirjam van Praag (2005). Performance measurement, expectancy and agency theory: an experimental study. University of Amsterdam and Inbergen Institute. SCHOLAR project. pdf

Among other topics this article is about the effects of performance indicators that are not well aligned with the institution's proper objectives. [For example: the number of teaching hours in secondary education as performance indicator, instead of the quality of the teaching. b.w.]

Daniel M. Koretz, Daniel F. McCaffrey and laura S. Hamilton (2001). Toward a framework for validating gains under high-stakes conditions. (CSE Tech. Rep. 551). Center for the Study of Education. pdf

Harold Berlak (2001). Yes, President Bush, Johnny's Test Scores May Be Up, But Can He Read? Center for Education Research, Analysis, and Innovation, School of Education, University of Wisconsin-Milwaukee http://www.asu.edu/educ/epsl/EPRU/point_of_view_essays/cerai-01-10.htm [Dead link? May 3, 2009]

"any Americans are rightly worried that their children are not learning the basics needed to thrive in the competitive global economy. President Bush's solution is to raise standards by testing both Johnny and his teachers.
The argument for the policy is simple: Provide tangible rewards to those who succeed, in the form of more money and access to educational and job opportunities; and punish principals, teachers and students for their failures.
Does it work? In Texas the scores are up, and the new President assures us he will bring the Texas miracle to the entire nation.
A closer look at both the size and educational significance of the gains1 in Texas, California, and elsewhere tells a different story. The gains average 5 percentile points. On a fifty-item standardized reading test, this is a gain of 2.5 multiple-choice test questions - paltry considering the many billions spent in direct and indirect costs, and the enormous commitment of school time, energy and resources devoted to coaching students on tests.2"
1. At best the gains are mixed. California reports 4-5 percentile points on the Stanford 9. Texas reports as much as 11 percentile points gain on its own test ( TAAS). A recent Rand report, Improving Student Achievement: What State NAEP Scores Tell us ? (available at http://www.rand.org site page als boek, maar ook per onderdeel gratis als pdf) shows gains of three percentile points or less. On the other hand, the Nation's Report Card compiled by National Center for Educational Study indicates a small but steady decline in NAEP reading scores of high school students. (available at http://www.nces.ed/gov site)
2. Seven years ago, Boston College Researchers Walter Haney, George Madaus, and Robert Lyons estimated indirect costs at 20 billion annually (The Fractured Marketplace for Standardized Testing, Boston: Kluwer, 1993). According to the Bowker Annual, direct expenditures on tests doubled annually between 1980 and 1997 to 200 million dollars. These are low estimates, given the proliferation of tests over the last five years.

Peter Afflerbach (2004). High Stakes Testing and Reading Assessment. National Reading Conference Policy Brief. doc

"... high stakes testing may repress the realization of high quality reading instruction and assessment and it poses a threat to the development of students who are accomplished, lifelong readers"

John H. Bishop (2005). High School Exit Examinations: When Do Learning Effects Generalize? . Cornell, Center for Advanced Human Resource Studies Working paper 05-04 http://www.ilr.cornell.edu/depts/cahrs/downloads/PDFs/WorkingPapers/WP05-04.pdf [Dead link? May 3, 2009]

from the abstract This paper reviews international and domestic evidence on the effects of three types of high school exit exam systems: voluntary curriculum-based external exit exams, universal curriculum-based external exit exam systems and minimum competency tests that must be passed to receive a regular high school diploma. The nations and provinces that use Universal CBEEES (and typically teacher grades as well) to signal student achievement have significantly higher achievement levels and smaller differentials by family background than otherwise comparable jurisdictions that base high stakes decisions on voluntary college admissions tests and/or teacher grades. The introduction of Universal CBEEES in New York and North Carolina during the 1990s was associated with large increases in math achievement on NAEP tests.
(...) The most positive finding about MCTs is that students in MCT states earn significantly more during the first eight years after graduation than comparable students in other states suggesting that MCTs improve employer perceptions of the quality of the recent graduates of local high schools.
Kortom, dit is een heel andere benadering van de high-stakes problematiek. Ik houd het er voorlopig op dat beide juist zijn: de negatieve aspecten juist op institutioneel vlak (met individuele leerlingen als slchtoffers), de positieve aspecten van landelijke eindexamens e.d. op individueel vlak (met de instituties natuurlijk wel als lachende derde)
John H. Bishop (2004). Drinking from the Fountain of Knowledge: Student Incentive to Study and Learn - Externalities, Information Problems and Peer Pressure. Cornell, Center for Advanced Human Resource Studies Working paper 04-15 pdf
John H. Bishop (1999). Are national exit examinations important for educational efficiency? Swedish Economic Policy Review. 6, 349-398. pdf
- from the summary Students in countries with these exams tend to outperform students in other countries in science, math, reading, and geography, when national economic development 1e1-els are accounted for.
  The paper also argues that the elimination of the Swedish exit examination system in the 1970, in combination with changes in the way university applicants were selected, appears to have led to a decline in the number of upper secondary school students taking rigorous courses in mathematics and science.

Stephen W. Raudenbush (2004). Schooling, statistics, and poverty: Can we measure school improvement? The ninth annual William H. Angoff Memorial Lecture was presented at Educational Testing Service, Princeton, New Jersey, on April 1, 2004. pdf

Voor een juiste interpretatie van ranglijsten van scholen en universiteiten is toch wel een eerste voorwaarde een klein beetje inzicht te hebben in de gigantische problemen die opdoemen wanneer iemand wil bewijzen dat school X beter is dan Y, of dat school A het nu beter doet dan vijf jaar geleden. Vandaar dat niemand om Raudenbush heen kan.

ABSTRACT Under No Child Left Behind legislation, schools are held accountable for making 'adequate yearly progress.' Presumably, a school progresses when its impact on students improves. Yet questions about impact are causal questions that are rarely framed explicitly in discussions of accountability. One causal question about school impact is of interest to parents: 'Will my child learn more in School A or School B?' Such questions are different from questions of interest to district administrators: 'Is the instructional program in School A better than that in School B?' Answering these two kinds of questions requires different kinds of evidence. In this paper, I consider these different notions of school impact, the corollary questions about school improvement, and the validity of causal inferences that can be derived from data available to school districts. I compare two competing approaches to measuring school quality and school improvement, the first based on school-mean proficiency, the second based on value added. Analyses of four data sets spanning elementary and high school years show that these two approaches produce pictures of school quality that are, at best, modestly convergent. Measures based on mean proficiency are shown to be scientifically indefensible for high-stakes decisions. In particular, they are biased against high-poverty schools during the elementary and high school years. The value-added approach, while illuminating, suffers inferential problems of its own. I conclude that measures of mean proficiency and value added, while providing potentially useful information to parents and educators, do not reveal direct evidence of the quality of school practice. To understand such quality requires several sources of evidence, with local test results augmented by expert judgment and a coherent national agenda for research and development in education.

Anneke van der Hoeven-van Doornum (2005). Development on scale, instruction at measure. OBIS, a system of value added indicators in primary education. Nijmegen: ITS. ISBN 90-5554-286-5 pdf

"This report deals with the OBIS1, a test on the development and school performances of 4-6-years old children. OBIS was derived from the British PIPS2 Baseline Assessment by translation and careful adaptation when necessary. The research with OBIS consists of two parts. The first part deals with questions concerning the construction of OBIS, and the reliability and validity aspects thereof. In the second part OBIS is applied to answer questions about:
- the impact of testing on pupils progress;
- the prediction of learning achievements using OBIS and background of pupils;
- the use of OBIS to identify pupils that need special education."
Dit is dus een kleutertest, het soort instrument waarover een hoop fuss is geweest in ons landje, als ik het goed heb vooral dankzij socioloog Wim Meijnen. Psychologen zouden voor het aangaan van dit project eieren voor hun geld hebben gekozen, Dolf Kohnstamm volgend. Ik ben dus wel heel benieuwd wat dit project uiteindelijk heeft opgeleverd, en wat Van der Hoeven inschat dat de mogelijke schade van deze testgekte kan zijn voor de jongste generaties van ons land. Zie ook Maria Straathof in gesprek met Anneke van der Hoeven doc, waaruit blijkt dat ze toch met wijd open ogen in dit project is gestapt, vol met goede bedoelingen, maar daar zal ik ook nooit aan twijfelen. En fraude van juf? De test wordt op de computer afgenomen, en daar kan niet mee worden gesjoemeld. Leve de technologie.
De problemen zijn 1) de pretentie van deze 'test' is die van de psychologische test, dat is dus iets anders dan een minder vergaande test die voor onderwijsonderzoek wordt gebruikt, zoals in het PRIMA-onderzoek, waar de tests niet voor individueel gebruik zijn bedoeld; en 2) dat de volgende stap is om een test zoals deze tegaan gebruiken als prestatie-indicator, wat in de VS high-stakes testing heet. Zie andere literatuur op deze pagina voor de huiveringwekkende effecten die dat voor leerlingen kan hebben.
De spagaat is: gaat de politiek, de instelling, zo'n type instrument misbruiken, of is dit type instrument nodig om juf in staat te stellen iedere kleuter afzonderlijk de beste condities voor groei te bieden? Van het tweede geloof ik niet zomaar iets. Zie een betoog van Paul Black 'Raising standards through formative assessment' voor wat er allemaal mogelijk is, maar hij heeft daar geen gestandaardiseerde instrumenten voor nodig: www.gtce.org.uk/shared/contentlibs/93802/93125/pupil_assessment.pdf [niet meer beschikbaar, 2-2008]. Sterker nog: het uitdrukkelijke advies is om bij formatieve toetsen geen cijfers te geven (of de testscore), alleen commentaar waar de leerling mee verder kan.
Deze test kost 20 minuten om te maken, juf neemt hem af. Heerlijk is dat toch. "This appears to provide the teacher with valuable diagnostic information on the child." Daar komt geen psycholoog meer aan te pas, waarom zou je die ook inschakelen?
Waarom noem ik dit onderzoek op een pagina over prestatie-indicatoren? p. 1: "The possible role of early testing, in particular using OBIS, in the allotment of finances to schools was discussed." De upshot van die discussie moet ergens uit het rapport worden gepeurd.
Dit rapport verdient een uitvoerige kritische analyse, om het soort melige commentaar zoals ik hierboven heb gegeven, te logenstraffen. Ik zal die analyse voorlopig nog niet maken, gewoon tijdgebrek.
Toch even snel de tekst doorkijken.
- p. 20: "Test validity focuses on what a test measures, and how well this is done." Dit is een Pipo-de-Clown definitie van validiteit.
- Van der Hoeven maakt gebruik van de AERA/APA/NCME Standards, jazeker, maar dan die van 1985, kennelijk niet op de hoogte van de aanzienlijk uitgebreide 1999-editie. html
- Het vertalen van een test (de PIPS-test) is voor Van der Hoeven niet problematisch. Ook al is het Engelse origineel verbonden met het Engelse curriculum (p. 21). "All its items strongly correlate with future literacy and numeracy." Het moest er nog maar bijkomen dat dat niet het geval was. Maar die correlatie zegt dus geen ene mallemoer. Nog zo'n wonderlijk statement (p. 22): "It should be clear that the PIPS is not an intelligence-test but an assessment that refers to future learning."
- Nog iets over de kwaliteit van de testafname (p. 27): "Teachers often comment that spending 20 minutes with each child helps to build a good relationship and that it is not just the child's reaction to assessment items that matters but the way in which they respond that gives valuable information (Tymms & Merrell, 2004)."
- Ik vind het een probleem dat er geen aandacht is voor de receptie in Engeland van al dat testen at key stage 1, want daar is een hoop gedoe over geweest, en waarschijnlijk nog steeds. De tegenstelling is die tussen wat politici willen (testen), en wat het veld en wat experts voor mogelijk en zinvol houden. Van der Hoeven verwijst niet naar dat debat, ik vind dat zorgelijk.
- In hoofdstuk 2 zou de constructie etc van de OBIS gerapporteerd moeten zijn. De lezer krijgt wel een hoop gegevens naar het hoofd gegooid, maar niet iets dat lijkt op een studie die bijvoorbeeld voor certificering door de COTAN nodig is. Die certificring is kennelijk niet aangevraagd, wat hoop geeft dat deze 'test' niet op de markt komt. Tenslotte, een test zonder COTAN-rating op de markt brengen lijkt me niet slim.
- En dan toch nog, op p. 82, de melding van een tegengeluid, van de Commissie Indicatiestelling Onderwijsachterstanden.
- Leeftijd van deze kleuters doet er evident toe. Ik kom het in het rapport weinig tegen, dat zal Klaas Doornbos ook teleurstellen (hij verdedigde een proefschrift over de effecten van deze leeftijdsverschillen, bv. op zittenblijven). Wat je nu mag vermoeden is het volgende: dat een deel van de mooie correlaties die overwl worden opgevoerd, heeft te maken met - een artefact is van - leeftijdverschillen. Ik ben benieuwd hoe dat precies zit.
- Zorgelijk allemaal. Die discussie over inzet van dit instrument bij het opzetten van lead tables voor scholen ben ik niet tegengekomen, daar zou ik opnieuw nog weer naar moeten zoeken.
- Dolf Kohnstamm heeft onderttussen al weer een kleine tien jaar geleden denk ik, een advies geschreven over het testen van kinderen op deze jonge leeftijd. Het advies: niet doen. Ik vind er bij Van der Hoeven geen woord aan gewijd. Ze kent dat stuk wel (gesprek met Maria Straathof)
- Heb ik wel een behoorlijk idee van wat ontwikkeling van een intelligentie-achtige test inhoudt? Ja. Alleen al de aantallen leerlingen die je daarvoor nodig hebt, de spreiding over stad en platteland, en ga zo maar door, maken het project van het ITS een beetje belachelijk. Er horen ook normtabellen voor te worden ontwikkeld, o.a. voor leeftijdsverschillen: nergens tegengekomen.
- Sorry, Anneke. Ik weet dat er ook brood op de plank moet komen, je hebt in de derde geldstroom projecten niet voor het uitkiezen, en kunt aangenomen werk niet altijd naar eigen hand (professionele normen) zetten. Het is riskant om onmogelijke opdrachten aan te nemen. Niettemin, dit soort tests kan heel ingrijpend zijn in het leven van de geteste kleuters, de verantwoordelijkheid is dus wel groot. Het is, zal ik maar zeggen, niet het dertien-uit-een-dozijn window-dressing onderzoek voor de overheid of andere bobo's die iets hebben te verbergen. Een kritische slotbeschouwing had er toch wel bij opgenomen mogen worden, niet?

R. Bosker, J. Fond Lam, H. Luyten, et al. (1998). Het vergelijken van scholen. Enschede : Universiteit Twente. ISBN 9036512204

"Zeven bijdragen over schoolverbetering en het vastleggen van indicatoren voor onderwijskwaliteit. "Over de nuttigheid van publicatie van schoolgegevens"(de kwaliteitskaarten van de Onderwijsinspectie en de Trouw-schoolprestaties); "Publicatie van schoolgegevens: een internationale vergelijking; "Moeilijke en minder moeilijke examens"; "Rendement" (rendementsmaten en het meten van rendement); "Het vergelijken van scholen" (hoe kunnen de resultaten van scholen op de kengetallen worden beoordeeld); "Kwaliteit als toegevoegde waarde" (bij het bepalen van de toegevoegde waarde van scholen moeten individuele gegevens van leerlingen aanwezig zijn); "Terugblik... en hoe verder?" (de gevaren van het openbaar maken van schoolresultaten en een aantal mogelijke oplossingen)."

Jaap Scheerens (1999). School effectiveness in developed and developing countries; A review of the research evidence. doc

Walt Haney, George Madaus, Lisa Abrams, Anne Wheelock, Jing Miao, and Ilena Gruia (2004). The Education Pipeline in the United States, 1970-2000. Education Pipeline Project, National Board on Educational Testing and Public Policy, Center for the Study of Testing, Evaluation, and Educational Policy, Lynch School of Education, Boston College, Chestnut Hill, MA 02467 pdf

Ik ga hier uitvoerig uit citeren, want
1. het gaat over data over heel Amerika sinds 1970,
2. de conclusies zijn alarmerend,
3. de alarmerende misstanden zijn direct gevolg van pogingen van instituties en verantwoordelijke lieden om de uitslagen van high-stakes testing te manipuleren ten koste van buitengewoon veel leerlingen,
4. het onderzoek is zorgvuldig,
de onderzoekers zetten er hun reputatie mee at stake
"In this report we present results of analyses of data on grade enrollment and graduation over the last several decades both nationally and for all 50 states. The main reasons for these analyses are that state-reported dropout statistics are often unreliable and most states do not regularly report grade retention data, that is data on the rates at which students are held back to repeat grades."
"A second major finding from our cohort progression analyses is that the rate at which students disappear between grades 9 and 10 has tripled over the last 30 years."
"This combination, of increasing attrition of students between grades 9 and 10, and increasingly more students enrolled in grade 9 relative to grade 8, is surely a reflection of the fact that more students nationally were being flunked to repeat grade 9."
"The combination of findings presented in the last two sections should make our fourth finding come as no surprise: high school graduation rates have been falling in the United States in recent years."
"Nonetheless, we argue, as did Leonard Ayres a century ago [1909: 'Laggards in our schools: A study of retardation and elimination in city school systems'], that rates of student progress through elementary and secondary school are one of the best measures of the health of an educational system. While the news from our analysis of the education pipeline is not altogether bleak, Education Pipeline, p. 45. evidence suggests that constrictions in the secondary school pipeline are likely leading to unfortunate negative consequences not just for young people but for society as a whole."
"As stated previously, proving cause and effect regarding historical developments is no easy matter, but what seems clear is that constriction in the education pipeline has been associated with three waves of education reform over the last three decades, namely minimum competency testing, academic standards movement and high stakes testing."
"Standards-based reform refers to a process by which states have been encouraged to develop grade level academic 'standards,' then to develop tests based on those standards, and finally to use results of those tests to make decisions about bother students and schools based on test results." (...) "Though the idea of such a reform strategy is seductively simply, there are a number of things wrong with it. First, even brief reflection ought to make clear that the aims of public education in the U.S. extend far beyond merely academic learning (much less merely raising scores on a small number of tests of academic subjects). Second, to base high school graduation decisions on standardized test results in isolation, irrespective of other evidence about student performance in high school, is contrary to recognized professional standards regarding appropriate use of test results. (See for example the statement of American Educational Research Association, http://www.aera.net/about/policy/stakes.htm [niet meer op de website]). (...) Third, documentation of widespread errors in test scoring, scaling, and reporting in the testing industry should make clear how unwise it is to make important decisions mechanically based on test scores in isolation (Henriques & Steinberg, May 20, 2001 html; Steinberg & Henriques, May 21, 2001 [Rhoades & Madaus, 2003 pdf). Indeed, in Minnesota, one large testing company was forced into a $10 million settlement after it was shown that hundreds of students had been wrongfully denied high school diplomas. (...) Fourth, recent research has demonstrated conclusively that 'low-tech' tests like those being used in all the states (that is, paper-and-pencil tests in which students answer multiple-choice questions or write answers on paper long hand) seriously underestimate the skills of students used to writing with computers (Haney & Russell, 2000 [niet in literatuurlijst te vinden ....]; Russell & Plati, 2001 [Effects of Computer Versus Paper Administration of a State-Mandated Writing Assessment, for sale online]). (...) "Finally, however, it is clear that when the same fallible technology (and all bureaucratic accountability systems and high stakes testing systems are such fallible technologies, Madaus, 1990 ['Testing as a social technology' niet online]), is used to make decisions about children and social institutions, the latter will always be in a better position to protect their interests than the former."
For example, when schools are under intense pressure to increase test score averages, and are not given the resources or tools for doing so in an educationally sound manner, the easiest way to make test pass rates (or score averages) appear to increase in the grade at which high stakes tests are administered is to exclude 'low achieving' students from being tested. One way to exclude them, at least temporarily, is to flunk them to repeat the grade before the grade tested. Another is to push students out of school altogether.
There is ample historical evidence of this phenomenon. In the payment by results era of school accountability in the Britain the latter part of the 19th century, when grants to schools were based on examination results, weaker pupils were often kept back or told to stay away from school on the exam date (Rapple, 1994, pdf). And when Ireland used a primary school leaving exam during the mid-20th century and schools' reputations were strongly influenced by student pass rates, teachers would flunk students at higher rates in the grades before the high stakes exam grade (Madaus & Greaney, 1985 "The Irish experience in competency testing" American Journal of Education, 93, 268-294)." Dan volgt een barrage van berichten uit de pers etc. over misstanden op precies dit punt.
"In closing, we should acknowledge that in recent years there has been considerable debate about the merits of high stakes testing. We do not try to review that debate here (though our position on the matter should by now be clear). Rather what we wish to emphasize is that whatever has been causing the constriction in the high school pipeline - the increasing rate at which students are being flunked to repeat grade 9 and the falling rate at which students are graduating from high school - this development should be viewed as a real national emergency. When students are squeezed out of the high school pipeline and do not even graduate from high school, this has dire consequences not just for these young people but for society as a whole. The reason we say this is that recent research shows that there is an increasingly strong link between people's failing to graduate from high school and their ending up in prison. "
news release

Brendan A. Rapple (1994). Payment by Results: An Example of Assessment in Elementary Education from Nineteenth Century Britain. Education Policy Analysis Archives, vol. 2, pdf

Abstract: Today the public is demanding that it exercise more control over how tax dollars are spent in the educational sphere, with multitudes also canvassing that education become closely aligned to the marketplace's economic forces. In this paper I examine an historical precedent for such demands, i.e. the comprehensive 19th century system of accountability, "Payment by Results," which endured in English and Welsh elementary schools from 1862 until 1897. Particular emphasis is focused on the economic market-driven aspect of the system whereby every pupil was examined annually by an Inspector, the amount of the governmental grant being largely dependent on the answering. I argue that this was a narrow, restrictive system of educational accountability though one totally in keeping with the age's pervasive utilitarian belief in laissez-faire. I conclude by observing that this Victorian system might be suggestive to us today when calls for analogous schemes of educational accountability are shrill.

E. E. White (1888). Examinations and promotions. Education, 8, 519-522. In Madaus and Kellaghan 1992, 125, wordt hierop teruggegrepen:

"By 1888, the superintendent in Cincinnati complained that when these essay tests were used to determine the promotion and classification of children they perverted 'the best efforts of teachers, and narrowed and grooved their instruction; they have occasioned and made well-nigh imperative the use of mechanical and rote methods of teaching; they have occasioned cramming and the most vicious habits of study; they have caused much of the overpressure charged upon schools, some of which is real; they have tempted both teachers and pupils to dishonesty; and last but not least, they have permitted a mechanical method of schools supervision.'"
Volgens de Centrale Catalogus Periodieken van de KB is dit tijdschrift niet in Nederland, wel vanaf 1978 (bijna een eeuw te laat).
G. F. Madaus and T. Kellaghan (1992). Curriculum evaluation and assessment. In P. W. Jackson Handbook of research on curriculum. New York: Macmillan. p. 119-154.

Rankings - Lead Tables - Lijsten

See the rankings page for details on world rankings of universities or national league tables of schools.

site

Amy N. Langville & Carl D. Meyer (2012). Who's #1? The Science of Rating and Ranking. Princeton University Press. site

Chapter One: Introduction to Ranking free pdf
This theorem of Arrow’s and the accompanying dissertation from 1951 were judgedso valuable that in 1972 Ken Arrow was awarded the Nobel Prize in Economics. While Arrow’s four criteria seem obvious or self-evident, his result certainly is not. He provesthat it is impossible for any voting system to satisfy all four common sense criteria simultaneously. Of course, this includes all existing voting systems as well as any clever newsystems that have yet to be proposed. As a result, the Impossibility Theorem forces us tohave realistic expectations about our voting systems, and this includes the ranking systemspresented in this book. Later we also argue that some of Arrow’s requirements are lesspertinent in certain ranking settings and thus, violating an Arrow criterion carries little orno implications in such settings.

H. G. Morrison & P. C. Cowan (1996). The state schools book: A critique of a league table. British Educational Research Journal, 22

241-249. p. 243: The focus of this article is not the performance tables but the use newspapers make of such data in order to compile simplistic league tables with questionable measurement properties.

Universities

abstract

Frans A. van Vught & Frank Ziegele (Eds.) (2012). Multidimensional Ranking. The Design and Development of U-Multirank. Springer. abstract

Ik zit hier helemaal niet op te wachten, maar het is toch wel nuttig om een actueel overzicht van de problematiek bij de hand te hebben. Als het even kan, dan laat ik dit boek gesloten. Al dat gedoe met ranglijsten gaat naar mijn overtuiging ten koste van wetenschappelijk onderwijs en onderzoek, niet ten bate ervan. Het levert werk op voor drommen ambtenaren en ranglijstonderzoekers, die daarmee op hun beurt weer legioenen universiteitsadministrateurs en wetenschappers van hun eigenlijke werk afhouden.

Berlin Principles on Ranking of Higher Education Institutions pdf

Do I endorse them? Don't think so. But then, I do not endorse ranking in whatever form. What is the worth of good intentions?

the mechanics of ranking

Rankings may look rather simple - 1. Harvard University; 2. Cambridge University; etcetera - but the ranking methods, more often than not, are rather complex while the interpretation of the data used is far from evident. Take for example the THES 'The World's top 200 universiteits' 2006 (see reference and annotation below). The ranking is done on an overall score composed from the following subscores

40% peer review score
10% recruiter review
5% international faculty score
5% international students score
20% faculty/student score
20% citations/faculty score

This is a set of indicators that is rather remarkable for the items that are used as indicator, and the items that have not been used. Better scores on the indicators used are for sale in the market place. The one truly important indicator missing, therefore, is fthe inancial assets of the universities.

It is the kind of problem that Alexander Astin has warned for repeatedly, using the hospital analogy. Very rich and highly selective hospitals admitting only patients that are healthy, will score very high on indicators of the kind that THES uses for universities. Therefore, the THES indicators - more about them here below - are highly misleading, because they do not allow any interpretation on which universities are making the best of the resources they have.

Dutch universities in the THES top 200

67 ( 70) Eindhoven University of Technology
69 ( 58) Amsterdam University
86 ( 53) Delft University of Technology
90 ( 138) Leiden University
92 ( 57) Erasmus University Rotterdam
95 (120) Utrecht University
97 (108) Wageningen University
115 (217) University of Twente
137 (177) Nijmegen University
172 (157) Maastricht University 183 (186) Free University of Amsterdam
Groningen and Tilburg did either not make the top 200, or were not in the race.

The THES reports the subscores of every university. If one wants to check whether, for example, the University of Amsterdam has something special, one may do so (a nice peer review score seems to have done the trick). The reported subscores have been scaled bij assigning 100 points to the instution scoring highest.

Highest scoring institutions and their runner up on the indicators used

peer review score: Cambridge 100; Berkeley 92
recruiter review: Harvard 100; MIT 93
international faculty: Macquarie U. (Australia) 100; Otago U. (New Zealand) 94
international students: London School of Economics 100; School of Oriental and African Studies (UK) 74
faculty/student score: Duke University (US) 100; Yale 93
citations/faculty: California Institute of Technology 100; Harvard & Stanford 55
overall: Harvard 100; Cambridge 96.8

The table illustrates the phenomenon already known from the school's sports day, or the Olympic Games; differences between the players in the abolute top are relatively large. The next observation is that on the lesser indicators the best institutions are not generally known to the public. That illustrates the problem of choosing adequate indicators to use in a world wide ranking such as this one.

"There remain issues about the advantages enjoyed by English-language universities and those institutions with a base in science and medicine, but there will be continuing efforts to level the playing field as far as is practicable."

The Times Higher, October 6 2006 World University Rankings Editorial p. 2

The English language itself, indeed, is a positive asset here. The THES editorial is aware of the fact that the playing field is not exactly level, but does not mention the financial assets problem. If the idea here is that it is fair to take only achievements into account, why then is the process variable of the student/staff ratio given some weight? Why not finance also? You see, the point is that Harvard might spend its dollar in reckless ways, while continental European universities really try to get the best value for their euro. It is now well known that Harvard's quality of instruction is way out of line with its prestige (Bok, 2006), to give but one incisive example. Ince himself, in 'How the land of the free charged right to the top' mentions the sobering fact that US universities take only 55 places in the top 200, compared to Europe's universities' 88, an indication that the university dollar is spent less well than the university euro.

The ranking methodology is complex and tricky, and surely will not be identical from year to year. There have been some changes in the 2006 methodology; for example, "a shift from measuring ten years of citations to five." Ponder this: in just about five or ten years the methodology might radically differ from that used in 2006. Are you willing to bet on today's method by letting your investment decisions depend on what their effects might be on these rankings? The effects might materialize only after five or ten years, isn't it?

the peer review (40%)
Martin Ince, the coordinator of the rankings and contributing editor to the THES, describes the procedure for the peer review, and some other methodological issues, in the October 6 edition, partly available here also.
The peers were 3,703 academics, each was "asked to name up to 30 universities they regard as the top institutions in their area." Ince: "This is a robust and simple test, and is almost immune to fraud."

the recruiter review (10%)
The reviews are supposed to be qualitative measures, they determine 50% of the total score (this is not an exact way of expressing what is actually counted, I follow Ince here). The graduate recruiters, a sample of 736, "were asked which universities they like to recruit from."

No, Ince does not disclose exactly how the sample was obtained, what was asked from these recruiters, etcetera.

Neither does he comment on selective admissions. As an Englishman he must be aware of the selectivity factor, yet in the recruiter review he allows the selective universities to cash in on qualities of their alumni that they might have bought (=selected for), not taught (=educated).

the faculty/student score (20%)
The scores in this rubric are remarkable, to say the least. I do not think the student/staff ratio is less reliable than the other indicators, yet the relation to the world rank score seems to be nil. The first place is for (13) Duke, the second for (4=) Yale, the third for (67) Eindhoven University of Technology. Watch who have not made it here in the top twenty: Cambridge is 27th, Oxford 31st, Harvard 37th, Stanford 119, Berkeley 158. This is one more illustration that universities fiercely competing for prestige (see Brewer et al.) tend to let their students pay at least part of the bill.

"We measure teaching by the classic criterion of staff-to-student ratio." Now this is asking for trouble, as Ince is well aware of. Who is a student, who is a teacher? In the medieval universities these were activities, not persons. Is it much different nowadays? How much? Much more problematic are the following Ince disclosures.

INCENTIVES TO LIE

"This [the staff-to-student ratio] is captured by asking universities how many staff and students they have.(...)

We ask universities to count people studying towards degrees or other substantial qualifications, not those taking short courses.(...)

We ask universities to submit a figure based on staff with some regular contractual relationship with the institution."

Every administration will creatively fill out the THES/QS forms asking them for the figures on students and teachers, this much is absolutely certain. If only because they will be convinced other administrations will do so. Ince does not mention any counter-measure, hopefully the THES/QS people have a secret plan to detect fraudulent data.

One of the reasons for trusting the university administration seems to be that there is no universal official statistic that could be used instead. Even so, official statistics are not therefore free of fraud.

For Dutch readers: In Nederland hebben we de amusante episode gehad van de spookstudenten-opwinding, aangeblazen door Wim Kok op basis van onjuiste informatie (een telefoontje als check had dat recht gezet), onderzocht door o.a. ondergetekende html. Spookstudenten bleken niet te bestaan, maar er zijn wel heel wat manieren om dubbel ingeschreven te staan voor van alles en nog wat, en dat schept ruimte voor universitaire rekenmeesters om met aantallen te schuiven. Zoiets gebeurt ook met het ophoesten van gegevens om numerieke rendementen te berekenen (percentages geslaagd voor de propedeuse, etc), zie daarvoor Voorthuis en Wilbrink (1987) bij de evaluatie van de wet tweefasenstructuur html.

the citations/faculty score (20%)
The data used here are from Thomson's Essential Indicators database, see this page. How much: "more than 40.000 papers and more than a million citations each for Texas and Harvard universities, the world's top two generators of scholarship on this measure" over the recent five year period. "To compile our analysis, we divide the number of citations by staff numbers to correct for institution size and to give a measure of how densily packed each university is with the most highly cited and impactful researchers." Ince does not mention whether this number of staff is different from that used in the student/staff ratio, presumably it is the same number.

From the THES analysis: "The reason for Caltech's dominance is clear. It has fewer than 1,000 undergraduates but 1,200 postgraduates and 1,200 academic staff, not including visitors. An they are concentrated in high-impact areas, mainly science and technology, with a growing emphasis on the life sciences." The Dutch high ranking university here, placed 8, is Erasmus University Rotterdam, it "has gained its position by well-cited medical publishing."

Well, having seen the Thomson website, there seems to be a vast industry making a living of the scientific publications in this world. Which Big Brothers are paying for this unproductive industry? Quite amazing, really.

the number of 'overseas' students and staff (together 10%)
Ince does not have much to say on the why and how of this measure. The increasingly international nature of higher education is the reason, a case of mistaken historical identity? The how is not answered at all, probably the institutions themselves fill out the THES/QS questionaires, and that's it.

which institutions, what disciplines?
Ince: "We gather data on universities that teach undergraduates only." ['only' refers to the gathering, not the universities] Wow, that is a true confession. Ince's point is that "this eliminates many high-quality specialist institutions such as Rockefeller University and the University of California, San Francisco, both of which are postgraduate medical institutions."

At this point I do no longer think that I can understand what exactly has been ranked in the THES pages. Probably Ince is ambiguous in his statement above, and the rankings are not restricted to everything undergraduate. The existence of postgraduate institutions within the university poses a serious problem in this ranking business, and taxes the honesty of university administrations in their reporting of the numbers of staff. Also, I do not know whether Thomsom registers exactly how authors are affiliated, I suspect THES/QS might use the Thomson data to decide cases.

These rankings might be somewhat honest as far as broad universities are concerned. Better would be to compare the same departments on a global scale. That is exactly what the THES does by reporting separately the rankings in the arts and humanities, technical science, medicine, and the social sciences.

top 100 biomedicine (THES October 14, 2005)
The only Dutch universities who have made it in this league are 55 (51) Erasmus and 71 (63) Amsterdam University. The bracketed figures are the 2004 rankings. Irregularities in the ranking are produced by some institutions being left out because they do not teach undergraduates, others because they did not publish at least 5000 papers, and Beijing University, placed 8 (11) may have been handicapped by their researcher publishing little in English.

top 100 social sciences (THES October 21, 2005)
Harvard comes first, London School of Economics second. Then 21 (20) Erasmus, 44 (-) Amsterdam University, and 72 (77=) Maastricht. The ranking is based on the peer score only.

top 100 social sciences (THES October 27, 2006)
Oxford now first. 30= Amsterdam University, 37 Erasmus, 76 Utrecht, 82 Maastricht, 84= Leiden. "The London Business School is not listed here because it does not teach undergraduates, but it is a free-standing college of the University of London."

top 100 arts and humanities (THES October 21, 2005)
"Massachusetts Institute of Technology appears in our arts table at number 12, unchanged from last year, presumably on the basis of its small but highly visible work in art and music. It is also seventh in the social sciences. Its best-known social scientist, political activist Noam Chomsky, was at one stage in his career the world's most-cited social scientist."

Dutch: 56= (37) Amsterdam University, 74 (-) Erasmus, and (1) 79 (-) Delft.

top 100 arts and humanities (THES October 27, 2006)
21 MIT; 24 Amsterdam University; 53 Leiden; 63 Utrecht;

top 100 science and engineering (THES October 7, 2005)
I must look it up.

top 200 institutions (THES October 28, 2005)
I missed this one. Can anybody send me the article as published on the THES site?

The THES 2006 ranking

See the rankings page for details on world rankings of universities or national league tables of schools.

gif/presind1.jpg

The figure shows that the position in the peer ranking does not predict that in the citations-ranking, and vice versa, Van Raan, 2005 (vertically: expert scores; horizontally: citation-analysis based scores. All data from the 2004 THES). Van Raan: "This result is sufficient to seriously doubt the value of the THES ranking study." If in doubt abouit this doubt, take the THES data and compute some correlations yourself.

The critical review of the THES ranking I have just given is rather superficial. After all, I am not an expert in ranking universities etcetera. A more incisive critique is to be found in Anthony F. J. van Raan (2005). Challenges in Ranking of Universities. pdf

One of the remarkable facts is that rankings tend to be constructed in very, very different ways. For example, the THES uses peer reviewers assessing contemporary quality of research (in a way that is kept completely intransparant by the THES), while Shanghai ARWU uses the achievements of the past, such as Nobel Prizes, and bibliometric data from 1981 until 1999. The problem resulting from those erratic ranking methods is illustrated in the Van Raan figure reproduced here: the THES and the RAWU ranking have nothing in common, meaning that the position in the THES ranking does not predict the RAWU position, not even a little bit. Except, of course, the absolute top taken by Harvard, Cambridge, etc.

Van Raan is an expert in bibliometrics, and bibliometric data are a major ingredient in any ranking taking itself seriously. Therefore one should take notice of what Van Raan has to say about the (lack of) reliability of blibliometric data. For example, a rather large proportion of references in scientific artcles turn out to be faulty, and therefore will not count as a valid citation. Another major problem is the correct and unambiguous naming of the institution the author is affiliated with. A sensible policy for your institution, therefore, is for its personnel to be abolutely on these points in getting their articles published.

Melville L. McMillan and Wing H. Chan (2006). University Efficiency: A Comparison and Consolidation of Results from Stochastic and Non-stochastic Methods. Education Economics, 14, 1-30.

from the abstract High-efficiency and low-efficiency groups are evidenced but the rank for most universities is not significantly different from that of many others. The results emphasize the need for caution when employing efficiency scores for management and policy purposes, and they recommend looking for confirmation across viable alternatives.
article not for free downloadable, I have not seen it.

Philip Andrew Stevens (2005). A Stochastic Frontier Analysis of English and Welsh Universities. Education Economics, 13, 355-374.

abstract With imperfect markets for the services of the higher education sector, it is important to assess the effectiveness of institutions. Previous studies have analysed the costs of universities but few their efficiency. In this paper, we examine the costs and efficiency of English and Welsh universities as suppliers of teaching and research using the method of stochastic frontier analysis on a panel of 80 institutions over four years. We also investigate the impact of staff and student characteristics on inefficiency.
Regrettably, the ERIC-style abstract does not indicate the results of the study
article not for free downloadable, I have not seen it.

Schools

achter de schermen

"De rector vroeg [Herman Koch] maanden voor de examens of hij thuis wilde blijven, want hij stond er zo slecht voor dat hij het schoolgemiddelde wel naar beneden móest halen."

Robin Gerrits (19 mei 2009). Eindexamens 2009. Herman Koch. Over het examen Nederlands. De Volkskrant, p. 2.

United Kingdom's lead tables

H. G. Morrison & P. C. Cowan (1996). The state schools book: A critique of a league table. British Educational Research Journal, 22

241-249. fc uit het abstract: Structural equation modelling is used to analyse one of Britain’s most widely disseminated league tables, The Sunday Times State Schools Book. It is demonstrated that the table fails to meet any of the technical requirements which would assure its internal construct validity and its technical shortcomings are shown to impact adversely on a large number of schools. Finally, it is argued that modern validity inquiry effectively rules out the creation of ‘valid’ league tales purporting to rank schools according to quality. p. 243: The focus of this article is not the performance tables but the use newspapers make of such data in order to compile simplistic league tables with questionable measurement properties.

Stephen Gorard & Emma Smith (2010). Equity in Education. An International Comparison of Pupil Perspectives. Palgrave Macmillan. sample chapter 1

Chapter 2: Querying the traditional role of schools in attainment. Hierin stevige kritiek op het fenomeen van de lead tables (hitlijsten van scholen, zoals Trouw en Elsevier die publiceren). Gorard is daar, wat de Engelse scene betreft, uitstekend van op de hoogte. Hij wijst ook op het strikte toevalskarakter van de uitkomsten van effectiviteitsonderzoeken, zoals die van Thomas, Peng & Gry, hierbeneden.

Sally Thomas, Wen Jung Peng & John Gry (2007). Modelling patterns of improvement over time: value added trends in English secondary school performance across ten cohorts. Oxford Review of Education, 33, 261-295.abstract [zwakke scholen, zeer zwakke scholen: alleen op basis van random fluctuaties, of is er toch meer aan de hand?]

However, underlying these linear improvement trajectories it appears that only one in 16 schools managed to improve continuously for more than four years at some point over the decade in terms of value added.
Wie zijn statistiek ooit heeft gesnapt, ziet dat dit een perfect toevalsresultaat is: het gemiddelde van alle scholen is het uitgangspunt, dus de toevalskans om in jaar X 'beter' te scoren is 0,5. om dat vervolgens ook in jaar X+1 te doen 0,25, en in vier achtereenvolgende jaen dus 1 op 16. Zie ook Gorrard & Smith (2010, ch. 2).

Gary Eason (January 11, 2007). Tables turned: changes this year. BBC News html

Panic and chaos about English and Math. Turning the tables by now adding a new conception of contextual value or measure of value added by the school. " The top school in the country on this basis is a multi-faith United Learning Trust academy in Liverpool, St Francis of Assisi, with a score of 1078.7 (the measure is based around 1000). On the other key benchmark, the English and maths GCSEs, it scored 17%." " At the other end of the table is Eastbourne Comprehensive in Darlington, Co. Durham, with a CVA measure of 919.2 (and an English-and-maths score of 15%)."

Inspectie van het Onderwijs

De kwaliteitskaart van de Inspectie van het Onderwijs.

De website van de Inspectie is bijzonder zwak in het geven van cruciale informatie, om de boordeling maar even in hun eigen terminologie te gieten. Het is nogal zoeken voordat je erachter bent wat de vier scoremogelijkheden op de kwaliteitskaart betekenen. En dan nog. Ik zal proberen het uit te leggen, een uitleg die begrensd is door wat ik ervan snap. Maar ja, wat wil je, ik heb nog maar bijna een halve eeuw onderwijsonderzoek achter de rug.
Ik neem het Gymnasium Apeldoorn, waarvan op 21 januari 2006 wordt vermeld dat het 666 leerlingen heeft (dat is dus alleen de gymnasium-vestiging, in mijn tijd het Stedelijk Gymnasium geheten). Ik veronderstel dat de kwaliteitskaart berust op de onderzoeken die vermeld zijn, dat zijn er voor deze school vier. De oudste daarvan is vastgesteld op 14 juni 2002. Het is een rapport, en waarachtig, de analogie met leerlingenrapporten gaat helemaal op: de school wordt op de natte vinger beoordeeld, zo'n beetje vergelijkenderwijs dus, zonder referentie aan welke absolute of tenminste objectieve norm dan ook. Niet dat ik dat zielig voor de school vindt, het is tenslotte een broodje van eigen deeg, nietwaar? Maar houd in het achterhoofd dat dit soort subjectieve oordelen een tombola van willekeur vormen.
Voor een reeks kwaliteitskenmerken zijn er telkens een reeks indicatoren. Een 'indicator' wordt in onvoldoende of voldoende mate aangetroffen, of daartussenin: "de indicator is wel aangetroffen, maar niet overtuigend en structureel." Verdomde subjectief dus allemaal, ik zou graag een kritische evaluatie van de Inspectie-werkwijze zien. Afijn, als de indicator-oordelen allemaal zijn gegeven, dan volgt een tweede subjectief oordeel voor het betreffende kwaliteitskenmerk in zijn geheel, in vier mogelijke scores, waarschijnlijk corresponderend aan de vier blokjes op de kwaliteitskaart: overwegend zwak - meer zwak dan sterk - meer sterk dan zwak - overwegend sterk. Hoera. Inderdaad, als je zwakke gegevens hebt, moet je die niet uitdrukken in pseudo-exact cijfer met een of meer decimalen. Maar dit is toch wel een beetje van de malle. Het kantelpunt op die scoreschaaltjes is de 'norm,' zoiets als 'even zwak als sterk' whatever that may mean: wat het rapport erover zegt is onbegrijpelijk, waarschijnlijk is bedoeld met "per les: vier van de zes indicatoren" dat tenminste vier van de zes indicatoren positief moeten zijn. Deze norm is hoogstwaarschijnlijk bureaucratisch objectief: gegeven de scores op de indicatoren, dan kan een klerk vaststellen of de 'norm' gehaald is, of niet. Ah, het gaat om oordelen per les, iets waar een inspecteur bij aanwezig is? (p. 5: "Wij hebben 30 deellessen bijgewoond ...") Ik weet ik niet wat ik me hierbij voor moet stellen, maar zand daarover. Van die scores per les moet voor de school nog een score worden gebrouwen, en ook daarvoor is een bureaucratische regel geformuleerd voor de norm (dat kantelpunt, tussen meer zwak dan sterk en meer sterk dan zwak). De laatste bureaucratische regel bevat meestal toch ook nog een subjectieve component, zoals 'plus een voldoende afwisseling van werkvormen (indicator 5.7)' waarvan ik niet weet hoe een inspecteur dat dan vaststelt gegeven de scores van de afzonderlijke lessen, mogelijk is bedoeld dat tenminste 75% van de lessen op die indicator positief moet scoren.
Een bijzonder probleem is er bij de opbrengsten zoals in het uitgebreide rapport in tabel 1 vermeld. Daar is in de normering sprake van 'gemiddeld' of 'hoger'. Wat is dat 'gemiddelde'???????? Het wordt in ieder geval niet nadrukkelijk in het rapport vermeld, althans ik zie het niet. Ik heb ook geen flauw idee of dat gemiddelde iets is van de school zelf, of van opleidingen binnen diezelfde school, of van vergelijkbare opleidingen in alle scholen landelijk, or what????

Inspectie van het Onderwijs (2003). Toezichtkader Voortgezet Onderwijs. Inhoud en werkwijze van het inspectietoezicht conform de WOT. www.kwaliteitskaart.nl/Documents/pdf/Brochure_toezichtkader_VO [niet meer beschikbaar 2-2008]

Kervezee: "Met deze brochure geven wij duidelijkheid over wat wij doen en hoe wij dat doen. Wij willen transparant zijn en aanspreekbaar op wat wij doen." Dat is prachtig, dan staan hier zeker alle antwoorden in op de vragen die de kwaliteitskaart en zijn achterliggende rapportages oproepen.
Nee, dus. Er staat nauwelijks iets meer informatie in dan in het schoolrapport zelf.
Dat betekent dus dat scholen niet goed kunnen weten hoe het mogelijk is om te werken aan het verkrijgen van betere oordelen van de Inspectie. Natuurlijk de Inspectie zal dat zelf in bepaalde gevallen wel aangeven, maar zij geven er geen garanties bij. Met andere woorden: een andere inspectie, een andere inpecteur, en het spel begint van voren af aan.
In het bijzonder mis ik in dit rapport wat de Inspectie doet om te borgen dat d eeigen inspecteurs hetzelfde kader op dezelfde manier hanteren. Neem bijvoorbeeld de Angelsaksische gewoonte om examenbeoordelaars uitvoerig te trainen voor het beoordelen en waarderen van examenwerk; iets dergelijks mogen we toch van de Inspectie ook verwachten waar het om zo iets ingrijpends gaat als het beoordelen van 'de' kwaliteit van een opleiding?
Nee, ik ben hier niet blij mee. Hierbeneden wordt zichtbaar tot welke brokkenmakerij dit alle kan leiden: de media zijn behoorlijk geil op de gegevens die de Inspectie opnebaar maakt, en vertalen dat onmiddellijk in hitlijsten. En die hebben gevolgen voor de marktpositie van althans die scholen die directe concurrenten in hun geografische omgeving hebben.

Het CPB blaast ook een deun mee, nog meer verwarring.

De volgende publicaties van het Centraal Planbureau zijn het onderwerp in deze Bon blog 6874 11 mei 2010. Het springende punt is dat het CPB op basis van statistische analyse meent dat scholen met succes beleid hebben gezet op hun relatief niet zo mooie positie in de trouw hitlijst. Ik heb die kritiek toegespitst op het buiten beschouwing laten van regressie naar het midden. In een kortere publicatie in ESB merken de auteurs daar wel iets over op, maar dat neemt het bezwaar niet weg. Het is overigens niet een discussie over statistische technieken. Het wezenlijke punt is dat eigenlijk nooit consistent is aangetoond dat er systematische effectiviteitsverschillen tussen scholen zijn, rekening houdend met verschillen in de instroom van leerlingen bijvooreeld. Godad & Smith (2010) gaan er in hun hoofdstuk 2 in samenvattende zin nog even diep op in. Als verschillen tussen scholen strikt toevallig zijn, dan mag je over een korte reeks van vier jaren verwachten dat 1 op de 16 scholen er telkens positief uitkomt. Dat is precies wat Thomas, Peng & Gry (2007) vinden (zie hierboven, eveneens voor het boek van Gorard & Smith).

Pierre Koning en Karen van der Wiel (2010). Ranking the schools. How quality information affects school choice in the Netherlands. CPB Discussion Paper 150. PDF

Pierre Koning en Karen van der Wiel (2010). School responsiveness to quality rankings. An empirical analysis of secondary education in the Netherlands. CPB Discussion Paper 149. PDF

Pierre Koning en Karen van der Wiel (2010). Kwaliteitsinformatie middelbare scholen maakt verschil. Economisch Statistische Berichten, 95, #4585, 14 mei, 294-297.

Elsevier/Inspectie-lijsten 2009

Januari 2009 publiceert Elsevier lijsten met gegevens van bijna alle middelbare scholen in Nederland, gebaseerd op de gegevens van de Inspectie Onderwijs, met een eigen bewerking en oordeel. Dat heet dan de beste scholen, en dat geeft meteen het probleem aan: deze vlag dekt de lading niet. Elsevier legt uit hoe ze op basis van de Inspectiegegevens een eigen oordeel vormen, dat ze harder noemen. Waarom er een harder oordeel van kwaliteiten van scholen nodig is? Dat bepalen de plaatsvervangende goden in de burelen van Elsevier. In ieder geval: per school staan er een aantal interessante kengetallen op een rijtje. Is dat goed, of is dat niet goed? Beide. Getallen op zich zeggen weinig, daar is interpretatie bij nodig. De Inspectie doet dat ook wel zo goed mogelijk, in hoor en wederhoor met de betrokken scholen. Elsevier doet dat niet.

Ik geef een voorbeeld. Elsevier maakt veel werk van gemiddelde cijfers van schoolonderzoeken versus centraal schriftelijke eindexamens (zie ook mijn analyse van standpunten van Jaap Dronkers hierover hier). Er is geen nuance in te bekennen. Wel de opmerking dat een hoger percentage APC-leerlingen (leerlingen die wonen op een postcode in achterstandsgebieden) samen lijkt te gaan met relatief hogere schoolonderzoekcijfers t.o.v. die van het schriftelijk examen, waarbij ze suggereren dat scholen deze leerlingen dus matsen, waardoor ze na het zo behaalde eindexamen maatschappelijk minder zullen presteren dan anderen. Dit is een voorbeeld van onhandig omgaan met kengetallen van scholen. Mij zeggen die kengetallen op zich weinig tot niets. Het is hetzelfde probleem dat ik heb met de vraag van een student: er zijn bij mijn laatste tentamen wel 70% onvoldoendes uitgedeeld, dat moet toch niet kunnen? Hoezo? Dat kan ik op afstand niet beoordelen. Misschien hebben studenten zich helemaal niet voorbereid, en verdienen ook die 30% anderen geen voldoende. Begint u te begrijpen wat het probleem is? Nog een voorbeeld. Een school krijgt het predicaat excellent als de school 'betere' kengetallen heeft dan 90% van de scholen. Wat gebeurt: gymnasia zitten gewoon in de pool van VWO-scholen, en worden dus overladen met excellentie (dat geeft overigens wel aan dat die kengetallen iets met kwaliteit hebben te maken, in dit geval waarschijnlijk niet van de gymnasia als scholen, maar van de scherpte van de toelatingsselectie tot categoriale gymnasia in vergelijking tot andere VWO-scholen). Iedere brugpieper kan toch begrijpen dat dit geen eerlijke vergelijking van VWO-scholen met categoriale gymnasia is. Genoeg over dit leed dat de beste scholen heet. Zie het zelf:

Arthur van Leeuwen en Ruud Deijkers (8 januari 2009). Beste scholen 2009: opnieuw te hoge cijfers bij examens. Elsevier. html. Op deze webpagina ook links naar een aantal pdf-documenten van Elsevier, o.a. een overzicht van scholen met de grootste en de kleinste verschillen tussen eindexamencijfers (schoolonderzoek en centraal schriftelijk) pdf.

Trouw-lijsten

December 2008: schoolprestaties 2008 zijn doorzoekbaar op de website van Trouw. Aleid Truijens (de Volkskrant 23 december 2008) geeft een inhoudelijke toelichting: de resultaten moeten wel goed worden gelezen, bijvoorbeeld in relatie tot de adviezen waarmee de leerlingen de school zijn binnengekomen: een school die middelmatig presteert, maar waar een groot deel van de leerlingen met havo-advies op vwo-niveau eindexamen hebben gedaan, doet het natuurlijk fantastisch!

In beginsel zijn voor ieder jaar sinds 1996 (gepubliceerd in 1997) de lijsten beschikbaar op de site van Trouw. Het schijnt dat in de praktijk een en ander niet goed werkt. [januari 2007]

Een eeuw geleden maakte Trouw een enorme klapper door via de rechter het vrijgeven van kerngegevens van scholen af te dwingen. Dat waren de gegevens over 1996, zeg maar over het in dat jaar afgesloten schooljaar 1995-1996. Jaap Dronkers deed onmiddellijk een geslaagde poging het journalistieke enthousiasme in minder riskante banen te leiden, door voor de scholen ook een indicator te berekenen die aangeeft wat de school zelf heeft toegevoegd. Toegevoegde waarde is werkelijk een wereld apart van de eindexamenresultaten en andere resultaten zoals die uit eenvoudige schooladministraties zijn samen te stellen. Daar zijn ze in Engeland langzamerhand ook achter gekomen (zie hierboven), in Nederland is dat inzicht in 1997 meteen ingebracht, en ik vermoed in later jaren weer goeddeels weggezakt uit de aandacht. Het is dus bepaald niet eenvoudig om ranglijsten samen te stellen die eerlijk zijn in alle belangrijke opzichten: naar ouders, maar ook naar leraren, en naar de samenleving. Misschien is het in beginsel zelfs een onderneming waar je niet aan zou moeten willen beginnen, maar dat is napraten. Feit is dat er in het onderwijs altijd buitengewoon afhoudend is gereageerd op pogingen om prestaties van de school naar buiten te brengen, ook al zou dat 'buiten' beperkt blijven tot medezeggenschapsraden en ouders. Mijn persoonlijke ervaring, waar een bescheiden maar doordacht voorstel html door de georganiseerde leraren (als onderscheiden van diezelfde leraren in persoonlijk gesprek) werd weggeblazen, enkele jaren voorafgaand aan de actie van Trouw, zal door vele ouders in vele MRen zijn gedeeld.

Jeugd 2006 Trouw 25 februari 2006.

Na de schoolprestaties (zie beneden) is het nu de beurt aan het jeugdbeleid van 467 gemeenten. Trouw publiceert een ranglijst van van ongeveer alle gemeenten in Nederland, over hoe het er met hun jeugd voorstaat. Het Hilda Verwey-Jonker Instituut heeft de gegevens in deze vorm aangeleverd in Kinderen in tel, ik kan de krant niet anders lezen. Het rapport zelf wordt dinsdag a.s. (28 februari) gepresenteerd, ik ben benieuwd of dat nog nuanceringen oplevert. Het opstellen van een rangorde is een ongehoorde actie van een instituut dat zich wetenschappelijk noemt. De goedbedoelende opdrachtgevers zijn de Stichting kinderpostzegels, Jantje Beton, Unicef, en Defence for Children. Hoe goed bedoeld ook, publicatie van ranglijsten die werkelijk geen enkel verband hebben met het feitelijke beleid zoals dat door gemeenten wordt gevoerd, doet vooral kwaad. Betrokkenen fluiten in het donker, roepen dat het goed is dat er nu meer belangstelling voor dat beleid komt. Het doel heiligt weer eens de middelen. Ik zal dit uitleggen.
Het verdriet van Nederland wordt in beeld gebracht. Het is een aloude valkuil waar meer sociaal-wetenschappelijke onderzoekers in het verleden in zijn gevallen: het Hilda Verwey-Jonker Instituut heeft een serie indicatoren gebruikt die eenzijdig de problematische kanten van jeugd in getallen giet: kindersterfte, schoolverzuim, jeugdcriminaliteit. En bovendien het percentage uitkeringsgezinnen, waarvan niet duidelijk is of dit als indicator op een lijn met kindersterfte etc wordt meegenomen, of dat het als een soort correctiefactor is bedoeld. Dat er ook nog iets moois uit jeugd voort kan komen, daar is bij het Hilda Verwey-Jonker Instituut, verslag in Trouw lezend, niets van bekend.
Het idee om de stand van de jeugd - op gemeentelijk niveau - in kaart te brengen is natuurlijk prima. Dat sluit aan op een trend die zich de laatste jaren aan het aftekenen is dat er een versterkt jeugdbeleid moet komen, met een eigen bewindspersoon voor jeugd in de volgende kabinetsperiode. Maar om dat op deze manier te doen tast de geloofwaardigheid van de sociale wetenschappen in dit land toch wel aan. Want de blunder van het onderzoek van het Hilda Verwey-Jonker Instituut is dat in de ranglijst geen rekening wordt gehouden met de ongelooflijk uiteenlopende situaties in gemeenten als Amsterdam en Rotterdam - aanvoerders van de treurlijst - aan de ene kant, en Naarden, Rozendaal en Oegstgeest - de hoogst scorende gemeenten - aan de andere. Dat de redactie van Trouw zich voor deze blunder leent is opvallend, want het was toch juist de verdienste van Trouw bij de publicatie van prestaties van scholen in 1997, daarbij rekening te houden met de grote verschillen in de aard van de instroom van hun leerlingen. Dat is de verdienste van Jaap Dronkers geweest.
Deze ranglijst is een blamage. Dat mag alleen al duidelijk zijn uit de kop op de voorpagina van Trouw Kind is slechtst af in Rotterdam. Dat is smaad. Kinderen horen en lezen dit trouwens ook, lieve opdrachtgevers! Wetenschappers die ranglijsten opstellen, moeten niet vreemd opkijken wanneer krantenredacties dit soort koppen erbij verzinnen.

Schoolprestaties 2005 Trouw bijlage 15 december 2005. Let op: de prestaties betreffen het schooljaar 2003-2004. Voor het voorpagina-artikel Onderwijsprestaties / Onderwijs presteert matig. Tien procent onvoldoende; vooral havo-afdelingen zwak, door Ingrid Weel, zie de Trouw website, voor de toelichting op de cijfers zie deze pagina van Trouw. De bijlage is in afzonderlijke pdf-documenten te downloaden van de Trouw site, waarvoor hulde aan Trouw.

Wat mij verbaast aan dit voorpagina-verhaal is het sensationele taalgebruik: tien procent scoort een ONVOLDOENDE. Ik weet niet wat dat is, ik zal daar de hele krant van vandaag nog op spellen.
Het is ALTIJD het geval dat 10% van de scholen, atleten, garnalen het slechtst zijn, ook over vijf jaar bekeken.
Sterker: dat is ook het geval wanneer we stiekem zouden weten dat de kwaliteit van alle scholen gelijk is: door allerlei toevalligheden scoren ze toch verschillend, en zijn er 10% het haasje. Dat staat met zoveel woorden ook in het artikel: van de 'onvoldoende' vwo's in 2000 waren er in 2001 nog maar 35% onvoldoende. Dat is een statistische wetmatigheid, geen onderwijskundige, het effect is te sterker naarmate er meer toevalligheden in die scores meedoen.
Prachtig materiaal dus voor een goede vraag in de Wetenschapsquiz 2006. Even bijlessen voor journalisten, even op adem komen voor te schande gemaakte scholen en iedereen die daarin haar beste best doet.

De bijlage begint met een uitleg van het onderzoek, ook door Ingrid Weel, en ik word daar ondanks, of eigelijk juist door de mooie grafiekjes niet wijzer van. Het gekke van die grafiekjes is namelijk dat 'gemiddeld' niet gemiddeld is, ra ra hoe kan dat. Het vwo scoort in 2004 voor 70% BOVENgemiddeld. Het havo 55%, en mavo/vmbo-gt 60%. Daar komt nog bij dat de gemiddelde groepen resp. 20%, 35% en 30% in omvang zijn, ongeveer de helft daarvan ligt ook nog boven het rekenkundig gemiddelde, toch? Laten we wel wezen, dat hebben we in de wiskunde met elkaar afgesproken. Deze eindcijfers zijn voor mij dus abacadabra, wat evenals de wiskunde uit de arabische middeleeuwen komt. Snel naar de Toelichting Schoolprestaties 2005. Daaruit begrijp ik dat het inderdaad de bedoeling is dat het oordeel 'gemiddeld' spoort met de landelijke gemiddelde prestatie voor dat schooltype, etcetera. De formules voor het omzetten van prestaties naar deze categoriescores kloppen gewoon niet met deze bedoeling. Bijvoorbeeld: het landelijke gemiddelde eindexamencijfer voor het havo is 6,3, terwijl Trouw zegt alle scholen met gemiddelde kleiner dan 6,3 als 'slecht' te beoordelen. Hier gebeuren vreemde dingen, met de zetter, de schrijver, de data?

Met de hogere Trouw-wiskunde is er een scherpe trend omhoog in de schoolprestaties, vandaar het tweede artikel, ook van Ingrid Weel: Scholen doen het steeds beter. Stijgende prestaties. Prachtig, niet? Even een telefoontje naar Meijnen en Dronkers, en het artikel loopt meteen vol met speculaties waar geen mens iets aan heeft. Niemand kan dus iets zinnigs zeggen over deze Trouw-gegevens. De volgende tekst heb ik naar Trouw gestuurd, en is in de discussielijst geplaatst.

Kan Trouw uitleggen hoe het kan dat in 2004 70% van de vwo-scholen bovengemiddeld scoort (en bovendien nog 20% gemiddeld)? (pagina 4 figuur 1). Dit is hogere statistiek, waarin gehakt wordt gemaakt van de afspraak dat 'gemiddeld' ergens in de buurt van 50% moet liggen: Trouw komt uit op 70 + ongeveer de helft van 20 = 80%. Het probleem doet zich over de hele lijn van de Trouw-beoordeling voor, bv. p.5 waar alle lijnen in de figuur BOVEN het gemiddelde van 3.0 liggen Trouw suggereert voortdurend dat de beoordelingen een absoluut karakter hebben, anders zouden de scholen het gemiddeld niet 'steeds beter' kunnen doen, maar uit de techniek van berekenen kan ik niet anders concluderen dan dat alles relatief is, met 'het gemiddelde' van het jaar als benchmark. Wat is waarheid? Of worden de groepsgrenzen van 2003/4 gewoon op voorgaande jaren toegepast? Of juist die van 1998/99 op de volgende jaren? En waarom zou je dat mogen doen? Help ons uit de brand.

Overigens was ook de redactie van Netwerk niet alert op al deze hocus pocus, zij ging gretig in op de suggestie dat relatieve prestaties absolute betekenis zouden hebben.

Ingrid Weel, onderwijsredactie Trouw, laat me weten dat inderdaad de gemiddelden berekend voor schooljaar 1997/8 nog steeds als referentie worden gebruikt. Waar dus Trouw in 2005 spreekt over 'het gemiddelde' dan is dat 'het gemiddelde 1997/8'. Dat verklaart hoe het kan dat in latere jaren alle schooltypen gemiddeld bovengemiddeld scoren: gemiddeld 2003/4 boven gemiddeld 1997/8 gemiddeld.
Ingrid: "In schooljaar 97/98 gaat gemiddeld 82% van de mavo'ers, 53% van de havisten en 58% van de vwo'ers onvertraagd van klas 3 naar hun diploma. In 2003/2004 (meest recente) is dat gestegen tot 86% op vmbo-gt, 63% op havo, 67% op vwo. Oftewel het gemiddelde van '98 is niet meer het gemiddelde van 2004, maar we gebruiken (zoals ik ook in mijn vorige mail schreef) nog wel dezelfde normering."

Trouw is zijn lezers dan ook een verklaring schuldig over die aflopende percentages zittenblijvers, die geven een kunstmatige indruk dat het met het 'rendement' van de scholen over de jaren heen steeds beter gaat. De scholen 'kopen' die 'verbetering' door meer leerlingen te laten afstromen naar ander lagere onderwijstypen. Ik heb het donkere vermoeden dat dat AFSTROMEN minder adequaat in de berekening van de Trouw-rendementen zit.

Nog donkerder: de scholen voeren hierin mogelijk een beleid dat rekening houdt met de Trouw-publicaties, waarin ze beter (minder slecht) voor de dag willen komen. Bovenaan deze webpagina staat een verdomd goed en indringend overzicht van dit soort effecten zoals toponderzoekers dat in de VS waarnemen:
Sharon L. Nichols and David C. Berliner (2005). The Inevitable Corruption of Indicators and Educators Through High-Stakes Testing. Education Policy Studies. pdf

Trouw, Sjoerd Karsten (in Trouw 15-12-2005), en waarschijnlijk ook de Onderwijsinspectie (zaterdag 17-12-2005 in Trouw te lezen) denken nu, gestuurd door de conclusies van Trouw over de positie van scholen over een periode van vijf jaar heen, dat 10% van de scholen STRUCTUREEL onvoldoende scoort. Daar hangt een wolk van negatieve associaties omheen: dat het onderwijs van die scholen kwalitatief onder de maat is - wat op geen enkele manier is aangetoond - en dat zij er niet in slagen dat te verbeteren - wat ook niet is aangetoond.
Er is bovendien geen benchmark om de vijf-jaargegevens tegen te vergelijken. Een kunstmatige benchmark is misschien wel te construeren: wat zou er gebeuren onder de veronderstelling dat waargenomen verschillen tussen scholen berusten op invloeden die je als 'toevallig' mag beschouwen? Dat valt waarschijnlijk met een computermodel te simuleren. Dat levert een aanwijzing op voor het aantal scholen dat over een periode van vijf jaar onderaan blijft bungelen LOUTER door toevallige omstandigheden.
Dat aantal zal best minder zijn dan Trouw's 10%, maar ik zie niet dat Trouw een goede poging heeft gedaan om uit die 10% een steekproef te nemen van enkele scholen, en hun 'rendementsscores' te vergelijken met het meer gewogen oordeel van de onderwijsinspectie.

Als er sprake is van beleid bij scholen om op het 'rendement' van Trouw beter te scoren, dan staat er een straf op beleid van die scholen die gewoon aan de kwaliteit van hun onderwijs werken, en zich niet door Trouw-rendementen van de wijs laten brengen. Het zou best eens kunnen dat kleine verschillen in beleid dan al snel leiden tot schijnbaar 'achterblijven' van een deel van de scholen.

10% structureel onvoldoende?
De claim van Trouw dat 10% van de scholen structureel 'onvoldoende' scoort is zorgelijk. Voorzover mij bekend is er geen wetenschappelijk onderzoek van de laatste decennia dat erop wijst dat deze claim juist zou kunnen zijn. Integendeel, alle pogingen aan te tonen dat er structurele (over de jaren heen consistente) verschillen in prestaties bestaan tussen scholen zijn 'mislukt,' d.w.z. het is niet gelukt aan de hand van de data dergelijke verschillen aan te tonen. Maar goed, dat is oude kost (Over schooleffecten zie effectiviteit.htm), de onderzoekers van het VOCL-cohort zouden in Trouw eens een boekje open kunnen doen over hun resultaten in vergelijking tot de claims van Trouw.

Wat mij voor ogen staat is een andere benadering om de claim van Trouw op de korrel te nemen. Het moet mogelijk zijn een eenvoudig model op te stellen, en simulerenderwijs na te gaan hoe groot werkelijke kwaliteitsverschillen tussen scholen moeten zijn om te resulteren in 10% van de scholen die over vijf jaar 'onvoldoende' scoren. Daarbij zijn een paar dingen, die Trouw over het hoofd ziet, van belang:

1) Aangenomen mag worden dat er kleine systematische verschillen tussen scholen zijn op de Trouw-indicatoren, verschillen die zijn terug te voeren tot verschillen in instroom etc. en die daarom GEEN kwaliteitsverschillen betreffen

2) Aangenomen mag worden dat er geen perfecte match is tussen kwaliteitsverschillen en verschillen in prestaties op de Trouw-indicatoren: de indicatoren dekken kwaliteit niet perfect. In het bijzonder betekent dat dat er verschillen op de prestatie-indicatoren kunnen zijn, ook al zijn er GEEN kwaliteitsverschillen, en het omgekeerde is eveneens mogelijk. Die afwijkingen in kwaliteit van de Trouw-indicatoren kunnen zeer wel 'structureel' zijn, i.e. consistent over meerdere jaren heen.

3) Meer in het bijzonder zal het zo zijn dat er verschillen tussen scholen zijn in de mate waarin zij hun beleid afstemmen op wat de buitenwereld aan indicatoren over hun kwaliteit gebruikt. Ik wil niet meteen zo ver gaan te stellen dat scholen die hun oren laten hangen naar wat de buitenwereld aan 'bewijs' over hun kwaliteit vraagt daarmee een falend kwaliteitsbeleid hebben, maar het is wel een risico dat op de loer ligt. In plaats daarvan mogen we wel veronderstellen dat kwalitatief gelijke scholen op deze manier in de prestatie-indicatoren consistent verschillend kunnen zijn. Dit punt gaat dus een heel eind verder dan punt 2).

4) Het liefst zou ik dan een proces willen hebben waardoor scores op de prestatie-indicatoren tot stand komen. In het geval van toetsscores is zo'n proces te postuleren, dus in beginsel zou het voor klassen, jaargroepen, en scholen ook moeten kunnen.Misschien wordt dat te ingewikkeld, en dan zou het proces benaderd kunnen worden door een bepaalde spreiding over indicatoren te postuleren, gebaseerd op de in de loop der jaren verkregen empirische data. Ik heb zo'n aanpak eerder (1980) wel uitgevoerd om de effecten van een ontwerp van wet van Pais voor toelating tot numerus fixus studies door te rekenen. Met spectaculaire resultaten: dat jongens die hun militaire dienst nog niet hebben gedaan en nog een enkel ander gebrek missen, nauwelijks toelatingskansen bleken te hebben. html

5) Veronderstel dat de indicatoren worden berekend op de typische Trouw-manier, want het gaat om de claim van Trouw, dus dan zijn we ook gebonden aan die methode (zij het niet kritiekloos).

6) Veronderstel dan eens dat er geen consistente verschillen zijn tussen de scholen, maar dat door de beperkingen van de processen en de hierboven genoemde punten 1, 2 en 3 er ieder jaar een behoorlijke spreiding tussen scholen is. Dan moet het mogelijk zijn een random proces voor ieder jaar te gebruiken, aannemend dat ieder jaar de kaarten echt helemaal opnieuw worden geschud. En dan maar kijken hoeveel scholen over vijf jaar, onder deze condities, drie keer 'onvoldoende' scoren, or whatever.

7) Doe hetzelfde, aannemend dat er in bepaalde mate werkelijk consisten kwaliteitsverschillen tussen scholen zijn. Tel wederom die 'onvoldoendes.'

8) Kijk ook naar de mogelijkheid van een tussenpositie waarin er van jaar tot jaar een bepaalde consistentie is, dus niet ieder jaar de kaarten helemaal opnieuw worden geschud, maar over meer dan een jaar gezien er geen consistentie meer is.

9) Al naar gelang de resultaten van deze oefeningen gaan we verder met het onderzoek. Als er in de Trouw-data inderdaad vijfjaars-consistentie zou zijn, gaat het dan om een effect dat enige realistische maatschappelijke betekenis heeft? NB: voor de ouder die een school moet kiezen, zijn zelfs in Trouw-termen bestaande verschillen hoogstwaarschijnlijk volslagen irrelevant, maar ook deze claim vraagt onderbouwing. In het algemeen is het echter zo dat schooleffecten, als ze bestaan, aggregatie-verschijnselen zijn, waarvan je op klassikaal niveau, en zeker op leerlingniveau, geen donder terugvindt.

Wie ideeën of gegevens wil aandragen: doe dat. Dit project is niet in een paar dagen uit te werken, helaas, anders zou ik dat doen.

April 2006: De Inspectie publiceert een lijst 'zeer zwakke' scholen: 63 basisscholen, 1 speciale basisschool, 10 speciale voortgezette opleidingen en 8 scholen voor middelbaar onderwijs (De Volkskrant). Per 1 mei op internet: www.onderwijsinspectie.nl/watdoenwij/zzs [niet meer beschikbaar, 2-2008]. Een nieuw fenomeen, met als achilleshiel waarschijnlijk dat een en ander vooral op (bureaucratische) procedures is gebaseerd. Aanwijzing daarvoor is het grote aantal Vrije Scholen, typisch een soort onderwijs waar de papieren werkelijkheid van 'handelingsplannen, leervorderingsprotocollen' moeite heeft binnen te dringen (houden zo, Vrije Scholen! En gooi die Steiner eruit!). Onderstaande uitleg van de Inspectie geeft immers aan dat dat wat de school toevoegt aan de 'waarde' van de leerlingen er geen rol in speelt, wat nogal wiedes is, want die toegevoegde waarde is pas op langere termijn en helemaal niet eenvoudig vast te stellen. Moeten we blij zijn met de schandpaal van de Inspectie? Laat ik zeggen, in het kader van de bovenstaande commentaar op de Trouw-publicaties, dat het duidelijk maakt dat problematisch functionerende scholen niet simpel de onderste 10% zijn, maar de 1 tot 2 procent pathologie die in deze complexe wereld min of meer onvermijdelijk is. Net als bij rampen en grote ongelukken, loont het natuurlijk in individuele gevallen uit te zoeken wat er aan de hand is, daar niet van.

"Een zeer zwakke school is een school die onvoldoende onderwijsresultaten (eindopbrengsten) realiseert en die daarnaast op cruciale onderdelen van het onderwijsleerproces onvoldoende kwaliteit laat zien.
Een school wordt aan dit overzicht toegevoegd wanneer de inspectie de school als zeer zwak heeft beoordeeld, het rapport van het periodiek kwaliteitsonderzoek (PKO) vastgesteld is en op internet is geplaatst.
Een school wordt weer van de lijst verwijderd wanneer de inspectie na een traject van geïntensiveerd toezicht bij het afsluitende onderzoek naar de kwaliteitsverbetering (OKV) heeft vastgesteld dat de school zich voldoende heeft verbeterd, het rapport van dit onderzoek is vastgesteld en op internet is geplaatst."

D.R. Veenstra, A.B. Dijkstra, J.L. Peschar en T.A.B. Snijders (1998). Discussie. Scholen op rapport. Een reactie op het Trouw-onderzoek naar schoolprestaties. pdf

Jaap Dronkers (1999). Veranderden leerlingaantallen in het voortgezet onderwijs in het schooljaar 1998-1999 door de publicatie van inspectiegegevens en de berekening van het schoolcijfer door Trouw in oktober 1997? Een nadere analyse. Tijdschrift voor Onderwijsresearch,24, 63-66.
- Dit korte artikel is niet digitaal beschikbaar. Dronkers laat hier zien dat er weliswaar kleine verschillen zijn ['significant' noemt hij ze, maar het gaat niet om een steekproef van scholen], maar dat die praktisch gesproken wel buitengewoon klein zijn: 2% van de verschillen in instroom vergeleken met het voorgaande jaar zou een gevolg kunnen zijn van ouders die op basis van de Trouw-publikaties anders kiezen. In 2006 is het effect in zijn herinnering behoorlijk opgeblazen (De Volkskrant, januari): "De publicatie door Trouw van schoolprestatiesleidde het jaar daarop tot substantiële verschuivingen in aanmeldingen van nieuwe leerlingen." Dat is, zacht uitgedrukt, een verdraaiing van zijn eigen conclusies.
Dronkers, J., 1998. "Het betere is de vijand van het goede. Een reactie op de commentaren over het Trouw rapportcijfer." [The better is the enemy of the good. A reaction on the comments on the Trouw school grades] Pedagogische Studiën 75:142-150. [geen online versie beschikbaar]
Dronkers, J., 1998. "Het Trouw-rapportcijfer van scholen in het voortgezet onderwijs; een analyse van de inspectiegegevens over de schooljaren 1995/96 en 1996/97." [The Trouw grading of schools in secondary education; an analysis of the inspectorate data of the years 1995/96 and 1996/97] Tijdschrift voor Onderwijsresearch 23:159-176. [geen online versie beschikbaar]
J. Roeleveld and J. Dronkers (1994). Bijzondere of buitengewone scholen? Verschillen in effectiviteit van openbare en confessionele scholen in regio's waarin hun richting een meerderheids- of minderheidspositie inneemt. Mens en Maatschappij, 69, 85-108.
ANNE BERT DIJKSTRA, RENé VEENSTRA en JULES PESCHAR (27-4-2002). Hitlijsten voor scholen gevaarlijk. Publicatie van schoolprestaties kunnen ongelijkheid vergroten. Friesch Dagblad html
- De auteurs veronderstellen, zonder dat zo te formuleren, dat de hitlijsten geldig zijn, dat de verschillen corresponderen aan een range van kwaliteitsverschillen. Immers, als dat niet zo is, zijn die lijsten alleen verspilling van tijd, energie, plezier, en papier. Het punt is: er is geen onderzoek dat aannemelijk maakt dat de verschillen in deze hitlijsten verband houden met in de tijd stabiele kwaliteitsverschillen. (Ook al zouden ze voor vandaag wel gelden, maar morgen heel anders zijn, dan nog heb je er bij schoolkeuze geen donder aan)
Elsevier-lijsten

Elsevier Beste studies 2006 - WO www.elsevier.nl/survey/onderzoek/asp/surveyid/31/positie/23/index.html [oordeel van studenten, ook: HBO, ook: oordeel van hoogleraren] [

Elsevier 20 januari 2007 'De beste scholen editie 2007' Alle middelbare scholen van Nederland beoordeeld. p. 34-69.

P. Koopman and J. Dronkers (1994). De effectiviteit van algemeen bijzondere scholen in het algemeen voortgezet onderwijs. Pedagogische Studiën, 71, 420-41.
R. H. Hofman (1993). Effectief schoolbestuur. Een studie naar de bijdrage van schoolbesturen aan de effectiviteit van basisscholen. Groningen: RION.
R. H. Hofman e.a. (1996). Variation in Effectiveness between Private and Public Schools: The Impact of School and Family Networks. Educational Research and Evaluation, 2, 366-94.
J. W. M. Knuver (1993). De relatie tussen klas- en schoolkenmerken en het functioneren van leerlingen. Groningen: RION.
R. Veenstra (2001). Academic Achievement in Public, Religious, and Private Schools: Sector and Outcomes Differences in Holland. Paper presented at the annual meeting of the American Educational Research Association, Seattle, April 11.
J. Dronkers (1998). Het betere is de vijand van het goede. Een reactie op de commentaren over het Trouw rapportcijfer. Pedagogische Studiën, 75, 142-50
D. R. Veenstra e.a. (1998). Scholen op rapport. Een reactie op het Trouw-onderzoek naar schoolprestaties. Pedagogische Studiën, 75, 121-34.
A. B Dijkstra e.a. (Red.) (2001). Het oog der natie: scholen op rapport. Standaarden voor de publicatie van schoolprestaties. Assen: Van Gorcum.

Interne links

Over schooleffecten zie effectiviteit.htm

Externe links

David Colquhoun (2007) How should universities be run to get the best out of people? http://www.dcscience.net/goodsci/goodscience.htm

The above figure table is linked from this web page
This is a longer version of comments published in the Times Higher Education Supplement, June 1, 2007.

College and university rankings. Education & Social Science Library, University of Illinois http://www.library.uiuc.edu/edx/rankings.htm
International: html

Cornell University Higher Education Research Institute, Working papers page. Measuring Up 2004 .
Technical Guide Documenting Methodology, Indicators, and Data Sources For Measuring Up 2004: The National and State Report Card on Higher Education November 2004 pdf

"Measuring Up 2004 consists of the national report card for higher education and fifty state report cards. Its purpose is to provide the public and policymakers with information to assess and improve postsecondary education in each state. Measuring Up 2004 is the third in a series of biennial report cards. This web site provides state leaders, policymakers, researchers, and others with access to the national report card as well as access to all fifty state report cards. In addition, the site can compare any state with the best-performing states in each performance category, compare indicator scores and state grades for any performance category, obtain source and technical information for indicators and weights, and download the reports. Further, the Measuring Up web site has the capacity to view previous report cards from 2000 and 2002."

National Commission on Accountability in Higher Education (2005). Accountability for better results. A national imperative for higher education. A project of the State Higher Education Executive Officers. pdf (Maakt o.a. gebruik van de inzichten van Burke, 2004)

Joseph C. Burke (Editor) (2004). Achieving Accountability in Higher Education: Balancing Public, Academic, and Market Demands. Jossey-Bass. site met example chapter by Burke pdf, and contents pdf.

Literature

Martin Cave, Stephen Hanney, Mary Henkel, and Maurice Kogan (19973). The use of performance indicators in higher education. The challenge of the quality movement. London: Jessica Kingsley Publishers. third edition 1997.

M. S. R. Segers (1993). Kwaliteitsbewaking in het hoger onderwijs. Een exploratieve studie naar prestatie-idicatoren in theorie en praktijk. Proefschrift R.U. Limburg. isbn

James Monks and Ronald G. Ehrenberg (1999). The Impact of US News and World Report College Rankings on Admission Outcomes and Pricing Decisions at Selective Private Institutions. NBER Working Paper No. 7227*

Available at NBER. *Published: (Published as "U.S. News and World Report's College Rankings: Why Do They Matter") Change, Vol. 31, no. 6 (November/December 1999): 42-51.
---- Abstract ----- Despite the widespread popularity of the U.S. News & World Report College rankings there has been no empirical analysis of the impact of these rankings on applications, admissions, and enrollment decisions, as well as on institutions' pricing policies. Our analyses indicate that a less favorable rank leads an institution to accept a greater percentage of its applicants, a smaller percentage of its admitted applicants matriculate, and the resulting entering class is of lower quality, as measured by its average SAT scores. While tuition levels are not responsive to less favorable rankings, institutions offer less visible price discounts in the form of slightly lower levels of expected self-help (loans and employment opportunities) and significantly more generous levels of grant aid. These decreases in net tuition are an attempt to attract additional students from their declining applicant pool.

Ronald G. Ehrenberg (2003). Method or Madness? Inside the USNWR College Rankings. Cornell Higher Education Research Institute, Working paper WP 39.

forthcoming in the Journal of College Admissions
Behandelt op andere wijze hetzelfde onderwerp als zijn (2001), zie hier beneden. Paragraphs:
Why American's Have Become Obsessed with College Rankings
How Higher Education Institutions Try to Manipulate the USNWR Rankings
What's Wrong with the Ratings
Het eindoordeel: The rankings exacerbate, but are not the major cause of the increased competition in American higher education that has taken place over the last few decades. The real shame is that this competition has focused institutions on improving the selectivity of their entering first-year classes. Institutions appear to be increasingly valued for the test scores of the students they attract, not for their value added to their students and to society.

Ronald G. Ehrenberg (2001). Reaching for the Brass Ring: How the U.S. News and World Report Rankings Shape the Competitive Environment in U.S. Higher Education. Cornell Higher Education Research Institute, Working paper WP 17. pdf

published in the Review of Higher Education (Winter 2003)
[eerste alina:]
In a relatively short period of time the U.S News & World Report (henceforth USNWR) annual ranking of the nation's colleges and universities as undergraduate institutions has become the 'gold standard' of the ranking business. Perhaps this occurred because the USNWR ranking has the appearance of scientific objectivity (institutions are ranked along various dimensions with explicit weights being assigned to each dimension). Perhaps this occurred because institutions at the top of each category, for example the top 50 national universities, are ranked numerically within their categories and the American public wants to know which institution is number one.
[een citaat van p. 16, om een indruk van dit paper te krijgen:]
Virtually every academic institution engages each year in a process of examining all of the data it is planning to submit to USNWR to see if there are legitimate adjustments that it can make to the data that will improve its position in the ranking. Of course it is a rare institution that carefully examines whether it unintentionally erroneously reported something that overstates its position. One may well wonder if the resources that each institution devotes to preparing, checking and adjusting its data could more productively be either saved or used to educate students.

L'équité des systèmes éducatifs européens. Un ensemble d'indicateurs.- MEURET Denis ; MORLAIX Sophie ; et alii.- Rapport final du projet "Construire des indicateurs internationaux d'équité des systèmes éducatifs" réalisé dans le cadre du programme Socrates 6.1.2..- GERESE (Groupe européen de recherche sur l'équité des systèmes éducatifs), Liège, Service de pédagogie théorique et expérimentale (Université de Liège), mars 2005. 176 p. pdf 176 pp geheel in het Frans.

Georges Felouzis (2004). Les indicateurs de performance des lycées, une analyse critique pdf
Zie ook http://indicateurs.education.gouv.fr/ voor de data etcetera,
ook, meer algemeen: http://www.education.gouv.fr/stateval/etat/etat.htm

Ian Bednowitz (2000). The Impact of the Business Week and U.S. News & World Report Rankings on the Business Schools They Rank. Cornell University Senior Honors Thesis, May 2000. pdf .

abstract
This paper examines the widely popular Business Week and U.S. News & World Report rankings of the top business schools to determine their impact on the admissions outcomes, pricing policies, and career placement outcomes of the business schools they rank. The analysis indicates that both ranking systems have a significant impact on students and administrators in the short term and long term, but employers are only impacted by long-term changes in ranking. While both ranking systems are shown to have significant effects, some evidence indicates that Business Week's ranking is slightly more influential with students and significantly more influential with recruiters. In general, a fall in either ranking system leads a school to become less selective because its applicant pool shrinks and declines in quality, and a smaller percentage of applicants who are accepted matriculate. In addition, administrators are forced to either cut tuition or increase grant and scholarship aid to attract more students from its declining applicant pool. A more favorable ranking allows a school to become more selective as it attracts higher quality students who are more eager to attend the university, and the school can then decrease its grant and scholarship aid or slightly raise its tuition. Employers do not respond to yearly changes in rank, but a prolonged change in a school's ranking by either system leads employers to change their behavior. A long-term increase in a program's ranking leads to more of its students obtaining job offers, higher salaries for these offers, and more offers per student, in addition to an overall increase in the value of the MBA (as measured by change in salary). Similarly, a program which encounters a long-term decline in rank will see fewer of its students obtain lower-paying jobs, fewer options for each student, and a devaluing of the program's MBA.

Uit de implications:
Due to the subjective and varying natures of both ranking systems, it is not clear what they are measuring, but this leads to another question: Does it even matter? The rankings seem to be measuring perceptions of quality, and if students want the best jobs, they want to go to the institution with the highest perceived quality, and employers do the same thing. In this way, the rankings serve as a tool for communicating between the two consumers, and if the goal of an MBA is simply to improve one's career prospects, then the rankings are somewhat serving their purpose.
Inderdaad, zo kun je er ook tegenaan kijken.

Robert H. Frank (2001). Higher Education: The Ultimate Winner-Take-All Market? Cornell Higher Education Research Institute, Working paper WP 2. pdf

An edited version published in M. Devlin and J. Meyerson ed. Forum Futures - Exploring the Future of Higher Education - 2000 Papers (San Francisco: Jossey-Bass, 2001).

John Maynard Keynes once compared investing in the stock market to picking the winner of a beauty contest. In each case, it's not who you think will win, but who you think others will pick. The same characterization increasingly applies to a student's choice among universities. This choice depends much less now on what any individual student may think, and much more on what panels of experts think. The U.S. News & World Report's annual college ranking issue has become by far the magazine's biggest seller, and the same is true of Business Week's biennial issue ranking the nation's top MBA programs. The size of a school's applicant pool fluctuates sharply in response to even minor movements in these rankings.
In my remarks today, I'll discuss some of the reasons for the growing importance of academic rankings. I'll also explore how our increased focus on them has affected the distribution of students and faculty across schools, the distribution of financial aid across students, and the rate at which costs have been escalating in higher education.

Jennifer A. O'Day (2002). Complexity, Accountability, and School Improvement. Harvard Educational Review, 72. http://www.hepg.org/oday.html [Dead link? May 3, 2009] or pdf

A very handy, recent, and extensive list is to be found in Sharon L. Nichols and David C. Berliner (2005). The Inevitable Corruption of Indicators and Educators Through High-Stakes Testing. Education Policy Studies Laboratory, Arizona State University pdf (p. 171-180 in a 180 pp. document). Many items in this list refer to online documents, most of them American, of course.

Reports on indicators

OCW (2004). Kennis in kaart 2004 pdf
OCW (2005). Kennis in kaart 2005 http://www.minocw.nl/ho/doc/2005/kenniskaart.pdf [niet meer beschikbaar 2-2008]

Indicatoren voor het hoger onderwijs zijn in 2004 (niet meer in 2005) ingedeeld naar vier groepen: kwaliteit, toegankelijkheid, doelmatigheid, en maatschappelijke rol. Als het goed is, wordt deze rapportage jaarlijks uitgebracht door het Ministerie van Onderwijs en Wetenschappen.
Ik moet deze opstellingen nog bestuderen, maar vrees het ergste. Bijvoorbeeld p. 33 (2005) stelt dat van de instromers in het HO slechts ongeveer tweederde inderdaad een diploma behaalt. Dat is absurd, omdat degelijk onderzoek naar studieloopbanen nog niet zo lang geleden opleverde dat bijna 90% een diploma behaalt. Ondertussen is de BaMa ingevoerd, en zal dat percentage daarom zeker niet lager zijn (als 'Ba' als diploma geldt). Wanneer een groot deel van deze rapporten wordt vervuild met rendementen op studierichtingniveau, die soms erg laag kunnen zijn, kunnen we concluderen dat het departement het land hiermee weer eens een fijne dienst heeft bewezen. [Over uitval en rendement - verdraaid lastige onderwerpen, ik geef het toe - heb ik eerder ook wel eens iets geschreven, zie bijv. mijn 1980 over uitval en vertraging html 44k, of 1987 over rendementen onder de tweefasenstructuur html. Nog spookstudenten erbij doen? pdf 380k]
Paul J. Hanges, Julie S. Lyon (2005). Relationship Between U.S. News and World Report's and the National Research Council's Ratings/Rankings of Psychology Departments. American Psychologist, 60(9), 1035-1037.
- abstract Every year, U.S. News and World Report (USNEWS) creates a stir among academics and the public by publishing its ranking of universities and various departments within those universities. Although members of the public rely on the USNEWS rankings when making their academic choices, psychologists and other academics tend to rely on the National Research Council (NRC) report to differentiate various academic departments. Given the concerns about the scientific merit of the USNEWS rankings, the authors gathered some empirical information about the correlates of the USNEWS department ratings/rankings. They address the following questions in this comment: How similar are the ratings/rankings from USNEWS and the NRC? Are the USNEWS and NRC ratings/rankings related to other indices of department quality? Finally, what do these correlations say about the utility of these two rating systems? The authors believe that this comparison provides an initial exploration of the meaningfulness of two resources that are heavily relied on by the public and academia. The authors found that although they expected a positive correlation between the NRC and USNEWS rankings, they did not expect the magnitude of the relationship to be so substantial. Further, both of these measures exhibited significant and substantial relationships with two other NRC criteria of department effectiveness and several weaker but clearly nontrivial relationships with the APA graduate student data. At the very least, the present results do not support the belief of some academics that the USNEWS ratings/rankings lack scientific merit. Indeed, these results seem to suggest that the USNEWS rankings of psychology departments substantially duplicate the NRC rankings. (PsycINFO Database Record (c) 2005 APA, all rights reserved)

Nieuws

De Volkskrant bericht 14-2-2006 dat de bewindslieden van Onderwijs een 'bureaucratiemeter' voor schoolorganisaties willen. Kijk, dat is een soort indicator die me eigenlijk wel aanspreekt. Het probleem, zoals met alle indicatoren, zal zijn dat er oneindig veel manieren zijn om gegevens voor een bureaucratiemeter in gewenste richting bij te buigen voordat ze worden gerapporteerd. Of iemand er uiteindelijk dan wijzer van wordt? Ik vraag het u af. Tenslotte zijn er in iedere scholenorganisatie tal van belanghebbenden in vele organen die de bureaucratische trekken van de directie (en vele anderen) in toom moeten houden, bureaucratiemeter of geen bureaucratiemeter. Een kwestie van mentaliteit en van cultuur, dus. Maar ik zal het in de gaten houden. www.minocw.nl/nieuws/35058 [niet meer beschikbaar, 2-2008] Per 28-4-2006 lijkt er nog geen nieuws te melden over deze controle op de controle op de controle. Mega-Big Brother.

See the rankings page for details on world rankings of universities or national league tables of schools.

See the competition page for details on the idea and practice of competion in the fields of education.

Dick van der Wateren (10 januari 2015). De verleiding van toegevoegde waarde. blog

American Statistical Association (ASA) (2014). ASA Statement on Using Value-Added Models for Educational Assessment.
https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.pdf

Internetconsulrtatie. (3 maart 2015). Wijzigingsbesluit aanpassing van de indicatoren voor de beoordeling van de leerresultaten webpagina

Kafka in het klaslokaal. Zie hierboven de statement van de American Statistical Association

ASA Statement on Using Value-Added Models for Educational Assessment pdf

Bruno S. Frey & Margit Osterloh (2009). Onderzoeksevaluaties: verborgen kosten, twijfelachtige voordelen en betere alternatieven. (over prestatiebeoordelingen met heel uitvoerige literatuurlijst) In Thijs Jansen, Gabriël van den Brink & Jos Kole (Red.) (2009). Beroepstrots. Een ongekende kracht. Boom . 194-221 pdf hele boek via researchgate.net

February 2023 / contact ben apenstaartje benwilbrink.nl freelance advies ontwikkeling onderzoek

http://www.benwilbrink.nl/projecten/prestatieindicatoren.htm

Dit is een soort startpagina over het onheil van prestatie-indicatoren, vooral in het onderwijs. Toegangen tot rapporten, kritische analyses, wetenschappelijke achtergronden (of het gebrek daaraan).

Prestatie-indicatoren (indicator systems)

Ben Wilbrink

Rankings - Lead Tables - Lijsten

Universities

Schools

United Kingdom's lead tables

Inspectie van het Onderwijs

Het CPB blaast ook een deun mee, nog meer verwarring.

Elsevier/Inspectie-lijsten 2009

Trouw-lijsten

Elsevier-lijsten

Interne links

Externe links

Literature

Reports on indicators

Nieuws