Literatuur over toetsen (itt tests)


Ben Wilbrink



Gifford, Bernard R. Gifford (Ed.) (1989). Test policy and the politics of opportunity allocation: the workplace and the law. National Commission on Testing and Public Policy. Kluwer Academic Publishers. isbn 0792390156 info and previews




Gifford, Bernard R. Gifford (Ed.) (1989). Test policy and test performance: education, language and culture. National Commission on Testing and Public Policy. Kluwer Academic Publishers. isbn 0792390148 info and previews




Lindquist, E. F. Lindquist (Ed.) (1951). Educational measurement. American Council on Education. Walter W. Cook: The functions of measuement in the facilitaion of learning 3-46; Ralph W. Tyler: The functions of measurement in improving instruction 47-67; John G. Darley & Gordon V. Anderson: The fucntions of measurement in counseling 68-84; Henry Chancey & Norman Frederiksen: The functions of measurement in educational placement 85-117; E. F. Lindquist: Preliminay considrations in objective test construction 119-158; K. W. Vaugh: Planning the objective test 159-184; Robert L. Ebel: Writing the test item 185-149; Herbert S. Conrad: The experimental tryout of test materials 250-265; Frederick B. Davis: Item selection techniques 266-328; Arthur E. Traxler: Administering and scoring the objective test 329-416; Geraldine Spaulding: Reproducing the test 417-454; David G. Ryans & Norman Frederiksen: Performance tests of educational achievement 455-494; John M. Stalnaker: The essay type of examination 495-530; Irving Lorge: The fundamental nature of measurement 533-559; Robert L. Thorndike: Reliability 360-620; Edward E. Cureton: Validity 621-694; John C. Flanagan: Units, scoes and norms 695-763; Charles I. Mosier: Batteries and profiles 764-809



E. F. Lindquist (1969). The impact of machines on educational measurement. 351-369. [Separate publication: The impact of machines on educational measurement - a monograph. AERA-pdk award lecture annual meeting American Educational Research Association, Chicago February 9, 1968] In Ralph W. Tyler (Ed.) (1969). Educational evaluation: New roles, new means. The Sixty-eighth Yearbook of the National Society for the Study of Education. NSSE. paywalled https://nsse-chicago.org/yearbooks.asp?cy=1969




Christopher Stray (2009). From oral to written examinations. In R. Lowe (Ed.) The history of higher education: Major themes in education, volume 4 (159-207). Routledge. concept version


I am listed in the acknowledgement, thanks Chris. Refers to Assessment in historical perspective, 1997.



Gwyneth Hughes (2014). Ipsative assessment. Motivation through marking progress. Palgrave Macmillan. [nog? niet als eBook in KB] info


Ik heb juli 2014 enkele aantekeningen bij dit boek gegeven op Twitter, vooral in de vorm van online beschikbare publicaties waar Hughes naar verwijst, of waar ze juist niet verwijst ;-).



Gordon Stobart (2008). Routledge. [als eBook in KB] info (30 pp preview)




Dominique Sluijsmans, Sabine van Eldik, Desirée Joosten-ten Brinke & Linda Jakobs (2014). Bewust en bekwaam toetsen Wat zouden lerarenopleiders moeten weten over toetsing? pdf




Jo-Anne Baird, Therese N. Hopfenbeck, Paul Newton, Gordon Stobart & Anna T. Steen-Utheim (2014). Assessment and learning. State of the field review. Norwegian Knowledge Center for Education. pdf


An interesting vhapter 7 on PISA tests. Everything about them gets criticized, except its constructivist bias, even though the constructivisme/situationism of the PISA tests has been described adequately.



George F. Madaus (1988). The influence of testing on the curriculum. In Laurel N. Tanner (Ed.) (1988). Critical issues in Curriculum (83-121). NSSE. [onmiddellijk daarop volgend: Daniel Tanner (1988). The textbook controversies. pp 122-147. [feedforward, backwash, washback] paywalled




James E. Carlson & Matthias von Davier (2013). Item response theory. ETS SPC-13-05 pdf




Saskia Wools (2007). Evaluatie van een instrument voor kwaliteitsbeoordeling van competentieassessments. pdf







Michèle Lamont (2009). How professors think. Inside the curious world of academic judgment. Harvard University Press. isbn 9780674057333 info




Matthew Jensen Hays, Nate Kornell & Robert A. Bjork (2013). When and why a failed test potentiates the effectiveness of subsequent study. Journal of Experimental Psychology: Memory, and Cognition, 39, 290–296. abstract




Stefan Johansson , Eva Myrberg & Monica Rosn (2012) Teachers and tests: assessing pupils' reading achievement in primary schools, Educational Research and Evaluation: An International Journal on Theory and Practice, 18:8, 693-711. abstract




Marjorie C. Kirkland (1971). The effects of tests on students and schools. Review of Educational Research, 41, 303-350.


backwash, feedforward



Yigal Attali & Don Powers (2010). Immediate Feedback and Opportunity to Revise Answers to Open-Ended Questions. Educational and Psychological Measurement, 70, 22-35 abstract


Dit is nu eens een intrigerend idee: geef kandidaten meteen na het antwoorden op een toetsvraag informatie over de jusitheid, en geef ze ook de gelegenheid om het antwoord te verbeteren! Dat ik daar zelf nog nooit aan heb gedacht. Ik heb het ongetwijfeld al wel eens zien langskomen in de vorm van ‘the answer-until-correct method for MC items (Pressey, 1926)’.



Gregory Ethan Stone, Kristin L. K. Koskey and Toni A. Sondergeld (2011). Comparing Construct Definition in the Angoff and Objective Standard Setting Models : Playing in a House of Cards Without a Full Deck. Educational and Psychological Measurement, 71 942abstract


Dit is een onderzoeklijn van Gregory Stone. Het bevalt me helemaal niet dat hij spreekt van objectief afgeleide standaarden. Ik moet daar zeker eens een keer goed naar kijken. De enige juiste methode is die welke ik in 1980 in het TOR heb beschreven. Zo moeilijk is dat trouwens niet, voor een selectiepsycholoog met enige affiniteit met besliskunde.



George Engelhardt, Jr. (2011). Evaluating the Bookmark Judgments of Standard-Setting Panelists. , 909–924abstract




David Spendlove (2009). Putting Assessment for Learning into Practice. Continuum. site


Misschien een heel aardig boekje, maar de aanbevelingen zijn autoritair, dat is: zonder enige bronvermelding. Er is wel een lijstje met verder te lezen publicaties. Dat is misschien een keuze die past bij een boekje met tips, van bescheiden omvang, maar ik prefereer toch tips met een specifieke bronvermelding zodat de lezer zelf kan nagaan wat de strekking/onderbouwing van de tip is.



Natalia Karelaia & Robin M. Hogarth (2008). Determinants of Linear Judgment: A Meta-Analysis of Lens Model Studies. Psychological Bulleting, 134, 404-426. pdf



Grant Wiggins (1994). The immorality of test security. Educational Policy, 8, 157-182. abstract



Grant P. Wiggins (1993). Assessing student performance. Exploring the purpose and limits of testing. Jossey-Bass. isbn 1555425925



Ron J. Pat El (2012). Lost in Tranlation. Congruency of teacher and student perceptions of assessment as a predictor of intrinsic motivation in ethnodiverse classrooms. Proefschrift Universiteit Leiden. availability of chapters; samenvatting


De promovendus schrijft over ‘eikpunten’, wat niet wijst op een overdreven mate van zorgvuldigheid. De ideologie is die van het sociaal-constructivisme, wat mij toch wat minder passend lijkt bij een academisch werkstuk. Maar ja, de hoofdstukken zijn deels al gepubliceerd in gerefeerde wetenschappelijke tijdschriften.

De docent heeft in een sociaal-constructivistische leeromgev- ing meer de rol van ondersteuner van het leerproces, dan van kennisoverdrager. Regelmatige (informele) evaluaties kunnen als eikpunten dienen voor zowel leer- lingen, die informatie krijgen over te verbeteren punten, als voor leerkrachten, die inzicht krijgen waar aan gewerkt moet worden in volgende lessen.

Wie tekenen er voor dit proefschrift: promotoren Paul Vedder en Mien Segers, co-promotor: Harm Tillema. Commissieleden: Roel Bosker, P. van den Broek, P. den Brok, C. Espin.

Grote afwezige in dit onderzoek: de vakinhouden van het onderwijs. Dit is een overbodig onderzoek., en dan druk ik mij vriendelijk uit (onvriendelijk zou zijn: dit is een schadelijk onderzoek, het bevestigt wanbeleid in het onderwijsveld).



Dirk Ifenthaler, Deniz Eseryel & Xun Ge (Eds.) (2012). Assessment in Game-Based Learning. Foundations, Innovations, and Perspectives. Springer.



Lorrie A. Shepard (2000). The role of classroom assessment in teaching and learning. CSE Technical Report 517 Published in V. Richardson (Ed.) (2001), Handbook of research on teaching (4th ed). Washington, DC: American Educational Research Association. pdf" target='_blank'>pdf




Robert L. Brennan (Ed.) (2006). Educational Measurement. National Council on Measurement in Education; American Council on Education.

 [geen online bestanden beschikbaar?]
Editor¿s Preface
1. Perspectives on the Evolution and Future of Educational Measurement
Robert L. Brennan
Part I: Theory and General Principles
2. Validation
Michael T. Kane
3. Reliability,
Edward H. Haertel
4. Item Response Theory
Wendy M. Yen and Anne R. Fitzpatrick
5. Scaling and Norming
Michael J. Kolen
6. Linking and Equating
Paul W. Holland and Neil J. Dorans
7. Test Fairness
Gregory Camilli
8. Cognitive Psychology and Educational Assessment
Robert J. Mislevy
Part II: Construction, Administration, and Scoring
9. Test Development
Cynthia B. Schmeiser and Catherine J. Welch
10. Test Administration, Security, Scoring, and Reporting
Allan S. Cohen and James A. Wollack
11. Performance Assessment
Suzanne Lane and Clement A. Stone
12. Setting Performance Standards
Ronald K. Hambleton and Mary J. Pitoniak
13. Technology and Testing
Fritz Drasgow, Richard M. Luecht, and Randy E. Bennett
Part III: Applications
14. Old, Borrowed and New Thoughts in Second Language Testing
Micheline Chalhoub-Deville and Craig Deville
15. Testing for Accountability in K-12
Daniel M. Koretz and Laura S. Hamilton
16. Standardized Assessment of Individual Achievement in K-12
Steve Ferrara and Gerald E. DeMauro
17. Classroom Assessment
Lorrie A. Shepard
18. Higher Education Admissions Testing
Rebecca Zwick
19. Monitoring Educational Progress With Group-Score Assessments
John Mazzeo, Stephen Lazer, and Michael J. Zieky
20. Testing for Licensure and Certification in the Professions
Brian E. Clauser, Melissa J. Margolis, and Susan M. Case
21. Legal and Ethical Issues
S. E. Phillips and Wayne J. Camara
Index



Satomi Mizutani (2009). The Mechanism of Washback on Teaching and Learning. A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Educational Psychology, The University of Auckland, 2009. (supervisors: Professor John Hattie, Dr. Christine Rubie-Davies, and Dr. Jenefer Philp) pdf



Greaney, V., & Kellaghan, T. (1996). Monitoring the learning outcomes of educational systems. Washing D. C.: The World Bank.  [geen directe aandacht voor washback]



Kathleen M. Bailey (1999). Washback in language testing. Educational Testing Service MS-15 june 1999  pdf



Eleana Shohamy, Smadar Donitsa-Schmidt & Irit Ferman (1996). Test impact revisited: washback effect over time. Language Testing, 13, 298-317. abstract



Mary Spratt (2005). Washback and the classroom: the implications for teaching and learning of studies of washback from exams. Language Teaching Research, 9, 5-29. abstract Er is een bestand op internet beschikbaar: pdf



Shahrzad Saif (2006). Aiming for positive washback: a case study of international teaching assistants. Language Testing, 23, 1-34 abstract



Ana P. Muñoz and Marta E. Álvarez (2010).  Washback of an oral assessment system in the EFL classroom. Language Testing, 27, 33-49. abstract



M. L. Smith (1991). Put to the test: The effects of external testing on teachers. Educational Researcher, 20(5), 8-11. eerste pagina



M. L. Smith and C. Rottenberg (1991). Unintended consequences of external testing in elementary schools. Educational Measurement: Issues and Practice, 10(4), 7-11. [Zie ook Gregory J. Cizek (2011). More unintended consequences of high-stakes testing. Educational Measurement: Issues and Practice, 20, 19-27  final draft]



David R. Krathwohl (2002). A revision of Bloom’s taxonomy: An overview. Theory into Practice, 41, 212-264. pdf



Ineke Huibregtse en Wilfried Admiraale (2000). D score op een ja/nee-woordenschattoets: correctie voor raden en persoonlijke antwoordstijl. TOR, 24, 110- . online



F. M. Edens, F. Rink & M. J. Smilde (2000). De studentenrechtbank: een evaluatieonderzoek naar beoordelingslijsten voor prestatievaardigheden. Tijdschrift voor Onderwijsresearch, 24, 265-274. online



Mary E. Lunz, Betty A. Bergstrom & Richard C. Gershon (1994). Computer adaptive testing. International Journal of Educational Research, 21, 623-634. [Relevant voor de rekentoets, WisCat, etc.]



Martin Brunner, Cordula Artelt, Stefan Krauss, Jürgen Baumert (2007). Coaching for the PISA test. Learning and Instruction. 18, 321-336.



P. Vedder (1992). Het Cito-leerlingvolgsysteem. Pedagogische Studiën, 69, 284-290. Met repliek: P. Gillijns & P. Verhoeven (1992). Het Cito-leerlingvolgsysteem: met het oog op de praktijk. Pedagogische Studiën, 69, 291-296.



Hartmut von Hentig (1980). Die Krise des Abiturs und eine Alternative. Klett-Cotta. Stuttgart, Ernst Klett.



Harold L. Kleinert, Diane M. Browder & Elizabeth A. Towles-Reeves (2009). Models of Cognition for Students With Significant Cognitive Disabilities: Implications for Assessment. Review of Educational Research, 79, 301-326.



Maarten van Gils (1977). De onbetrouwbaarheid van selektieve tekstbegriptoetsen. Pedagogiche Studiën, 54, 52-61.



Willem K. B. Hofstee (2009). Promoting intersubjectivity: a recursive-betting model of evaluative judgments. Netherlands Journal of Psychology, 65. abstract


Aantekeningen: toetsmodellen.htm#Hofstee_intersubjectivity



Jean-Yves Rochex (2006). Social, Methodological, and Theoretical Issues Regarding Assessment: Lessons From a Secondary Analysis of PISA 2000 Literacy Tests Review of Research in Education January 2006 30: 163-212,



Maarten Pinxten, Bieke De Fraine*, Jan Van Damme and Ellen D’Haenens Causal ordering of academic self-concept and achievement: Effects of type of achievement measure British Journal of Educational Psychology (2010), 80, 689- -709 download UBUw



Ana Maria Pazos Rego (2009?). The aphabetic principle, phonics, and spelling. In Jeanne Shay Schumm: Reading assessment and instruction for all learners. The Guilford Press.



Elana Shohamy (2008). Assessment in multicultural societies: Applying democratic principles and practices to language testing. In Charles A. MacArthur, Steve Graham & Jill Fitzgerald: Handbook of writing research. The Guilford Press. 72-92.



Evert Gijsbert Harskamp & Conradus Johannes Maria Suhre (1997?). Toetsen basisvorming: Een onderzoek onder scholen, ouders en leerlingen.. GION. isbn 9789066904446 SVO-project 96080 (ik heb dat aanbesteed)



Paul Black & Dylan Wiliam (2009). Developing the theory of formative assessment. Educational assessment, evaluation and accountability, 21 concept



Paul E. Newton (2012). Clarifying the Consensus Definition of Validity. Measurement: Interdisciplinary Research and Perspectives, 10, 1-29. abstract



Robert J. Mislevy, Linda S. Steinberg and Russell G. Almond (2003). On the structure of educational assessments. CSE Technical Report 597 pdf.



James W. Pellegrino, Naomi Chudowsky, and Robert Glaser (Eds.) (2001). Knowing what students know. The Science and Design of Educational Assessment. The Science and Design of Educational Assessment. Board on Testing and Assessment, Center for Education, Division of Behavioral and Social Sciences and Education, National Research Council. pdf van heel het boek hier ophalen Shepard, L. (1991). Psychometricians’ beliefs about learning. Educational Researcher, 20, 2-16. (Integraal online als html of direct te downloaden pdf)

p. 9: Conclusion: Implications for Measurement Practice Three main points are made in the respective sections of this artide: 1. On the basis of qualitative analysis of interview data from a representative sample of 50 district testing directors, it is asserted that approximately half of all measurement specialists operate from implicit learning theories that encourage close alignment of tests with curriculum and judicious teaching of tested content. 2. These beliefs, associated with criterion-referenced testing, derive from behaviorist learning theory, which requires sequential mastery of constituent skills and behaviorally explicit testing of each learning step. 3. The sequential, facts-before-thinking model of learning is contradicted by a substantial body of evidence from cognitive psychology. My argument is that hidden assumptions about learning should be examined precisely because they are covert. What we believe about learning and the intended effect of testing on learning should be considered directly, not "smuggled in" by the adoption of a popular test theory. (..) This article is an exercise in making implicit beliefs explicit so that they become available for debate and evaluation.



Harry Torrance (2012): Formative assessment at the crossroads: conformative, deformative and transformative assessment, Oxford Review of Education, 38:3, 323-342 To link to this article: http://dx.doi.org/10.1080/03054985.2012.689693



Caroline V. Gipps (1994). Beyond testing. Towards a theory of educational assessment. Falmer Press. [boek niet meer beschikbaar]