Literatuur testpsychologie (psychometrie, methodologie)


Ben Wilbrink

Zie ook projecten/raden.htm

zie ook projecten/geheimhouding.htm




Peter Herriot (Ed.) (1989). Assessment and selection in organizations. Methods and selection in organizations. Chichester: Wiley. isbn 0471916404




K. Anders Ericcson, A. and Simon, H. Cognition: An Historical Overview [scan availabe in Simon archive http://diva.library.cmu.edu/Simon/ ] [never reprinted] In Thomas V. Merluzzi, Carol R. Glass and Myles Genest (Eds.) (1981). Cognitive assessment. New York: Guilford Press.




Earl Hunt (2011). Human Intelligence.




Kofi Kissi Dompere (2014). Fuzziness, Democracy, Control and Collective Decision-choice System : A Theory on Political Economy of Rent-Seeking and Profit-Harvesting Springer [eBook in KB] info


Ik heb niet veel literatuur over rentseeking, los van de wereld van selectie-aan-de-poort, ik heb deze dus maar even genoteerd.



Helga A. H. Rowe (Ed.) (1991). Intelligence: Reconceptualization and Measurement. Erlbaum. [als eBook in KB] preview Questia




Maria Elena Oliveri & Matthias von Davier (2014). Toward Increasing Fairness in Score Scale Calibrations Employed in International Large-Scale Assessments. International Journal of Testing, 14, 1-21. open access


gebruikte data: PIRLS



Arne Evers, Klaas Sijtsma, Wouter Lucassen & Rob R. Meijer (2010). The Dutch Review Process for Evaluating the Quality of Psychological Tests: History, Procedure, and Results. International Journal of Testing, 10. abstract [paywall]


Geschiedenis, en werkwijze, Cotan.



Franié, Sanja; Dolan, Conor V.; Borsboom, Denny; Hudziak, James J.; van Beijsterveldt, Catherina E. M.; Boomsma, Dorret I. (2013). Can genetics help psychometrics? Improving dimensionality assessment through genetic factor modeling. Psychological Methods, 18, 406-433. abstract




Wim J. van der Linden (1998). A discussion of some methodological issues in international assessments. International Journal of Educational Research, 29, 569-577. abstract




Stephen G. Sireci and Polly Parker (2006). Validity on Trial: Psychometric and Legal Conceptualizations of Validity. Educational Measurement: Issues and Practice, fall, 27-34. abstract




Shudong Wang, Hong Jiao, Michael J. Young, Thomas Brooks and John Olson (2008). Comparability of Computer-Based and Paper-and-Pencil Testing in K-12 Reading Assessments : A Meta-Analysis of T"methodological issues in international asssessment" esting Mode Effects. Educational and Psychological Measurement 2008 68 5abstract




Fadia Nasser-Abu Alhija & Adi Levy (2009). Effect Size Reporting Practices in Published Articles. Educational and Psychological Measurement, 69, 245-265. abstract




Alvaro J. Arce-Ferrer and Elvira Martínez Guzmán (2009). Studying the Equivalence of Computer-Delivered and Paper-Based Administrations of the Raven Standard Progressive Matrices Test. Educational and Psychological Measurement, 69, 855-867. abstract


Vindt geen verschillen, i.t.t. eerder overzicht van Kubinger (1991).



Anneke C. Timmermans, Tom A. B. Snijders and Roel J. Bosker (2013). In Search of Value Added in the Case of Complex School Effects. Educational and Psychological Measurement 73, 210-228abstract


Ik zie dit artikel als vooral een technische analyse: specificeer een model, gebruik een beschikbare dataset, en rekenen maar. Ga na hoe model A tot andere uitkomsten leidt dan model B. De auteurs gaan althans in dit artikel nauwelijks in op de vraag of het schatten toegevoegde waarde een zinvolle onderneming is waarmee de samenleving mag worden lastiggevallen. Zij rekenen gewoon aan modellen, en zoals dat dan gegarandeerd het geval is: dat levert bepaalde utikomsten op. Er liggen evenwel heel wat stilzwijgende en minder stilzwijgende vooronderstellingen ten grondslag aan deze werkwijze.



Robert W. Lissitz (2009). Validity. Revisions, new directions, and applications. Information Age Publishing. [nog niet gezien]



Wim J. van der Linden & Minjeong Jeon (2012). Modeling Answer Changes on Test Items. Journal of Educational and Behavioral Statistics, 37, 180-199abstract pdf

On fraudulent changes.



Wim J. van der Linden, Minjeong Jeon & Steve Ferrara (2011). A Paradox in the Study of the Benefits of Test-Item Review. Journal of Educational Measurement, 48, 380-398. pdf



Kristian E. Markon (2013). Information Utility: Quantifying the Total Psychometric Information Provided by a Measure. Psychological Methods, 18, 15-35. abstract



Gregory J. Cizek (2012). Defining and Distinguishing Validity: Interpretations of Score Meaning and Justifications of Test Use. Psychological Methods, 17, 31-43. abstract



Ken Kelley & Kristopher J. Preacher (2012). On effect size. Psychological Methods, 17, 137-172. accepted concept



Michèle Nuijten, Marie Deserno, Angélique Cramer & Denny (2013). Psychologische stoornissen als complexe netwerken. De Psycholoog, januari, 12-23 [gecorrigeerde referentie, zie De Psycholoog, februari 2013 blz. 4]



Han L. J. van der Maas, Conor V. Dolan, Raoul P. P. P. Grasman, Jelte M. Wicherts, Hilde M. Huizenga, and Maartje E. J. Raijmakers (2006). A Dynamical Model of General Intelligence: The Positive Manifold of Intelligence by Mutualism. Psychological Review, 113, 842-861. pdf




Nate Silver (2012). The signal and the noise. Why so many predictions fail -- but some don't. The Penguin Press. isbn9781594204111 http://www.nytimes.com/2012/10/24/books/nate-silvers-signal-and-the-noise-examines-predictions.html http://www.npr.org/2012/10/10/162594751/signal-and-noise-prediction-as-art-and-science



Cynthia G. Parshall, Judith A. Spray, John C. Kalohn, and Tim Davey (2002). Practical Considerations in Computer-Based Testing. Springer [Nog niet gezien. Besproeken door Rob Meijer: Applied Psychological Measurement, Vol. 27 No. 1, January 2003, 78-80



David Thissen & Howard Waiuner (Eds.) (2001). Test Scoring. Springer [Nog niet gezien. Besproeken door Rob Meijer: Applied Psychological Measurement, Vol. 27 No. 1, January 2003, 75-77



Ronald K. Hambleton (2000). Advances in Performance Assessment Methodology. Applied Psychological Measurement, 24, 291-293. [Introduction to special issue)



Randy Elliot Bennett, Mary Morley & Dennis Quardt (2000). Three Response Types for Broadening the Conception of Mathematical Problem Solving in Computerized. Applied Psychological Measurement, 24, 294-309. abstract



M. David Miller & Robert L. Linn (2000). Validation of performance-based assessments. Applied Psychological Measurement, 24, 367-378. abstract



Wim J. van der Linden (2000). Optimal Assembly of Tests with Item Sets. Applied Psychological Measurement, 24, 225-240. abstract


Dit gaat over het type examenopgaven waarin een tekst is gegeven, waarover dan meerdere vragen worden gesteld. En dat is een vorm die in eindexamens veel wordt gebruikt. Het woord ‘optimaal’ heeft natuurlijk maar een beperkte betekenis: optimaal binnen gegeven randvoorwaarden. Als die randvoorwaarden beroerd zijn, zoals de kwaliteit van de vragen in de vragenverzameling waaruit wordt getrokken, dan is dat ‘optimaal’ een eufemisme.



Tom Verguts & Paul de Boeck (2000). A Rasch Model for Detecting Learning While Solving an Intelligence Test. Applied Psychological Measurement, 24, 151-162. abstract


Een opvallende titel. Intrigerend.



E. Matthew Schulz, Michael J. Kolen & W. Alan Nicewander (1999). A Rationale for Defining Achievement Levels Using IRT-Estimated Domain Scores. Applied Psychological Measurement, 23, 347-362. abstract



Rob R. Meijer & Michael L. Nering (1999). Computerized Adaptive Testing: Overview and Introduction. Applied Psychological Measurement, 23, 187-194. abstract



Chi-Keung Leung, Hua-Hua Chang & Kit-Tai Hau (2005). Computerized adaptive testing: A mixture item selection approach for constrained situations. British Journal of Mathematical and Statistical Psychology, 58, 239-257. abstract



T. J. H. M. Eggen (1999). Item Selection in Adaptive Testing with the Sequential Probability Ratio Test. Applied Psychological Measurement, 23, 249-261. abstract



Almond & Mislevy (1999). Graphical Models and Computerized Adaptive Testing. Applied Psychological Measurement, 23, 223-237. abstract



Tenko Raykov(1999). Are Simple Change Scores Obsolete? An Approach to Studying Correlates and Predictors of Change. Applied Psychological Measurement, 23, 120-126. abstract



Nambury S. Raju, Reyhan Bilgic, Jack E. Edwards & Paul F. Fleer (1999). Accuracy of Population Validity and Cross-Validity Estimation: An Empirical Comparison of Formula-Based, Traditional Empirical, and Equal Weights Procedures. Applied Psychological Measurement, 23, 99-115. abstract



Wim J. van der Linden (1999). Empirical Initialization of the Trait Estimator in Adaptive Testing. Applied Psychological Measurement, 23, 21-29. abstract



Gideon J. Mellenbergh (1999). A Note on Simple Gain Score Precision. Applied Psychological Measurement, 23, 87-89. abstract



John R. Bergan, Richard D. Schwarz & Linda A. Reddy (1999). Latent Structure Analysis of Classification Errors in Screening and Clinical Diagnosis: An Alternative to Classification Analysis. Applied Psychological Measurement, 23, 69-86. abstract



Klaas Sijtsma & Anton C. Verweij (1999). Knowledge of Solution Strategies and IRT Modeling of Items for Transitive Reasoning. Applied Psychological Measurement, 23, 55-68. abstract


Onderzoek waarbij de leerlingen hun antwoorden op de toets hebben moeten motiveren. Zie ook hoofdstuk 2 van Toetsvragen ontwerpen hfdst 2.



Wim van der Linden (1998). Optimal assembly of psychological and educational tests. Applied Psychological Measurement, 22, 195-211. abstract



Anat Ben-Simon, David V. Budescu and Baruch Nevo (1997). A Comparative Study of Measures of Partial Knowledge in Multiple-Choice Tests. Applied Psychological Measurement, 21, 65-88. abstract



Craig W. Deville (1996). An empirical link of content and construct validity evidence. Applied Psychological Measurement, 20, 127-139. abstract



Richard H. Williams & Donald W. Zimmerman (1996). Are simple gain scores obsolete? Applied Psychological Measurement, 20, 59-69. abstract



Rolf Langeheine, Elsbeth Stern & Frank van de Pol (1994). State Mastery Learning: Dynamic Models for Longitudinal Data Applied Psychological Measurement, 18, 277-291. abstract



Menucha Birenbaum, Kikumi K. Tatsuoka & Yaffa Gutvirtz (1992). Effects of Response Format on Diagnostic Assessment of Scholastic Achievement. Applied Psychological Measurement,16, 353-363. abstract


In het geval van opgaven algebra.



A.H.G.S. van der Ven & F.M. Gremmen (1992). The Knowledge or Random Guessing Model for Matching Tests. Applied Psychological Measurement, 16, 177-194. abstract



Mary E. Lunz, Betty A. Bergstrom & Benjamin D. Wright (1992). The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Test. Applied Psychological Measurement, 16, 33-40. abstract



Frits E. Zegers (1991). Coefficients for interrater agreement. Applied Psychological Measurement, 15, 321-333. abstract



Frits E. Zegers (1989). Het meten van overeenstemming. Nederlands Tijdschrift voor de Psychologie, 44, 145-156.

“. . . de ene leraar geeft de cijfers 7, 8 en 9, terwijl de ander voor dezelfde opstellen respectievelijk de cijfers 2, 3 en 4 geeft. De pmc tussen deze sets scores is maximaal (+1), maar het valt moeilijk te verdedigen dat de leraren het volledig met elkaar eens zijn.”

blz. 145



W. K. B. Hofstee & F. E. Zegers (1991). Idiographic correlation: modeling judgments of agreement between school grades. Tijdschrift voor Onderwijsresearch, 16, 331-336.



John B. Carroll (1990). Estimating Item and Ability Parameters in Homogeneous Tests With the Person Characteristic Function. Applied Psychological Measurement, 14, 109-125. abstract



Huub van den Bergh (1990). On the Construct Validity of Multiple- Choice Items for Reading Comprehension. Applied Psychological Measurement, 14, 1-12. abstract



Michael I. Waller (1990). Modeling Guessing Behavior: A Comparison of Two IRT Models. Applied Psychological Measurement, 13, 233-243. abstract



Jerry S. Gilmer (1989). The Effects of Test Disclosure on Equated Scores and Pass Rates. Applied Psychological Measurement, 13, 245-255. abstract



Terry A. Ackerman (1989). Unidimensional IRT Calibration of Compensatory and Noncompensatory Multidimensional Items. Applied Psychological Measurement, 13, 113-127. abstract



Marion S. Aftanas (1988). Theories, Models, and Standard Systems of Measurement. Applied Psychological Measurement, 12, 325-338. abstract



Terry A. Ackerman & Philip L. Smith (1988). A Comparison of the Information Provided by Essay, Multiple-Choice, and Free-Response Writing Tests. Applied Psychological Measurement, 12, 117-128. abstract



David V. Budescu (1988). On the Feasibility of Multiple Matching Tests — Variations on a Theme by Guiliksen. Applied Psychological Measurement, 12, 5-14. abstract



David V. Budescu (1987). Open-Ended Versus Multiple-Choice Response Formats—It Does Make a Difference for Diagnostic Purposes. Applied Psychological Measurement, 11, 385-395. abstract



Wim J. van der Linden (1986). The Changing Conception of Measurement in Education and Psychology. Applied Psychological Measurement, 10, 325-332. abstracttestpsychologie.htm-->


Technocratic.



Catharina C. van Thiel & Michel A. Zwarts (1986). Development of a Testing Service System. Applied Psychological Measurement, 10, 391-403. abstract



Ronald K. Hambleton & Richard J. Rovinelli (1986). Assessing the Dimensionality of a Set of Test Items Applied Psychological Measurement, 10, 287-302. abstract



Harold Gulliksen (1986). Perspective on Educational Measurement. Applied Psychological Measurement, 10, 109-132. abstract



Neal Schmitt & Daniel M. Stults (1986). Methodology Review: Analysis of Multitrait-Multimethod Matrices. Applied Psychological Measurement, 10, 1-22. abstract



J. P. Guilford (1985). A Sixty-Year Perspective on Psychological I Measurement. Applied Psychological Measurement, 9, 341-349. abstract



Anne Anastasi (1985). Some Emerging Trends in Psychlolgical Measurement: A Fifty-Year Perspective. Applied Psychological Measurement, 9, 121-138. abstract



Gail Ironson, Susan Homan & Ruth Willis (1984). The Validity of Item Bias Techniques with Math Word Problems. Applied Psychological Measurement, 8, 391-396. abstract



Albert C. Oosterhof & Pamela K. Coats (1984). Comparison of Difficulties and Reliabilities of Quantitative Word Problems in Completion and Multiple-Choice Item Formats. Applied Psychological Measurement, 8, 287-294. abstract



Robert L. Linn & C. Nicholas Hastings (1984). Group differentiated prediction. Applied Psychological Measurement, 8, 165-172. abstract



Michael Kane & Jennifer Wilson (1984). Errors of Measurement and Standard Setting in Mastery Testing. Applied Psychological Measurement, 8, 107-115. abstract



Isaac I. Bejar (1983). Subject Matter Experts' Assessment of Item Statistics. Applied Psychological Measurement, 7, 303-310. abstract



Henk Blok & Wim E. Saris (1983). Using Longitudinal Data to Estimate Reliability. Applied Psychological Measurement 7, 295-301. abstract



Anne R. Fitzpatrick (1983). The Meaning of Content Validity. Applied Psychological Measurement 7, 3-13. abstract



Ronald K. Hambleton (1983). Application of Item Response Models to Criterion-Referenced Assessment. Applied Psychological Measurement 7, 33-44. abstract



R. A. Weitzman (1982). Sequential Testing for Selection. Applied Psychological Measurement 6, 337-51. abstract



Jo P. M. Pieters & Ad H. G. S. van der Ven (1982). Precision, Speed, and Distraction in Time-Limit Tests. Applied Psychological Measurement 6, 93-103. abstract



Rand R. Wilcox (1981). A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model. Applied Psychological Measurement 5,531-537. abstract



Lawrence J. Stricker (1981). The Role of Noncognitive Measures in Medical School Admissions. Applied Psychological Measurement 5, 313-323. abstract



Gary B. Forbach & Ronald G. Evans (1981). The Remote Associates Test as a Predictor of Productivity in Brainstorming Groups. Applied Psychological Measurement 5, 333-339. abstract



Susan E. Whitely & Lisa M. Schneider (1981). Information Structure for Geometric Analogies: A Test Theory Approach. Applied Psychological Measurement 5, 383-397. abstract



Robert L. Linn, Michael V. Levine, C. Nicholas Hastings & James L. Wardrop (1981). Item Bias in a Test of Reading Comprehension. Applied Psychological Measurement 5, 159-173. abstract



Ronald K. Hambleton (1980). Contributions to Criterion-Referenced Testing Technology: An Introduction. Applied Psychological Measurement 4, 421-424. abstract



Rand R. Wilcox (1980). Determining the Length of a Criterion-Referenced Test. Applied Psychological Measurement 4, 425-446. abstract



Lorrie Shepard (1980). Standard Setting Issues and Methods. Applied Psychological Measurement 4, 447-467. abstract



Wim J. van der Linden (1980). Decision Models for Use with Criterion-Referenced Tests. Applied Psychological Measurement 4, 469-492. abstract



George B. Macready & C. Mitchell Dayton (1980). The Nature and Use of State Mastery Models. Applied Psychological Measurement 4, 493-516. abstract



Ross E. Traub & Glenn L. Rowley (1980). Reliability of Test Scores and Decisions. Applied Psychological Measurement 4, 517-545. abstract



Robert L. Linn (1980). Issues of Validity for Criterion-Referenced Measure. Applied Psychological Measurement 4, 547-561. abstract



Ronald A. Berk (1980). A Framework for Methodological Advances in Criterion-Referenced Testing. Applied Psychological Measurement 4, 563-573. abstract



Samuel Livingston (1980). Comments on Criterion-Referenced Testing. Applied Psychological Measurement 4, 575-581. abstract



Howard Wainer (1980). A Test of Graphicacy in Children. Applied Psychological Measurement 4, 331-340. abstract



Luis M. Laosa (1980). Measures for the Study of Maternal Teaching Strategies. Applied Psychological Measurement 4, 355-366. abstract



Robert B. Frary (1980). The Effect of Misinformation, Partial Information, and Guessing on Expected Multiple-Choice Test Item Scores. Applied Psychological Measurement 4, 79-90. abstract



Wim J. van der Linden (1979). Binomial Test Models and Item Difficulty. Applied Psychological Measurement 3, 401-411. abstract



D. Magnusson & G. Backteman (1978). Longitudinal Stability of Person Characteristics: Intelligence and Creativity. Applied Psychological Measurement 2, 481-490. abstract



R. R. Schmeck & F. D. Ribich (1978). Construct Validation of the Inventory of Learning Processes. Applied Psychological Measurement 2, 551-562. abstract



Robert T. Keller & Winford E. Holland (1978). A Cross-Validation Study of the Kirton Adaption-Innovation Inventory in Three Research and Development Organizations. Applied Psychological Measurement 2, 563-570. abstract



Wim J. van der Linden & Gideon J. Mellenbergh (1978). Coefficients for Tests from a Decision Theoretic Point of View. Applied Psychological Measurement 2, 119-134. abstract



Wim J. van der Linden & Gideon J. Mellenbergh (1977). Optimal Cutting Scores Using A Linear Loss Function. Applied Psychological Measurement 2, 593-599. abstract



Norman Frederiksen & William C. Ward (1978). Measures for the Study of Creativity in Scientific Problem-Solving. Applied Psychological Measurement 2, 1-24. abstract



Susan E. Whitely (1977). Information-Processing on Intelligence Test Items: Some Response Components. Applied Psychological Measurement 1, 465-476. abstract



Robyn M. Dawes (1977). Suppose We Measured Height With Rating Scales Instead of Rulers. Applied Psychological Measurement 1, 267-273. abstract; pdf



Susan E. Whitely (1977). Information-Processing on Intelligence Test Items: Some Response Components. Applied Psychological Measurement 1



P. W. Van Rijn, T. J. H. M. Eggen, B. T. Hemker & P. F. Sanders (2002). Evaluation of Selection Procedures for Computerized Adaptive Testing with Polytomous Items. Applied Psychological Measurement, 26, 393-411. abstract



Dimiter M. Dimitrov (2007). Least Squares Distance Method of Cognitive Validation and Analysis for Binary Items Using Their Item Response Theory Parameters. Applied Psychological Measurement, 31, 367-387. abstract

Ik ben bang dat dit allemaal geweldig ingewikkeld is, en volkomen irrelevant. De theoretische achtergrond is stimulus-response theorie, maar dat hoeft op zich nog niet verkeerd te zijn. Ik heb gene tijd om dit nu uit te zoeken.



Donald W. Zimmerman & Richard H. Williams (2003). A New Look at the Influence of Guessing on the Reliability of Multiple-Choice Tests. Applied Psychological Measurement, 27, 357-371. abstract



Theo J. J. M. Eggen & Angela J. Verschoor (2006). Optimal Testing With Easy or Difficult Items in Computerized Adaptive Testing. Applied Psychological Measurement, 30, 379-393. abstract



Wim van der Linden (2006). Equating Error in Observed-Score Equating. Applied Psychological Measurement, 30, 355-378. abstract



Wim van der Linden (2006). Equating Scores From Adaptive to Linear Tests. Applied Psychological Measurement, 30, 493-. abstract



Neil J. Dorans, Jinghua Liu & Shelby Hammond (2008). Anchor Test Type and Population Invariance: An Exploration Across Subpopulations and Test Administrations. Applied Psychological Measurement, 32, 81-97. abstract



Robert L. Brennan (2008). A Discussion of Population Invariance. Applied Psychological Measurement, 32, 102-114. abstract



Qing Yi, Deborah J. Harris & Xiaohong Gao (2008). A Discussion of Population Invariance of Equating. Applied Psychological Measurement, 32, 98-101. abstract

“If the conversions for various subgroups of interest are not comparable or population invariant, then the psychometric implication is that different conversions should be used for different groups. However, in practice, testing programs cannot use different linkings for different groups. In today’s social and political climate, it would be very difficult for a testing program to justify assigning different reported scores to two candidates from different groups who have the same number-correct score on the test. So if the results of population invariance studies show indications of population sensitivity, then great care needs to be taken in selecting a data collection design and a subpopulation (of the total testing population) for use for all item and test analyses and for score equating. And the subpopulation for which score comparability is expected to hold should be specified in the programs’ technical manual. Careful specification of the analysis population used for a test will improve score equity and improve scale stability across test administrations and test forms.”



Qing Yi, Deborah J. Harris & Xiaohong Gao (2008). Invariance of Equating Functions Across Different Subgroups of Examinees Taking a Science Achievement Test. Applied Psychological Measurement, 32, 62-80. abstract



Robert Semmes, Mark L. Davison & Catherine Close (2011). Modeling individual differences in numerical reasoning speed as a random effect of response time limits. Applied Psychological Measurement, 35, 433-446. abstract


Bij rekentoetsen is de vraag: toetsen we hier (verschillen in) rekenvaardigheid, intelligentie, of wat? Het antwoord op die vraag hangt ook af van de tijd die beschikbaar is om de toets af te leggen: heeft iedereen ruimschoots de tijd om het werk af te maken, of is de tijd zo beperkt dat een niet te verwaarlozen aantal deelnemers niet toekomt aan behoorlijk maken van alle opgaven? Vertaald naar de Nederlandse situatie bij de rekentoetsen die aan de examens in het middelbaar onderwijs worden toegevoegd: brengt de techniek van digitale afname van de toetsen de leerlingen in een situatie van te weinig tijd om alle opgaven behoorlijk te kunnen beantwoorden? Zo ja, dan is er een tijdsfactor in het spel. Bij digitale afname is er een ingewikkelde situatie die niet dezelfde is als beperkt beschikbare tijd voor de hele test: als een opgave niet onmiddellijk kan worden gemaakt, kan de leerling (in de huidige software die door het Cito wordt gebruikt) niet later nog eens terug naar een dergelijk opgave. Uit de literatuurlijst:



Wim J. van der Linden (2011). Setting time limits on tests. Applied Psychological Measurement, 35, 183-199. abstract




Jihyun Lee & James Corter (2011). Diagnosis of subtraction bugs using Bayesian networks. Applied Psychological Measurement, 35, 27-47. abstract




Timo M. Bechger, Gunter Maris & Ya Ping Hsiao (2010). Detecting Halo Effects in Performance-Based Examinations. Applied Psychological Measurement, 35, 27-47. abstract




Wim J. van der Linden & Marie Wiberg (2010). Local Observed-Score Equating With Anchor-Test Designs. Applied Psychological Measurement, 35, 27-47. abstract




Robert C. Daniel & Susan E. Embretson (2010). Designing Cognitive Complexity in Mathematical Problem-Solving Items. Applied Psychological Measurement, 35, 27-47. abstract




Susan Embretson (Ed.) (2010). Measuring psychological constructs. Advances in model-based approaches. American Psychological Association. site



William W. Cooley & Paul R. Lohnes (1976). Evaluation Research in Education. Irvington Publishers.



William W. Cooley & Paul R. Lohnes (1962). Multivariate Procedures for the Behavioral Sciences. Wiley. Lib. Congress 62-18990.



Daniel H. Robinson, Joel R. Levin, Leslie O'Ryan & Duane Halbur-Ramseyer (2001). Does Statistical Language Constitute a "Significant" Roadblock to Readers' Interpretations of Research Results?.Journal of Educational Psychology, 93, 646-654. abstract




AERA, APA & NCME (1999). The Standards for Educational and Psychological Testing. zie hier - niet geautoriseerde samenvatting



Anne E. Magurran & Brian J. McGill (Eds.) (2011). Biological Diversity. Frontiers in measurement and assessment. Oxford University Press.



Alexander W. Wiseman (2010). The uses of evidence for educational policy making: global contexts and international trends. Review of Research in Education, 34, 1-24.



Richard J. Murnane & John B. Willett (2011). Methods Matter. Improving Causal Inference in Educational and Social Science Research. Oxford University Press. [genoemd in blog 11 van de serie over realistisch rekenen]



Robert F. Dedrick, John M. Ferron, Melinda R. Hess, Kristine Y. Hogarty, Jeffrey D. Kromrey, Thomas R. Lang, John D. Niles, and Reginald S. Lee (2009). Multilevel Modeling: A Review of Methodological Issues and Applications. Review of Educational Research, 79, 69-102



G. van den Berg (1981). Onderwijskundig onderzoek: twee doelstellingen, één onderzoeksmodel. Pedagigische Studiën, 58, 213-225.



W. Wardekker (1959). Interdisciplinaire onderwijskunde: modellen voor een wetenschap. Pedagigische Studiën, 56, 183-196. Niet van belang.



Theresa Ann Sipe & William L. Curlette (Guest eds.) (1997). A meta-synthesis of factors related to educational achievement: A methodological approach to summarizing and synthesizing. International Journal of Educational Research, 25 #7, 583-698.



Richard P. Phelps (Ed.) (2009). Correcting fallacies about educational and psychological testing. APA. [in KB als eBook] [ UBL PEDAG. 64.b.44 ]




J. Tinbergen (1936). Grondproblemen der theoretische statistiek. De Erven F. Bohn.



Donald W. Zimmerman (2009). The Reliability of Difference Scores in Populations and Samples. Journal of Educational Measurement, 46, 19-42,



Stephen Gorard (2010) 'All evidence is equal: the flaw in statistical reasoning' Oxford Review of Education, 36, 63 -- 77.



Stephen B. Broomell & David V. Budescu (2009). Why are experts correlated? Decomposing correlations between judges. Psychometrika, 74, 531-553.



Rolf Haenni (2008). Aggregating referee scores: An algebraic approach. In U. Endriss and P. W. Goldberg: COMSOC'08, 2nd International Workshop on Computational Social Choice, 277-288, 2008. pdf



F. Roels (Ed.) (1928). Cinquième Conférence International de Psychotechnique Tenue à Utrecht. Comtes-Rendus. Dekker & v.d. Vegt.



Andrew H. Jazwinski (1970). Stochastic Processes and Filtering Theory. Academic Press. (Toepassingen: meetprocedures, foutencorrectie. )



Willem K. B. Hofstee (2009). Promoting intersubjectivity: a recursive-betting model of evaluative judgments. Netherlands Journal of Psychology, 65. abstract


Aantekeningen: toetsmodellen.htm#Hofstee_intersubjectivity



Tilmann Gneiting & Adrian E. Raftery (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359-378. pdf



William Meredith (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.

Amrein-Beardsley_2008



Audrey Amrein-Beardsley (2008). Methodological concerns about the education value-added system. Educational Researcher, 37, 65-75. Slavin_2008



Robert E. Slavin (2008). Perspectives on evidence-based research in education. Educational Resaercher, 37, 5-14. Howe_2009



Kenneth R. Howe (2009). Epistemology, methodology, and education sciences. Positivist dogmas, rhetoric, and the education science question. Educational Researcher, 38, 428-440. pdf

BDL



David J. Bartholomew, Ian J"changing conception. Deary & Martin Lawn (2009). The origin of factor scores: Spearman, Thomson and Bartlett. British Journal of Mathematical and Statistical Psychology. 62, 569-582. Leuk, historisch. BDL



A. Shreider (1964). Method of statistical testing. Monte Carlo method. Elsevier.

Morrison_2009



Keith Morrison (2009). Causation in Educational Research. Routledge.



Thomas D. Cook & Donald T. Campbell (1979). Quasi-Experimentation. Design & Analysis Issues for Field Settings. Rand McNally.



George A. Anastassiou (2010). Probabilistic Inequalities. World Scientific. isn 9789814280785 981428078X



Stephen Stark, Oleksandr S. Chernyshenko & Fritz Drasgow (2012). Examining the Effects of Differential Item (Functioning and Differential) Test Functioning on Selection Decisions: When Are Statistically Significant Effects Practically Important? Journal of Applied Psychology, 89, 497-508. abstract



Schmidt, Frank L., and Ryan D. Zimmerman (2004). A counterintuitive hypothesis about employment interview validity and some supporting evidence. Journal of Applied Psychology, 89, 553-561. abstract



Timothy A. Judge, Amy E. Colbert & Remus Ilies (2004). Intelligence and Leadership: A Quantitative Review and Test of Theoretical Propositions. Journal of Applied Psychology, 89, 542-552.



Frederick L. Oswald, Neal Schmitt, Brian H. Kim, Lauren J. Ramsay, and Michael A. Gillespie (2004). Developing a Biodata Measure and Situational Judgment Inventory as Predictors of College Student Performance. Journal of Applied Psychology, 89, 187-207. pdf



Gilian B. Yeo & Andrew Neal (2004). A Multilevel Analysis of Effort, Practice, and Performance: Effects of Ability, Conscientiousness, and Goal Orientation. Journal of Applied Psychology, 89, 231-247.abstract



Weekley, Jeff A., Frank Blake, Edward J. O'Connor, and Lawrence H. Peters (1985). A comparison of three methods of estimating the standard deviation of performance in dollars. Journal of Applied Psychology, 70, 122-126.



Richard R. Reilly & James W. Smither (1985). An Examination of Two Alternative Techniques to Estimate the Standard Deviation of Job Performance in Dollars. Journal of Applied Psychology, 70, 651-661. abstract



Burke, Michael J., and James T. Fredrick (1986). A comparison of economic utility estimates for alternative SDy estimation procedures. Journal of Applied Psychology, 71, 334-339.



C-L. C. Kulik & J. A. Kulik (1982). Effects of ability grouping on secondary school students: a meta-analysis of evaluation findings. American Educational Research Journal, 19, 415-428.



J. A. Kulik, R. L. Bangert-Drowns & C-L. C. Kulik (1984). Effectiveness of coaching for aptitude tests. Psychological Bulletin, 95, 179-188.



J. A. Kulik, Robert L. Bangert-Drowns, James A. Kulik & Chen-Lin C. Kulik (1983). Effects of coaching programs on achievement test performance. Review of Educational Research, 53, 571-585. abstract



Robert J. Mislevy (1993). A framework for studying differences between multiple-choice and free-response test items. In Randy Elliot Bennett and William C. Ward Construction versus choice in cognitive measurement (p. 75-106). Erlbaum.

This really is a miserable definition; Mislevy can do much bettr than this. It is miserable because it does not exclude anything; anything goes here. Nevertheless, it has appeared in print, and as such it reveals a kind of over-simplifying that tends to be typical of a lot of psychometric work. It seems the thinking goes into the models themselves, not in the situations they are supposedly representing.
The educational decisions, by the way, are strictly reserved to instutional representatives. Mislevy does not see students as making their own decisions, whether on the basis of test results, or any other information. This, in my opinion, is unprofessional neglect that has somehow come to be regarded as professional - notwithstanding that one chapter in Cronbach and Gleser, 1957, emphasizing the individual decision maker.



Gideon J. Mellenbergh & Wulfert P. van den Brink (1998). The Measurement of Individual Change. Psychological Methods, 3, 470-485. abstract



David J. Weiss and Shannon Von Minden (2011). Measuring Individual Growth With Conventional and Adaptive Tests. Journal of Methods and Measurement in the Social Sciences Vol. 2, No. 1, 80-101. pdf



J.B. Carlin & Rubin, D.B. (1991). Summarizing multiple-choice tests using three informative statistics. Psychological Bulletin, 110, 338-349. sbetabinomiaalmodel



J. G. C. Verheij (1994, ter publicatie aangeboden). An improved maximum likelihood procedure for estimating the parameters of the beta-binomial distribution. betabinomiaal [Ter publicatie aangeboden, maar ik weet niet in welk tijdschrift; googelen levert niets op, 2020] Er zit een hoop werk in. Ik heb er niet direct iets aan, want ik kan nergens naar verwijzen, maar een keer doornemen is misschien best aardig, houden dus maar.



Henry Rouanet (1996). Bayesian Methods for Assessing Importance of Effects. Psychological Bulletin, 119, 149-158. pdf



Henry Rouanet (1996). Bayesian Methods for Assessing Importance of Effects. Psychological Bulletin, 119, 149-158. pdf



Michael T. Kane (1992). An Argument-Based Approach to Validity. Psychological Bulletin, 112, 527-535. abstract




Ju-Whei Lee & J. Frank Yates (1992). How Quantity Judgment Changes as the Number of Cues Increases: An Analytical Framework and Review. Psychological Bulletin, 112, 363-377. abstract



Deborah A. Prentice and Dale T. Miller (1992). When Small Effects Are Impressive. Psychological Bulletin, 112, 160-164. pdf



Peter Z. Schochet & Hanley S. Chiang (online first 2012). What Are Error Rates for Classifying Teacher and School Performance Using Value-Added Models? Journal of Educational and Behavioral Statistics. abstract



Walter van Dyke Bingham (1937/1942). Aptitudes and Aptitude Testing. Harper & Brothers Publishers. abstract


(Niet genoemd door Anne Anastasi, 1984).



Lotte Schenk-Danzinger (1953). Entwicklungstests für das Schulalter. I. Teil Altersstufe 5-11 Jahre. Wien: Verlang für Jugend und Volk. Curieus boek, geen psychologie maar pedagogie. Giga veel testjes, alles is subjectief.



Barbara S. Plake (Ed.) (1984). Social and Technical Issues in Testing. Implications for Test Construction and Usage. Erlbaum. pdf's




Edward F. Alf & Donald D. Dorfman(1967). The classification of individuals into two criterion groups on the basis of a discontinuous payoff function. Psychometrika, 32, 115-123.



Ellen Condliffe Lagemann (2000). An Elusive Science: The Troubling History of Education Research. University of Chicago Press. isbn 0226467724 review short review 1997 article: http://www.jstor.org/discover/10.2307/1176271 -->



F. Allan Hanson (1993). Testing testing. Social consequences of the examined life. University of California Press.online


p. 81 APA-statement over selectie met leugendetector! [lie detector] Als 85 % correct, dan worden meer kandidaten ten onrechte voor leugenaar uitgemaakt dan er terecht als leugenaar worden geïdentificeerd. p. 114 Bentham's panopticum!



Royce R. Ronning, Jane C. Conoley, John A. Glover, and Joseph C. Witt (Eds.) (1987). The influence of cognitive psychology on testing. Buros-Nebraska Symposium on Measurement and Testing. Volume 3. Erlbaum. isbn 0898598982 open access



The chapters of all volumes of this series are available online as pdf's: http://digitalcommons.unl.edu/buroscogpsych/



Earl Hunt (1974): Quote the Raven? Nevermore! pp 129-158 in Lee W. Gregg (Ed.) (1974). Knowledge and Cognition. Erlbaum. goo.gl/a63ThD




Alfred Binet & Théodore Simon (1916/1973 reprint). The development of intelligence in children. (The Binet-Simon Scale). Translated by Elizabeth S. Kite. Reprint: New York Arno Press. isbn 0405051350 https://archive.org/details/developmentofint00bineuoft




R. W. van der Giessen (1957). Enkele aspecten van het probleem der predictie in de psychologie, speciaal met het oog op de selectie van militair personeel. Swets en Zeitlinger. proefschrift VU, stellingen


Goed overzicht van dit veld, in NL maar ook VK en US.



Abe D. Hofman, Brenda R. J. Jansen, Susanne M. M. de Mooij , Claire E. Stevenson and Han L. J. van der Maas (2018). A Solution to the Measurement Problem in the Idiographic Approach Using Computer Adaptive Practicing Intelligence, 14 open




Matthew J. Salganik and many, many others (2020). Measuring the predictability of life outcomes with a scientific mass collaboration open




Janne Adolf, Noémi K. Schuurman, Peter Borkenau, Denny Borsboom and Conor V. Dolan (2014). Measurement invariance within and between individuals: a distinct problem in testing the equivalence of intra- and inter-individual model structures. Front. Psychol., 19 September | https://doi.org/10.3389/fpsyg.2014.00883 open




Theo J.H.M. Eggen and Bernard P. Veldkamp (Editors) (2012). Psychometrics in Practice at RCEC. academia.edu




Denny Borsboom, Gideon J. Mellenbergh, and Jaap van Heerden(2003). The Theoretical Status of Latent Variables. Psychological Review Vol. 110, No. 2, 203–219 pdf




Thomas M. Haladyna & Steven M. Downing (2004). Construct-irrelevant Variance in High-Stakes Testing. Educational Measurement: Issues and Practice




Ross E. Traub (Winter 1997). Classical test theory in historical perspective.Educational Measurement: Issues and Practice, 8-14 pdf




Neil J. Dorans (2012). The Contestant Perspective on Taking Tests: Emanations From the Statue Within Educational Measurement: Issues and Practice December, https://doi.org/10.1111/j.1745-3992.2012.00250.x




Darrell Bock (2005). A brief history of item response theory. Measurement: Issues and Practice 10.1111/j.1745-3992.1997.tb00605.x abstract & scihub pdf




Co van Calcar & Bert Tellegen (1967), Gedragsbeoordeling en prestatie. Enschede, Pedagogisch Centrum, 1967. [rapport, mijn exemplaar zou wel eens heel zeldzaam kunnen zijn]


Opvattingen van leerkrachten 1e klas over o.a. zwakke lezers.



Walt Haney (1984). Testing reasoning and reasoning about testing. Review of Educational Research, 54, 597-654. abstract & scihub




Wim Pesch & Albert Ponsioen (2004). Flinterdunne en fllagrante Flynn-effecten bij licht verstandelijk gehandicapte kindeen. Aan beve;lingen voor het gebruik van de WISC-III De Psycholoog




Robert J. Sternberg (1981). Testing and cognitive psychology. AP, 36, 1181-1189. abstract




Sandra Scarr (1981). Testing for children. Assessment and the many determinants of intellectual competence. AP, 36, 1159-1166. abstract




Barbara Lerner (1981). The minimum competence testing movement. Social, scientific, and legal implications. AP, 36 abstract




Daniel J. Reschly (1981). Psychological testing in educational classification and placement. AP, 36, 1094-1102. abstract




Sternberg, R. J., Wagner, R. K., Williams, W. M., & Horvath, J. A. (1995). Testing common sense. American Psychologist, 50(11), 912–927. https://doi.org/10.1037/0003-066X.50.11.912abstract




Riet van Bork (2019). Interpreting psychometric models. Dissertation UvA (Denn Borsboom). read online




Wim J. van der Linden (2005). Classical test theory. In K. Kempf-Leonard (Ed.), Encyclopaedia of social measurement (Vol. 1) (pp. 301‑307). Academic Press.




Advancing Human Assessment. The Methodological, Psychological and Policy Contributions of ETS Editors: Bennett, Randy, von Davier, Matthias (Eds.)open access




Measuring the predictability of life outcomes with a scientific mass collaboration. Matthew J. Salganik and many others (2020) 8398–8403 | PNAS | April 14, 2020 | vol. 117 | no. 15 www.pnas.org/cgi/doi/10.1073/pnas.1915006117 open access; via Paige Harden 'The genetic lottery'




Denny Borsboom, Jan-Willem Romeijn and Jelte M. Wicherts (2008). Measurement invariance versus selection invariance: Is fair selection possible? Psychological Methods, 13, 75-98 pdf