Literatuur over partijdigheid: bias


Annotated by Ben Wilbrink (work in progress)



Ik heb hier even alles uit mijn literatuurbestand opgenomen dat waarschijnlijk interessant is. Heel veel gaat dubbelop: niet alleen worden dezelfde wielen telkens uitgevonden, ieder artikel begint al gauw met een overzicht van verschillende in omloop zijnde modellen. Veel artikelen zijn ook overzichtsartikelen, in die categorie zijn eigenlijk alleen de allerlaatste misschien interessant. Ik zal een en ander tzt uitwieden.


Arvey, R.D., & Faley, R.H. (1988). Fairness in selecting employees. (2nd edition) Amsterdam: Addison-Wesley.

Berk, Ronald A. (Ed.) (1982). Handbook of methods for detecting test bias. Baltimore: The Johns Hopkins University Press.

Bichel et al. (1975). Sex bias in graduate admissions: data from Berkeley. Science, 187, 398-404. Reprint in Fairley & Mosteller (1977): Statistics and public policy.

Boehm, V.R. (1978), Populations, preselection, and practicalities: a reply to Hunter and Schmidt. Journal of Applied Psychology, 63, 15-18. (Arguments are presented indicating that Hunter and Schmidt's (1978) conclusions are both statistically questionable and irrelevant to practical issues involved in differential prediction.)

K. Bügel en P. F. Sanders (1998). Richtlijnen voor de ontwikkeling van onpartijdige toetsen. Arnhem: Cito. pdf

Denny Borsboom, Jan-Willem Romeijn and Jelte M. Wicherts (2008). Measurement invariance versus selection invariance: Is fair selection possible? Psychological Methods, 13, 75-98 pdf

Sorel Cahan 1 Eyal Gamliel (2006). Definition and Measurement of Selection Bias: From Constant Ratio to Constant Difference. Journal of Educational Measurement, 43, 131 - 144. [nog niet gezien]

Gregory Camilli and Lorrie Shepard (1987). The inadequacy of ANOVA for detecting test bias. Journal of Educational Statistics, 12, 87-99. pdf

Cleary, T. Anne (1968). Test bias: prediction of grades of negro and white students in integrated colleges. Journal of Educational Measurement, 5, 115-124.first page JSTOR (toegang via KB lidmaatschap: hele artikel)

Cleary, & Hilton (1968). An investigation of test bias. Educational and Psychological Measurement, 28, 61-75.

Cleary, et al. (1975). Educational uses of tests with disadvantaged students. American Psychologist, 30, 15-41.

Cohen, A. S., & Kim, S-H (1993). A comparison of Lord's chi-square and Raju's area measures on detection of DIF. APM, 17, 39-52.

Cole, Nancy S. (1972). Bias in selection. ACT Research Reports, no. 51. Ook gepubliceerd onder dezelfde titel in Journal of Educational Measurement, 10, 237-255.

Cole, N.S., and Moss, P.A. (1989). Bias in test use. In Linn, R.L. (Editor) (1989). Educational Measurement. London: Collier Macmillan Publishers, 201-220.

Cole, Nancy S., and Michael J. Zieky (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369-382.

Cronbach, Lee J. (1976). Equity in selection - Where psychometrics and political philosophy meet. Journal of Educational Measurement, 13, 31-42.

Crooks (1972). An investigation of sources of bias in the prediction of job performance. A six-year study. Proceedings of the inivitational conference, E.T.S.

Darlington, R.B. (1971). Another look at 'cultural fairness'. Journal of Educational Measurement, 8, 71.

Drasgow (1982). Biased test items and differential validity. Psychological Bulletin, 95, 526-531.

Edith van Eck, Ard Vermeulen en Ben Wilbrink (1994). Doelmatigheid en partijdigheid van psychologisch onderzoek bij de selectie van schoolleiders in het primair onderwijs. Amsterdam: SCO-Kohnstamm Instituut. (rapport 359) [Hoofdstuk 3. Het psychologisch onderzoek html; Hoofdstuk 5. Seksepartijdigheid en rendement html]

Einhorn, H. J., and A. R. Bass (1971). Methodological considerations relevant to discrimination in employment testing. Psychological Bulletin, 75, 261-269.

Feingold, A. (1994). Gender differences in personality: a meta-analysis. Psychological Bulletin, 116, 429-456.

Flaugher, The many definitions of test bias. American Psychologist, 1978, 33, 671- .

Henk Van Der Flier, Gideon J. Mellenbergh, Herman J. Adèr, Marina Wijn (1984). An Iterative Item Bias Detection Method. Journal of Educational Measurement, 21, 131-145.

Frazer, Miller and Epstein (1975), Bias in prediction: a test of three models with elementary school children. Journal of Educational Psychology, 67, 490-494.

Mark J. Gierl, Jeffrey Bisanz, Gay L. Bisanz, Keith A. Boughton (2003). Identifying Content and Cognitive Skills That Produce Gender Differences in Mathematics: A Demonstration of the Multidimensionality-Based DIF Analysis Paradigm. Journal of Educational Measurement, 40, 281-306. jstor

Mark J. Gierl, Yinggan Zheng, and Ying Cui (2008). Using the attribute hierarchy method to identify and interpret cognitive skills that produce group differences. Journal of Educational Measurement, 45, 65-89. pdf in a free sample (#1, 2008), as of april 2009.

Gifford, B.R. (Editor, 1989). Test policy and test performance: education, language and culture. National Commission on Testing and Public Policy. Dordrecht: Kluwer Academic Publishers.

Goldman and Hewitt (1976). Predicting the success of black, chicano, oriental and white college students. Journal of Educational Measurement, 13, 107-118.

Gross, and Su (1975). Defining a 'fair' or 'unbiased' selection model: a quesion of utilities. Journal of Applied Psychology, 60, 345-351.

Hedges, L. V., & Friedman, L. (1993). Gender differences in variability in intellectual abilities: a reanalysis of Feingold's results. Review of Educational Research 63, 94-105.

Helms, J.E. (1992). Why is there no study of cultural equivalence in standardized cognitive ability testing? American Psychologist, 47, 1083-1101.

Paul W. Holland (1985?). On the study of differential item performance without IRT. pdf

Hook and Cook (1979). Equity theory and the cognitive ability of children. Psychological Bulletin, 86, p. 429.

Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 72-98.

Hunter, J.E., Schmidt, F.L., and Rauschenberger, J.M. (1977). Fairness of psychological tests: imlications of four definitions for selection utility and minority hiring. Journal of Applied Psychology, 62, 245-260.

Hunter, Schmidt & Hunter (1979). Differential validity of employment tests by race: a comprehensive review and analysis. Psychological Bulletin, 86, 721-735.

Ironson, Gail H., and Michael J. Subkoviak (1979).A comparison of several methods of assessing item bias. Journal of Educational Measurement 1979, 16, 209-226.

Ironson, G.H., Guion, R.M., and Ostrander, M. (1982). Adverse impact from a psychometric perspective. Journal of Applied Psychology, 67, 419-432. (Applying latent trait theory to an analysis of a 64-item multiple choice skill test administered to 1,035 police recruits, we illustrate how two shorter tests measuring the same attribute, but having different test characteristic curves, have different degrees of adverse impact. ... We propose that the concept of adverse impact be redefined in terms of the degree to which test scores distort any underlying true subgroup differences in the attribute measured.)

Jensen, A. R. (1980). Bias in mental testing. London: Methuen.

Kaye, D. (1982). Statistical evidence of discrimination. Journal of the American Statistical Association, 77, 773-783.

Frank Kok (1988). Vraagpartijdigheid. Methodologische verkenningen. Proefschrift UvA. SCO-publicatie 88.

Frank G. Kok, Gideon J. Mellenbergh, Henk Van Der Flier (1985). Detecting Experimentally Induced Item Bias Using the Iterative Logit Method. Journal of Educational Measurement, 22, 295-303.Jstor

Ledvinka, J., Markos, V.H., & Ladd, R.T. (1982). Long-range impact of 'fair selection' standards on minority employment. Journal of Applied Psychology, 67, 18-36.

Lewy (1973). Discrimination among individuals versus discrimination among groups. Journal of Educational Measurement, 10, 19-24.

Linn, Robert L (1973). Fair test use in selection. Review of Educational Research, 43, 139-163.

Linn, Robert L. (1976). In search of fair selection procedures. Journal of Educational Measurement, 13, 53-58.

Linn, Robert L. (1978). Single-group validity, differential validity, and differential prediction. Journal of Applied Psychology, 63, 507-512.

Linn, Robert L. (1984). Selection bias: multiple meanings. Journal of Educational Measurement, 21, 33-47.

Linn, Robert L, & Harnisch, D.L. (1981). Interactions between item content and group membership on achievement test items. Journal of Educational Measurement, 18, p. 109-..

Linn, R. L., and C. N. Hastings (1984). Group Differentiated Prediction. Applied Psychological Measurement Vol. 8, No. 2, Spring 1984, pp. 165-172. abstr Studies of predictive bias have frequently shown that a prediction equation based on majority group members tends to overpredict the criterion performance of minority group members. Two statistical artifacts that may cause the overprediction finding are reviewed and evaluated using data for black and white students at 30 law schools. It is shown that (1) the degree of overprediction decreases as the predictive accuracy for white students increases, and (2) that overprediction can be caused by the effects of selection on variables not included in the regression model. Use of Heckman's (1979) procedure to adjust the estimates of the regression parameters was found to essentially eliminate overprediction. p. 165: Predictive bias has been the focus of a substantial number of studies in a wide variety of selection situations, including military, employment, and educational settings. The basic paradigm of these studies is quite familiar by now. Within-group regression equations are computed and the standard errors of prediction, the slopes, and the intercepts are compared. If different prediction systems are obtained for two groups, e.g., a minority group and a majority group, or men and women, then the use of an equation based upon one group will result in systematic errors of prediction when applied to the other group. The natural question is then: What is the magnitude and direction of those systematic errors? A common approach to answering this question is to use the majority group prediction equation to obtain predictions for values of the predictors equal to the minority group means and to compare these predictions to the actual minority group mean on the criterion. Alternatively, predictions based on the two equations may be made for various combinations of the two predictors to define regions where one equation yields higher predictions than the other. The naive expectation, in keeping with a belief that tests are biased against minority group members, was that the predicted criterion performance from the majority group equation would be lower than the actual performance of minority group members. That is, that there would be a bias against minority group members in the sense that their criterion performance would be underpredicted. However, the results of most studies run counter to this expectation. The bulk of the evidence shows either no difference in the predictions from minority and majority group equations or that the majority group equation tends to overpredict the minority group performance (Linn, 1982; Schmidt & Hunter, 1981). These results led Schmidt and Hunter (198 1, p. 1128) to conclude that "cognitive ability tests . . . are fair to minority group applicants in the sense that they do not underestimate expected job performance of minority groups."

Gitta H. Lubke, Conor V. Dolan, Henk Kelderman, Gideon J. Mellenbergh (2003). On the relationship between sources of within- and between-group differences and measurement invariance in the common factor model. Intelligence 31, 543–566

Gideon J. Mellenbergh (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118. Jstor

Mellenbergh, G. J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127–143.

Meredith, William (1965). A method for studying differences between groups. Psychometrika, 30, 15-30.

Meredith, William (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525-543.

Meredith, William, and Roger E. Millsap (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289-311.

Millsap, Roger E. (2007). Invariance in measurement and prediction revisited. Psychometrika, 72, 461-473. fc

Millsap, Roger E., and Howart T. Everson (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334. [nog niet gezien, moet nog een pdf zien te bemachtigen, of gewoon een fotokopie]

Millsap, Roger E., and Oi-Man Kwok (2004). Evaluating the Impact of Partial Factorial Invariance on Selection in Two Populations. Psychological Methods, 9, 93-115.

Moran, M. P. (1990). The problem of cultural bias in personality assessment. In Reynolds, C. R., & Kamphaus, R. W. (Eds.) (1990). Handbook of psychological and educational assessment of children. Personality, behavior, & context. London: The Guildford Press 491-523.

Novick, Melvin R., & D. D. Ellis (1977). Equal opportunity in educational and employment selection. American Psychologist, 32, 306-320.

Novick, Melvin R., & Nancy S. Petersen (1976). Towards equalizing educational and employment opportunity. Journal of Educational Measurement, 13, 77-88.

Oppler, S.H., Campbell, J.P., Pulakos, E.D., & Borman, W.C. (1992). Three approaches to the investigation of subgroup bias in performance measurement: review, results, and conclusions. Journal of Applied Psychology, 77, 201-217.

Steven Osterlind (1987). Psychometric validity for test bias in the work of Arthur Jensen. In Sohan Modgil and Celia Modgil (Eds) (1987). Arthur Jensen. Consensus and controversy (191-198). The Falmer Press. (Shepard replies to Osterlind, Osterlind replied to Shepard, Gordon replies to Shepard Gordon replies to Osterlind, Scheuneman replies to Osterlind, Osterlind replies to Gordon, Osterlind replies to Scheuneman; 199-211) Petersen, Nancy S., & Melvin R. Novick (1976). An evaluation of some models for culture-fair selection. Journal of Educational Measurement, 13, 3-30.

Rudner, Lawrence M., Pamela R. Getson & David L. Knight (1980). Biased item detection techniques. Journal of Educational Statistics, 5, 213-233.

Rudner, Lawrence M., Pamela R. Getson & David L. Knight (1980). A Monte Carlo Comparison of Seven Biased Item Detection Techniques. Journal of Educational Measurement, 17, 1-10.

Sawyer, R.L., Cole, N.S., & Cole, J.W.L. (1976). Utilities and the issue of fairness in a decision theoretic model for selection. Journal of Educational Measurement, 13, 59-76.

Janice Scheuneman (1979). A Method of Assessing Bias in Test Items A Method of Assessing Bias in Test Items. Journal of Educational Measurement, 16, 143-152

Janice Dowd Scheuneman (1982). A posteriori analysis of biased items. In Ronald A. Berk: Handbook of methods for detecting test bias (pp. 180-198). The Johns Hopkins University Press.

Janice Dowd Scheuneman (1979). An Experimental, Exploratory Study of Causes of Bias in Test Items. Journal of Educational Measurement, 24, pp. 97-118

Janice Dowd Scheuneman (1987). An argument opposing Jansen on test bias: The psychological aspects. In Sohan Modgil and Celia Modgil (Eds) (1987). Arthur Jensen. Consensus and controversy (155-170). The Falmer Press. (Reply by Gordon, Reply to Gordon; 171-175) Janice Dowd Scheuneman and Kalle Gerritz (1990). Using Differential Item Functioning Procedures to Explore Sources of Item Difficulty and Group Performance Characteristics. Journal of Educational Measurement, 27, 109-131.

Jstor Tamara van Schilt-Mol (2007). Differential Item Functioning en Itembias in de Cito-Eindtoets Basisonderwijs. Aksant Academic Publishers. Proefschrift Universiteit Tilburg.

Schmitt, A.P., and Dorans, N.J. (1990), Differential item functioning for minority examinees on the SAT. Journal of Educational Measurement 27, 67-80.

Shepard et al. (1981). Comparison of procedures for detecting test-item bias with both internal and external ability criteria. Journal of Educational Statistics, 6, 317-375.

Lorrie Shepard (1982). Definitions of bias. In Ronald A. Berk: Handbook of methods for detecting test bias (pp. 9-30). The Johns Hopkins University Press.

Lorrie Shepard, Gregory Camilli and David M. Williams (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics, 9, 93-128. pdf

Lorrie A. Shepard (1987). The case for bias in tests of achievement and scholastic aptitude. In Sohan Modgil and Celia Modgil (Eds) (1987). Arthur Jensen. Consensus and controversy (177-190). The Falmer Press. Lorrie Shepard, Gregory Camilli and David M. Williams (1985). Validity of Approximation Techniques for Detecting Item Bias Validity of Approximation Techniques for Detecting Item Bias. Journal of Educational Measurement, 22, 77-105.

Gary Skaggs, Robert W. Lissitz (1992). The Consistency of Detecting Item Bias across Different Test Administrations: Implications of Another Failure. Journal of Educational Measurement, 29, 227-242.

Stanley (1971). Predicting college success of the educationally disadvantaged. Science, 171, 640-647. Reprinted in Aiken (1973: 130).

Martha L. Stocking, Ida Lawrence, Miriam Feigenbaum, Thomas Jirele, Charles Lewis, Thomas Van Essen (2002). An Empirical Investigation of Impact Moderation in Test Construction. Journal of Educational Measurement, 39, 235-252.

Thomas, G.E. (1980). Race and sex group equity in higher education: institutional and major field enrollment statuses. American Educational Research Journal, 17, 171-181.

Thorndike, Robert L. (1971). Concepts of culture-fairness. Journal of Educational Measurement, 8, 63-70.

Toepasbaarheid van psychologische tests bij allochtonen. Rapport van de testscreeningscommissie ingesteld door het LBR in overleg met het NIP. Utrecht: Landelijke Bureau Racismebestrijding, 1990.

Henny Uiterwijk heeft bij het Cito interessante studies gedaan, die zijn helaas (behalve de samenvatting van zijn proefschrif) niet op de site avn het Cito beschikbaar, en ik heb ze nog niet in hard copy verzameld.

Wicherts, J. M., Dolan, C. V., & Hessen, D. J.(2005). Stereotype threat and group differences in test performance: A question of measurement invariance. Journal of Personality and Social Psychology, 89, 696-716.

Jelte M. Wicherts, Conor V. Dolan, David J. Hessen, Paul Oosterveld, G. Caroline M. van Baal, Dorret I. Boomsma, Mark M. Span (2004). Are intelligence tests measurement invariant over time?Investigating the nature of the Flynn effect. Intelligence 32) 509–537.

Jelte M. Wicherts & Roger E. Millsap (2009). The absence of underprediction does not imply the absence of measurement bias. American Psychologist, 64, 281-283.

Wilbrink (1968). Multiple discriminant analyse van de Cattell 16 P.F.Q. voor studenten in zeven studierichtingen aan de T. H. E. Eindhoven: Groep Onderwijsresearch. (verslag stage-onderzoek, niet gepubliceerd) html



Meer literatuur ouder, technischer, meer van hetzelfde)


P. W. Holland and D. T. Thayer (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer and H. Braun: Test validity. Erlbaum. (pp. 129-146). questia

Paul W. Holland and Howard Wainer (1993). Differential Item Functioning. Erlbaum. questia

Xiaohui Wang, Eric T. Bradlow, Howard Wainer and Eric S. Muller (2008). A Bayesian method for studying DIF: A cautionary tale filled with surprises and delights. Journal of Educational and Behavioral Statistics, 33, 363-384.



3 juli 2009 \ contact ben at at at benwilbrink.nl    

Valid HTML 4.01!       http://www.benwilbrink.nl/literature/bias.htm