For several reasons this publication is important to my own work. To begin with, it presents some history on standardized testing in the early twentieth century. Essentially, though, it is about the concept of measurement, and its abuses by psychologists and other social scientists. I am not interested in the abuses themselves, only insofar as they can have serious consequences as regards the ways we think it is just to test pupil achievement. The designer of achievement test items should make it clear how it is that these items measure what it is intended they should measure, and what the arguments for the intention are in the first place.
Another reason I should be interested in this work by Joel Michell is because of the spa-model: strategic preparation for achievement tests. A host of measurement issues is at stake in this model; I very urgently need to make it clear what they are, and how they affect the model itself, or its application in the search for optimal achievement testing.
"This book is about an error, an error in scientific method fundamental to quatitative psychology." (p. xi) This error, and another one put on top of it, became locked in in scientific practice. Michell sets out to research how this was possible, an exercise in philosophy of science. This looks interesting to me, but the point in studying these errors and their development is in being able from now on to avoid them: will the work of Michell make such a thing possible? Does the error really touch on educational measurement?
This is a very difficult book to read because of its message and organization. OK, it is about an error, so what? Show me this error and its consequences, and I will follow your explanation as to its origins and development. Michell does not seem to do this. I expect him to explain first what the error might mean for people being tested, or for particular kinds of decisions based on testscores. Instead, he tells us how mistaken de godfathers of psychometrics have been in their use ot the concept of measurement. In particular I want to know in what way 'educational measurement' (such as in Linn, 1989), might be 'in error.' So I first will check whether and where Michell tells me what might be this error in the SAT I or in its uses. If I do not find it, I will look into other publications of Michell for answers. Or publications about the work of Michell. Or the brilliant but equally difficult article by Lumsden in the Annual Review of Psychology 1975 or there about. Only then will I go on annotating the text of Michell.
What more do I expect to get from this book? I will note any and all definitions of basic concepts 'attribute,' 'psychological attribute,' 'measurement,' 'number,' 'length,' quantitative,' and so on, because I need a good collection of those for the book Test item design. Second, I will have to make schemes of relations between basic terms. Third, I will have to make an inventory of the work, ideas and mistakes of the many scientists and philosophers figuring in this book; this inventory will be my guide in studying psychometrics as well as philosophy in science, in years to come. Understand me well: I will not take Michell's position as the only one that is right or even possible, I will annotate the inventory based on work by others, among them Borsboom, Mellenbergh en Van Heerden, Lagemann, Dijksterhuis.
This is only the preface, of course. What I would like to see are examples of supposedly quantitative attributes that prove to be not so, and what that proof entails in terms of practical uses. I still do not know whether this book delivers on this point. It is possible to show whether or not 'intelligence' is a quantitative attribute (using the techniques mentioned in chapter 8). Has something of the sort been done yet?
"Quantitative psychologists presumed that the psychological attributes which they aspired to measure were quantitative. There is no question that presuming instead of testing was an error in scientific method." [p. xi] Fechner was the original presumer.
After having read the book once, it is gradually becoming clear to me that psychologists are using a toy definition of measurement (the Stevens legacy): any rule consistently assigning numbers to objects. I - and many others - have been brainwashed by the Lords and Novicks, and Kerlingers, of psychology into believing this crap. It explains also the fuss about construct validation, because somehow or other some sense must be made of those 'assignments.' See also 'The concept of validity' by Borsboom, Mellenbergh, and Van Heerden, deconstructing construct validity.
To get a feeling of the drama that is going on here, take a glance at Thorndike's influential 1904 book, where he is forcefully - page after page - selling the idea of 'measurement' in connection with psychological tests. Lagemann (2000) shows how the community of test developers in the US (the UK is quite another story) got hooked on the vast sums of money they were able to make in the first decennia of the twentieth century.
Michell, on p. xii, constructs the contrast between the classical idea of science as trying to find out how natural systems work, and "the view that success in science derives from from the solving of 'practical' problems." It is certainly true, given the history of psychological testing in the 20th century, that this is the second kind of science, while pretending to find out about 'intelligence,' etcetera. There is nothing wrong with trying to be practical, of course. In science there is the particular branch called engineering, which makes it possible that the Earth now sustains some seven billion people. (Wilbrink and Roos, 1991, on strategic science policy html). In psychology, however, as in the social sciences in general, 'practical science' itself influences the way psychological and social things are - Rosenthal effects, Pygmalion in the classroom, etcetera - a point not explicitly mentioned by Michell, I think. That is a pity, for evidently it is the case that psychological tests, as well as aptitude and achievement tests, have changed society - daily life, educational life - markedly compared to the situation one century ago. If the things these tests pretend to measure do not exist in the way social scientists tell us they do, society is in trouble.
p. xiv: Qualitative features of human life mentioned here. What are they? Examples? This level of abstractness is an ongoing problem in this book: what exacly demarcates quantitative from qualitative attributes? Quantitative attribrutes get explained, it is then left somewhat to the reader to fill in - by default - examples of attributes failing to be quantitative (which might simply be the case because no adequate methods have been found yet to show them to be quantitative). Think of sex as an evidently qualitative attribute. And that way discover that for a property to be qualitative is not the same as being immaterial.
What about the attributes educational assessment is about? Is there an authoritative list of what they are? Do the many 'standards' qualify as defining an 'attribute'? It seems we have identified a serious problem here, isn't it? It might be, of course, that the question has been wrongly put. But then, if educational assessment is not assessment of attributes, what might it be? One hypothesis is that assessment is part of a game that pupils and their teachers are engaged in, and that can be described by using James Coleman's social system theory (see my 1992 html for one such exercise). More directly, the game is between the individual pupil and the system, a situation that has been modeled by Van Naerssen (1970) html, en myself (the SPA project, Strategic Preparation for Achievement testing html). From the perspective of the pupil, are the achievement tests 'measurements'? The pupil is not trying to measure anything, she will probably try to prove a point in sitting the test. What is going on here, in what way is this still to be characterized as 'measurement in psychology'? That is what I will have to find out in analyzing Michell's text.
For recent work on (the concept of) causality see the publications of Judea Pearl. The concept of quantity is treated by Michell. The last statement cited is highly remarkable, and forebodes a disturbing journey through the work of a number of fine philosophers (on what it is te 'measure') like Campbell, Cohen and Nagel who seemingly lost the connection with their own philosophical roots.
This is the problem with most psychological tests, and this is not a discovery claimed by Michell. Item response theories, such as those using the Rasch model, claim to solve the problem by using probabilities. "If Rasch's hypothesis is correct, the estimates can be regarded as measures of the ability involved. Some psychologists claim to be able in this way to measure intellectual abilities." [p. 12]
The difficulty here is that the - my own - naive concept of measurement is quite different: measuring length is answering the question 'how long is'? What is completely automated and therefore not consciously available anymore is the original idea of measurement as a counting operation using a particular unit.
In the testing of intelligence the risc is to take it for granted that the test works quite the same as our measuring rod in the case of length. Which is not the case at all; there is no natural unit of intelligence independent of the test(s) used. (I am not summarizing Michell here)
Michell has a lot to say about this definition by Stevens. For me, it will do to observe that this definition does not exclude anything except, perhaps, random assignment of numberals. Therefore, it must be useless. The Stevens mantra has nevertheless been repeated countless times by psychologists and many others, and has received no competition whatsoever from any rival definition (not counting the work of Suppes, Luce and others as psychology).
This is a funny question. It has to be asked "before we can coclude that ay attribute is quantitative (and therefore measurable).
Lord and Novick did not pose the question. See Michell on this point: p. 21. "Their response to this problem is simply to stipulate that total test scores are interval scale measures of theoretical attributes and to state that to the extent that a set of test scores produce 'a good empirical predictor the stipulated interval scaling is justified' (p. 22 [in L&N]). They then see it as being a 'major problem of theoretical psychology .... to 'explain' the reason for the efficacy of any particular scaling that emerges from [such] empirical work' (p. 22). (...) No discussion of scientifically crucial tests figures at all in the text by Lord and Novick."
And so the concept of causality is made available.
What window on the mind would achievement tests offer? Are they designed to offer a window on the pupil's mind? I don't think so, knowing how they typically are constructed. Do you?
Warming up for the following chapters.
Reading this chapter I am contuinuously reminded of Whitehead's Process and reality, one of the great metaphysical works of the last century. Chapter three is very much about number systems, the concept of continuity, and other fundamental things and problems. It is not quite clear in what way this exposition exactly is relevant to the issue of measurement abuses in the social science community. It looks more like measurement issues were the driving force behind important developments in mathematics and physics in medieval and modern times.
Michell uses a paper by Otto Hölder "on the axioms of quantity and the theory of measurement." Roche (1998) does not mention this paper, hoe could that be? Is the Roche exposition totally independent of that of Michell? I will have to look into that. It might be Michell himself has commented on this in a later publication.
Michell uses the concept of length as his preferred case. The common sense conception is that lengths are observable attributes, and that it is straightforward to quantify them. Both are highly problematic however. And that seems to be one of the missions of this chapter: to make it clear that simple attributes like lengths and weights themselves are thereotical terms, not observables. Nowadays it is absolutely out of the question to measure length or weight simply by observing them. By the same token, that makes the 'assignment' of numbers to particular measurements a theoretical act also. The number system as such, of course, was not known in classical and medieval times: measurements were done by establishing proportions, not by assigning numbers. It might just be the case that proportions still have a life in folk math, I will have to look into that question (this has nothing to do with the developmental stages researched by Jean Piaget). What is highly intriguing is the insight that the way we automatically equate measurements with numbers indicating the quantity measured, is a way of handling numbers that is with us for only a very short time.
This chapter is a synthesis of developments in mathematics and physics - better: philosophy and logic - pertaining to measurement issues, it is not about original work done by Michell. He did publish an English translation of the Hölder paper, though (Michell and Ernst, 1996, 1997).
Wow, that is not difficult at all! Why doesn't Michell stuff his text with clear examples?
It seems there is more to 'attributes' than meets the eye. Is it possible to analogously construct:
Does sex have additive structure? Does it mean anything to 'add' masculine and feminine together? Evidently not. Does intelligence have additive structure? Being twice as intelligent as person a is not a statement that I can interpret meaningfully. Is a grade of 'A' two times as 'good' as one of 'B'? So, lots of attributes do not have additive structure. Because, Michell undoubtedly would say, there are no numerical relations to begin with.
So, while Michell takes lots of space to treat numerical relations and additive structure, for a good understanding of his main thesis it might have been smarter for him to treat the kind of ways a host of important attributes do not have these relations and structure.
Titles I have mentioned in my annotations.
Google Scholar Joel Michell
Michell and Ernst, 1996, 1997
Joel Michell (2000). Normal Science, Pathological Science and Psychometrics. Theory & Psychology, 10, 639-667. pdf Joel Michell (2003). Epistemology of Measurement: The Relevance of its History for Quantification in the Social Sciences. Social Science Information, 42, 515-534