Joel Michell (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge University Press.


Ben Wilbrink

For several reasons this publication is important to my own work. To begin with, it presents some history on standardized testing in the early twentieth century. Essentially, though, it is about the concept of measurement, and its abuses by psychologists and other social scientists. I am not interested in the abuses themselves, only insofar as they can have serious consequences as regards the ways we think it is just to test pupil achievement. The designer of achievement test items should make it clear how it is that these items measure what it is intended they should measure, and what the arguments for the intention are in the first place.

Another reason I should be interested in this work by Joel Michell is because of the spa-model: strategic preparation for achievement tests. A host of measurement issues is at stake in this model; I very urgently need to make it clear what they are, and how they affect the model itself, or its application in the search for optimal achievement testing.


"This book is about an error, an error in scientific method fundamental to quatitative psychology." (p. xi) This error, and another one put on top of it, became locked in in scientific practice. Michell sets out to research how this was possible, an exercise in philosophy of science. This looks interesting to me, but the point in studying these errors and their development is in being able from now on to avoid them: will the work of Michell make such a thing possible? Does the error really touch on educational measurement?

This is a very difficult book to read because of its message and organization. OK, it is about an error, so what? Show me this error and its consequences, and I will follow your explanation as to its origins and development. Michell does not seem to do this. I expect him to explain first what the error might mean for people being tested, or for particular kinds of decisions based on testscores. Instead, he tells us how mistaken de godfathers of psychometrics have been in their use ot the concept of measurement. In particular I want to know in what way 'educational measurement' (such as in Linn, 1989), might be 'in error.' So I first will check whether and where Michell tells me what might be this error in the SAT I or in its uses. If I do not find it, I will look into other publications of Michell for answers. Or publications about the work of Michell. Or the brilliant but equally difficult article by Lumsden in the Annual Review of Psychology 1975 or there about. Only then will I go on annotating the text of Michell.

What more do I expect to get from this book? I will note any and all definitions of basic concepts 'attribute,' 'psychological attribute,' 'measurement,' 'number,' 'length,' quantitative,' and so on, because I need a good collection of those for the book Test item design. Second, I will have to make schemes of relations between basic terms. Third, I will have to make an inventory of the work, ideas and mistakes of the many scientists and philosophers figuring in this book; this inventory will be my guide in studying psychometrics as well as philosophy in science, in years to come. Understand me well: I will not take Michell's position as the only one that is right or even possible, I will annotate the inventory based on work by others, among them Borsboom, Mellenbergh en Van Heerden, Lagemann, Dijksterhuis.

Quantitative attributes are attributes having a quite specific kind of structure. [p. xi]

This is only the preface, of course. What I would like to see are examples of supposedly quantitative attributes that prove to be not so, and what that proof entails in terms of practical uses. I still do not know whether this book delivers on this point. It is possible to show whether or not 'intelligence' is a quantitative attribute (using the techniques mentioned in chapter 8). Has something of the sort been done yet?
"Quantitative psychologists presumed that the psychological attributes which they aspired to measure were quantitative. There is no question that presuming instead of testing was an error in scientific method." [p. xi] Fechner was the original presumer.
After having read the book once, it is gradually becoming clear to me that psychologists are using a toy definition of measurement (the Stevens legacy): any rule consistently assigning numbers to objects. I - and many others - have been brainwashed by the Lords and Novicks, and Kerlingers, of psychology into believing this crap. It explains also the fuss about construct validation, because somehow or other some sense must be made of those 'assignments.' See also 'The concept of validity' by Borsboom, Mellenbergh, and Van Heerden, deconstructing construct validity.

... mainstream quantitative psychologists (...) presumed that psychological attributes are quantitative. [p. xii]


... measurement is simply the assignment of numerals to objects and events according to rule.
This definition was proposed by by the psychologist Stanley Smith Stevens in 1946 [Michell, p. xii]

... this second error [the definition of measurement] disguises the first [presuming psychological attributes to be quantitative] so successfully and has persisted within psychology now for more than half a century ... [p. xii]

To get a feeling of the drama that is going on here, take a glance at Thorndike's influential 1904 book, where he is forcefully - page after page - selling the idea of 'measurement' in connection with psychological tests. Lagemann (2000) shows how the community of test developers in the US (the UK is quite another story) got hooked on the vast sums of money they were able to make in the first decennia of the twentieth century.

Michell, on p. xii, constructs the contrast between the classical idea of science as trying to find out how natural systems work, and "the view that success in science derives from from the solving of 'practical' problems." It is certainly true, given the history of psychological testing in the 20th century, that this is the second kind of science, while pretending to find out about 'intelligence,' etcetera. There is nothing wrong with trying to be practical, of course. In science there is the particular branch called engineering, which makes it possible that the Earth now sustains some seven billion people. (Wilbrink and Roos, 1991, on strategic science policy html). In psychology, however, as in the social sciences in general, 'practical science' itself influences the way psychological and social things are - Rosenthal effects, Pygmalion in the classroom, etcetera - a point not explicitly mentioned by Michell, I think. That is a pity, for evidently it is the case that psychological tests, as well as aptitude and achievement tests, have changed society - daily life, educational life - markedly compared to the situation one century ago. If the things these tests pretend to measure do not exist in the way social scientists tell us they do, society is in trouble.

p. xiv: Qualitative features of human life mentioned here. What are they? Examples? This level of abstractness is an ongoing problem in this book: what exacly demarcates quantitative from qualitative attributes? Quantitative attribrutes get explained, it is then left somewhat to the reader to fill in - by default - examples of attributes failing to be quantitative (which might simply be the case because no adequate methods have been found yet to show them to be quantitative). Think of sex as an evidently qualitative attribute. And that way discover that for a property to be qualitative is not the same as being immaterial.

What about the attributes educational assessment is about? Is there an authoritative list of what they are? Do the many 'standards' qualify as defining an 'attribute'? It seems we have identified a serious problem here, isn't it? It might be, of course, that the question has been wrongly put. But then, if educational assessment is not assessment of attributes, what might it be? One hypothesis is that assessment is part of a game that pupils and their teachers are engaged in, and that can be described by using James Coleman's social system theory (see my 1992 html for one such exercise). More directly, the game is between the individual pupil and the system, a situation that has been modeled by Van Naerssen (1970) html, en myself (the SPA project, Strategic Preparation for Achievement testing html). From the perspective of the pupil, are the achievement tests 'measurements'? The pupil is not trying to measure anything, she will probably try to prove a point in sitting the test. What is going on here, in what way is this still to be characterized as 'measurement in psychology'? That is what I will have to find out in analyzing Michell's text.

Chapter 1. Numerical data and the meaning of measurement.


Two such [categorial features], of fundamental importance to theories in physics, are causality and quantity. The category of causality underwrites the experimental method, that of quatity, measurement.

The relationship between quantity, as a category of being, and measurement, as a method of science has never been rigorously examined.[p. 3]

For recent work on (the concept of) causality see the publications of Judea Pearl. The concept of quantity is treated by Michell. The last statement cited is highly remarkable, and forebodes a disturbing journey through the work of a number of fine philosophers (on what it is te 'measure') like Campbell, Cohen and Nagel who seemingly lost the connection with their own philosophical roots.

intellectual abilities


Obviously, with the exception of the two possible extreme scores on any test, two people could get the same total score by getting different items correct. [p. 9-10]

This is the problem with most psychological tests, and this is not a discovery claimed by Michell. Item response theories, such as those using the Rasch model, claim to solve the problem by using probabilities. "If Rasch's hypothesis is correct, the estimates can be regarded as measures of the ability involved. Some psychologists claim to be able in this way to measure intellectual abilities." [p. 12]


i, may be masured relative to a unit, Aj. The measure of Ai relative to Aj is r if and only if Ai/Aj is r. That is, quite generally, measurement is just the process of discovering or estimating the measure of some magnitude of a quantitative attribute relative to a given unit. That is, measurement is the discovery or estimation of the ratio of some magnitude of a quantitative attribute to a unit (a unit being, in principle, any magnitude of the same quantitative attribute). [p. 14]

measurement the discovery or estimation of the ratio of a magnitude of a quantity to a unit of the same quantity [p. 222 glossary]

The difficulty here is that the - my own - naive concept of measurement is quite different: measuring length is answering the question 'how long is'? What is completely automated and therefore not consciously available anymore is the original idea of measurement as a counting operation using a particular unit.
In the testing of intelligence the risc is to take it for granted that the test works quite the same as our measuring rod in the case of length. Which is not the case at all; there is no natural unit of intelligence independent of the test(s) used. (I am not summarizing Michell here)


Measurement is the assignment of numerals to objects or events according to (any) rule. [p. 15]

\ Michell has a lot to say about this definition by Stevens. For me, it will do to observe that this definition does not exclude anything except, perhaps, random assignment of numberals. Therefore, it must be useless. The Stevens mantra has nevertheless been repeated countless times by psychologists and many others, and has received no competition whatsoever from any rival definition (not counting the work of Suppes, Luce and others as psychology).

What are the marks of quantity?

This is a question which those who accept Stevens' definition will not understand. It emphasises the fundamental, practical difference between the two concepts of measurement. [p. 19]

This is a funny question. It has to be asked "before we can coclude that ay attribute is quantitative (and therefore measurable).
Lord and Novick did not pose the question. See Michell on this point: p. 21. "Their response to this problem is simply to stipulate that total test scores are interval scale measures of theoretical attributes and to state that to the extent that a set of test scores produce 'a good empirical predictor the stipulated interval scaling is justified' (p. 22 [in L&N]). They then see it as being a 'major problem of theoretical psychology .... to 'explain' the reason for the efficacy of any particular scaling that emerges from [such] empirical work' (p. 22). (...) No discussion of scientifically crucial tests figures at all in the text by Lord and Novick."

In their quest for mental measurement, psychologists have contrived devices (tests or experimental situations) which, when appropriately applied, yield numerical data. These devices are treated as windows upon the mind, as if in the fact of yielding numerical data they revealed quantitative attributes of the mind. However, the windows upon the mind presumption dissolves the distinction between cause and effect, in this case the attributes of the mental system causing behaviour and attributes of the effects of the effects this behaviour has upon the devices contrived. That the latter posses quantitative features in no way entails that that all of the former must. [p. 21-22]

And so the concept of causality is made available.

What window on the mind would achievement tests offer? Are they designed to offer a window on the pupil's mind? I don't think so, knowing how they typically are constructed. Do you?

Chapter 2. Quantitative psychology's intellectual inheritance

Warming up for the following chapters.

Euclid's concept of ratio provided a principled rationale for the application of aithmetic to continuous magnitudes and, hence, of measurement itself. [p. 32]

Chapter 3. Quantity, number and measurement in science

Reading this chapter I am contuinuously reminded of Whitehead's Process and reality, one of the great metaphysical works of the last century. Chapter three is very much about number systems, the concept of continuity, and other fundamental things and problems. It is not quite clear in what way this exposition exactly is relevant to the issue of measurement abuses in the social science community. It looks more like measurement issues were the driving force behind important developments in mathematics and physics in medieval and modern times.

Michell uses a paper by Otto Hölder "on the axioms of quantity and the theory of measurement." Roche (1998) does not mention this paper, hoe could that be? Is the Roche exposition totally independent of that of Michell? I will have to look into that. It might be Michell himself has commented on this in a later publication.

Michell uses the concept of length as his preferred case. The common sense conception is that lengths are observable attributes, and that it is straightforward to quantify them. Both are highly problematic however. And that seems to be one of the missions of this chapter: to make it clear that simple attributes like lengths and weights themselves are thereotical terms, not observables. Nowadays it is absolutely out of the question to measure length or weight simply by observing them. By the same token, that makes the 'assignment' of numbers to particular measurements a theoretical act also. The number system as such, of course, was not known in classical and medieval times: measurements were done by establishing proportions, not by assigning numbers. It might just be the case that proportions still have a life in folk math, I will have to look into that question (this has nothing to do with the developmental stages researched by Jean Piaget). What is highly intriguing is the insight that the way we automatically equate measurements with numbers indicating the quantity measured, is a way of handling numbers that is with us for only a very short time.

This chapter is a synthesis of developments in mathematics and physics - better: philosophy and logic - pertaining to measurement issues, it is not about original work done by Michell. He did publish an English translation of the Hölder paper, though (Michell and Ernst, 1996, 1997).

defining the concept of quantity

Indeed, as is evident,different lengths stand in numerical relations to one another (e.g., the length of X is 12 times the length known as the centimetre). It is the possibility of this sort of relation between different levels of an attribute, one level being r times another (where r is a positive number), that distinguishes quantitative fro non-quantitative attributes. Non-quantitative attributes do not stand in numerical relations of this sort to one another. [p. 48]

sex: a non-qualitative attribute

I will think of an example here myself, because Michell keeps on talking about length. A non-qualitative attribute is sex: being male is not one r times 'sexier' than being female.

Wow, that is not difficult at all! Why doesn't Michell stuff his text with clear examples?


Where we deal with a range of properties, all of the same general kind, such as the class of all lengths [or of all sexes, b.w.] the class constitutes what is here menat by an attribute.

Note:Some attributes, such as the lengths of objects, are ranges of properties; others, such as the distance between two points in space, are ranges of relations.[p. 48]

It seems there is more to 'attributes' than meets the eye. Is it possible to analogously construct:


It is possible to hypothesize 'intelligence' to be a quantitative property or persons. Then, what does it mean to say two persons differ in intelligence: is this a distance, and therefore a range of relations also? Michell surely will answer this one, let's watch out for it.


Michell seems to reserve this term to relations of the kind 'this object is 12 times the centimetre.' Be warned.
Numerical relations require additive structure. [p. 48]


Michell presents this concept [p. 48-9] in a highly abstract way. Which is a sensible thing to do, of course, but in no way helps the naive reader to understand what is so special about this addivity [of length], which we know already must be true: for any lengths a and b, there is a length c = a + b, etcetera.

Does sex have additive structure? Does it mean anything to 'add' masculine and feminine together? Evidently not. Does intelligence have additive structure? Being twice as intelligent as person a is not a statement that I can interpret meaningfully. Is a grade of 'A' two times as 'good' as one of 'B'? So, lots of attributes do not have additive structure. Because, Michell undoubtedly would say, there are no numerical relations to begin with.
So, while Michell takes lots of space to treat numerical relations and additive structure, for a good understanding of his main thesis it might have been smarter for him to treat the kind of ways a host of important attributes do not have these relations and structure.

showing how magnitudes of a quantity relate to numbers

the theory that the the area of a rectangle is its length times its breadth

Chapter 8. Quantitative psychology and the revolution in measurement theory.

For example, verbal ability is the ability to do well in verbal tasks. Sometimes the best we can do in science is to identify something via its effects, but this never justifies defining it as a disposition to produce those effects, as if absurdly it has no intrinsic character, only effects. [p. 207]


Titles I have mentioned in my annotations.

Other publications by Michell

Google Scholar Joel Michell

Michell and Ernst, 1996, 1997

Joel Michell (2000). Normal Science, Pathological Science and Psychometrics. Theory & Psychology, 10, 639-667. pdf Joel Michell (2003). Epistemology of Measurement: The Relevance of its History for Quantification in the Social Sciences. Social Science Information, 42, 515-534

Publications on Michell

Sept. 23, 2006 \ contact ben apenstaartje

Valid HTML 4.01!