Joel Michell (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge University Press.

Annotated

Ben Wilbrink

on this annotation
This annotation is a kind of personal reconstruction of the concept of measurement in psychology. No, not a reconstruction of the book, but of the content of my mind. It is a journey from my rather naive conceptions to the exact ones presented by Michell. It is a kind of journey every student in every discipline has to make. Michell is not very concerned with the needs of the traveller, as is typical for most of the didactics everywhere in our educational systems. But that is not the point. What I am trying to get from the book is a set of precise concepts that I will use in the three main projects I am working on: selective processes in education - e.g. the use of intelligence tests - , the design of achievement test items, and a model - a theory if you want - about strategic preparation for sitting achievement tests.
As a personal construction, this annotation is rather idiosyncratic. I bring my own intellectual resources - e.g. having tried to read Whitehead's book Process and reality some three decades ago, my interest in the history of assessment in education - and I will think of examples illustrating the messages of Michell that I might use in my own work. However, as I am publishing my annotations on the www, I will do my best to make them interesting as well as understandable to the reader interested in Michell's theme - I expect one such visitor a year, please contact me.

For several reasons this publication is important to my own work. To begin with, it presents some history on standardized testing in the early twentieth century. Essentially, though, it is about the concept of measurement, and its abuses by psychologists and other social scientists. I am not interested in the abuses themselves, only insofar as they can have serious consequences as regards the ways we think it is just to test pupil achievement. The designer of achievement test items should make it clear how it is that these items measure what it is intended they should measure, and what the arguments for the intention are in the first place.

Another reason I should be interested in this work by Joel Michell is because of the spa-model: strategic preparation for achievement tests. A host of measurement issues is at stake in this model; I very urgently need to make it clear what they are, and how they affect the model itself, or its application in the search for optimal achievement testing.

Preface

"This book is about an error, an error in scientific method fundamental to quatitative psychology." (p. xi) This error, and another one put on top of it, became locked in in scientific practice. Michell sets out to research how this was possible, an exercise in philosophy of science. This looks interesting to me, but the point in studying these errors and their development is in being able from now on to avoid them: will the work of Michell make such a thing possible? Does the error really touch on educational measurement?

This is a very difficult book to read because of its message and organization. OK, it is about an error, so what? Show me this error and its consequences, and I will follow your explanation as to its origins and development. Michell does not seem to do this. I expect him to explain first what the error might mean for people being tested, or for particular kinds of decisions based on testscores. Instead, he tells us how mistaken de godfathers of psychometrics have been in their use ot the concept of measurement. In particular I want to know in what way 'educational measurement' (such as in Linn, 1989), might be 'in error.' So I first will check whether and where Michell tells me what might be this error in the SAT I or in its uses. If I do not find it, I will look into other publications of Michell for answers. Or publications about the work of Michell. Or the brilliant but equally difficult article by Lumsden in the Annual Review of Psychology 1975 or there about. Only then will I go on annotating the text of Michell.

Some answers to these questions are given on pages 159-161.
"If all that a particular level of general ability means, employing Comrey's principle [numbers are used to represent the results of certain operations that have been performed], is a performance of a certain class, then the concept of general ability cannot be invoked to explain performance of that kind because the explanation and that what is to be explained are the same. In so far as a scientific explanation of some effect identifies causes and in so far as cause and effect must always be logically distinct occurrences, theoretical quantities defined by Comrey's principle fail as explanatory concepts. (p. 159)
The figure (used in Test item design' html) shows how psychologists like to see things. The point is, the intelligence concept is defined in terms of observable behavior/testscores. This kind of pseudo-theory does not explain anything, it is circular.
Cronbach and Gleser 1957:
"As Cronbach and Gleser (1957) later came to acknowledge, the usefulness of tests in predictive or decision-making contexts is an issue independent of whether or not tests measure psychological attributes. Cronbach and Gleser advocated replacing the 'measurement model' by the 'decision theory model' and, while this recommendation may have value in specific, practical contexts, the issue of whether or not such tests measure anything remains a genuine scientific issue. As already noted, discovering that a particular mental test is useful in a particular context raises the scientific issue of why that should be so, but in and of itself, it does not contribute towards settling that scientific issue and, certainly, does not imply that anything is measured." (p. 159-160)
One such practical context, without doubt, is assessment in education (a better term than 'educational measurement.'). In educational assessment psychometrics should be exchanged for decision theory (my conclusion. Michell does not explicitly address educational assessment issues as such).
History is clear on this point: psychologists have opted for clinging to the concept of tests as measurements, following the lead of Stevens, instead of the pragmatic approach of Cronbach and Gleser, acknowledging that mental tests, for the time being, do not qualify as measurements (and dropping the rhetoric of 'measurements' or 'theory of mental test scores').
On pp 160-161 Michell reveals how Stevens manipulated Campbells statements in the Stevenson commission in a fraudulent way (not the term Michell uses, at least not in this place).

What more do I expect to get from this book? I will note any and all definitions of basic concepts 'attribute,' 'psychological attribute,' 'measurement,' 'number,' 'length,' quantitative,' and so on, because I need a good collection of those for the book Test item design. Second, I will have to make schemes of relations between basic terms. Third, I will have to make an inventory of the work, ideas and mistakes of the many scientists and philosophers figuring in this book; this inventory will be my guide in studying psychometrics as well as philosophy in science, in years to come. Understand me well: I will not take Michell's position as the only one that is right or even possible, I will annotate the inventory based on work by others, among them Borsboom, Mellenbergh en Van Heerden, Lagemann, Dijksterhuis.

Quantitative attributes are attributes having a quite specific kind of structure. [p. xi]

This is only the preface, of course. What I would like to see are examples of supposedly quantitative attributes that prove to be not so, and what that proof entails in terms of practical uses. I still do not know whether this book delivers on this point. It is possible to show whether or not 'intelligence' is a quantitative attribute (using the techniques mentioned in chapter 8). Has something of the sort been done yet?
"Quantitative psychologists presumed that the psychological attributes which they aspired to measure were quantitative. There is no question that presuming instead of testing was an error in scientific method." [p. xi] Fechner was the original presumer.
After having read the book once, it is gradually becoming clear to me that psychologists are using a toy definition of measurement (the Stevens legacy): any rule consistently assigning numbers to objects. I - and many others - have been brainwashed by the Lords and Novicks, and Kerlingers, of psychology into believing this crap. It explains also the fuss about construct validation, because somehow or other some sense must be made of those 'assignments.' See also 'The concept of validity' by Borsboom, Mellenbergh, and Van Heerden, deconstructing construct validity.

... mainstream quantitative psychologists (...) presumed that psychological attributes are quantitative. [p. xii]

FAKE DEFINITION

... measurement is simply the assignment of numerals to objects and events according to rule.
This definition was proposed by by the psychologist Stanley Smith Stevens in 1946 [Michell, p. xii]

... this second error [the definition of measurement] disguises the first [presuming psychological attributes to be quantitative] so successfully and has persisted within psychology now for more than half a century ... [p. xii]

To get a feeling of the drama that is going on here, take a glance at Thorndike's influential 1904 book, where he is forcefully - page after page - selling the idea of 'measurement' in connection with psychological tests. Lagemann (2000) shows how the community of test developers in the US (the UK is quite another story) got hooked on the vast sums of money they were able to make in the first decennia of the twentieth century.

Michell, on p. xii, constructs the contrast between the classical idea of science as trying to find out how natural systems work, and "the view that success in science derives from from the solving of 'practical' problems." It is certainly true, given the history of psychological testing in the 20th century, that this is the second kind of science, while pretending to find out about 'intelligence,' etcetera. There is nothing wrong with trying to be practical, of course. In science there is the particular branch called engineering, which makes it possible that the Earth now sustains some seven billion people. (Wilbrink and Roos, 1991, on strategic science policy html). In psychology, however, as in the social sciences in general, 'practical science' itself influences the way psychological and social things are - Rosenthal effects, Pygmalion in the classroom, etcetera - a point not explicitly mentioned by Michell, I think. That is a pity, for evidently it is the case that psychological tests, as well as aptitude and achievement tests, have changed society - daily life, educational life - markedly compared to the situation one century ago. If the things these tests pretend to measure do not exist in the way social scientists tell us they do, society is in trouble.

p. xiv: Qualitative features of human life mentioned here. What are they? Examples? This level of abstractness is an ongoing problem in this book: what exacly demarcates quantitative from qualitative attributes? Quantitative attribrutes get explained, it is then left somewhat to the reader to fill in - by default - examples of attributes failing to be quantitative (which might simply be the case because no adequate methods have been found yet to show them to be quantitative). Think of sex as an evidently qualitative attribute. And that way discover that for a property to be qualitative is not the same as being immaterial.

What about the attributes educational assessment is about? Is there an authoritative list of what they are? Do the many 'standards' qualify as defining an 'attribute'? It seems we have identified a serious problem here, isn't it? It might be, of course, that the question has been wrongly put. But then, if educational assessment is not assessment of attributes, what might it be? One hypothesis is that assessment is part of a game that pupils and their teachers are engaged in, and that can be described by using James Coleman's social system theory (see my 1992 html for one such exercise). More directly, the game is between the individual pupil and the system, a situation that has been modeled by Van Naerssen (1970) html, en myself (the SPA project, Strategic Preparation for Achievement testing html). From the perspective of the pupil, are the achievement tests 'measurements'? The pupil is not trying to measure anything, she will probably try to prove a point in sitting the test. What is going on here, in what way is this still to be characterized as 'measurement in psychology'? That is what I will have to find out in analyzing Michell's text.

Chapter 1. Numerical data and the meaning of measurement.

CATEGORIAL FEATURES

Two such [categorial features], of fundamental importance to theories in physics, are causality and quantity. The category of causality underwrites the experimental method, that of quatity, measurement.

The relationship between quantity, as a category of being, and measurement, as a method of science has never been rigorously examined.[p. 3]

For recent work on (the concept of) causality see the publications of Judea Pearl. The concept of quantity is treated by Michell. The last statement cited is highly remarkable, and forebodes a disturbing journey through the work of a number of fine philosophers (on what it is te 'measure') like Campbell, Cohen and Nagel who seemingly lost the connection with their own philosophical roots.

intellectual abilities

EQUAL SCORES ARE UNEQUAL

Obviously, with the exception of the two possible extreme scores on any test, two people could get the same total score by getting different items correct. [p. 9-10]

This is the problem with most psychological tests, and this is not a discovery claimed by Michell. Item response theories, such as those using the Rasch model, claim to solve the problem by using probabilities. "If Rasch's hypothesis is correct, the estimates can be regarded as measures of the ability involved. Some psychologists claim to be able in this way to measure intellectual abilities." [p. 12]

MEASUREMENT DEFINED

i, may be masured relative to a unit, A_j. The measure of A_i relative to A_j is r if and only if A_i/A_j is r. That is, quite generally, measurement is just the process of discovering or estimating the measure of some magnitude of a quantitative attribute relative to a given unit. That is, measurement is the discovery or estimation of the ratio of some magnitude of a quantitative attribute to a unit (a unit being, in principle, any magnitude of the same quantitative attribute). [p. 14]

measurement the discovery or estimation of the ratio of a magnitude of a quantity to a unit of the same quantity [p. 222 glossary]

The difficulty here is that the - my own - naive concept of measurement is quite different: measuring length is answering the question 'how long is'? What is completely automated and therefore not consciously available anymore is the original idea of measurement as a counting operation using a particular unit.
In the testing of intelligence the risc is to take it for granted that the test works quite the same as our measuring rod in the case of length. Which is not the case at all; there is no natural unit of intelligence independent of the test(s) used. (I am not summarizing Michell here)

STEVENS' DEFINITION

Measurement is the assignment of numerals to objects or events according to (any) rule. [p. 15]

\ Michell has a lot to say about this definition by Stevens. For me, it will do to observe that this definition does not exclude anything except, perhaps, random assignment of numberals. Therefore, it must be useless. The Stevens mantra has nevertheless been repeated countless times by psychologists and many others, and has received no competition whatsoever from any rival definition (not counting the work of Suppes, Luce and others as psychology).

What are the marks of quantity?

This is a question which those who accept Stevens' definition will not understand. It emphasises the fundamental, practical difference between the two concepts of measurement. [p. 19]

This is a funny question. It has to be asked "before we can coclude that ay attribute is quantitative (and therefore measurable).
Lord and Novick did not pose the question. See Michell on this point: p. 21. "Their response to this problem is simply to stipulate that total test scores are interval scale measures of theoretical attributes and to state that to the extent that a set of test scores produce 'a good empirical predictor the stipulated interval scaling is justified' (p. 22 [in L&N]). They then see it as being a 'major problem of theoretical psychology .... to 'explain' the reason for the efficacy of any particular scaling that emerges from [such] empirical work' (p. 22). (...) No discussion of scientifically crucial tests figures at all in the text by Lord and Novick."

In their quest for mental measurement, psychologists have contrived devices (tests or experimental situations) which, when appropriately applied, yield numerical data. These devices are treated as windows upon the mind, as if in the fact of yielding numerical data they revealed quantitative attributes of the mind. However, the windows upon the mind presumption dissolves the distinction between cause and effect, in this case the attributes of the mental system causing behaviour and attributes of the effects of the effects this behaviour has upon the devices contrived. That the latter posses quantitative features in no way entails that that all of the former must. [p. 21-22]

And so the concept of causality is made available.

What window on the mind would achievement tests offer? Are they designed to offer a window on the pupil's mind? I don't think so, knowing how they typically are constructed. Do you?

Chapter 2. Quantitative psychology's intellectual inheritance

Warming up for the following chapters.

Euclid's concept of ratio provided a principled rationale for the application of aithmetic to continuous magnitudes and, hence, of measurement itself. [p. 32]

Chapter 3. Quantity, number and measurement in science

Reading this chapter I am contuinuously reminded of Whitehead's Process and reality, one of the great metaphysical works of the last century. Chapter three is very much about number systems, the concept of continuity, and other fundamental things and problems. It is not quite clear in what way this exposition exactly is relevant to the issue of measurement abuses in the social science community. It looks more like measurement issues were the driving force behind important developments in mathematics and physics in medieval and modern times.

Michell uses a paper by Otto Hölder "on the axioms of quantity and the theory of measurement." Roche (1998) does not mention this paper, hoe could that be? Is the Roche exposition totally independent of that of Michell? I will have to look into that. It might be Michell himself has commented on this in a later publication.

Michell uses the concept of length as his preferred case. The common sense conception is that lengths are observable attributes, and that it is straightforward to quantify them. Both are highly problematic however. And that seems to be one of the missions of this chapter: to make it clear that simple attributes like lengths and weights themselves are thereotical terms, not observables. Nowadays it is absolutely out of the question to measure length or weight simply by observing them. By the same token, that makes the 'assignment' of numbers to particular measurements a theoretical act also. The number system as such, of course, was not known in classical and medieval times: measurements were done by establishing proportions, not by assigning numbers. It might just be the case that proportions still have a life in folk math, I will have to look into that question (this has nothing to do with the developmental stages researched by Jean Piaget). What is highly intriguing is the insight that the way we automatically equate measurements with numbers indicating the quantity measured, is a way of handling numbers that is with us for only a very short time.

This chapter is a synthesis of developments in mathematics and physics - better: philosophy and logic - pertaining to measurement issues, it is not about original work done by Michell. He did publish an English translation of the Hölder paper, though (Michell and Ernst, 1996, 1997).

defining the concept of quantity

Indeed, as is evident,different lengths stand in numerical relations to one another (e.g., the length of X is 12 times the length known as the centimetre). It is the possibility of this sort of relation between different levels of an attribute, one level being r times another (where r is a positive number), that distinguishes quantitative fro non-quantitative attributes. Non-quantitative attributes do not stand in numerical relations of this sort to one another. [p. 48]

sex: a non-qualitative attribute

I will think of an example here myself, because Michell keeps on talking about length. A non-qualitative attribute is sex: being male is not one r times 'sexier' than being female.

Wow, that is not difficult at all! Why doesn't Michell stuff his text with clear examples?

ATTRIBUTE

Where we deal with a range of properties, all of the same general kind, such as the class of all lengths [or of all sexes, b.w.] the class constitutes what is here menat by an attribute.

Note:Some attributes, such as the lengths of objects, are ranges of properties; others, such as the distance between two points in space, are ranges of relations.[p. 48]

It seems there is more to 'attributes' than meets the eye. Is it possible to analogously construct:

INDIVIDUAL DIFFERENCES

It is possible to hypothesize 'intelligence' to be a quantitative property or persons. Then, what does it mean to say two persons differ in intelligence: is this a distance, and therefore a range of relations also? Michell surely will answer this one, let's watch out for it.

NUMERICAL RELATIONS

Michell seems to reserve this term to relations of the kind 'this object is 12 times the centimetre.' Be warned.

Numerical relations require additive structure. [p. 48]

ADDITIVE STRUCTURE

Michell presents this concept [p. 48-9] in a highly abstract way. Which is a sensible thing to do, of course, but in no way helps the naive reader to understand what is so special about this addivity [of length], which we know already must be true: for any lengths a and b, there is a length c = a + b, etcetera.

Does sex have additive structure? Does it mean anything to 'add' masculine and feminine together? Evidently not. Does intelligence have additive structure? Being twice as intelligent as person a is not a statement that I can interpret meaningfully. Is a grade of 'A' two times as 'good' as one of 'B'? So, lots of attributes do not have additive structure. Because, Michell undoubtedly would say, there are no numerical relations to begin with.
So, while Michell takes lots of space to treat numerical relations and additive structure, for a good understanding of his main thesis it might have been smarter for him to treat the kind of ways a host of important attributes do not have these relations and structure.

showing how magnitudes of a quantity relate to numbers

the theory that the the area of a rectangle is its length times its breadth

Chapter 8. Quantitative psychology and the revolution in measurement theory.

For example, verbal ability is the ability to do well in verbal tasks. Sometimes the best we can do in science is to identify something via its effects, but this never justifies defining it as a disposition to produce those effects, as if absurdly it has no intrinsic character, only effects. [p. 207]

References

Titles I have mentioned in my annotations.

John J. Roche (1998). The mathematics of measurement. A critical history. London: Athlone.

Other publications by Michell

Google Scholar Joel Michell

Michell and Ernst, 1996, 1997

Joel Michell (2000). Normal Science, Pathological Science and Psychometrics. Theory & Psychology, 10, 639-667. pdf Joel Michell (2003). Epistemology of Measurement: The Relevance of its History for Quantification in the Social Sciences. Social Science Information, 42, 515-534

not for free online http://ssi.sagepub.com/cgi/reprint/42/4/515
abstract Five episodes in the history of quantitative science provided the occasions for changes in the understanding of measurement important for attempts at quantification in the social sciences. First, Euclid's generalization of the ancient concept of measure to the concept of ratio provided a clear rationale for the use of numbers in quantitative science, a rationale that has been important through the history of science and one that contradicts the definition of measurement currently fashionable within the social sciences. Second, Duns Scotus's modelling of qualitative change upon quantitative change provided the opportunity to extend measurement from extensive to intensive attributes, a shift that makes it clear that the possibility of measuring qualitative attributes in the social sciences is not one that can be ruled out a priori. Third, Hölder's specification of the character of quantitative attributes showed that quantitative structure is a specific kind of empirical structure, one that is not logically necessary and, therefore, it shows that it is not necessary that any psychological attributes must be quantitative either. Taking the points emanating from Duns Scotus and Hölder together, the issue of whether psychological attributes are quantitative is shown to be an empirical issue. Fourth, Campbell's delineation of the categories of fundamental and derived measurement, and his subsequent critique of psychophysical measurement, showed that attempts at psychological measurement raised new challenges for measurement theory. Fifth, the articulation of the theory of conjoint measurement by Luce and Tukey reveals one way in which those challenges might be met. Taken as a whole, these episodes show that attempts at measurement in the social sciences are continuous with the rest of science in the sense that the issue of whether social science attributes can be measured raises empirical questions that can be answered only in the light of scientific evidence.

Publications on Michell

Sept. 23, 2006 \ contact ben apenstaartje benwilbrink.nl

http://www.benwilbrink.nl/literature/michell1999.htm