Measurement

Annotated references

Ben Wilbrink

Measurement in this page is broadly conceived. See the special page on validity for the special position of achievement tests as 'measurement instruments.' John Roche's (1998) The mathematics of measurement will undoubtedly be of much use, as will Denny Borsboom's (2003) work on measurement issues in psychology. At the start of the inventory my feeling is that I do not understand the least about the concept of measurement, strange as it may seem because almost all of the books and articles to be mentioned here are part of my own library. Regrettable I sold the Krantz et al. volume I on measurement recently to someone who needed that text on a daily basis. Deep problems are involved in the concept, also in practical issues regarding achievement testing.

APPETIZER

Most people think achievement tests are instruments that 'measure' the level of mastery of student a and b, or the difference in their mastery of course content. What do you think: is there a difference that gets 'measured' here? No? Then why does everybody believe this crap? Yes? Mail me your argument, and I will publish it here.

The problem here, of course, is that a mastery of 70% - let us say as defined on the domain of test items this particular test has been drawn from - is in no way the same kind of attribute as being 186 cm long. Whatever instrument you use to measure this height, it will give 186 cm plus or minus a small observation error. Not so with every reasonable number of items from the domain you might wish to use to 'measure' your own mastery: the results will show a bewildering variety. A statistician might tell you there is some order in the observations you make, because you are in fact using a binomial process in your measuring procedure.

Now there is this strange habit among psychometricians, the people who like to tell you what to believe about your test results, that the variability in fact is a lot of 'observational error' around the true - but unknown - mastery of yours. These people must be crazy: you have answered a lot of these test items, and you have seen that they are perfectly true to the course content: they are perfectly valid in the sense Borsboom, Mellenbergh and Van Heerden (2004) define the concept of validity. Therefore, you think, the variability in testscores must be 'true variability,' not 'error variability.'

The psychometrician will tell you that your proposal sounds interesting, but this philosophy will not make the slightest difference to the achievement testing business. Now, is he or she right in thinking so? I don't think so. If every item is a valid item, which it should be, then why would it be a good idea to count the number of items answered correctly? The presumption is that mastery is a kind of quantitative concept. If you care to read the literature about what it is to truly get to master mathematics, statistics, physics, biology, etcetera, you will more often than not find that this is about qualities, not quantities. There is a big difference between the two concepts. Reasoning along the quality line, it is not the number of items right, but having at least one item right. At least once having run this particular distance in the school's best time ever does qualify you as a master runner, isn't it? Think about it. Following this idea up might liberate education from the thick blanket of achievement tests that is suffocating it.

Mail me your reaction, or suggestions to underpin the idea with some good arguments, from philosophy, empirical research, whatever. Read the education pages I am assembling for physics and a number of other disciplines.

Denny Borsboom (2003). Conceptual issues in psychological measurement. Dissertation University of Amsterdam.

Denny Borsboom (2005). Measuring the Mind. Conceptual Issues in Contemporary Psychometrics. Cambridge Uiversity Press site

Denny Borsboom, Gideon J. Mellenbergh and Jaap van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061-1071. pdf

Ellen Condliffe Lagemann (2000). An elusive science: The troubling history of education research. University of Chicago Press. site

Joel Michell (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge University Press. questia

John Rawls (2001). Justice as fairness. A restatement. Belknap Harvard University Press.

the Mises Review

John J. Roche (1998). The mathematics of measurement. A critical history. London: Athlone.

Measurement fundamentals

R. Duncan Luce and John W. Tukey (1964). Simultaneous conjoint measurement: A new type of fundamental measurement. Journal of Mathematical Psychology, 1, 1-27. pdf

R. Duncan Luce and Patrick Suppes (1974). Theory of measurement. Encyclopedia Britannica. pdf

Louis Narens and R. Duncan Luce (1976). The algebra of measurement. Journal of Pure and Applied Algebra, 8, 197-233. pdf

R. Duncan Luce (1978). Conjoint measurement: A brief survey. pdf

R. Duncan Luce (1978). A mathematician as psychologist. pdf

A scientific autobiography

R. Duncan Luce (1979). Suppes' controbutions to the theory of measurement. In R. J. Bogdan Patrick Suppes. Reidel. pdf

R. Duncan Luce and Louis Narens (1981). Axiomatic measurement theory. SIA-AMS Proceedings vol 13. pdf

R. Duncan Luce and Louis Narens (1983). Symmetry, scale types, and generalizations of classical physical measurement. pdf

R. Duncan Luce (1985). Mathematical modeling of perceptual, learning, and cognitive processes. pdf~

R. Duncan Luce and Louis Narens (1986). Measurement: The theory of numerical assignments. Psychological Bulletin, 99, 166-180. pdf

R. Duncan Luce and Louis Narens (1987). Measurement scales on the continuum. Science, 236, 1527-1532. pdf

abstract In a seminal article in 1946, S. S. Stevens noted that the numerical measures then in common use exhibited three admissible groups of transformations: similarity, affine, and monotonic. Until recently, it was unclear what other scale types are possible. For situations on the continuum that are homogeneous (that is, objects are not distinguishable by their properties), the possibilities are essentially these three plus another type lying between the first two. These types lead to clearly described classes of structures that can, in principle, be incorporated into the classical structure of physical units. Such results, along with characterizations of important special cases, are potentially useful in the behavioral and social sciences.

R. Duncan Luce (1988). Goals, achievements, and limitations of modern fundamental measurement theory. In H. H. Bock Classification and related methods of data analysis. Elsevier. pdf

R. Duncan Luce and Louis Narens (submitted as of 2007). Theory of measurement. In L. Blume and S. N. Durlauf (Eds) Palgrave Dictionary of Economics. pdf

Louis Narens and R. Duncan Luce (submitted as of 2007). Meaningfulness and invariance. In L. Blume and S. N. Durlauf (Eds) Palgrave Dictionary of Economics. pdf

Patrick Suppes (2002). Representation and invariance of scientific structures. MIT Press. isbn 1575863332

Honored with the Lakatos Award 2003 html
ch. 1 pdf
Suppes' articles downloadable site, in particular also the articles referred to in his 2002.
Nancy Cartwright 2005: In praise of the representation theorem pdf
reviewed by F. A. Muller (2004) pdf
reviewed by Jean-Marc Bernard (2003?) pdf

D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky (1971/2007). Foundations of Measurement. Volume I: Additive and Polynomial Representations. Dover (reprint appearing January 30, 2007).

P. Suppes, R. D. Luce, D. H. Krantz and A. Tversky (2007). Foundations of Measurement Volume II: Geometrical, Threshold, and Probabilistic Representations. Dover (reprint appearing January 30, 2007). Reviewed by George W. Furnas, APM, 15, 103-105

P. Suppes, R. D. Luce, D. H. Krantz and A. Tversky (2007). Foundations of Measurement Volume III: Representation, Axiomatization, and Invariance. Dover (reprint appearing January 30, 2007). Revieuwed by F. Gregory Ashby in APM 15, 105-108.

In the words of Luce et al (1990): “experimenters have no business comparing the arithmetic averages of ordinal scale measurements for two groups because such comparisons are noninvariant, and hence meaningless, when arbitrary monotone transformations of the scores are permissible” (p. 269).

Norman Robert Campbell (1920/1957). Foundations of science. The philosophy of theory and experiment. Dover.

The same text was called Physics: The elements. in its 1920 Cambridge University Press edition.
This is a really nasty text, highly influentual though: Campbell is not in the habit of mentioning his sources.
Joel Michell 1999, p. 121-131, is highly critical of his treatment of the concept of measurement.

P. W. Bridgman (1927). The logic of modern physics. New York: Macmillan

Joel Michell 1999, p. 169 ff on the operationism made popular by this book by Nobel Prize winner Bridgman.

Joel Michell (1990). An introduction to the logic of psychological measurement. Erlbaum. questia

Explains what it is to 'measure' quantitatively. Conjoint measurement theory.

At once scientific and psychologic. Joel Michell (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge University Press. questia

For my personal annotations to the book see here
Psychologists like to present their theories as quantitative theories. Nobody has proved these theories to be quantitative, however. See chapter 8, or Joel's (1990), to see what it takes to prove measurements to be quantitative. The measurement theory is in place - 'the revolution that happened' in work by Suppes and Luce, among others - so it is known by now how extremely difficult it will be to prove psychological measurements to be quantitative. The trick has not been pulled off, yet.

R. Duncan Luce (2000). Utility of Gains and Losses: Measurement- Theoretical, and Experimental Approaches. Erlbaum. questia

" This monograph brings together in one place my current understanding of the behavioral properties people either exhibit or should exhibit when they make selections among valued alternatives, and it investigates how these properties lead to numerical representations of these preferences."

William P. Fisher, Jr., and Benjamin D. Wright (Eds) (1994). Applications of Probabilistic Conjoint Measurement. International Journal of Educational Research, 21, 559-664.

abstract This special issue demonstrates the symmetry and rigor of probabilistic conjoint measurement in practical applications in education and other human sciences. Following the opening chapter introducing the theory, other chapters focus on conjoint measurement in testing and student evaluation, standard setting, and the study of behavior and attitude. (ERIC SLD)

Judea Pearl (2000). Causality. Models, reasoning, and inference. Cambridge: Cambridge University Press. html

his home page gives you a list of key publications on causality. See esp. a summary of chapter 6 from the book: Simpson's paradox: An anatomy pdf
Statistical and causal inference: A review. Test, 12, 2003, 281-345 pdf
Reasoning with cause and effect. UCLA Cognitive Systems Laboratory, Technical Report (R-265), July 1999.Summary of IJCAI-99 lecture. In Proceedings of the International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, CA 1437--1449, 1999 pdf
The New Challenge: From a Century of Statistics to the Age of Causation. UCLA Cognitive Systems Laboratory, Technical Report (R-249), January 1997. Presented at the IASC Second World Congress, Pasadena, CA, February 1997. pdf
Structural and Probabilistic Causality. In D.R. Shanks, K.J. Holyoak, and D.L. Medin (Eds.), The Psychology of Learning and Motivation, Vol. 34: Causal Learning, Academic Press, San Diego, CA 393-435, 1996. pdf
Graphical Models for Probabilistic and Causal Reasoning. In A.B. Tucker, Jr. (Ed.), The Computer Science and Engineering Handbook, Chapter 31, CRC Press, Inc. 697--714, 1997. pdf
Causal diagrams for empirical research. Expanded version of a paper in Biometrika, 1995 pdf

William P. Fisher, Jr. (2001). Invariant Thinking vs. Invariant Measurement. Rasch Measurement Transactions, 14, 778-81. html

"A critique of Neil J. Dorans and Paul W. Holland 'Population invariance and the equatability of tests: basic theory and the linear case.'

David J. Bartholomew (Ed.) (2006). Measurement. SAGE. 4-volume set. contents

from the description With the literature on social measurement scattered across disciplinary boundaries, this collection provides a unique resource for researchers and libraries. It brings together over 60 key articles from the fields of sociology, economics, psychology, psychometrics, political science and management science, as well as cross-disciplinary fields such as epidemiology and education.

Thom G. G. Bezembinder (1970). Van rangorde naar continuum. Een verhandeling over datastructuren in de psychologie. Deventer: Van Loghum Slaterus.

Cartwright, Nancy (1999). The dappled world. A study of the boundaries of science. Cambridge: Cambridge University Press. fil>

Patrick Suppes (1951). A set of independent axioms for extensive quatities. Portugaliae Mathematica, 10, 163-172. Reprinted in Suppes (1969), 36-45. pdf

See Michell (1999) on the significance of this article.

Patrick Suppes and Dana Scott (1958). Foundational aspects of theories of measurement. The Journal of Symbolic Logic, 23, 113-128. Reprinted in Suppes (1969), 46-64. pdf

This article anticipates most of the ideas worked out in the Suppes and Zinnes 1963 chapter. "I mention the joint article with Zinnes especially for those who are interested in general questions about the theory of measurement, but find the fourth article somewhat heavy going." (Suppes, 1969, p. 3)

Andersen, E.B. (1973). Conditional inference and models for measuring. Copenhagen: Mentalhygiejnisk Forlag. (referenced in Eggen's 2004 dissertation) [I have not yet seen this one, not in library Univ. Leiden]

Gerhard H. Fischer (2000). Applying the Postulate of Specific Objectivity to the Measurement of Treatment Effects in Clinical Psychology. Open and Distance Learning. html

Mathematics and the physical sciences

Hasok Chang (2004/2007). Inventing temperature. Measurement and scientific progress. Oxford University Press.

short abstracts of chapters
A key publication. Did physicists really solve their measurement problems? Work in the spirit of Nancy Cartwright.
This work won him the Lakatos Prize html

Luther, R. Luther & K. Ostwald (1910). Ostwald-Luther Hand- und Hülfsbuch zur Ausführung physiko-chemischer Messungen. Verlag von Wilhelm Engelann. 3rd edition

David Z. Albert (1992). Quantum mechanics and experience. Harvard University Press.

The subject really is the measurement problem in quantum physics. Also" Bohm's theory.

Max Jammer (1989). The conceptual development of quantum mechanics. American Institute of Physics.

The last chapter: ‘Two fundamental problems,‘ the second of which is ‘Observation and measurement.’ (p. 392-397)

Harold Jeffreys and Bertha Swirles Jeffreys (1946). Methods of mathematical physics. Cambridge at the University Press.

p. 49: "Any physical measurement is the assignment of a single magnitude. Such magnitudes are called scalars. Physics may be defined as the study of the relations between scalars, so that from one set of measurements other sets, given the conditions of observation, can be predicted."

Alexander Koyré (1968). Metaphysics and measurement. Essays in scientific revolution. London: Chapman & Hall.

(Galileo and the scientific revolution of the seventeenth century (1943) - Galileo and Plato (1943) - Galileo's treatise 'de motu gravium'" the use and abuse of imaginary experiment (1960) - An experiment in measurement (1953) - Gassendi and science in his time (1953) - Pascal savant (1954))

M. Norton Wise (Ed.) (1995). The values of precision. Princeton University Press. isbn 0691016011

1. Quantification, Precision, and Accuracy: Determinations of Population in the Ancien Regime Andrea Rusnock 17
2. A Revolution to Measure: The Political Economy of the Metric System in France Ken Alder 39
3. The Nicety of Experiment": Precision of Measurement and Precision of Reasoning in Late Eighteenth-Century Chemistry Jan Colinski 72
4. Precision: Agent of Unity and Product of Agreement Part I-Traveling M. Norton Wise 92
5. The Meaning of Precision: The Exact Sensibility in Early Nineteenth-Century Germany Kathryn M. Olesko 103
6. Accurate Measurement Is an English Science Simon Schaffer 135
7. Precision and Trust: Early Victorian Insurance and the Politics of Calculation Theodore M. Porter 173
8. The Images of Precision: Helinholtz and the Graphical Method in Physiology Frederic L. Holmes and Kathryn M. Olesko 198
9. Precision: Agent of Unity and Product of Agreement Part II-The Age of Steam and Telegraphy M. Norton Wise 222
10. The Morals of Energy Metering: Constructing and Deconstructing the Precision of the Victorian Electrical Engineer's Ammeter and Voltmeter Graeme J. N. Gooday 239
11. Precision Implemented: Henry Rowland, the Concave Diffraction Grating, and the Analysis of Light George Sweetnam 283
12. The Laboratory of Theory or What's Exact about the Exact Sciences? Andrew Warwick 311
13. Precision: Agent of Unity and Product of Agreement Part Ill-"Today Precision Must be Commonplace" M. Norton Wise 352

Wayne A. Fuller (1987). Measurement error models. New York: Wiley.

J. Osinga en J. W. Maaskant (1982). Handboek elektronische meetinstrumenten. Deventer: Kluwer Technische Boeken.

Quite fascinating to see how the validity of a measuring instrument is developed by mapping the relevant theory into the design of the instrument.
Of course, the term 'validity' is not used at all here.
The term 'reliability' is not used either, except once where the reliability interval is explained. Instead, there are a number of different concepts, together covering what in the social sciences tends to get called 'reliability' of measurements:
exactness [nauwkeurigheid]: agreement of the measurement result with the true value of the variable measured
precision [precisie] or replicability [dupliceerbaarheid of herhaalbaarheid]: agreement between successive measurements of the same variable, using the same instruments and methods by the same observer
for example, cheap instruments might be as exact or accurate as expensive ones, the expensive ones being more precise (having a better resolution).
The problem, then, is this: is the exactness related to or identical to the concept of validity as used in the social sciences, or to the concept of validity as proposed by Brosboom, Mellenbergh and Van Heerden (2004)?
The authors also separate the measurement proces from the instrument itself. Exactness concerns the process, precision (sensitivity of the instrument and readability of its output) the instrument.
On top of those rather ambiguous concepts they need the idea of reprodureproducibility [p. 20]: agreement between the resuls of different measurements of the same variable, by different observers, following different methods, using different instruments in different laboratories at sufficiently broad intervals of time. Is this akin to what De Groot in his Methodology would call reliability (in a very broad sense)?
A condition for exactness, of course, is that the meauring instrument recently has been calibrated appropriately. Aha.
Therefore, standards abolutely are necessary, and the should be absolute standards, of course. Hofstee and Ten Berge have published on the concept of absolute measurement in clinical psychology, I must retrieve these articles.

John D. Trimmer (1950). Response of Physical Systems. Wiley. questia

Esp. ch. 7 'Measuring instruments'

Alex Hebra (2003). Measure for Measure: The Story of Imperial, Metric, and Other Units. Johns Hopkins. [I have not yet seen this one; UB Leiden 9669 F 10]

contents Blessing our countings: units and numbers - Going to great lengths - Degrees of separation: angles and solid angles - The "obvious" unit of time - Weighty matters: of mass and force - Gravimetric standards - The matters with mass - Empire of light: luminosity and intensity - Hot stuff: temperature, pressure, and thermodynamics - The missing link: energy - Compound units - Invasion of aliens and nihilists - The inter(galactic)net - Units, physics, and mathematics.

John J. Roche (1998). The mathematics of measurement. A critical history. London: Athlone.

Wayne A. Fuller (1987). Measurement error models. New York: Wiley.

Philip Catton (). The most measured understanding of spacetime. doc

C. Th. J. Alkemande, A. M. Hoogenboom en J. A. Smit (1979). Inleiding tot de fysische meetmethoden. Utrecht: Bohn, Scheltema & Holkema.

Een en al degelijkheid: korte beschrijvingen van de theorie die onder de te meten (kenmerken etc. van de) verschijnselen ligt, de onderbouwing van de meetmethoden, de instrumentatie; alles op eenvoudig niveau behandeld, maar dat maakt het boek juist aantrekkelijk omdat het nu zo'n ongelooflijk breed spectrum - als ik dat zo mag zeggen - van meetbare verschijnselen bestrijkt. De toon is positivistisch, er is geen sprankje aandacht voor de ongetwijfeld enorme missers in de historie van de betreffende theorie en pogingen om tot valide en betrouwbare metingen te komen (wat bijvoorbeeld voor het meten van temperatuur versus warmte, hier in hetzelfde hoofdstuk behandeld, bepaald indrukwekkende inhoud zou hebben toegevoegd). Dus evenmin aandacht voor de zonden en vergissingen die in de dagelijkse natuurkundige en toegepaste praktijk ongetwijfeld routinematig worden gemaakt. En er moeten ongetwijfeld fysische onderwerpen zijn waar nog wordt geworsteld om er adequate meetmethoden of -technieken voor te ontwikkelen, maar ook daarover geen woord. Dat is jammer, omdat overigens de uiteenzetting door deze hoogleraren experimentele natuurkunde heel goed laat uitkomen dat meten geen sinecure is, dat er altijd de nodige haken en ogen aan procedures en technieken zitten, dat de keuze van geschikte methoden nog een lastige opgave is, en ga zo maar door. Voor een gedrags- of sociale wetenschapper die dit alles leest, toch wel indrukwekkend dat zelfs met materiaal dat niet terugpraat, meten zo lastig blijkt. Ook indrukwekkend is de mogelijkheid om bijna zo nauwkeurig te kunnen meten als men de instrumenten kan maken, gegeven de typische aard van het materiaal waaraan men meet: bijna perfect homogeen en dus bijna oneindig deelbaar als dat nodig is. Totdat moleculaire niveaus worden bereikt, natuurlijk, maar dan betreedt de meter een wereld van deels discrete verschijnselen - tellen maar!
Schaalkenmerken lijken hier niet zo'n rol te spelen als zij in de psychologie doen: als de metingen maar nauwkeurig zijn, valt die schaal wel naar eigen hand te zetten mocht dat wenselijk zijn.

Kathryn M. Olesko (1991). Physics as a calling. Discipline and practice in the Königsberg Seminar for Physics. Ithaca: Cornell University Press.

19th century beginnings of university teaching of physics, heavily emphasizing measurment and error: Franz Neumann’s teaching methods - error analysis and measurement - scientific work based on seminar exercises - the nature of science teaching
The emphasis on errors of measurement -- psychometricians would call that reliability -- was very much linked to concerns the same psychometricians would call validity: the attempt to bring theory and experimental data (measurements, error and all) into harmony. The Olesko book bristles with examples of this fight for validity without anyone -- or Olesko herself -- ever calling it this way. A lot of experimenting on temprature and heat, for example; or induction; and a lot of other phenomena that were difficult to interpret, as well as difficult to 'measure.'

British Rheologists' Club (1949). The principles of rheological measurement. Report of General Conference, Bedford College, University of London, October 1946. Thomas Nelson and Sons.

Osinga, J. Osinga & J. W. Maaskant (1982). Handboek elektronische meetinstrumenten. Kluwer Technische Boeken bv.

Alkemade, C. Th. J. Alkemade, A. M. Hoogenboom & J. A. Smit (1979). Inleiding tot de fysische meetmethoden. Bohn, Scheltema & Holkema.

Swart, Jacob Swart (1856, 3e). Handleiding voor de praktische zeevaartkunde. Amsterdam: Wed. G. Hulst van Keulen.

Psychometrics past and present

Robyn M. Dawes (1977). Suppose We Measured Height With Rating Scales Instead of Rulers. Applied Psychological Measurement 1, 267-273. abstract; pdf

R. Duncan Luce (1967). Remarks on the theory of measurement and its relation to psychology. plus Discussion pdf

R. Duncan Luce (1972). What sort of measurement is psychophysical measurement? American Psychologist, 96-106. pdf

Edwin G. Boring (1961). The beginning and growth of measurement in psychology. Isis, 52, 238-257. Reprinted in Donald T. Campbell, Robert I. Watson ((1963). History, Psychology, and Science: Selected Papers by Edwin G. Boring (p. 140-158. Erlbaum. questia

W. Grant Dahlstrom (1985). The Development of Psychological Testing. In Gregory A. Kimble and Kurt Schlesinger: Topics in the History of Psychology Vol. 2. Erlbaum. questia

Marion S. Aftanas (1989). Theories, Models, and Standard Systems of Measurement. Applied Psychological Measurement, 12, 325-338.abstract

Harold Gulliksen (1986). Perspective on Educational Measurement. Applied Psychological Measurement, 10, 109-132.abstract

J. P. Guilford (1985). A Sixty-Year Perspective on Psychological I Measurement. Applied Psychological Measurement, 9, 341-349.abstract

Anne Anastasi (1985). Some Emerging Trends in Psychlolgical Measurement: A Fifty-Year Perspective. Applied Psychological Measurement, 9, 121-138.abstract

Edward L. Thorndike (1904). Theory of mental and social measurements. New York: The Science Press.

Rudolf Pintner (1923). Intelligence Testing: Methods and Results. Henry Holt. questia

E. L. Thorndike, E. O. Bregman, M. V. Cobb, E. Woodyard and the Staff of the Division of Psychology of the Institute of Educational research of Teachers College, Columbia University (1925). The measurement of intelligence. New York: Teachers College Bureau of Publications, Columbia University.

George Rasch (1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press. Expanded edition of the original 1960 text.

Rogosa, D., D. Brandt, & M. Zimowski (1982). A Growth Curve Approach to the Measurement of Change. Psychological Bulletin, 92, 726-748.

abstract The measurement of individual change is approached from the standpoint of individual time paths and statistical models for individual change. Other distinctive features of this paper are (a) the consideration of both statistical and psychometric properties of measures of individual change and (b) the examination of measures of change for data with more than two observations on each individual. Many results and conclusions are at odds with the previous literature in the behavioral sciences.
".... many current recommendations about the measurement of change are unsound. This paper provides a conceptual and mathematical framework for the measurement of individual change. "

Isn't this a strange thing, trying to measure change? Is change an attribute one could measure? In what sense are Rogosa and others using the term 'measurement' here? Is this a move to not having to discuss the validity problem at all?

George Engelhard, Jr. (1992). Historical Views of Invariance: Evidence from the Measurement Theories of Thorndike, Thurstone, and Rasch. Educational and Psychological Measurement, Vol. 52, No. 2, 275-291

abstract The purpose of this study is to provide a historical perspective on the concept of invariance within measurement theory. Two major classes of invariant measurement are described—sample-invariant item calibration and item-invariant measurement of individuals. The work of Stevens is used to help clarify the concept of invariance. The importance of invariance as a key measurement concept is then illustrated with the measurement theories of Thorndike, Thurstone, and Rasch. A case is made for viewing invariance as a fundamental aspect of measurement in the behavioral sciences; invariance appears to be essential in order to realize the advantages of objective measurement.

As useful as standard tests and standard test theory have proven in large scale evaluation, selection, and placement problems, their focus on who is competent and how many items they can answer falls short when the goal is to improve individuals' competencies.

Robert J. Mislevy (1993, p. 84). A framework for studying differences between multiple-choice and free-response test items. In Randy Elliot Bennett, William C. Ward (1993). Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment. Erlbaum. questia

K. Bollen and R. Lennox (1991). Conventional wisdom on measurement: a structural equation perspective. Psychological Bulletin, 110, 305-314.

Mellenbergh, G. J. (1996). Measurement precision in test score and item response models. Psychological Methods, 1, 293-299.

Benjamin D. Wright (nd). A history of social science measurement. Measurement for social science and education. http://www.rasch.org/memo62.htm

Last sentence: "Today there is no methodological reason why social science cannot become as stable, as reproducible and hence as useful as physics."
This already was the intention of Thorndike in 1904. Except for this sentence, this is an article that does a nice job of summarizing what it's title promises. Of course, Wright is heavily involved in Rasch measurement.

Susan E. Embretson and Scott L. Hershberger (Eds) (1999). The new rules of measurement. What every psychologist and educator should know. Erlbaum. questia

Wim J. van der Linden (2005). Linear models for optimal test design. Springer.

In the '70s I bought books on optimal programming such as the one by Wagner. It was immediately evident that these techniques could be of use in the design and application of tests, in much the same way as measurement instruments in physics might be optimized under one or more specified constraints. The example already indicates that it is my expectation that these techniques do not touch on the issue of measurement. In other words: optimizing an instrument that does not qualify as a measuring instrument in the eyes of, for example, Joel Michell, will not make it one. Maybe for this reason Wim van der Linden does not refer to the work of Michell, nor even that of his Dutch colleagues Borsboom, Mellenbergh and Van Heerden.
p. xi: "(...) test theory has developed from a timid fledgling to a mature discipline, with numerous results that nowadays support item and test analysis and test scoring at nearly every testing organization around the world."
Van der Linden belongs to the school in educational measurement that is interested in applying statistical techniques to given tests or test items, as contrasted to the design of the items in the first place. It might even be honest to say that Van der Linden is in the business of reliability of tests and their uses, not in that of their validity. His intended audience are the measurement specialists of corporations like the American ETS or the Dutch Cito, and they are not the people writing the test items in the first place. It would not be a problem, if only the written test items would be valid to the course content etcetera. Well, they are not. It is not a good thing trying to optimize tests consisting of items that simply are not valid to what it is that education should be about. Wim van der Linden does not acknowledge this problem. Should he have done so, he might have remarked that it is not fair to reproach the developer of optimizing techniques for the lack of quality of test items used in the optimization processes. And he would be right.
But then there is a measurement problem, isn't it?
What I'd like to know is: how do these techniques impact on tests and testees? I will skim the book for that information.
I wonder what Van der Linden's philosophy on testing is. I get the impression that the optimizing techniques in the book are meant to be applied to individual tests, not to combinations of a series of tests, for example all the tests in the course or the exam. Is this the case? His chapter 6 is about assembling multiple tests.
The basic problem in achievement testing is that, utimately, test items are of the all-or-nothing type - the answer is right or wrong - while the mastery of the student is assumed to be lying somewhere between totally absent and perfect. No instrument meant to 'measure' mastery can do so in the way most physical measuring instruments would do their job. Wim van der Linden does not seem to recognize this special character of achievement testing (or psychological testing in general).
p. xi: "(...) test theory has developed by careful modeling of response processes on test items and by using sophisticated statistical tools for estimating model parameters and evaluating model fit." Well, anybody perusing the contents of Psychometrika from the very beginning until the present day will have a difficult time finding anything even remotely resembling 'modeling of response processes,' forget the 'careful.' Quite the opposite is the case: more often than not there is no process at all that is being modeled: true scores are imputed to exist, but such is not modeling a process. To do so, the binomial process should be used, or more sophisticated variants of it. Lord and Novick shied away from it, to call but two names in the field.
Christaen Huygens then could have been called the father of test theory. Instead, Van der Linden takes Spearman's 1904 paper on the association between things as the start of the field. Edgeworth's publications on examination statistics, 1889, would have been more appropriate as a starting point, I think.
p. xi: "(...) in spite of its enormous progress, although test theory is omnipresent, its results are used in a peculiar way. Any outsider entering the testing industry would expeect to find a spin-off in the form of well-developed technology that enables us to engineer tests rigorously to our specifications. Instead, test theory is mainly used for post-hoc quality control, to weed out unsuccessful items, sometimes after they have been pretested, but sometimes after they have been in operational use. Apparently, our primary mode of operation is not to create good tests, but only to prevent bad tests.' This an uncommon observation in the testing field. Wim van der Linden has taken up the challenge, and has developed a test design technology, a pro-active test theory one is tempted to say.
Van der Linden does not have much to say about the design of individual test items, in the sens of making a proper translation from course content and course goals to items testing for the realization of those goals. p. xii: "Part of the explanation for our lack of technology may be a deeply ingrained belief among some in the industry that test items are unique and that test development should be treated as an art rather than a technology."
What van der Linden's optimal test design accomplishes is exactly what he promises, given the availability of adequately researched test items. He does not address the problem of designing the individual test item in the first place. This does not need to be problematic, if design principles for individual items were available. Such is not the case, however, barring some interesting developments in niches in the educational system, or experiments in particular disciplines.
The problem then is, how is it possible to build a valid design technology that does not address the question of validity at the level of the design of the individual test item? The pragmatic answer will be that techniques of item analysis will get rid of really problematic items .... . Van der Linden says this much, on p. xiv: "The steps of item pretesting and calibration executed in this stage are treated well in several other books and chapters (e.g., Hambleton & Swaminathan, 1985; Lord, 1980; Lord & Novick, 1968), and it is not necessary to repeat this material here. As for the preceding step of writing items for the pool, I do go as far as to show how blueprints for items can be calculated at the level of specific item writers and offer suggestions on how to manage the item-writing process (Chapter 10). But I do not deal with the actual process of item writing." (Here he refers to Irvine and Kyllonen, item generation for test development, 2002; I suspect, having seen the contents specification, that the techniques in this volume are not techniques of test item design proper, but of test item generation given adequate item forms. But then, how to design adequate item forms?) (Chapter 10 also assumes the indivial test items already available; there is no discussion on their design). In the absence of a clear conception of what it takes to develop individual test items, how is it possible to impute the process of answering the very same items by students?
I have taken some space to discuss this book, because it its existence in itself is a demonstration of the abstract technological character of of thinking in the test theory field. Not once is there a discussion of the impact of this kind of approach on individual students, let alone of the impact of neglecting the development of techniques of design at the proper level, that of the translation between course content and the goals articulated on them into individual test items. Validity is at stake here. Can we trust Van der Linden and his friends to handle validity issues adequately? He himself might be the first to acknowledge the problem, indeed he repeatedly has expressed his observations on this point. His theoretical work, however, does not quite reflect his worries.
Why is it that students themselves are wholly absent in this theory? After all, many students have worked with Wim van der Linden on the development of it, in the first place. Did it occur to no one that students themselves also are stakeholders in the testing field?
Van der Linden's theory is in the best interests of institutions. Do students suffer from this? He does not pose the question, neither does he answer it. It would be a miracle if the constraints students would prefer, to use his own terminology, are the same ones the institutions would choose.

Robert Lissitz (Ed.) (2006). Longitudinal and Value Added Models of Student Performance. JAM Press.

publisher's info includes a list of contributions. None of the contributions is available on the internet. One may use the contents listing to search for publications on growth by these authors that are available on the internet. I do not get the impression that this is about quality of accountability, not about quality of education. Is that because measurement is the name of the game?
Another book edited by Lissitz is (2005) Value added models in education: Theory and applications. Again mostly on accountability issues. I have not seen either book itself. An older publication is Bryk et al. (1998) Assessing School Academic Productivity: The Case of Chicago School Reform Social Psychology of Education 2 103-142, pdf And of course there is the volume by James Coleman (1981). Longitudinal data analysis. Basic Books.

Robert J. Mislevy and Geneva D. Haertel (2006). Implications of Evidence-Centered Design for Educational Testing. PADI Technical Report 17. pdf

"Evidence-centered design (ECD) views an assessment as an evidentiary argument: An argument from what we observe students say, do, or make in a few particular circumstances, to inferences about what they know, can do, or have accomplished more generally (Mislevy, Steinberg, & Almond, 2003). The view of assessment as argument is a cornerstone of test validation (Kane, 1992, Messick, 1989). ECD applies this perspective proactively to test design."
Looks quite interesting, is highly abstract, and way too ambitious at the same time, taking psychometric theory as the theory the whole bunch should be fitted in. Not figuring in the references: Borsboom, Michell, Pearl. The kind of 'evidence' in the intellectual world of Mislevy and Haertl is ETS-invented, it seems. Keep an eye on the PADI site for further developments.

John B. Carroll (1987). New perspectives in the analysis of abilities. In Royce R. Ronning, Jane C. Conoley, John A. Glover, and Joseph C. Witt (Eds.) (1987). The influence of cognitive psychology on testing. Buros-Nebraska Symposium on Measurement and Testing. Volume 3 (pp. 267-84).

A thought provoking article. I have annoted it in my carroll1987.htm

Social and individual 'measurement'

The fundamental problem in the social sciences is that people do not passively let their measures be taken.

In education, the purpose is to manipulate measurement in the sense that pupils are supposed to grow in learning, their growth subsequently being 'measured.'

A. Myrick Freeman, III (1993). The measurement of environmental and resource values. Theory and methods. Washington, D.C.: Resources for the Future.

newer edition: 2003
for example, see ECO2 (drafting group under WG2B) (2004). Assessment of Environmental and Resource Costs in the Water Framework Directive 9Version no.: Final draft Date: November 12, 2004) pdf

Measurement. Interdisciplinary Research and Perspectives. Contents of this journal: html. no free online articles.

David F. Lohman and Thomas Rocklin (1993). Current Issues in the Assessment of Intelligence and Personality. To appear in D. H. Saklofske and M. Zeidner: International Handbook of Personality Intelligence. New York: Plenum. Please do not cite this draft without permission. pdf

The authors do not enter the debate about measurement. Their chapter relates current developments to historical and actual ones, providing a perspective on 20th century events in this field.

Zenderland, Leila (1998). Measuring minds. Henry Herbert Goddard and the origins of American intelligence testing. Cambridge University Press.

Carleton W. Washburne (1922). Educational measurements as a key to individualizing instructions and promotions. Journal of Educational Research, 5, 195-206. [Does someone have a pdf for me?]

Thomas M. Ostrom (1989). Interdependence of attitude theory and measurement. In Anthony R. Pratkanis, Steven J. Breckler and Anthony G. Greenwald: Attitude structure and function, p. 11-37. London: Erlbaum.

Offers a short history, rather superficial

Edward h. Haertel and Joan l. Herman (2005). A Historical Perspective on Validity Arguments for Accountability Testing. Yearbook of the National Society for the Study of Education, 104. http://www.nsse-chicago.org/Yearbooks.asp Who can send me a pdf?

Wim K. B. Hofstee (1981). Psychologische uitspraken over personen. Beoordeling/voorspelling/advies/test. Deventer: Van Loghum Slaterus, 1981. [Psychological judgments of persons. Assessment/prediction/advice/test]

This is the land of self-fulfilling prophecies, make-believe, pretention. Typically, people do not let themselves be 'measured' as rocks do. Especially so if the measurements are about the mental or social domain.

Jean Piaget and Bärbel Inhelder

They really are in a category of their own. What here is of importance in their work is not the developmental psychology itself, but the attributes postulated and their way of 'measureing' them, or rather the transitions from one developmental stage to another. As such their work is mentioned in studies like Michell (2003) and Borsboom (2005).
Jean Piaget et Bärbel Inhelder (1948). La représentation de l'espace chez l'enfant. Paris: Presses Universitaires de France. [The child's conception of space.]
Jean Piaget (1923/1948) Le langage et la pensée chez l'enfant. Neuchâtel: Delachaux et Niestlé. [The language and thought of the child]
Jean Piaget (1941) La genèse du nombre chez l'enfant. Neuchâtel: Delachaux et Niestlé. [The child's conception of number]
Jean Piaget (1924/1947) Le jugement et le raisonnement chez l'enfant. Neuchâtel: Delachaux et Niestlé. 3e éd. 1947 [Judgment and reasoning in the child]
Jean Piaget et Bärbel Inhelder (1941) Le développement des quantités chez l'enfant. Conservation et atomisme. Neuchâtel: Delachaux et Niestlé.[The child's construction of quantities: conservation and atomism]
Jean Piaget (1945). La formation du symbole chez l'enfant. Imitation, jeu et rêve. Image et représentation. Neuchâtel: Delachaux et Niestlé.[Play, dreams and imitation in childhood]
Evert W. Beth and Jean Piaget (1966). Mathematical epistemology and psychology. Translated from the French by W. Mays [Épistémologie mathématique et psychologie. First published by Presses Universitaires de france, Paris as Volume XIV of the Études d'Épistémologie Génétique.']. Dordrecht/Boston: Reidel.
The Jean Piaget Society homepage
The JPS Rasch Analysis Homepage.

Een interessante bron voor historische problemen met valide meten zijn studies naar onvergelijkbare theorieën (incommensurable cognitive conceptions). In onderstaand overzicht van Susan Carey vind je een aantal belangrijke bij elkaar: phlogiston-theorie, opvattingen van de Experimentalisten over hitte versus de latere uitsplitsing naar temperatuur en warmte, en onderzoek van Carey zelf naar het geheel eigen cognitieve systeem van kinderen voor 'zwaarte' van voorwerpen dat onvergelijkbaar is met de natuurkundige uitsplitsing naar dichtheid en massa.

In die begrippenparen temperatuur-warmte, en dichtheid-massa is telkens de eerste een extensieve grootheid, de tweede een intensieve: bij samenvoegen tellen de extensieve bij elkaar op, terwijl de intensieve middelen.

Susan Carey (1992). The origin and evolution of everyday concepts. In R. Giere (ed.), Cognitive Models of Science (Minnesota Studies in the Philosophy of Science, Vol. XV). Minneapolis: University of Minnesota Press, 89-128. pdf

Ik kom bij deze literatuur terecht omdat ik er steeds sterker van overtuigd raak dat gezond-verstand-theorieën een belangrijke (belemmerende) rol in het onderwijs spelen, en dat er daarom didactieken nodig zijn die de student in staat stellen een ontwikkelingsslag te maken van de eigen intuitieve theorie naar de bedoelde tekstboek-theorie. Voor het natuurkunde-onderwijs zijn er op dat punt waarschinjlijk al belangrijke vorderingen te melden, op andere gebieden minder of in het geheel niet. Om nog maar te zwijgen van de ramp van competentiegericht leren dat iedere hoop op een sterke didactiek doet vervliegen.

Ik ben dus niet alleen op zoek naar heldere beschrijvingen van die gezond-verstand-theorieën, maar ook naar theorie en experimenten op het relevante gebied van 'conceptual change' (dat heel dicht ligt bij belangrijke onderwerpen uit de wetenschapsgeschiedenis en -filofosie, zoals die rond paradigmawisselingen). zie meno.htm.

Dat alles is nodig om een ontwerptechnologie voor toetsvragen te kunnen ontwikkelen, zoals je al zult hebben vermoed.

Het artikel van Carey geeft voldoende ingangen tot de literatuur.

Educational 'measurement'

I will narrow this subject down to achievement tests, whether standardized or teacher-made. In a separate page I will collect some definitions of testing from the literature. A primary question, then, is that of the significance of the 'educational' here. Jerome Popp (1998) provides some answers in an attractively adequate and short book. He does, however, not expand his exposition to measurement in education as well.

Jerome A. Popp (1998). Naturalizing philosophy of education. John Dewey in the Postanalytic Period. Southern Illinois University Press.

If one wants to educate, first solve the riddle how it is that one comes to know something, then apply that solution to the instructional process. Empiricism has failed to provide the answers it promised, nevertheless Popper's falsificationism and, for example, Wesley Salmon's Bayesianism clearly are very helpful, and it is immediately evident how instruction can make use of these insights. And that is just for starters, the still five chapters to go. I look out to the chapter on Churland's treatment of folk psychology as in the same league that folk physics is in (see f.e. my (2006) html [in Dutch] or html [in English]).
Jerome A. Popp (1999). Cognitive Science and Philosophy of Education: Toward a Unified Theory of Learning and Teaching. Caddo Gapp Press. [I have not yet seen this one]

Paul Davis Chapman (1988). Schools as sorters. Lewis M. Terman, Applied Psychology, and the Intelligence Testing Movement, 1890-1930. New York: New York University Press.

E. F. Lindquist (Ed.) (1951). Educational measurement. Washington, D.C.: American Council on Education.

Robert Ladd Thorndike (1971). Educational Measurement. Second edition. Washington: American Council on Education.

Robert L. Linn (Ed.) (1989). Educational measurement. Third edition. New York: American Council on Education / Macmillan.

On validity, see validity.htm

Measurement histories

Measuring length and weight such as we are used to do, is a relatively new concept in science, with the exception of special sciences such as astronomy, or geometry. Until the eighteenth century the Greek tradition of comparing proportions was the technique used. See Roche (1998), Murdoch (1963). It is quite understandable why that should be so. In astronomy it comes naturally to, for example, treat the distance from the Earth to the Sun as a standard distance, while in geometry the technique to use one known length only, and then use triangulation to determine all other lengths, effectively makes the original length a kind of standard length [I have yet to check the literature how these things exactly have been done]. The Greek tradition of using proportions obviates the need to standardize lengths, times, etcetera; no effort was expended in standardizing, until in the eigtheenth century, and the French Revolution offering the opportunity to enforce national standards in the economy at large. In particular, in the Greek tradition, it was highly unusual 'measure' entities by assigning numbers to them according to some procedure. In that tradition, characteristics were compared the way weights are compared using a balance; circles are compared using the square of their diameter instead of our modern πr² (Murdoch, p. 262: "Circles are to one another as the squares on the diameters"). It is, therefore, quite amazing to see how fast the Western public already in the early years of the twentieth century became brainwashed into the idea that there is such a thing as one's personal intelligence that can be measured by testing it, and assigning a number to it (a confidence interval, but that's a nuance lost on lay persons being 'measured.').

WEIGHTS

"That weights were used in the early history of mankind is shown by the fact that that the equal-arm balance can be traced back to the year 5000 B.C. 'Weights' are also mentioned in the Bible. In Deuteronomy, chapter 25, verse 13, we read: 'You shall not have in your bag two kinds of weights, a large and a small .... a full and just weight you shall have.' Or in Proverbs, chapter 11, verse 1, it is said: 'A false balance is an abomination to the Lord, but a just weight is his delight.'"

Jammer, 2000, p. 7-8.

Max Jammer (2000). Concepts of mass in contemporary physics and philosophy. Princeton University Press.

A sequel, covering the period 1960-2000, to his Max Jammer (1961). Concepts of mass in classical and modern physics. Harvard University Press.
If there is one book that treats measurement in a fundamental way, this one must be it!

John J. Roche (1998). The mathematics of measurement. A critical history. London: Athlone. Springer. isbn 0387915818

p. 30: "Many qualitative physical phenomena were analysed, or experimented upon in Antiquity, often without any kind of quantification. These include pneumatics, the concept of a vacuum, colours, the tides, freezing, and electrical and magnetic attraction. Qualitative beliefs and concepts, together with qualitative experimental discoveries have frequently been necessary preconditions for quantitative results." [Cohen and Drabkin (1969) A source book in Greek science, 247-55, 310-314, 389-91; Aristotle (1991) [The complete works of Aristotle ed. J. Barnes] On Colours; Meteorology, 347a 1 -347b 35; Sambursky (1987) The physical world of late Antiquity, 119-21]
p. 47: "The model of magnitude used in the more rigorou versions of this tradition [this Greek inspired tradition of stating and analyzing quantitative laws using the language of proportion] was the non-metric geometrical length known directly, and not the magnitude measured numerically. Lengths,and other magnitudes such as weight and time are, therefore, assumed to be known through a direct sensory encounter with the quantity concerned rather than by numerical measurement. As a result, this tradition did not emphasize measurement of any kind, nor the search for units and exact standards [see p. 262 in John E. Murdoch, 1963, p. 262]."

John E. Murdoch (1963). The medieval language of proportions: Elements of the interaction with Greek foundations and the development of new mathematical techniques. In A. C. Crombie: Scientific change. Historical studies in the intellectual, social and technical conditions for scientific discovery and technical invention, from antiquity to present. London: Heinemann. p. 237-271.

Stephen Jay Gould (1981). The mismeasure of man. New York: Norton.

reviewed by Lloyd G. Humphreys in Applied Psychological Measurement, 1983, 113-118
chapters: American polygeny and craniometry before Darwin: Blacks and Indians as separate, inferior species - Measuring heads: Paul Broca and the heyday of craniology - measuring bodies: Two case studies on the apishness of undesirables - The hereditarian theory of IQ: An American invention - The real error of Cyril Burt: Factor analysis and the reification of intelligence
Gould does not touch on measurement issues as such: the term 'measurement' does not figure in his index

Witold Kula (1986). Measures and men. Princeton: Princeton University Press.

Zevenboom, K. M. C. Zevenboom (1959). Bijdrage tot de kennis van de oude Amsterdamse graanmaat. Noord-Hollandsche Uitgevers Maatschappij.

Zevenboom, K. M. C. Zevenboom (1960). De bemoeiingen van het Instituut en de Akademie met het ijkwezen. Noord-Hollandsche Uitgevers Maatschappij.

G. J. C. Nipper (2004). 18 eeuwen meten en wegen in de Lage Landen. Walburg Pers. isbn 9057302802

Ronald Edward Zupko (1990). Revolution in measurement: Western European weights and measures since the age of science. Philadelphia: The American Philosophical Society.

Zupko does not emply the concepts of validity or reliability, even though his treatment evidently is about problems of validity and reliability in measurements.
Products and quantities (p. 11-13). "Even when measures had standardized counts, capacities, or weights, the actual size depended on the characteristics or peculiarities of the product involved or on other factors. In England, for example, the bale for bolting cloth was 20 pieces; buckram, 60 pieces; fustian, 40 or 45 half-pieces; paper, 10 reams; pipes, 10 gross or 1440 in number; and thread, 100 bolts."
"Hundreds of other units containing thousands of additional variations existed and they made the operation of regional and interregional commerce extremely difficult and complicated. It should appear evident how such a confusing condition contributed to constant fraudulent practices and to continuous misunderstandings in business transactions."
Coinage, wages, and prices (p. 17-18). "It was customary in the Middle Ages to base agricultural area or superficial measures of land either on coinage standards or on units of income derived through production."
The English librate was an amount of land worth one pound (monetary) a year. Its total acreage depended on local soil conditions and on the value of the pund, and it seems to have varied from several bovates or oxgangs (often four) to as much as 1/2 knight's fee. The knight's fee probably originated as an amount of land needed to support a knight and his family for a period of one year."
Agriculture and taxes (p. 18-19). "Measures were also based on food production and tax assessments."
Such measures were common everywhere in feudal and manorial Europe, particularly so on eccesiastical estates. All of these and similar measures varied greatly by individual, political connection, financial worth, and region of residence.
"To complicate these variations and irregularities, capacity measures throughout Europe were either heaped, striked, or shallow. The ehaped measure (...) contained an amount of grain extending above its rim. (...) Unfortunately, public and private employers usually demanded payments in heaped measures while they ordinarily rendered their compensations in shallow measures. Tremendous societal friction was caused in both manorial and nonmanorial Europe by such activities. Frequant riots and rebellions were aimed specifically at the eradication of these inequities." [See especially Kula, 1986, bw].
Labor functions and time allotments (p. 20-21). "Medieval land and product measures were also based customarily on work functions and time allotments."
"In Herefordshire a math equaled approximately 1 acre (ca. 0.40 ha) or the amount of land that a man could mow in one day."
Production spans (p. 21-22). "The production span or strength potential of one or more animals constituted still another method of establishing standards."
"Some linear measures were based on a specific number of steps or paces, on bodily feats or capacity, and on the range of the human voice. The mile, for instance, had a number of special variantions prior to its standardization under Elizabeth I at 5280 feet (1.609 km)."
"No standardized system of weights and measures could possibly be formulated on such haphazard methodologies"
Human dimensions (p. 22-23). "Pre-dating the categories discussed above, of course, came measurement based on man's own body. Perhaps the oldest from the standpoint of time were linear measures based on the sizes of dimensions of human limbs and appendages."

The citations from Zupko should remind us that in education for centuries the unit of measurement of achievement was the error made, or nota falsa. It was simply accumulated across the semester, and students totals of notae falsae were carefully written down by the teacher as well as by the students themselves. (Wilbrink, 1997 html).

Remark that the current habit of counting number correct in achievement tests, amounts to much the same as the medieval usage of counting things etcetera by number, disregarding quality, and surely it is not different in kind from the counting of errors made by pupils.

M. Crosland (1972/1995). 'Nature' and measurement in eighteenth-century France. Reprinted in M. Crosland (1995). Studies in the culture of science in France and Britain since the enlightenment. Aldershot: Variorum. 277-309.

Edmund Whittaker & G. Robinson (1924/1944) The calculus of observations. A treatise on numerical mathematics. Blackie and son limited. archive.org

The book is available on archive.org, but the fine print of formulas etcetera sets a premium on the physical book ;-)
Chapters a.o.: Normal frequancy distributions - The method of least squares - Graduation, or the smoothing of data - Correlation)

J. L. Heilbron (1979). Electricity in the 17th and 18th centuries. A study of early modern physics. University of California Press.

J. L. Heilbron (1993). Weighing imponderables and other quantitative science around 1800. Historical Studies in the Physical and Biological Sciences, Supplement to vol. 24, Part 1, 1-337. isbn 0918102170

Chapter 5 rehearses the story of the greatest mobilization of the exact sciences for national purposes that occurred during the 18th century: the design, execution, and implementation of the metric system of weights and measures.

Robert W. Massof (2002). The Measurement of Vision Disability. Optometry & Vision Science, 79, 516-552. pdf 4Mb

Covers everything: measurement in history, psychometrics classical as well as irt

Joel Michell (1999). Measurement in psychology. A critical history of a methodological concept. Cambridge University Press. G. E. R. Lloyd (1995). The Revolutions of Wisdom. Studies in the Claims and Practice of Ancient Greek Science. University of California Press. eScholarship

Chapter Five: Measurement and mystification.

Not mentioned, read ....

This is my waste basket for, for example, articles I have not been able to collect, except their abstract.

William P. Fisher, Jr. (2003)

William P. Fisher, Jr. (2003)

William P. Fisher, Jr. (2003) Objectivity in psychosocial measurement: what, why, how.

J. P. Holman (1984 4t) Experimental methods for engineers. McGraw-Hill. sbn 0070296138

Good demonstration of the special character of measurment in the physical sciences: objects etcetera do not anticipate, get punished or rewarded. Therefore: this kind of measurement is not a good model for achievement testing.

A. W. Richeson (1966). English land measuring to 1800: Instruments and practice. The Society for the History of Technology / The M.I.T. Press. lccc 66-21357

Alex Hebra (2003). Measure for measure. The story of imperial, metric, and other units. The Johns Hopkins University Press.

Edwin Danson (2006). Weighing the world. The quest to measure the earth. Oxford University Press. isbn 0195181697 info

Herz, Norbert Herz (1905). Geodäsie. Eine Darstellung der Methoden für die Terrainaufnahme, Landvermessung und Erdmessung. Mit einem Anhange: Anleitung zu astronomischen, geodätischen und kartographischen Arbeiten auf Forschungsreisen. Leipzig und Wien: Deuticke.

archive.org

M. de Haas (1919 4e). Practische oefeningen in natuurkunde voor aanstaande technologen. Delft: Waltman.

W. S. B. Woolhouse (18907/1979). Historical measures, weights, calendars & moneys of all nations. And an analysis of the Christian, Hebrew and Muhammadan calendars (with tables up to 2000 A.D.). Chicago: Ares. isbn 0890052816 <

E. G. Ellis: Measurements of the rheological properties of lubricating greases - K. Weissenberg: Geometry of rheological phenomena - Wilfred W. Barkas: The anisotropic elastic properties of wood - E. Orowan: Mechanical testing of solids - L. R. G. Treloar: The meaning and use of certain rheological terms.
Catalogue of monographs, pamphlets, reprints and journals of the British Society of Rheology’s SCOTT BLAIR COLLECTION. doc
Springerlink JournaL: Rheologica Acta

W. Kula (1986). >b>Measures and men. Princeton University Press. isbn 0691054460 info

M. Aimé Witz (1883). Cours de manipulations de physique, préparatoire a la licence. Paris: Gauthier-Villars. online

Donald Laming (1997). The measurement of sensation. Oxford University Press. isbn 0198523424 info

Het gaat hier om subjectieve schattingen van de sterkte van prikkels etc. Dat blijkt dus een een nogal hectisch onderzoekthema te zijn, anders dan Stevens' in zijn 1957 studie suggereerde. Voor mij is het interessante dat het hier gaat om schattingen van prikkelsterkten, waarbij de fysische eigenschappen van die prikkels exact bekend zijn. Dat maakt het tot een interessant model voor subjectieve waarschijnlijkheden en subjectief nut, zoals in het algemene toetsmodel aan de orde. (Fechner - Stevens - sensory discrimination)

A. F. P. H. Bloemen & A. D. Mesritz (1946). Electrotechniek. Electrische meetinstrumenten en meetschakelingen. Technische Uitgeverij H. Stam.

Schermerhorn, Van Steenis & Wagenaar (1982). Landmeten en waterpassen. Leerboek voor het onderwijs en de praktijk.

Alfred W. Crosby (1997). The measure of reality. Quantification and Western society, 1250-1600. Cambridge University Press. isbn 0521554276

R. Rentenaar (Uitg.) (1971). Van Swindens vergelijkingstafels van lengtematen en landmaten. Centrum voor landbouwpublikaties en landbouwdocumenten. isbn 9022003523

H. K. Roessingh (1969): Gelderse landmaten in de 17e en 18e eeuw. 53-98 in Bijdragen en Mededelingen van het Historisch Genootschap. Wolters. deel 83, 1969. gebonden.

Clark Blaise (2000). Time Lord. Sir Sandford Fleming and the Creation of Standard Time. Weidenfeld & Nicholson. isbn 029784136X

Gerhard Dohrn-van Rossum (1996). History of the hour. Clocks and modern temporal orders. Chicago: University of Chicago Press. [Original: Die Geschichte der stunde: Uhren und modere Zeitordnungen. Munchen: Carl Hanser Verlag, 1992.] isbn 0226155102

J. M. Verhoeff (1983). De oude Nederlandse maten en gewichten. Meertens-Instituut. isbn 907038907X

John P. A. Ioannidis (2005). Why Most Published Research Findings Are False. PLOS Medicine open access

January 10, 2016 \ contact ben apenstaartje benwilbrink.nl

http://www.benwilbrink.nl/projecten/measurement.htm