Choosing students; Higher education admissions tools for the 21st century

edited by Wayne J. Camara and Ernest W. Kimmel

Lawrence Erlbaum Associates, 2005. isbn 0-8058-4752-9. questia

reviewed by Ben Wilbrink

This volume of 2005 easily is the most up to date and comprehensive description of the American 'system' of admissions to higher education. A critical note on the functioning of that system is not missing in all of its chapters, yet the overwhelming impression is that these authors regard the American system the best system there is, and that the many problems perceived can be solved by making the system and its tests etcetera 'work better.'

The problem that is not discussed - or even recogized - in the book is that institutions competing with each other for the 'best' students, use their selectivity itself as a major instrument to attract students. In combination with the American tendency to define educational quality as selectivity, the analysis of the admissions process runs the risk of running in circles. See the Brewer c.s. (2002) book (mentioned at the end of this page) on the real competitive processes behind admissions in American higher education.

Lee C. Bollinger: Competition in higher education and admissions testing.

Bollinger, a university president, provides a reflective kickoff. The competition in admissions has hardened considerably in recent years. Institutions get more selective, students and their parents invest more in getting in the 'best' colleges or universities. It is a rent seeking game: invest in the admission process, once admitted gather the fruits of your investment. The resulting anxiety about admissions results especially in blaming the SAT I as the culprit. The SAT is sold by the College Board as an 'aptitude' test. The problem is, it contains many items of the kind otherwise found in intelligence tests (analogies, for example). The College Board only fairly recently admitted coaching for the test to be possible (a little), at the same time stressing that good instruction should enable students to get fine scores (very much). This kind of bullshitting does not help in making the use of instruments like the SAT acceptable.
Bollinger would like admissions to be less narrowly determined by scholastic achievement. He proposes to give more emphasis to other kinds of talents also, in order to achieve more diversity in the student body (and reduce competitiveness?). Of course, this touches on the debate on affirmative action, and recent court cases (Michigan State University). It will surely be a big theme in admissions for many years to come.
The disappointment in reading Bollinger's analysis and proposal is that he takes the selective 'system' itself for granted. Otherwise he might have speculated on the kind of measures, procedures and instruments that could contribute to an evolution in the direction of 'open door' policies. After all, there is nothing specifically 'American' about its selective admissions system; the only freedom of choice students do have is the choice where to apply for admission, not much of a choice anyway for most people. The real freedom of choice is on the institution's side; the instutions really are free to pick and choose, and they are highly secretive about the real choices are made by their admissions officers (Patricia Conley researched this) (no, not about their philosophies about it, this PR is very important, of course). In contrast, in the Netherlands students really do have freedom of choice on entering higher education, barring a few studies where a special law restricts admission to a fixed maximum number of applicants, using a combination of scholastic achievement and lottery (numerus-fixus studies are for example medicine and psychology).

Robert H. Frank (2001). Higher Education: The Ultimate Winner-Take-All Market? Cornell Higher Education Research Institute, Working paper WP 2. http://www.ilr.cornell.edu/cheri/wp/cheri_wp02.pdf [broken link? 1-2009]
- An edited version published in M. Devlin and J. Meyerson ed. Forum Futures - Exploring the Future of Higher Education - 2000 Papers (San Francisco: Jossey-Bass, 2001).
  
  [the last paragraph:]
  No university, acting alone, can escape the powerful logic of the positional arms race. Yet there remain compelling ethical reasons both for limiting the escalation in the cost of acquiring higher education and for basing financial aid more heavily on need than on merit. Indeed, the growth in income and wealth inequality caused by spreading winner-take-all markets makes the case for cost containment and need-based aid more compelling than ever. But such goals can be met only through collective action. Positional arms control agreements may be the only practical way to keep higher education within reach for the average American family. To resist such agreements on the grounds that they are anti-competitive would make sense only if the market for higher education were just like the market for an ordinary private good or service.

Robert Laird: What is it we think we are trying to fix and how should we fix it? A view from the admissions office

The somewhat mystifying title covers a chapter on admissions standards - yes, think of 'standards' in the way the American Psychological Association uses it in its 'Standards for Educational and Psychological Testing' - that may come to rule admissions processes in the 21st century.
For European readers it may come as a shock that Robert Laird begins reminding his audience that one of the most selective institutions, Berkeley (appr 25% admitted), only instituted selective admissions in 1973. As most public institutions, Berkeley is just now beginning to consider applications on an individual basis, quite in contrast to private instutions where applicants' forms as a matter of course are assessed individually.
A number of court cases and especially a set of guidelines from the Office of Civil Rights 'The Use of Tests as Part of High-Stakes Decision-Making for Students: A resource guide for educators and policymakers' (2000) (pdf), is summarized by Laird in this set of ten standards:
1. An institution should have a clear statement of purpose for its admission policy
2. An institution should use appropriate criteria in its admission process
3. If an institution assigns formal weights or values to its admission criteria, it should be sure that such weights or values are reasonable
4. An institution should regularly evaluate its individual criteria and its aggregated criteria to determine if they are achieving the goals attributed to them and to the admission policy in the statement of purpose
5. An institution should not use formulas or rigid raw numeric cut-offs in its process
6. An institution should read as many individual applications as possible
7. An institution should consider an applicant's context in assessing his or her achievements
8. An institution should learn all it can about the high schools that provide its applicants
9. An intitution should track the performance of the graduates of each high school who enroll at that institution and consider that information in evaluating applicants
10. An institution should develop, to the extent posible, verification procedures for information supplied by applicants
Well, this is quite an impressive list. Maybe it looks more impressive than it amounts to in practice. For an example of a highly selective admissions procedure in the Netherlands - for its Police Academy - that is in accord with most of these standards, see Wilbrink and others (1990) html. In this selection procedure, however, a lot of information that Laird and his colleagues would like to consider in admissions, deliberately was not sollicited. That is the problem, isn't it? Considering every piece of information that might be relevant, and looking into its validity as well, will make the process very costly and will stretch the process in time. Laird is very aware of the problem, but court rulings will not allow institutions to use budgetary constraints as arguments that standards are too high for them, at least not if race or ethnicity is part of the information considered.
These standards are very commendable, but they will not solve the basic problem in the American system, that of artificial scarcity of places, only mitigate it. I wonder how a lottery procedure of the kind we use in the Netherlands (Hofstee, 1983 [html]) would compare to these standards, and to the costly admissions process Laird and his fellow admissions officers are committed to.
This chapter of Robert Laird is quite impressive. And it goes very much against the grain of the thinking of European policymakers on educational selection. Maybe Laird can explain to the Dutch Minister of Education that using a simple cutoff on an intelligence test as a criterion for admission to special education is a procedure that in the United States would not survive the first day of its instatement?
Atkinson, R. C. (2001). Standardized tests and access to American universities. html
Hofstee, W. K. B. (1983) The case for compromise in educational selection and grading. In S. B. Anderson and J. S. Helmick (Eds) On educational testing (p. 109-127). San Francisco: Jossey-Bass. [html]
The Fairness and Accuracy in Student Testing Act, the act that never made it through (Bush's) Congress. Because senator Paul Wellstone died in a plane crash in 2002, his act is not even available in an online file.

Ernest W. Kimmel: Who is at the door? The demography of higher education in the 21st century

Kimmel would like to know the demography of higher education in, say, 2012, and mentions several developments and theories (predicted developments), among them demographic ones, influencing enrollments and their quality. This resembles an exercise Jaap Dronkers and I undertook in 1993, to identify factors influencing the growing numbers of enrolment in (higher) education html. Kimmel would have been able, using the 1993 study, to formulate his conclusions more pointedly, but otherwise they would not be different.
The projected growth of enrollment in higher education is in absolute numbers (only?), is unevenly distributed among states (most in the South and West), among ethnic groups (Hispanics and Asian-Americans) and income groups (poor and disadvantaged), and higher for women than for men. The prediction problem is not the demography of the population itself, but the effects of politics. The longer trend is for states to cut budgets for higher education instutions as well as financial aid for students. Recently the cuts have been very large indeed. The institutions, of course, balance their income by 'sharply increasing tuition.' Knowing that enrollment growth is expected to come from students from disadvantaged or otherwise poor background, the question is how prospective students and their parents will choose in the face of huge financial debts. Kimmel does not provide much of an answer to this question. Knowing his Boudon or Bourdieu he might have warned that students from poor backgrounds will be very reluctant - compared to better-off students - to accumulate debts, and therefore to seek admission to higher education.
Kimmel mentions another threat to the growing participation of students from poor backgrounds: higher grading standards in high school will probably result in less students from these groups graduating from high school.
Kimmel warns that these mechanisms will result in American society getting more unequal that it is already.
American Association of State Colleges and Universities (2003). Student charges and financial aid as used by Kimmel, and the more recent 2005 http://www.aascu.org/student_charges_05/default.htm [broken link? 1-2009] [complete document pdf].
Audrey L. Amrein and David C. Berliner (2002). The Impact of High-Stakes Tests on Student Academic Performance. Educational Policy Research Unit, Arizona State University. http://www.asu.edu/educ/epsl/EPRU/documents/EPSL-0211-126-EPRU.pdf [broken link? 1-2009]. ["This study looked at data from 28 states where high-stakes testing programs are already in place and found no systemic evidence of improved achievement after states implemented high-stakes testing programs. Report and Appendix total 236 pages in two sections"]
Audrey L. Amrein and David C. Berliner (2002). The impact of high-stakes tests on student academic performance. An Analysis of NAEP Results in States with High-Stakes Tests and ACT, SAT, and AP Test Results in States with High School Graduation exams. http://www.asu.edu/educ/epsl/EPRU/documents/EPSL-0211-126-EPRU.pdf[broken link? 1-2009].
Anthony P. Carnevale and Richard A. Fry (2000). Crossing the great divide: Can we achieve equity when generation y goes to college? Princeton, NJ.: Educational Testing Service. pdf.
National Center for Public policy and higher Education (2002). Losing ground: A national sttaus report on affordability of American higher education. For downloads (ao.: 9.2 Mb color version) see webpage.
See also documents on the webpages of the National Center for Public Policy and Higher Education NCPPHE.
Julie P. Noble, William L. Roberts, Richard L. Sawyer (2006). Student Achievement, Behavior, Perceptions, and Other Factors Affecting ACT Scores. ACT Research Report 2006-1. pdf

Wayne J. Camara: Broadening criteria of college success and the impact of cognitive predictors

This is an introductory chapter. Remarkable is Camara's message that most institutions do not study the predictive validity of their admissions process, and are rather reluctant to cooperate in validity studies by third parties. To ameliorate the situation somewhat "The College Board, as well as ACT and other organizations that conduct admissions testing, offers free validation studies to institutions." (p. 57). This, of course, is no charity, for test developers have the obligation to study validity of their instruments.
Interesting references used by Camara are the volume edited by Samuel Messick, and the "work sponsored by the U.S. Army Research Institute for the behavioral and Social Sciences (ARI) from the early 1980s to the mid-1990s known collectively as Project A." (Campbell and Knapp 2001)
S. Messick (Ed.) (1999). Assessment in higher eduucation: Issues of access, quality, student development and public policy. Erlbaum.
J. P. Campbell and D. J. Knapp (Eds) (2001). Exploring the limits in personnel selection and classification. Erlbaum. (appr. 600 pp., reissue paperback, not available in public libraries in the Netherlands)

Wayne J. Camara: Broadening predictors of college success

This chapter also is rather introductory. In a way both chapters by Camara introduce the reader to the one and only paradigm the contributors to this volume have in mind: that of personnel selection. In more than one place in this review I will have opportunity to explain that the personnel selection paradigm is not the self-evident paradigm for admissions to higher education. At least in the Netherlands it is not regarded as such in expert opinion on admissions issues related to the numerus fixus legislation (restricted numbers of places available in a small number of studies, particularly medical ones). Of course, lay opinion assumes the personnel selection paradigm valid in admissions to higher education also. As indeed it may be in special cases, one of which is the highly selective admission to the national police academy NPA. This academy is not in the public education domain; in fact it is an in-service training otherwise comparable to non-university higher education studies in the Netherlands. An evaluation study on its admissions process is available (Wilbrink and others, 1990 html). In this review I will use this study as an example of admissions according to the personnel selection paradigm, and contrast it with an alternative paradigm that is valid in educational chang-overs, and is explained in, for example, the work of Alexander Astin (and many others, mainly outside the U.S.).

Paul R. Sackett: The performance-diversity tradeoff in admission testing

The big dilemma in admissions - anywhere and around the globe - is that ethnic groups might differ in mean scores on admissions tests, and in many cases do. Please note that these tests do not suffer from predictive bias, so bias is not the problem here. Sackett explains the dilemma in clear language - the diversity in the student population preferably should reflect that in the population at large, but using admission tests alone will not accomplish this - and looks into three possible approaches to soften the dilemma. The first two prove to offer no solution: modify procedures (including item selection predures), and substitute new tests for the ones used now. The third approach is to 'supplement existing tests with new measures.' Sackett shows that theoretically the third approach can change things only by a rather small amount, against high costs. Therefore Sacket must conclude, after having tried to square the circle looking for possibilities to both make tests both more selective and less divisive:
- There is no readily available or complete solution to the performance-diversity tradeoff; the hope is that this chapter will contribute to a clearer picture of what can and cannot be achieved through the use of various proposed methods.
Sackett surely is right about the dilemma, and has painted a clear picture of the limited possibilities to soften the dilemma. I might add that the dilemma will not go away by changing the system into a less competitive one as far as admissions are concerned. It would, however, substantially change in character if the criterion for admission would be something like 'human capital added.' Alas, human capital theorists never seem to think of doing research allowing causal - instead of correlational - interpretations about the factors resulting in - preceding - growth of (personal) human capital.
I have one nit to pick. Look at the next citation (p. 110).
- ... decreasing emphasis on the use of tests in the interest of achieving a diverse group of selectees often results in a substantial reduction in the performance gains that can be recognized through test use ... .
This proposition rests on the strong - but implicit - assumptions 1) that predictive validity is highly specific to the instiution involved (it almost never is) and 2) that that the loss to the institution is a loss to society (which it is not). If all institutions would decrease emphasis on the use of tests, no loss any institution or society at large would result. Of course, the situation might - on the institutional level - be one of a prisoner's dilemma. Force a productive solution by organising countervailing powers.

Warren E. Willingham: Prospects for improving grades for use in admissions

While reading the article the amazement about one obvious omission rises steadily. It is the complete absence of even the slightest doubt on the rationality of competitive selection as the means of allocating students to places available. An old fox like Warren should not fail to pose the validity question in the for psychometricians unusual way: how 'valid' is the use of competitive selection procedures against availbale alternatives? The question probably applies to all the contributions in the volume, but Warren's happens to be the one I picked to start reading.
Taking the validity of competitive selection for granted, the analysis of the value that the use of grades could add is a somewhat vacuous exercise. In this light, his table 7.1 about strengths of grades and tests respectively, looks somewhat capricious. The impression gets strengthened by his stressing the narrowness of college GPA as a criterion for selection validity, and at the same time stubbornly clinging to it. The question at the close of the article had better be at the beginning of it:

How good is the college GPA as a measure of success in college - in adult life?

Willingham has done some quite interesting research on grades as compared to scores on standardized tests, in a cohort study of some 8000 students, started in 1988. I must admit, his analysis is quite perceptive of different kinds of differences between grades and test scores, grades and grades, within and between schools (see also Warren W. Willingham, Judith M. Pollack, Charles Lewis (2002. Grades and test scores: Accounting for observed differences and grades. Journal of Educational Measurement, 39, 1-38. It is their contemplated high stakes function that worries me, as it does Willingham for that matter. He succeeds in giving the impression that a more human procedure of allocating places in higher education is what is needed, but fails in articulating its urgency and its possibilities.
I must make one more comment, on the Dutch system of standardized examinations. The Dutch system would not be the answer to Willingham's quest for standardized grading. The Dutch 'standard' is that of equal content and difficulty only. For most students, the examination result is first and foremost important for obtaining the diploma; there is no high stakes function of the examination results in admission to higher education. The one important exception is limited access (called 'numerus fixus' and regulated by law) to medical education and veterinary studies, where grade point average plays a major role.

Robert L. Linn: Evaluating college applicants : some alternatives

Linn, among other things editor of the third edition of the prestigious 'Handbook of educational measurement,' starts reminding his readers that
In fact, of course, most colleges are not very selective. Indeed, many colleges will accept any high school graduate."
The alternatives prove to be alternatives to the exclusive use of high school grades, and they are standardized tests like the SAT I. This is not surprising, neither is his useful treatment of four groups of standards for these tests: appropriateness of content, prediction, fairness, and consequences of test use. A test like the SAT I, as an intelligence test (see the chapter by Sternberg), is weak on the aspect of appropriate content: being high on intelligence is not bad, but the business of education is not to boost intelligence. The SAT II tests are matched to curricular content in high school, and therefore appropriate in content. The risk in using tests not appropriate in content is that students - or, for that matter, schools - will not be interested in high achievement as such, because that does not seem to contribute much to results on the SAT I, no matter the claims of the College Board to the contrary. Prediction is the song sung in almost any chapter in this volume. Consequences of test use are taken by Linn - and others - to be institutional ones only: 'costs, benefits, practical considerations, and side effects,' not those for the individual test takers. On the last point, read Hanson's 'Testing testing' or Lehman's 'The big test' instead of Linn's chapter. Fairness also gets the institutional treatment: a lot of talk of fairness to ethnic or racial groups, a major problem in American society, as it is in almost any society. Again: useful, but not quite insightful.
The problem here is - as elsewhere, I won't blame Linn for it - that so much about admissions is taken for granted that the very basic questions never get asked. Instead of calling in the troops of John Rawls, to get the right questions about justness of the distribution of positional goods in society, authors start firing their endless details about standardized tests, predictivity, diversity and the ramifications - Supreme Court cases etcetera - of affirmative action. To the perceptive European reader it is not difficult to see the meritocratic (Young, 1958) assumptions behind the admissions game as it is played in the United States. Young's essay has made it clear that meritocraty is not the best friend of democracy. Less dramatically, it is clear that Linn and his colleagues in this volume are hiding behind petty arguments about predictive validity and alternatives to the current admissions tests - and all the rest of it - in order to avoid publicly discussing the American myth of meritocracy. Not every American scientist is scared to do so, Alexander Astin isn't, John Rawls isn't; it is time the educational measurement community wakes up to reality.

Michael Young (1958). The rise of the meritocracy 1870 - 2033. An essay on education and equality. London: Thames and Hudson. See fora recent treatment the online paper of Robert M. Hauser (1998). Meritocracy, Cognitive Ability, and the Sources of Occupational Success. pdf
F. Allan Hanson (1993). Testing testing. Social consequences of the examined life. Berkeley: University of California Press. It is (partly) online available html

Robert J. Sternberg en The Rainbow project medewerkers: Augmenting the SAT through assessments of analytical, practical, and creative skills

Sternberg, of course, is the star researcher in the field of human intelligence (interview). (Gardner is another one, not quite in the same league, but working together with Sternberg on 'tacit intelligence.'). Well, yes, the SAT in the eyes of Sternberg is an intelligence test (see Bollinger, the first chapter); "a 3-hour examination currently measuring verbal comprehension and mathematical thinking skills". It surely is a fantastic instrument to predict success in almost no matter what, including success in college. That is not the merit of the SAT: any other qualified intelligence test will do the same trick. This predictive validity of intelligence tests is a global finding, it would apply in the Netherlands as well. Happily, the Dutch need not have their souls searched by intelligence tests in applying for higher education. The rare exception is the Dutch Police Academy; its admission procedure was evaluated in 1989, using the known predictive value of intelligence tests.
The Sternberg chapter reports on an experiment to add other 'intelligences' to the SAT, in order to achieve better predictability as well as better equity. Both have been achieved, which is a highly remarkable result, I have no argument with that, except this: Equity being a problem in the use of instruments like the SAT in the admission process, it is not clear how tinkering with the SAT to make it 'more equitable' makes its use more acceptable, especially not if this 'tinkering' results in 'better' predictability also and therefore further restriction of the freedom of choice students should have (or shouldn't they?). Back to the 'intelligences,' what are these other 'intelligences'? They are called practical intelligence and creative intelligence, and are well known from Sternberg's research on intelligence. It is sure they broaden the concept of intelligence, even of tested intelligence.
I suspect Sternberg to consider adding this intelligences in the admission process an option that should be considered seriously, full well knowing the time costs of extra testing will be considerable. As such, the Sternberg proposal boils down to strenthening instruments, without reflection on the justification of this kind of admission procedure itself. It's a pity we do not hear of Sternberg's reflection on the rationality of the American admission tradition, that - by the way - in its current form not much older is than half a century.

Rebecca Zwick (Ed.) (2004). Rethinking the SAT. The future of standardized testing in university admissions. RoutledgeFalmer questia

a.o. Nicholas Lemann: A history of admissions tsting.

William E. Sedlacek: The case for noncognitive measures

Sedlacek is one of the authors of the Noncognitive Questionnaire (NCQ). Therefore this chapter discusses noncognitive measures in the NCQ, being;
- positive self-concept
- realistic self-appraisal
- understanding and dealing with racism
- preferring long-range goals
- availability of a strong support person
- succesful leadership experience
- community
- nontraditional knowledge acquired
This is a highly remarkable list, containing just the kind of variables that an educational system worthy of its name would count among its goal variables. The educational philosophy of Sedlacek and his colleagues must be that one should prefer to admit to the most selective institutions the students having already attained these important educational goals. Do not misunderstand me; this might be just the right thing to do, but Sedlack does not make it clear why that should be so. In the meantime I would prefer the reverse objective: the better schools should preferably select the students needing their better education. The intensive care of the hospital should admit (selectively, if need be in the case of for example a major catastrophy with many casualties: by triage) those patients standing to profit from this kind of intensive care and from this kind of intensive care only. Sedlacek and his NCQ do not even come close to achieving this. Reporting correlations with later GPA is, of course, meaningless because these correlations do not have a clear relation to the 'value added' by the institution, as contrasted with the 'value selected for' in its admission process.
I regard the use of this kind of noncognitive measures in admissions to institutions of higher education as highly controversial. There no evident relation to constraints in the kind of societal functions open to the alumni of these institutions. An illustration of such a constraint would be body length for the admission to pilot training in the airforce, or mental stability in admission to a policy academy.
Another kind of problem in the use of these and other kinds of noncognitive measures (biographical, physical, motivational) in admissions is the arbitrary character of whether or not such measures will be used, and if so, which ones, and in which ways. In the American cultural tradition it might seem self-evident that it is the prerogative of the selecting instution to make such choices, but even American court cases make it very clear that such is not the case at all.
Again it is the assumption that better ways should be found to select candidates for selective institutions that is the stumbling block. These 'better' ways aggravate the problems in the admissions process. Why not look for ways to reduce the selectivity of those admissions processes, beginning with those of the iviest of Ivy League institutions: Harvard University? One way to do so would be to step up self-selectivity, by heightening the intellectual demands of the courses offered. Other possibilities will come to mind easily. Some of them will temporarily cause other kinds of trouble, than that caused by selective admissions itself, but then what else would one expect if one changes market forces?

See also:
Patrick C. Kyllonen (sept. 2005). The case for noncognitive assessments. ETS. pdf

Neal Schmitt, Frederick L. Oswald, en Michael A. Gillespie: Broadening the performance domain in the prediction of academic success

The goals of (higher) education, of course, are much broader than the GPA used in much research on prediction and admissions. Schmitt, Oswald and Gillespie therefore try to develop a test battery tapping more goals - the more realistic criteria of academic success - as well as instruments to predict them better. One of those instruments uses a kind of in-basket test: "how an applicant might react in different relevant contexts," another taps biographical data.
The authors identify the following dimensions of college performance (you might compare them to Sedlacek's noncognitive measures above):
- knowledge, learning, mastery of general principles
- continuous learning, intellectual interests, and curiosity
- artistic cultural appreciation and curiosity
- multicultural tolerance and appreciation
- leadership
- interpersonal skills
- social responsibility, citizenship, involvement
- physical and psychological health
- career orientation
- adaptability and life skills
- perseverance
- ethics and integrity
The chapter reports the empirical results assembled by the authors on predictive validities etcetera, using the traditional design using tests to predict criteria. There is nothing wrong with that, as long as the tests and the criteria have been chosen wisely, and isn't that just what the chapter is about?
The problem here is amazingly simply. In the Netherlands some seven years ago the first lead tables for secondary education were published by the daily paper Trouw. Thanks to an intervention by sociologist Jaap Dronkers the article also contained data on the input characteristics of students in these schools, therefore providing some insight into the value added by these schools. There is a less than perfect correlation between position in the lead table - based on outcomes only - and the value added measure. The same point has been made by Alexander Astin regarding the American myth that, for example, Ivy League institutions' educational quality would be so high compared to that of other institutions: there is no empirical evidence whatsoever that their value added is higher than that of other institutions, and - as every American reader should know - Astin has collected a tremendous lot of information on American higher education institutions (for example his 'What matters in college?' studies the variables related to the value the instution adds to the human and social capital of its students).
Therefore, what Schmitt, Oswald and Gillespie fail to do is to provide some insight in how institutions differ in value added on their list of criteria, and how the admissions tests they propose predict differences in value added. In personnel selection the authors would not have made the same omission; there the psychologist uses her test battery to predict differences in the value the candidates would contribute to corporate results. I have written a simulation program to study the effects of complex selection procedures on corporate results, look there to get an impression of what this is about (pdf).

Patricia M. Etienne and Ellen R. Julian: Assessing the personal characteristics of premedical students

This is about admissions to medical schools, to graduate education. The situation lends itself for comparison with the admissions to medicine studies in the Netherkands, the only difference being the age of the applicants: in the Netherlands the age of applicants typically is 17 or 18 years.

The Association of American Medical Colleges (AACM) has begun to develop tests of personal characteristics as an additional component of the Medical College Admission Tests (MCAT).
The MCAT tests verbal reasoning - a bit of intelligence - , writing, and biological as well as physical sciences - a bit of 'appropriate content' -. Typically the MCAT, together with undergraduate GPA, plays a major roles in the selection of applicants. The problem here is - and that is bothering the AACM also, already since the 1960s - that future physicians get selected on characteristics that are not particularly appropriate for caring professions. The AACM wants to give more weight in the admissions process to characteristics deemed desirable for practising physicians. This is exactly the kind of argument that plays a major role in the Netherlands also, in discussions about the admissions process for medicine. This is one of the studies where the number of places available have been limited by law, and where the institutions have limited freedom to design their own admissions procedures.

The AACM has conducted research on personal characteristics critical for functioning in the work place. The result is a catalogue of behaviors for students and residential staff:
1. Shaping the learning experience (studnets)
2. Extra effort and motivation
3. Technical knowledge and skill
4. Self-management and coping skills (students)
5. Interpersonal skills and professionalism
6. Ethical behavior
7. Interacting with patients and families
8. Fostering a team environment (students)
9. Mentoring and educating medical students (residents)
10. Maintaining calm under pressure (residents)
What can one say of these items? It looks like the result of a kind of job analysis, where the researchers forgot about the job and looked at personal behavior in a very broad sense. Why not use a personality test, a Big Five test for example, instead? Well, the reason no standardized personality test can be used here, according to the authors, is the risc of the test getting in the public domain, which would invalidate it rather effectively. The kind of admissions test needed is like that of the SAT I, an intelligence test whose items continuously get refreshed. It must be possible to construct many parallel forms of such a personal characteristics test.
The demarcation of personality and skill is not at all clear, though. Many of the items in the list above look like personality characteristics. At the same time they are skills in the medical context, in a job context. The list would do a good job serving as a list of educational goals, for medical school, or any school for that matter. It baffles me completely how one can use a test of skills to select for an educational tract that should train these very same skills. This looks very much like hospitals in an emergency situation instructing their triage officers to admit only the most healthy catastrophe victims, instead of giving absolute priority to victims needing immediate surgery in order to survive.
Well, test development is not easy here, so the AAMC has picked the development of a listening skills test for starters. I do not understand why one should want to develop such a test for admission, instead of for use in the instructional process itself or as a summative test for it. This might just be a fine test to use in the latter way! Dutch readers: see the dissertation of Gertrude Smit (1995) on the construction of this kind of instrument. All readers see for example Pisters, Bakx and Lodewijks (2002) Multimedia Assessment of Social Communicative Competence. html.
One bit of information I am missing in this chapter is whether or not the number of places in medical schools is strictly limited state- or nationwide. In Linns words: is every medical school (highly) selective, or is the number of (highly) selective medical schools only a (small?) proportion of all medical schools? Speaking of ethics (number 6 in the above list .... ) it does make a difference what situation in fact obtains. In the Netherlands the number of places available nationally is strictly limited by law, a 'numerus fixus' has been set ('numerus clausus' would have been the better name, as used in Germany). The government determines who is to be admitted, and who not. The way the Government regulates admissions is not ethically neutral, and not economically either, for that matter. On this matter there has been fierce debate since the early 1970s. How can one or should one treat differently two applicants, both wanting to study medicine, the one having a briljant gpa, the other merely being mediocre? I will not give any answers here; for a survey of the discussion, in Dutch, see my (1997) html, for a description of the use of a lottery method (in English) see Hofstee (1983 [html]).

W. K. B. Hofstee. (1983). The case for compromise in educational selection and grading. In S.B. Anderson and J. S. Helmick (Eds) On educational testing (p. 109-127). San Francisco: Jossey-Bass. [html]
W. K. B. Hofstee (1990). Allocation by lot: a conceptual and empirical analysis. Social Science Information, 29, 745-763. [pdf]
Bea Pisters, Anouke W. E. A. Bakx, Hans Lodewijks (2002) Multimedia Assessment of Social Communicative Competence. International Electronic Journal For Leadership in Learning, Volume 6, Number 1. html.
Gertrude N. Smit (1995). De beoordeling van professionele gespreksvaardigheden. Constructie en evaluatie van rollenspel, video- en schriftelijke toetsen. Baarn: Nelissen. Proefschrift RU Groningen.
Ben Wilbrink (1997). Opsomming van de discussie over toelating bij numerus fixusstudies. In: Gewogen loting gewogen. Advies van de Commissie Toelating Numerus Fixusopleidingen, Bijlage, 121-203. [208k html]
For a listing of publications on lotteries as a method in admissions or and the distribution of scarce goods in general, see my html page.

On dental education, see [not online available]:
John T. Mayhall (1990). Aptitude testing and the selection of dental students. Australian Dental Journal, 35, 548.
M. H. Spratley (1990). Aptitude testing and the selection of dental students. Australian Dental Journal, 35, 159-168. "... it is apparent that many questions remain unanswered."

In pharmacy also a communication skills test is seen as a gift from selection heaven:
Janet Jones, Ines Krass, Gerald M. Holder and Rosalie A. Robinson (2000). Selecting Pharmacy Students with Appropriate Communication Skills. American Journal of Pharmaceutical Education, 64, 159-168.. pdf

Peter J. Pashley, Andrea E. Thornton, and Jennifer R. Duffy: Access and diversity in law school admissions

Camara and Kimmel have given this chapter away to the Law School Admission Council (LSAC, lsac.org) to advertise itself and its test, the Law School Admission Test LSAT. At least they are honest about it, I love that.
This chapter begins by reviewing the LSAC's commitment to diversity. It then turns to the impact of the LSAT on admissions and the efforts ro promote good admission practices. Research on a novel approach to the selection process is described before a consideration of future directions for the LSAC.

Law school is at the graduate level. In the Netherlands law studies entrants typically are 17 or 18 years of age. Another difference is that in the United States the number of places available nationally is limited (artificially? under pressure of associations of 'workers' in the field of law? See Longley (1998) for some numbers), in the Netherlands there is no limitation. Artificial scarcety in the U.S. might have as (intended?) consequence that the study of law is a high status study, while in the Netherlands many school leavers, not quite knowing what direction to go next, apply for law studies - a golden opportunity for the departments involved, regrettably they never have seized this opportunity to offer their less motivated students a stimulating educational experience - .

Maybe as another consequence of the artificial scarcety of places in law school, using tests like the LSAT measuring verbal abilities mainly, is that the population of admitted students does not reflect that of the nation at large as regards the representation of ethnic groups. The LSAC's 'novel approaches' mentioned above, aime at ameliorating the diversity of the law school population by complementing the use of the LSAT with other measures and practices. The urgency of such policies is recognized by all stake holders, including the Supreme Court, by the way. The ethnic diversity situation is quite dramatic.
Unfortunately, the LSAT, like most standardized tests, routinely yields disparate average scores across racial and ethnic subgroups. For example, the average difference between scores from African American and White test takers has been approximately one standard deviation (Dalessandro & Stilwell, 2002).

To illustrate the enormous difference implied by 'one standard deviation', I have constructed the two artificial score distributions in the figure: their means are approximately 'one standard deviation' apart, each group counts 1000 cases, the test is assumed to have 120 items, the cutoff score (for a particular law school, say) is set at 85. Under these conditions, only half of the 'red' group is admitted, while more than 80% of the 'blue' group is admitted. In reality the score distributions will be more spread out than shown here, but then the standard deviation will be higher also, leaving the selectivity difference of more than 30% intact. Remark that the higher the cutoff score is placed, the bigger the selectivity difference will get. [The instrument used to construct the figure is an applet available at applets. For the parameter values used to construct the figure, click it.]
Therefore, the LSAC emphasizes in a number of ways again and again, for example by doing studies like Wightman's (1998), that the LSAT score (or profile) is only one bit or kind of information, other applicant characteristics should be used also, and in possibly non-traditional ways. One such alternative approach is 'choosing a class rather than individual students,' see Pashley and Thornton (1999). In itself, the concept is not new of admitting students in such a way that a certain diversity is present in the admitted class. The technique proposed is also known as an Operational Research approach. The combination of both is radically different, however, because applying OR-techniques it is no longer the case that individual applicants are pitted against each other in the admissions process, but directly compared to the desired class diversity profile and the characteristics available in the pool of applicants. It also makes it possible to use diversity as a criterion without using quotas, a concept that is fiercely contested in many states in the U.S. Assuming selection to be necessary, this surely is a promising approach. Its drawback might be that, because of its abstractness, it may not be quite transparent to applicants.
The big question remains - to the European reader at least - why a relatively cheap education in law - that in medicine is expensive - should need competitive selection? The labor market is artificially (politically?) prohibited to do its corrective work, as it probably also is in the case of medical schools. In the Netherlands it used to be the case in higher vocational education that schools were selective; this selectivity was abolished by law about 1980, exceptions being granted to a small number of schools only (hotel school, dance, schools of music). No disasters followed this change.
Somewhat bewildering: the LSAC promoting diversity and techniques to accomplish this, and at the same time investing heavily in research in the power of the LSAT to predict first year gradepoint average (FYA) ... .

In the Supreme Court of the United States. BARBARA GRUTTER, Petitioner, v. LEE BOLLINGER, et al., Respondents. BRIEF OF THE LAW SCHOOL ADMISSION COUNCIL AS AMICUS CURIAE IN SUPPORT OF RESPONDENTS pdf. Grutter v. Bollinger gets mentioned in almost all chapters in the Camara-Kimmel book, nevertheless its index does not mention Grutter, Bollinger, or the Supreme Court, for that matter.
Joe G. Baker (2001). Employment patterns of law school graduates. LSAC Research Report 00-01. pdf
S. P. Dalessandro en L. A. Stilwell (2002). LSAT performance with regional, gender, and racial/ethnic breakdowns: 1995-1996 through 2001-2002 testing years. LSAC Technical Report 02-01. [not available on the LSAC site, too hot to handle?]
Charles Longley (1998). Law school admissions, 1985 to 1995. Assessing the effect of application volume. LSAC Research Report 97-02. pdf
S. W. Luebke, and others (2003). Final report: LSAC skilss analysis law school task survey. LSAC rep.ort 02-02. pdf
P. J. Pashley and A. E. Thornton (1999). Crafting an incoming law school class: Preliminary results. Newtown, PA.: LSAC. pdf
Linda F. Wightman (1997). The threat to diversity in legal education: An empirical analysis of the consequences of abandoning race as a afctor in law school admission decisions. New York University Review, 72 (1), 1-53. pdf
Linda F. Wightman (1998). Are Other Things Essentially Equal? An Empirical Investigation of the Consequences of Including Race as a Factor in Law School Admission. Southwestern University Law Review, 28(1), 1043. pdf
Linda F. Wightman (2000). Beyond FYA: Analysis of the Utility of LSAT Scores and UGPA for Predicting Academic Success in Law School. University of North Carolina at Greensboro Law School Admission Council Research Report 99-05 July 2000. pdf

Isaac I. Bejar: Toward a science of assessment

Wow, this one is about item writing, and how to change that from an 'art' into a design process! Why the exclamation mark? First, because the chapter's content is so much smaller than its title promises. Second, because it is so right to treat item writing as the back bone of that science of assessment. Third, I wrote a book on item writing by design, in Dutch regrettably, but I will honor anyone's request for a translation of certain parts. The book is online as a pdf file [1.4 Mb] or its chapters separately as html pages.
Now the peculiar thing about Bejar's contribution - peculiar that is to me, probably not to Bejar and his fellow Americans - is that his vision of a 'science of assessment' is limited to institutional assessment of individuals in competition. The term 'institutional' is used here to contrast it with 'individual,' a contrast theoretically already spelled out by Cronbach and Gleser in their path breaking 1957 study Psychological tests and personnel decisions. The term 'individuals in competition' fits the admissions testing framework, at least in its American definition which is competition-driven. Choosing a diametrically different position will result in a 'didacometrics' where individual students play the key role, and achievement criteria will be defined in terms that are 'absolute' as far as individual students are concerned. There is a certain similarity here to mastery learning in contrast to normative assessment, but that similarity will only go so far. In the Netherlands this kind of didacometrics has been proposed and researched by Van Naerssen (1970, in Dutch, abstracts in English) and myself (in progress, in English, providing instruments that you may use yourself in your browser). The work mentioned is proof that a 'science of assessment' is possible along lines radically different from those followed by Educational Testing Service. What does Bejar have to say about the work in progress at ETS?
For starters, it is named Evidence-Centered Design, a name that does not reveal anything of its content. The ETS approach is to modularize the design process in an object-oriented way, meaning the reusability of specific models etcetera is emphasized. The key model is that of the student's expertise (not Bejar's term, by the way). This reminds one of the student model or expert model in serious programming of courseware and expert systems. In contrast, the approach I have taken in my 1983 treatment of item writing by design is to abandon psychological models of knowledge, in particular also the approach taken by Bloom and his team in the well-known cognitive taxonomy of educational objectives, and to replace it with philosophical approaches such as to be found in the theory of knowledge (for example, Carl G. Hempel's work). The ETS approach is psychology-driven, and it stands to win a lot by incorporating insights from the theory of knowledge. One reason for ETS not to choose the philosopher's approach is their clinging to the idea that assessment is (also? mostly?) assessment of intelligence. A lot of research is invested in constructing student models etcetera for analogical and logical reasoning.
Bejar's chapter is highly abstract, which is understandable reading the sophisticated techniques - item reponse theory, natural language processing, Bayes nets - deemed necessary to accomplish this science of assessment. I do not understand, however, why Bejar does not mention how this 'science of assessment' could be in the interest of the primary stakeholders - students and teachers - and why they should accept this 'science' they will not be able to understand even though the quality of their lives will come to depend on it. This particular new science of assessment - brand ETS - does not in any way empower students and teachers. On the contrary: they will understand even less of what is happening to them. In no way is this predictable result of the choices made at ETS an inevitable one: why not use the immense research resources to develop instructional technologies empowering students and teachers directly? Isaac Bejar, what is your answer? Does your colleague Randy Elliott Bennett (2001) present a beginning (How the internet will help large-scale assessment reinvent itself, EPAA vol 9, html, or does reinvent here mean reinstitute the same in a different format?
Online available examples of the kind of research this chapter discusses:
Bejar, I. I., Lawless, R. R., Morley, M. E., Wagner, M. E., Bennett, R. E., & Revuelta, J. (2003). A feasibility study of on-the-fly item generation in adaptive testing. Journal of Technology, Learning, and Assessment, 2(3). pdf. Or the ETS-Report 02-23 GRE Board Professional Report No. 98-12P pdf. Sandip Sinharay and Matthew Johnson (2005). Analysis of Data From an Admissions Test With Item Models Princeton, NJ: ETS. pdf
Chao-Lin Liu, Chun-Hung Wang, Zhao-Ming Gao and Shang-Ming Huang (2005). Applications of lexical information for algorithmically composing multiple-choice cloze items. Proceedings of the 2nd Workshop on Building Educational Applications Using NLP. Ann Arbor. p. 1-8. Association for Computational Linguistics. pdf
Russell G. Almond, Linda S. Steinberg, and Robert J. Mislevy (2003). Enhancing the Design and Delivery of Assessment Systems: A Four-Process Architecture. Journal of Technology, Learning, and Assessment, 1. pdf
Randy Elliot Bennett (2004). Inexorable and Inevitable: The Continuing Story of Technology and Assessment. Journal of Technology, Learning, and Assessment, 1. pdf. Also in Dave Bartram and Ron Hambleton (Eds) (Oct. 2005). Computer-Based Testing and the Internet: Issues and Advances. New York: Wiley.
Robert J. Mislevy, Russell G. Almond, Duanli Yan, and Linda S. Steinberg (2000). Bayes Nets in Educational Assessment: Where Do the Numbers Come From? CSE Technical Report 518 pdf.

Stanley Rabinowitz: The integration of secondary and postsecondary assessment systems : cautionary concerns

In recent years there has been an enormous growth in the number of tests used in education in America, particularly so at the end of high school, most of them high stakes testing. Rabinowitz is not the only one to express concern about this development (Linn, 2001). Many parties now are looking for opportunities for efficiency, the most important of which seems to be to use tests that have multiple purposes in secondary (exit exams) as well as in postsecundary (admissions) education. Rabinowitz calls attention to the somewhat mindless addition of one kind of test program after another, a very big one being that of the No Child Left Behind Act.
The analysis of Rabinowith stays however very much at the surface level of these developments. He tends to sum up scores of particular difficulties, that research should help to tackle and overcome. What is missing is a comprehensive vision on the general problem of transition between educational system - secondary education - to another - post-secondary education - or between education and the labor market, which poses in many respects a similar kind of problem.
Comparison to the situation in other countries is lacking, as it is in almost all of the chapters in this book. In the Netherlands, for example, the transition between secondary and post-secondary education is rather clear cut in comparisoin to that in the States. There are a state-mandated end-of-course examinations that also provide access to post-secondary education. This is not to say that the Dutch situation is without its own problems, but in comparison they seem to be in another league. The double-function of the state-mandated examinations is somewhat problematic, recently giving rise, ironically spoken, to serious concerns about declining standards in secondary education. The problems in the Netherlands, however, are clearly perceived to be problems in the educational process itself, not in the kinds or techniques of testing as Rabinowitz has it.
In the paragraph on 'required features of statewide assessment systems' the narrow perspective Rabinowitz takes is most clearly present. These features are of one kind only, psychometric ones: reliability and validity. Tests should meet high standards (AERA/NCME/APA) on these criteria. In itself this demand is perfectly legitimate (leave out the reliability concern, however; it will take care of itself if validities are proven to be allright). But what is highly questionable is that the problem lies in the psychometrical qualities of the tests, instead of in the exuberant use of standardized tests itself. The chapter opened on a high tone that the last really is the problem, but here Rabinowitz clearly falls back on piecemeal tinkering with the quality of tests as the way out of the problems. It can't be denied that there are many promising developments here, isn't it? There are, but the point is whether we should wish to pursue these new developments (the reserach of Sternberg is really promising, though). The point is, and Rabinowitz does not even pay lip service to it, that at the end of the day it is the quality of education itself that matters, not the quality of its multipurpose assessments. Express its importance in terms of human capital, addition to gross national product, human well-being, whatever; on every count it is clear immediately that this overkill of assessments is sapping the resources of the educational process itself (for example, see Nicholas and Berliner, 2005).
The last paragraph 'additional cautions' is the last opportunity to produce some new insights, and the chapter misses it. It is about the busy traffic of students between secondary and post-secondary institutions. Of course there are problems here of choosing the right kind of courses or schools, in preparation for a vocational life. But that is not the point the chapter makes. Again in this book it is the selectivity and diversity problem that attracts all attention, and again this problem gets stamped as one that can be solved by the power of tests. Not a wordt about what post-secondary education is for, what it adds to the individual's and society's human capital. Let alone about the supposed differences between educational institutions in this respect, never ever substantiated by quantitative research (see the lifetime work of Alexander Astin in this respect). At least in personnel psychology there is nowadays a clear notion of what it is that the psychological test battery should predict (and it does); in this book such a clear conception is monumentally missing. Being in the assessment business does not give one the right to promote assessment as the solution to all problems (the chapters closing statement given below). What is more, this is exactly the kind of thinking that has brought about the predicament the educational system finds itself in today:
- Next, policymakers must invest in research on future assessment methodologies, be they types of testing formats or systems of delivery, particularly those employing computer-adaptive approaches. The future does hold great possibilities for advancements in assessment power and efficience but only if important questions of comparability, access, and fairness are addressed both from an assessment perspective, and more importantly, at the classroom instructional level as well.
There glimmers a bit of hope here: let us look at the classroom instructional level itself. Do it, Rabinowitz and colleagues!

For online available research on the topics of this chapter see for example the testing page of AACT Education Policy Clearing House. [not online 2-2008]
AERA/NCME/APA (1999). Standards for educational and psychological testing. Washington, DC: American Psychological Association. [This is a fine set of standards, the results of one half of a century expeience and development. However, in itself standards like these do not and can not answer important questions about the desirability of using standardized tests in a particular environment. Regrettably, these associations have not made the standards online available, they should be ordered in Washington, DC.]
Audrey L. Amrein and David C. Berliner (2002). High-Stakes Testing, Uncertainty, and Student Learning. Education Policy Analysis Archives, 10. html. Closing statement: "Both the uncertainty associated with high-stakes testing data, and the questionable validity of high-stakes tests as indicators of the domains they are intended to reflect, suggest that this is a failed policy initiative. High-stakes testing policies are not now and may never be policies that will accomplish what they intend. Could the hundreds of millions of dollars and the billions of person hours spent in these programs be used more wisely? Furthermore, if failure in attaining the goals for which the policy was created results in disproportionate negative affects on the life chances of America's poor and minority students, as it appears to do, then a high-stakes testing policy is more than a benign error in political judgment. It is an error in policy that results in structural and institutional mechanisms that discriminate against all of America's poor and many of America's minority students. It is now time to debate high-stakes testing policies more thoroughly and seek to change them if they do not do what was intended and have some unintended negative consequences, as well."
Robert Linn (2001). The design and evaluation of of educational assessment and accountability systems. Tech. Rep. 539. Los Angeles: Center for the Study of Evaluation, National Center for research on Evaluation, Standards, and Student Testing, University of California. "http://www.cse.ucla.edu/CRESST/Reports/TR539.pdf [not online as of 2-2008] (from the abstract: "The importance of evaluating and reporting the precision of assessment and accountability results are discussed. Finally, a key validity issue - the degree to which reports of performance and of improvement based on observed assessment results support inferences about student learning - is addressed." Words, words, words.)
Sharon L. Nichols and David C. Berliner (2005). The Inevitable Corruption of Indicators and Educators Through High-Stakes Testing. Education Policy Studies Laboratory, Arizona State University pdf (180 pp.). (contents: Criticisms of Testing - Corrupting the Indicators and the People in the World Outside of Education - Corrupting the Indicators and the People in Education - Methodology - Administrator and Teacher Cheating - Student Cheating and the Inevitability of Cheating When the Stakes are High - Excluding Students from the Test - Misrepresentation of Dropout Data - Teaching to the Test - Narrowing the Curriculum - Conflicting Accountability Ratings - The Changing Meaning of Proficiency - The Morale of School Personnel - Errors of Scoring and Reporting).

Michael W. Kirst: Rethinking admission and placement in an era of new K-12 standards

Kirst quantifies Linn's remark of most colleges not being very selective: "About 80% of high-school graduates who attend postsecondary education go to broad access institutions that are open to enrollment or accept all qualified applicants (McCormick, 2001)." And this group of students experiences loads of problems in the transition to post-secondary education. While the American system of transition to secondary education is perceived - by Americans themselves (Adelman, 2001) as well as foreigners - first and foremost to be selective, for the great majority of students it is in fact a hurdle defined by a lack of adjustment to the needs of almost all stakeholders involved. The result is an unbelievable 50% of students finding themselves in remedial classes, high numbers of dropouts, almost a full school year being lost to productive instruction, etcetera. The chapter, highlighting these difficulties in the transition process to secondary education, therefore is the most important in the book, proving the subtitle of the book to be a misnomer. The 21st century challenge is not to find better tools for admissions, but to create the synergy between institutions necessary for smooth transitions to secondary education.
The chapter's themes are the 'Babel' of assessments confronting high school students, the stagnation in minority enrollment after the banning of affirmative action in admissions policies, the loss of productivity in the senior year, and the disruptive effects and inequity resulting from using high school rank in admissions.

In an extensive appendix Kirst describes 'The Bridge project: Strenghtening K-16 policies.' Project site.
- (Report: Betraying the College Dream: How Disconnected K-12 and Postsecondary Education Systems Undermine Student Aspirations. pdf. Publications from the project download)
In six participating states "Project researchers studied 1) the content of K-16 student transition policies, such as postsecondary admissions requirements, placement exams, and remediation policies, and statewide high school assessments, and 2) the ways in which these policies are communicated to, and understood by, stakeholders."
From the executive summary of 'Betraying':
America's high school students have higher educational aspirations than ever before. Eighty-eight percent of 8th graders expect to participate in some form of postsecondary educationi, and approximately 70 percent of high school graduates actually do go to college within two years of graduating. (...)
But states have created unnecessary and detrimental barriers between high school and college, barriers that are undermining these student aspirations. The current fractured systems send students, their parents, and K-12 educators conflicting and vague messages about what students need to know and be able to do to enter and succeed in college.(...) Other findings highlighted issues such as inequalities throughout education systems in college counseling, college preparation course offerings, and connections with local postsecondary institutions; sporadic and vague student knowledge regarding college curricular and placement policies; the importance of teachers in advising students about college preparation issues; student overestimation of tuition; and an inequitable distribution of college information to parents. (...)
references
Clifford Adelman (2001). Putting on the glitz. New England Journal of Higher Education, 15, 24-30. pdf
Education Trust (1999). Ticket to nowhere: The gap between leaving high school and entering college and high-performance jobs . Thinking K-16 Series, 3. Washington, DC. html
A. C. McCormick (Ed.) (2001). The Carnegie classification of institutions of higher education, 2000 edition. Menlo Park, CA: The Carnegie Foundation for the Advancement of Teaching. The book is not online available. For recent data - but selectivity is not in the tables - see individual profiles lookup. Soon available a complete classification data file
Vi-Nhuan Le (2002). Alignment among secondary and post-secondary assessment in five case study states. Santa Monica, CA: Rand Corporation. pdf
Andrea Venezia, Michael W. Kirst, and Anthony L. Antonio (2003). Betraying the college dream: How disconnected K-12 and postsecondary education systems undermine student aspirations. Stanford, CA: Stanford Institute for Higher Education Research. pdf
United States Department of Education (2001). Condition of education 2001. Washington, DC: U.S. Government Printing Office. pdf
University of California, Board of Admissions (2002). First-year implementation of comprehensive review in freshman admissions: A progress report from the board of admissions and relations with schools. Oakland: University of California. pdf
Norman L. Webb (1999). Alignment of science and mathematics standards and assessments in four states. National Institute for Science Foundation. Madison: University of Wisconsin. pdf

David T. Conley: Proficiency-based admissions

To European eyes a chapter as this comes as a total surprise: these Americans are trying to build a system we in Europe always have had! Our secondary education exit exams are exactly the kind of proficiency based tests these Americans are trying to build. The surprise, of course, should be no surprise at all: the reason the selective admissions system in th U.S. has grown to be what it is today, is that educational institutions were not regulated at the national level, as the case was - and is - in continental Europe, and largely in the U.K. also. But then our European politicians and concervative intellectuals should immediately stop to hold the American admissions system as an ideal to be followed in Europe also. Regrettably, I must predict they will never do so.
Conley emphasizes the many difficulties on the way to establish the ideal of proficiency-based admissions. Or is it his ideal? Conley's assumption seems to be that the integration of secondary and post-secondary assessments should result in a system without any serious problems at all. Such systems simply do not exist. The existing American system of selective admissions is an example of a system with serious shortcomings, and even a better alternative system will have them. It is more or less unavoidable in the transition between two separate educational systems. Let me illustrate this with an example. One criterion used over and over again is the prediction of success in post-secondary education, but 'success' here is not a simple concept because many dropouts eventually do return to school and finish their education. In the Netherlands, for example, most politicians and univeristy administrators are convinced that dropout rates are very high, and regular statistics do prove them right. But these regular statistics do not reveal whether dropouts register again in another study or another institution. In a major research project it was revealed that almost 90% of students entering higher education in the Netherlands succeed in leaving it with a diploma. Therefore it might be a better idea for the many states struggling with this problem, to settle for reasonable solutions instead of perfect ones, and to downplay the importance of the prediction of study success.
A remarkable feature of this chapter is its treatment of reliability issues. The problem here is a general one in the testing community. Somehow authors do not seem to understand that having high validity is proof of good reliability. By the way, this validity must be empirically shown to be present, not an opinion because test content seems to be 'valid.' The misunderstanding about reliability has a long history, for example in the second edition of the Handbook on Educational Measurement the author, - Robert Stanley - of the chapter on reliability started off remarking something of the sort I did above, and then filled yet almost hundred pages on the subject. It is not a minor problem, though, because the misunderstanding about reliability threatens validity of already established tests and procedures that one tries to make 'more reliable.' The problem is aggravated even more where equity issues get connected to reliability, which they should not: they demand particular kinds of proven validity. Much of the resources spent in training teachers to score 'more reliably,' or to abundantly use 'moderation panels' for an independent re-assessment of materials, are lost to the educational process itself. This might be seen easily if one remembers that the best way for students to learn is to get immediate feedback. Such feedback is assessment. All other forms of assessment in education serve other purposes than immediate instructional feedback, therefore they are more or less bureaucratic and parasitic on the educational process, even if not wholly unavoidable. Now apply the reliability concept on this kind of immediate feedback; that is not quite enlightening, ins't it? Well, then, do not bother teachers with it.
Some problems caused by this fascination for reliability are highly visible. Conley mentions the trend of more assessments using the multiple choice format, with probably counterproductive results for the quality of education itself. This is a prime example of reliability concerns crowding out validity. This kind of development has repeatedly been observed in the history of examinations, in Imperial China, as well as in Cambridge around 1800 (Wilbrink, 1997).
One last remark about reliability: part of it is sampling variability that technically - and rightly - is regarded as part of reliability. We should be very clear about this aspect of relibality, because it is inherent in education as well as in most forms of mental measurement. A group of students having a mastery 0.6 of course material will have test scores that have a binomial distribution, meaning that only a small fraction of these students will have exactly 60% of the items correct. The figure shows such distributions for a 20 item test. The reader who wants to experiment with this kind of sampling probability can use the instrument I have made available as a Java applet, module one of the Strategic Preparation for Achievement Tests model.

Ben Wilbrink (1997). Assessment in historical perspective. Studies in Educational Evaluation, 23, 31-48. html

references in Conley online available
C. Adelman (1999). Answers in the toolbox: Academic intensity, attendance patterns, and bachelor's degree attainment. page
National Center for Education Statistics (2001). Conditions of education 2001. Washington, DC: US. Department of Education. Downloads page. ("Also in the 2001 edition is a special focus essay on the access, persistence, and success of first-generation students in postsecondary education.")
No Child Left Behind
Robert Perkins, Brian Kleiner, Stephen Roey, and Janis Brown (2004). The High School Transcript Study: A Decade of Change in Curricula and Achievement, 1990-2000. National Center for Education Statistics. see download page.
A. Venezia, M. Kirst and A. Antonio (2003). Betraying the college dream: How disconnected K-12 and postsecondary systems undermine studnet aspirations. Palo Alto, CA: Stanford Institute on Higher Education Research. pdf

Having read this far, you will have an intuitive feeling about the criteria I use in evaluating the proposals etcetera in the Camara-Kimmel book. Let me spell them out, for clarity's sake.
To begin with, I use my lifetime expertise in studying admissions issues in the Netherlands, and advising politicians about the possibilities and impossibilities involved. No need saying politicians prefer to base their decisions on the impossibilities the selection psychologist will identify.
There is an excellent collection of standards ruling the production and use of (psychological) tests in high stakes testing, published by the American Psychological Association. These standards have been used in courts (Lerner, 1978), Congress (for example Eva Baker 2001) etcetera. It is especially the use of instruments like the SAT in admissions that is highly problematic in the light of these standards. This is a fact that is not noticed by many stakeholders, because everybody seems to take it for granted 1) that it is the institutions taking the decisions (imagine your hospital to treat you like this), 2) that they are free to select the students that are most cost-effective to the institution (imagine your hospital to take only patients asking the smallest amount of care), 3) that students gifted with talents do merit admission to the 'best' institutions (do they 'merit' the talents they are born with or born into?), and 4) that the more selective the institution, the more value its educational programs add to the capabilities of its students.
Alexander Astin (the above hospital metaphor is his), who made it his lifetime job to study selectivity and quality of America's institutions of higher education, denies ever to have seen evidence for 4) above to be true. Meaning that the use of the SAT in admissions does not add the slightest value to the output of America's higher education measured in terms of human capital, or Gross National Product for that matter. The point here is simply that institutions should not be allowed to compete for the 'best' students using the admissions process. Which boils down to the question who or which institution is entitled to take admissions decisions. This is the big question behind almost any remark in the above review about the positions taken by the authors and the way some of these positions may be criticized. In the Netherlands there is no question that the student is the primary decision maker. At least until shortly, for now there is a strong current in politics to adopt American ways for admission (politicians are always attracted to the impossibilities: the most prestigious courses in the Netherlands - f.e., natural sciences, engineering, mathematics, informatics - suffer under falling numbers of applicants!).
Which brings me to the point of the review. American selective admissions are highly visible to politicians abroad and to visiting professors, and therefore will influence admissions policies everywhere in the world, as they will in the Netherlands. Nothing wrong with that, as long the American admissions process proves to be sound and productive (and exportable, which it is not). Have Camara and Kimmel convinced this selection psychologist it is sound and productive?

American Psychological Association (1999). The Standards for Educational and Psychological Testing. Developed jointly by: American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME). http://www.apa.org/science/standards.html

American Psychological Association (1999). Code of fair testing practices in education. Prepared by the Joint Committee on Testing Practices. http://www.apa.org/science/fairtestcode.html#a

Alexander W. Astin (1985). Achieving educational excellence San Francisco: Jossey-Bass.

Alexander W. Astin (1993). Assessment for excellence: the philosophy and practice of assessment and evaluation in higher education. American Council on Education / Oryx series on higher education.

Alexander W. Astin (1993). What matters in college? Four Critical Years revisited. San Francisco: Jossey-Bass.

Burton Clark (1985). The school and the university. University of California Press. [Admissions in other countries]

Barbara Lerner (1978). The Supreme Court and the APA, AERA, NCME Test Standards: past references and future possibilities. American Psychologist, 33, 915-919. (And a bonus article:)

Barbara Lerner (1997). America's schools: still failing after all these years. National Review, Sept 15. html (about another kind of standards, not unrelated to the admissions question)

Special page of annotated references on admissions, in the US as well as elsewhere.

January 3, 2011 \ contact ben at at at benwilbrink.nl

If you have comments or supplementary information, mail me.

http://www.benwilbrink.nl/literature/camarakimmel.htm