IMO, there are exactly two reasons to give a test:
To sort students.
To help students learn more.
I believe reason #1 is the main reason tests (in particular, standardized tests) are given. We know this because for most standardized tests, teachers and students get no feedback at all about what items have been missed and why. Certainly by the time results of any kind are received, the student has moved on to a new teacher.
If the purpose of a test is to learn more, then it needs to be designed as such, and teachers need to treat them as such. Why, then, would there then need to be a score? When was the last time your tennis coach gave you a precise grade on your backhand? Would that have helped you play tennis better?
The same can be said for end of term grades.
At the time I promised Jack that I would respond to his challenge.
The science behind high-stakes testing is based on giving all students the opportunity to show what they know under the same conditions by writing the same test — a test that will count significantly toward their final grade and has been developed to be reliable and valid in providing information about student learning and mastery. This is important because students working with different teachers, and completing different assignments and assessments during the year can end up with the same teacher-awarded grade at the end of the year — say, 85 per cent — but actually possess very different levels of preparedness, learning and mastery. Committees of content, technical and assessment specialists, composed of highly experienced educators and scientists, create high-stakes tests. These committees, using the latest educational, scientific and technical methods for test design and development, make sure that (a) the content material taught in classrooms is adequately covered in the high-stakes test, (b) new test items are reviewed and field-tested before they appear on the final operational test to make sure the wording is understandable, does not bias or offend students, and conforms to the technical standards of previous items, (c) test items are double and triple-checked using a variety of technical analyses to ensure that the results are consistent within a pattern or trend — for example, students who respond correctly to one item are also responding correctly to items measuring the same material; this is done to ensure that items are not underestimating or overestimating what students know, and (d) test results are constantly monitored so that the test continues to measure the appropriate content and skills in students who have learned the material well and achieved mastery. Advancements in the science of testing are continuously integrated into the design and development of high-stakes tests.
If we read this uncharitably, perhaps Jack is right. Leighton says that the test does, indeed, sort students, but, she adds, it does it well. I’ll grant this as trivially true. On the other hand, well-constructed exams provide a second level of assurance that the student has met some standard or other, that the student is competent. Further, it sets the standard of competence in a publicly understood ways. Yes, we are separating those who meet the standard from those who do not, but what alternative do we have?
I think that there is another important function of these examinations: they provide evidence of system-wide performance. That is, it is crucial that a publicly sponsored and funded system of education provide curriculum that is attainable by the students, that the resources appropriately support the curriculum, that teachers teach the curriculum to students and that students learn the curriculum. Any individual student can have an especially fortunate or unfortunate day. The test does not guarantee that the student has learned (or not learned) the material; but it shouldn’t be too gruesomely off. Because randomness and luck travel in all directions, a complete class of students should have a test average that is not too far off an accurate measurement of all their learning. This is important information for teachers, schools, and jurisdictions.
All the above is predicated on the sort of high-quality, curriculum-referenced test that Jacqueline Leighton is talking about. Test construction is a highly technical business, and it should not be taken lightly. No matter how high you make the stakes, a badly constructed test provides little to no valuable information about individuals or groups. Similarly, generic ability tests cannot measure the curricular achievement of students.
So if you’re going to have a high stakes examination, make sure it’s professionally constructed, validated and relevant to the curriculum taught and learned.
Oh, and if you want some fun reading in your spare time, here’s a book Jacquie co-edited and in which I co-authored Chapter 3.