Using professional judgement to equate exam standards
In the UK, and perhaps in many other countries too, we don’t really set standards very often. Our main concern in most educational assessment is to maintain a standard once it has been set. We use many techniques to equate a new test to the previous standard, but a key feature is the part played by professional judgement. Often, we select some scripts from around about the expected pass mark and ask judges to decide - somehow - which single mark best represents the existing standard. But this is fraught with difficulties, and controversy . .
I will describe a wholly different approach, which does not require tests to be marked at all - no marking means no marker training, no marker standardisation, no ‘award meetings’, and a great reduction in time and costs. Instead, this approach relies on the professionalism of teachers, and their ability to recognise quality in a student’s performance. Marking is replaced by (Adaptive) Comparative Judgement, one of Thurstone’s methods for creating genuine measurement scales for psychological phenomena: judges simply make many binary choices, choosing A or B as the ‘better’ of two scripts they are shown. The result is an extremely reliable scale of the quality of the scripts. For normal operational scoring all the scripts come from one test, but for equating purposes a small number from, say, last year’s test are included and a composite scale is created, which carries the standards forward automatically.
Very powerful statistical controls are available for ‘CJ’ data, which means that most concerns about using judgement for equating can be monitored, and uncertainties can be quantified. Although this work is stil at an early stage, the results so far are very promising!