Trude Nilsen & Rolf Vegar Olsen
In this paper we compare and discuss similarities and differences in the way PISA and TIMSS develops descriptions of what students typically know or can do in the domains included in the tests. The procedures applied make use of items’ difficulties to empirically delimit and verbally describe a set of discrete levels of progression along the internationally standardised scales. These procedures can be viewed as reduced versions of the standard setting method commonly referred to as bookmarking. Using Item Response Theory and a set of empirical and pragmatic rules the scales are divided into a limited number of intervals by a few benchmarks or cut-off scores. The derived descriptions in the two studies differs in that the process in TIMSS results in statements reflecting the actual content of the items involved, while PISA presents more generic descriptions. Grading in school subjects in Norway is based on very generic and domain independent descriptions of criteria referring to the curricular goals. However, grading is a complex and judgemental process and we claim and present arguments for why the application of methods like those referred to above are highly relevant for achieving theoretically sound and empirically robust principles for grading in our country.