Optimal scores in comparison to sum scores and parametric IRT scores

Marie Wiberg, James Ramsay & Juan Li

Session 4B, 11:20 - 12:05, VIA

Many standardized tests use sum scores, i.e. number of items correct, as a measure of test takers’ ability as they are easy to interpret and computationally fast. Sum scores have however, some limitations as they are calculated after a test has been performed and it targets the whole test and not single items. When constructing a test it is instead common to model the items with parametric item response theory (IRT). A well-known problem with parametric IRT is that not all items can be satisfactorily modeled with a parametric IRT model. Recently, optimal scores was proposed to be used in addition to sum scores and serves as a flexible alternative both for scoring the test and for estimating item characteristic curves. In optimal scores, the interaction between test takers’ performance and item impact is used, thus giving more weights to items with more information. The aim with this presentation is to present and discuss optimal scores and compare it with sum scores and parametric IRT scores using both real test data and simulated data. Examples of how to fit different real test items will be given in comparison to parametric IRT models. The simulation study will examine bias and root mean squared error for optimal scores as compared with the alternatives. The results indicate that we can improve the accuracy if optimal scores are used and that optimal scores provide a flexible alternative for estimating item characteristic curves. The latter is especially of interest when we have items, which does not fit a parametric IRT models. The presentation ends with a discussion, which include some future direction of research.

Published Sep. 5, 2018 1:48 PM - Last modified Sep. 5, 2018 1:48 PM