A new likelihood approach for the simultaneous estimation of IRT equating coefficients on multiple forms

Waldir Leoncio & Michela Battauz

Session 2B, 12:45 - 14:15, VIA

Test equating is a statistical procedure to ensure that scores from different test forms are comparable and can be used interchangeably (González and Wiberg, 2017). Within the Item Response Theory framework, if the statistical modeling of each test form is performed independently, their respective parameters will be on different scales and thus incomparable. Equating solves this problem by transforming item parameters so they are all on the same scale. Popular methods for equating pairs of test forms include the mean-sigma, mean-mean, Stocking–Lord and Haebara (Kolen and Brennan, 2014). For multiple forms, it might be necessary to employ more elaborate methods which take into account all the relationships between the forms.

We are proposing a new statistical methodology that simultaneously equates a large number of test forms. Simultaneous equating methods are not new in the literature, with Haberman (2009) proposing a linear regression method, Battauz (2013) presenting chain and average equating coefficients and Battauz (2017) introducing the generalization of some well-known methods such as those mentioned above. Our proposal differentiates itself from the current state-of-the-art by using the likelihood function of the true item parameters and the equating coefficients to perform the concurrent estimation of all equating coefficients. By taking into account the heteroskedasticity of the item parameter estimates as well as the correlations between the item parameter estimates of each test form, this new method yields equating coefficient estimates which are more efficient than what is currently available in the literature.

When dealing with large-scale assessments, often composed of several test forms with dozens or hundreds of items each, the number of parameters to be estimated can easily become a concern. After all, each new item adds at least one IRT parameter to the likelihood function, and any additional test form can introduce several new items as well as two mandatory equating coefficients. This can quickly make the proposed approach too complex from a computational point of view. We overcome this problem by considering the equating coefficients as parameters of interest and the true item parameters as nuisance parameters. With this setup, the profile likelihood can be used instead of its complete counterpart, thus potentially saving the costly estimation of hundreds of parameters of secondary importance.

The statistical and computational properties of the methods developed are being investigated under controlled simulations and the results are promising. Possible practical applications include any large- scale assessment which administers and equates several test forms.

Published Sep. 5, 2018 1:39 PM - Last modified Sep. 5, 2018 1:39 PM