Measurement invariance in PISA 2015: A systematic investigation of patterns across questionnaires, scales and countries

Janine Buchholz & Johan Braeken

Session 3A, 9:45 - 11:15, HAGEN 2

International large-scale assessments (ILSAs) such as the Programme for International Student Assessment (PISA), Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study (PIRLS) aim at measuring and comparing latent constructs between respondents from a large number of participating countries -- an endeavor which requires measurement invariance (MI) across all participating countries to be established. The most commonly employed technique for MI testing is multigroup-CFA (MGCFA; e.g. Greiff & Scherer, 2018). Yet, the method was proven unsuitable given the large number of countries participating in these assessments (Rutkowski & Svetina, 2014). In addition, it capitalizes on global model fit, thus being unable to point at group-specific misfit.

Using a recently developed measure of group fit rooted in MGCFA, the present study presents a systematic investigation of the 58 questionnaire scales reported in the most recent cycle of PISA (OECD, 2016) for the following reasons: (1) PISA can be regarded as having “strategic prominence in international education policy debates” (Hopfenbeck et al., 2017, p. 1); (2) with about 70 participating countries in PISA 2015, the number of tested groups is particularly large; (3) within ILSAs, the questionnaires are hardly ever subject to MI testing (e.g. Braeken & Blömeke, 2016); (4) most scientific publications on PISA focus on secondary analyses of constructs administered with the questionnaires (Hopfenbeck et al., 2017), thus placing an operational need on the appropriateness of comparisons across countries in these studies; (5) in its most recent cycle, PISA implemented an innovative approach for MI testing using IRT item fit (OECD, 2016), thus raising the question about the replicability of their findings in the context of more common analysis techniques.

Based on a quantification of the amount of measurement (non-) invariance across scales and countries, we will report on identified patterns due to scale properties (e.g., length, response categories, previous use) and country characteristics (e.g., previous participation, geographic location, language groups, gross domestic product). These findings will help to identify country subsets for which meaningful comparisons are appropriate, and they may also be used to guide questionnaire development in the context of ILSAs.

Published Sep. 5, 2018 1:42 PM - Last modified Sep. 5, 2018 1:42 PM