Peculiar Subgroup’s Aberrance Response Behavior in Multistage Adaptive Testing: A simulation study
Session 1A, 10:30 - 12:00, HAGEN 2
The purpose of this simulation study is to investigate peculiar subgroup’s aberrance response behavior under a 3-stage multistage test (MST) design. Aberrant responses may lead to proficiency estimation error because the estimates would not reflect the examinees’ actual proficiency. Topics related to the examinees’ aberrant responses, such as person-misfit statistics, item selection strategy, and response time, have been widely investigated under the computerized adaptive test (CAT) context (Karabatsos, 2003; Meijer, 2003; van der Linden, 2008). Like CAT, MST is also adaptive. However, MST differs substantially from CAT in terms of its design structures and adaptive algorithm. MST utilizes routing decisions that are based on performance on a series of preassembled test items, called modules. A test form consists of a series of stages in which one or more modules are administered. An MST design consists of a small number of separate modules, and each module can be assembled to meet a set of specifications such as item content and item difficulty. Adaption to an examinee’s ability occurs between stages of the testing process and is based on the examinee’s cumulative performance on previous item sets. Accordingly, fewer adaptation points are available under MST. MST designs vary substantially as a function of numbers of stages, numbers of modules, or numbers of items in each module. Figure 1 shows an example of a three-stage multistage testing structure. Hence it is difficult to generalize the findings derived from CAT directly to the MST context. Few studies have investigated aberrances in examinee behavior in MST and only two-stage design was investigated (Kim & Moses, 2016). As MST has received attention for their features and efficiency nowadays, more research on examinees’ aberrant responses in MST literature is needed.
The simulation is based on a three-stage 1-2-3 MST with each item parameterized according to the two-parameter logistic (2PL) item response theory (IRT) model. For the no aberrance condition, the average of item difficulty parameters was set to be 0.00 for Stage 1, -0.5 for low and +0.5 for high at Stage 2, and -1.0 for low, 0 for middle, and +1.0 for high at Stage 3. The averaged item discrimination parameters are set at either 1 or 0.5 in all modules. I further simulate 100 examinees at each of 41 quadrature points on a theta scale ranging from −3.0 to +3.0, with an interval of 0.15 (N =4,100). To simulate peculiar subgroup’s item responses which differ from the no aberrance condition, item responses are generated using fake item parameters to manipulate the levels of difficulty and discrimination, i.e., 15% of the examinees at each theta point; (1) subtract 0.3 from the true b parameters; or (2) increase 0.3 from the true b parameters; or (3) subtract 0.1 from the true a parameters; or (4) increase 0.1 from the true a parameters. The achievement estimates are compared with their true proficiency means (i.e., generated thetas). Full details are omitted here due to space constraints. The present findings are expected to contribute to the MST literature.