Abstracts
Résumé
La présence de réponses aberrantes est habituellement détectée par l’utilisation d’indices d’ajustement permettant de déterminer si le patron de réponses est inapproprié par rapport aux caractéristiques du test. Cette approche nécessite cependant une préestimation des paramètres d’items qui est souvent réalisée sur le même ensemble de données. La présence de réponses aberrantes pourrait donc influencer le processus de calibration et la détection de patrons inappropriés. Cet article présente un processus itératif pour réduire le risque d’une calibration biaisée causée par la présence de réponses aberrantes. La démarche consiste à retirer successivement les patrons identifiés comme inappropriés du processus de calibration des items. Ce processus est illustré en analysant les données d’un test de classement en anglais langue seconde (TCaLS-II) au Québec. L’application du processus itératif aux données met en évidence une augmentation du nombre de patrons de réponses détectés comme inappropriés, présentant un impact relativement faible sur les paramètres d’items estimés et un nombre restreint d’itérations nécessaires pour obtenir une convergence du processus itératif.
Mots-clés :
- théorie de la réponse aux items,
- calibration des items,
- réponses aberrantes,
- patrons inappropriés,
- indice lz,
- purification
Abstract
The presence of response disturbances or aberrant response patterns is often assessed by the computation of person fit indexes. These indicate whether a pattern, as a whole, can be considered as abnormal with respect to the test characteristics. However, most often they require item parameters to be pre-calibrated and this calibration is performed upon the same data set. The presence of response disturbances might therefore impact the item calibration process and subsequently, the identification of person misfit. This paper presents a straightforward iterative process to reduce the risk of unfair item calibration due to the presence of response disturbances. The idea consists in iteratively removing the patterns flagged as aberrant from the item calibration process, and re-computing the person fit indexes with newly calibrated parameters. The process is illustrated by analyzing the data from an English skill assessment questionnaire in Quebec. applying the process to these data reveals: an increase in the number of response patterns flagged as aberrant, a somewhat weak impact on estimated item parameters, with a limited number of iterations required to reach convergence of the iterative process.
Keywords:
- item response theory,
- item calibration,
- aberrant responses,
- person fit,
- lz index,
- purification
Resumo
A presença de respostas aberrantes é habitualmente detetada pela utilização de índices de ajustamento que permitem determinar se o padrão de respostas é inapropriado em relação às características do teste. no entanto, esta abordagem necessita de uma pré-estimativa dos parâmetros de itens que é realizada sobre o mesmo conjunto de dados. a presença de respostas aberrantes poderia, assim, influenciar o processo de calibração e a deteção de padrões inapropriados. Este artigo apresenta um processo iterativo para reduzir o risco de uma calibração enviesada devido à presença de respostas aberrantes. O procedimento consiste em retirar sucessivamente os padrões identificados como inapropriados do processo de calibração dos itens. Este processo é ilustrado pela análise dos dados de um teste de avaliação de competências de Inglês como segunda língua (TCaLSII) no Quebeque. a aplicação do processo iterativo aos dados coloca em evidência um aumento do número de padrões de respostas detetadas como inapropriadas, apresentando um impacto relativamente fraco sobre os parâmetros dos itens estimados e um número restrito de iterações necessárias para obter uma convergência do processo iterativo.
Palavras chaves:
- teoria de resposta ao item,
- calibração do item,
- respostas aberrantes,
- padrões inapropriados,
- índice lz,
- purificação
Appendices
Références
- Andersen, E. B. (1970). Asymptotic properties of conditional maximum likelihood estimators. Journal of the royal Statistical Society (Series B), 32, 283-301.
- Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. (2nd ed.). New york, Ny: Marcel dekker. doi: 10.2307/2532822
- Bertrand, R., & Blais, J.-G. (2004). Modèles de mesure: L’apport de la théorie de la réponse aux items. Sainte-Foy, Canada: Presses de l’université du Québec.
- Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. r. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). reading, MA: Addison-wesley.
- Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters. An application of the EM algorithm. Psychometrika, 37, 29-51. doi: 10.1007/BF02293801
- Brassard, P. D., Béland, S., & Raîche, G. (2011). Identification des comportements qui déterminent les patrons de réponses des étudiants qui tentent de se sous classer intentionnellement à un test. In G. Raîche, K. Paquette-Côté, & D. Magis (Eds.), Des mécanismes pour assurer la validité de l’interprétation de la mesure en éducation (pp. 85- 104). Québec, Canada: Presses de l’Université du Québec.
- DeMars, C. E. (2010). Item response theory. oxford, UK: Oxford University Press. doi: 10.1093/acprof:oso/9780195377033.001.0001
- Dorans, N. J., Pommerich, M., & holland, P. W. (2007). Linking and aligning scores and scales. New york, Ny: Springer. doi: 10.1007/978-0-387-49771-6
- Drasgow, F., Levine, M.V., & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38, 67-86. 10.1111/j.2044-8317.1985.tb00817.x
- Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: principles and applications. Boston, MA: Kluwer.
- Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty six person-fit statistics. applied Measurement in Education, 16, 277–298. doi: 10.1207/S15324818AME1604_2
- Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practice (2nd ed.). New york, Ny: Springer. doi: 10.1007/978-1-4757-4310-4
- Laurier, M., Froio, L., Pearo, C., & Fournier, M. (1998). L’élaboration d’un test provincial pour le classement des étudiants en anglais langue seconde au collégial. Québec, Canada: direction générale de l’enseignement collégial, ministère de l’éducation du Québec.
- Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
- Lord, F. M. (1986). Maximum likelihood and Bayesian parameter estimation in item response theory. Journal of Educational Measurement, 23, 157-162. doi: 10.1111/j.17453984.1986.tb00241.x
- Lord, F. M., & M. R. Novick (Eds.). (1968). Statistical theories of mental test scores. reading, MA: Addison-wesley.
- Magis, D., Béland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an r package for the detection of dichotomous differential item functioning . Behavior research Methods, 42, 847-862. 10.3758/BrM.42.3.847
- Magis, D., & Raîche, G. (2012). Random generation of response patterns under computerized adaptive testing with the r package catr. Journal of Statistical Software, 48, 1-31.
- Magis, D., Béland, S., & Raîche, G. (2012). A didactic presentation of Snijders’ lz * index of person fit with emphasis on response model selection and ability estimation. Journal of Educational and Behavioral Statistics, 12, 37-57. 10.3102/1076998610396894
- Meijer, R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. applied Psychological Measurement, 25, 107-135. doi: 10.1177/01466210122031957
- Mislevy, R. J., & Bock, R. d. (1982). Biweight estimates of latent ability. Educational and Psychological Measurement, 42, 725-737. doi: 10.1177/001316448204200302
- Molenaar, I. W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55, 75-106. doi: 10.1007/BF02294745
- Nering, M. L. (1995). The distribution of person fit using true and estimated person parameters. applied Psychological Measurement, 19, 121-129. doi: 10.1177/014662169501900201
- Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning (2nd ed.). thousand Oaks, CA: Sage.
- Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Vol. 26. Psychometrics (pp. 125-167). Amsterdam, Netherlands: Elsevier. r development Core team (2012). r: a language and environment for statistical computing. Vienna, Austria: r Foundation for Statistical Computing.
- Raîche, G. (2002). Le dépistage du sous-classement aux tests de classement en anglais, langue seconde, au collégial. Gatineau, Canada: Collège de l’outaouais.
- Raîche, G., Magis, d., Blais, J.-G., & Brochu, P. (2012). Taking atypical response patterns into account: a multidimensional measurement model from item response theory. In M. Simon, K. Ercikan, & M. rousseau (Eds), Improving large-scale assessment in education. Theory, issues, and practice (pp. 238-259). New York, NY: Routledge.
- Reise, S. R. (1995). Scoring method and the detection of person misfit in a personality assessment context. applied Psychological Measurement, 19, 213-229. doi: 0.1177/014662169501900301
- Schuster, C., & Yuan, K.-H. (2011). Robust estimation of latent ability in item response models. Journal of Educational and Behavioral Statistics, 36, 720-735. doi: 10.3102/1076998610396890
- Snijders, T. A. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66, 331-342. doi: 10.1007/BF02294437
- St-Onge, C., Valois, P., Abdous, B., & Germain, S. (2011). Accuracy of person-fit statistics: A Monte Carlo study of the influence of aberrance rates. applied Psychological Measurement, 35, 419-432. 10.1177/0146621610391777
- Swaminathan, H., & Gifford, J. A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50, 349-364. doi: 10.1007/BF02294110
- Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51, 589-601. doi: 10.1007/BF02295598
- Warm, T. A. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika, 54, 427-450. doi: 10.1007/BF02294627
- Yen, W. M. (1981). Using simulation results to choose a latent trait model. applied Psychological Measurement, 5, 245-262. doi: 10.1177/014662168100500212
- Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. applied Psychological Measurement, 8, 125-145. 10.1177/014662168400800201
- Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213. doi: 10.1111/j.17453984.1993.tb00423.x
- Zimowski, M., Muraki, E., Mislevy, R., & Bock, R. D. (2003). BILoG-MG 3 [Computer software]. Lincolnwood, IL: Scientific Software International.