Validity of test score interpretations and cross-cultural comparisons in the First and Second International Science Studies

Citation DataEducational Assessment, Evaluation and Accountability, ISSN: 1874-8600

Publication Year2024

0
Citations
0
Usage
4
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Captures
4
- Readers
  4

Article Description

In international large-scale assessments, student performance comparisons across educational systems are frequently done to assess the state and development in different domains. These results often have a large impact on educational policy and on the perceptions of an educational system’s performance. Early assessments, such as the First and Second International Science Studies (FISS and SISS), have been used alongside recent studies to create unique scales for investigating changes in constructs. The implicit assumptions in system comparisons are that the measures are valid, reliable, and comparable. However, these assumptions have not always been investigated thoroughly. This study aims to investigate the validity and cross-system comparability of scores from the FISS and SISS, conducted by the International Association for the Evaluation of Educational Achievement in 1970–1971 and 1983–1984. Findings based on item response theory (IRT) modeling indicate that scores in most educational systems can be viewed as reliable measures of a single science construct, supporting the validity of test score interpretations in these educational systems individually. In a robust assessment of measurement invariance using standard IRT methods, an alignment-based method, and the root mean square difference (RMSD) fit statistic, we demonstrate that measurement invariance is violated across systems. The alignment-based method identified a well-fitting model with complex restrictions but no items exhibited invariance across all systems, a result supported by the RMSD statistics. These results question the appropriateness of score comparisons across systems in FISS and SISS. We discuss the implications of these results and outline consequences for score comparisons across time.

Bibliographic Details

DOI10.1007/s11092-024-09444-7

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85206797126&origin=inward; http://dx.doi.org/10.1007/s11092-024-09444-7; https://link.springer.com/10.1007/s11092-024-09444-7; https://dx.doi.org/10.1007/s11092-024-09444-7; https://link.springer.com/article/10.1007/s11092-024-09444-7

AUTHOR(S)

Yuriko K. Sosa Paredes; Björn Andersson

PUBLISHER(S)

Springer Science and Business Media LLC

TAG(S)

Social Sciences; Business, Management and Accounting

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know