Investigating EFL listening comprehension skills: An empirical validation of C1 level test scores

Authors

  • Ármin Kövér Language Pedagogy PhD Programme Eötvös Loránd University, Budapest,

DOI:

https://doi.org/10.61425/wplp.2017.11.80.95

Keywords:

listening comprehension skills, empirical validation, item analysis, Many-Facet Rasch Measurement

Abstract

Testing listening comprehension skills is a difficult task because of the complex nature of the listening process. This complexity also makes the process of listening comprehension test design challenging. Overcoming such challenges was one of the major tasks in a project at a major Hungarian university, during which four test‑based listening practice booklets, targeting the C1 proficiency level, were designed. The present study focuses on the investigation of the reliability of the items in the first practice booklet by analysing and empirically validating the test scores of 98 test‑takers. The test in the first practice booklet contained 30 items with four different item formats. Data analysis follows a quantitative approach. The reliability of the test scores was examined with the help of Iteman software using the approaches of classical test theory and Facets software using Many‑Facet Rasch Measurement (MFRM). The latter method was also used to empirically validate the test scores by identifying misfitting test‑takers and items in the dataset. The empirical validation with MFRM offered a subtle way of strengthening the reliability of the test scores by artificially connecting the dataset. Even though the results of the present study could be improved by pre-testing the remaining three practice tests, and by removing altogether six misfitting test‑takers and four misfitting items from the present dataset, a higher degree of reliability has been reached as far as the fitting test items are concerned. The results indicate that applying MFRM for the empirical validation of test-scores might be beneficial not only for validating listening comprehension test scores but also for validating other types of test‑scores, especially in large‑scale testing.

Downloads

Published

2017-12-01

Issue

Section

Articles