Towards a profile-based writing scale for a local post-admission ESL writing assessment.pdf

资源描述

1、 24 2023 年第一辑总第二辑Towards a profile-based writing scale for a local post-admission ESL writing assessmentYAN XUNUniversity of Illinois,Urbana-ChampaignAbstractPost-admission language tests tend to have a restricted range of proficiency levels among test-takers due to considerations made during the a

2、dmission selection process.Although range restriction can present challenges for proficiency-focused assessment,it can also bring opportunities to zoom in on fine-grained performance profiles of test-takers.This study reports on the validation of a profile-based rating scale for an ESL writing place

3、ment test in a US university.The profile-based rating scale was created by employing a three-staged,hybrid scale development approach,to provide not only accurate placement decisions but also fine-grained diagnostic information regarding ESL students writing performance profiles.The scale strikes a

4、balance between argument development and lexico-grammar,to better account for the range of writing performances among test-takers.To gather validity evidence for the profile-based rating scale,this study employs a sequential,mixed-methods approach to examine the quality of test-taker performances ac

5、ross profiles and rater perceptions on the scale.Nine certified raters were recruited to conduct independent evaluations of lexico-grammar and argumentation on a sample of 150 test-taker performances.These evaluations were subjected to many-facet Rasch measurement analysis to examine the differences

6、 across writing performance profiles included in the rating scale.Next,semi-structured,follow-up interviews were conducted with the raters,to complement the quantitative findings on the usability and effectiveness of the scale.The findings provide supportive evidence for the validity of the profile-

7、based rating scale.I argue that by focusing on performance profiles,post-admission language tests can strengthen the alignment across curriculum,instruction,and assessment in ESL writing programs.Keywordslearner profile;scale development;Rasch analysis;local language tests 25 YAN Xun Towards a profi

8、le-based writing scale for a local post-admission ESL writing assessment1.IntroductionIn the last two decades,many universities in the US have witnessed a significant increase in international student population,and this trend has been on going until the recent COVID-19 pandemic.Against this backdro

9、p,the need for English for Academic Purposes(EAP)courses have increased,particularly in relation to academic writing skills.Many large universities in the US have set minimum cut-off scores on English proficiency tests for admission.In addition,many universities also require students who scored lowe

10、r than a certain test score to take post-admission placement tests and appropriate ESL courses after enrollment.Therefore,the students who end up taking post-admission placement tests are within a restricted range of proficiency with little variance in their performance,and this provides unique chal

11、lenges for local ESL programs.In most tertiary-level assessment contexts,students language proficiency levels often fall on the higher end of the full proficiency spectrum.This restricted range problem makes large-scale language tests unsuitable to be directly used in local assessment contexts becau

12、se large-scale tests often fail to make finer distinctions among test-takers within the narrow range(Dimova et al.2020).Meanwhile,test-takers of similar overall language proficiency can differ in terms of subskill profiles.Even within a subskill(e.g.,writing ability),test-takers can also differ in t

13、erms of subconstructs(e.g.,rhetorical vs.linguistic skills)(Ginther&Yan 2018).The provision of fine-grained information about different profiles can be pedagogically useful in local language curriculum,especially if the profiles can be incorporated into local language placement tests.This study repo

14、rts on the validation of a profile-based rating scale for an English writing placement test designed to capture the fine-grained differences among ESL students writing performances at a large US university.The study also aims to demonstrate how information provided by such a scale can assist in inst

15、ruction within an ESL writing program.2.Literature review2.1 Range restriction in post-admission language assessmentA common problem in educational and psychological research and practice is the issue of direct range restriction(Wiberg&Sundstrm 2009).Due to admission or employment selection,the vari

16、ables of interest tend to be restricted to the higher end of the spectrum.This issue can influence the analyses,inferences,and decisions on issues related to language proficiency(Cho&Bridgeman 2012;Ginther&Yan 2018).For example,in most large research universities(e.g.,University of Illinois at Urban

17、a-Champaign,Indiana University,Iowa State University),the cutscores for admissions on large-scale English language proficiency tests,in terms of total score,tend to be around 80 on the TOEFL iBT or 6.5 on the IELTS;most universities also do not have minimum requirements on subsection scores.As a res

18、ult,the overall language proficiency of admitted international students tends to fall between 26 2023 年第一辑总第二辑intermediate advanced and advanced level.To facilitate international students to succeed in academic coursework,most universities provide post-admission ESL support,mostly focusing on writi

19、ng skills and some also focusing on speaking skills.Thus,universities also institute an in-house placement test along with some kind of exemption criteria(based on large-scale English proficiency tests)to 1)exempt students who do not need ESL support,and 2)place those in need of ESL support to appro

20、priate writing and/or speaking courses.This kind of assessment needs gives rise to local language tests in universities settings,and in fact,the majority of language testing professionals reside and flourish(in terms of both testing practice and scholarship)in this kind of local contexts(Dimova et a

21、l.2022).Local language testing presents unique challenges and opportunities.Many local programs rely on or adopt existing rating scales from general proficiency tests like TOEFL,which target a fuller range of proficiency.This may be largely due to budgetary reasons,lack of expertise,and/or the amoun

22、t of effort and time involved in creating new rating scales.However,commercially-available or theory-based scales have many disadvantages in local assessment contexts.On one hand,because of the restricted range in language proficiency(lack of examinees at both ends of the scale,i.e.,very high or ver

23、y low),these scales are not necessarily sensitive enough to capture the nuance of writing performances among the students within the restricted range(Cho&Bridgeman 2012;Bridgeman et al.2015;Ginther&Yan 2018).Likewise,it is also difficult to employ a theory-based approach to develop a new scale from

24、either experienced teachers intuition or literature(theory-based scales)because those scales tend to target the full range of proficiency.On the other hand,scales developed through performance data-driven approaches are possible;however,if the scale focuses holistically on the overall proficiency or

25、 the target construct,the resultant scale tends to only reliably distinguish two or three levels.If the scale is stretched further into multiple levels,it might be difficult to see substantial differences between adjacent levels.Nonetheless,the restricted range problem also presents opportunities fo

26、r innovations in writing scale development.In addition to accurate placement decisions,post-admission language tests are also expected to provide diagnostic information for language instructors about students actual language/writing performance profile(i.e.,an independent evaluation on each subskill

27、).Oftentimes,finer-grained diagnosis of students language ability takes place in classrooms,and teachers need to collect students writing samples for diagnostic purposes.However,due to either logistic considerations or lack of expertise,there has been a lack of attention or interest in diagnostic te

28、sts(e.g.,Alderson 2005).Thus,if post-admission language tests can provide both placement and diagnosis information about the learners performance profiles,it would make this type of tests more learning-oriented and closely-aligned to the local language curriculum,in ways that large-scale tests canno

29、t achieve(Dimova et al.2020,2022).27 YAN Xun Towards a profile-based writing scale for a local post-admission ESL writing assessmentPost-admission language tests offer a special advantage to examine learner profile.It has been well documented in second language research that the nature of language p

30、roficiency changes across speakers of different proficiency level(e.g.,Oltman et al.1988;Alderson 1991;Kunnan 1992),and different“profiles”of writers exist even within the same proficiency group with varying strength and weakness in their subskills.When learners proficiency levels are within a restr

31、icted range,the differences across learner profiles will likely be more prominent.For instance,within the intermediate-advanced proficiency range,some students have strong receptive skills(reading and listening),while showing contrasting weakness in productive skills(speaking and writing)(Bridgeman

32、et al.2015;Ginther&Yan 2018).Jarvis et al.(2003)have also shown that multiple profiles exist even among the highly rated compositions,despite some identifiable common traits in their writing quality.These profiles can have different implications for instruction.Therefore,in order to fully capture th

33、e subtle profile differences in students performances,rating scales for post-admission language tests can explore the possibility of identifying learner profiles,which can offer both placement and diagnosis of ESL students.2.2 Integrated tasks for post-admission ESL writing placement test in US univ

34、ersitiesPerformance-based assessment has become the norm for language testing over the past few decades.This assessment method of having test-takers produce a performance sample such as an essay instead of multiple choice items,has gone from an“almost unheard-of method”in 1970(Vaughan 1991:111)to“th

35、e most widely used method for assessing writing”in 1990(Barkaoui 2007:100).In higher education in particular,the ability to develop an argument is an important skill across disciplines,and argumentative writing is central to second language(L2)writing pedagogy and assessment for academic purposes.In

36、 L2 assessment,argumentative writing is often assessed through integrated tasks where test-takers are required to integrate information from source materials(which usually serves as supporting evidence)into their writing and argue for a stance.This type of essay task is considered to be an authentic

37、 method to assess L2 learners writing skills for administrative and instructional decisions and also to investigate the nature of the L2 writing construct and the effectiveness of instructional interventions on L2 writing learning and development(Barkaoui 2007).This move toward direct assessment of

38、writing has brought raters and rating scales to the fore.Unlike indirect assessment of writing(e.g.,multiple-choice items),direct assessment of writing requires test-takers to produce at least one writing sample,and raters need to score the sample according to a rating scale.As Upshur&Turner(1995)po

39、inted out,test scores in the context of performance-based assessment are influenced by not only the examinees ability and the task type but also the rater and the rating scale.That is,test scores cannot be free from the effects of raters and rating scales(as well as task effects),and in order to ref

40、lect test-takers 28 2023 年第一辑总第二辑true writing abilities on the test scores,a substantial amount of effort needs to be placed on the development,implementation,and validation of rating scale.In this context,raters and the rating scale arguably play a major role in both the reliability and validity o

41、f a given assessment.2.3 Approaches to scale developmentTo develop a rating scale that captures profile differences in test-taker performances,it is necessary to discuss the general scale development approaches in the literature.Fulcher et al.(2011)classified two major approaches to the scale develo

42、pment,namely,the measurement-based approach and the performance-based data-driven approach.The measurement-based approach relies on existing scales or intuitions of“experts”(i.e.,with extensive experiences)in teaching or assessment of the subject to identify the common features and/or descriptors at

43、 varying levels of proficiency.Once identified,level descriptors are placed into a single scale based on the estimates of their difficulty.No real performance analysis is required at this stage,but post-hoc measurement method(i.e.,Rasch analysis)can be used afterwards to examine the reliability of t

44、he scale.The advantage of a measurement-based scale lies in the experts sensitivity to test-taker performance levels from an accumulation of prior experiences.However,because of the subjective nature of expert intuition,such“a priori”developed scale has been criticized for being less specific,imprec

45、ise,and thus resulting in inconsistent ratings across the raters(Knoch 2009).Many scholars have pointed out that the language used in these scales is often relativistic,abstract,and impressionistic,which allows for subjective interpretations of the features differentiating between bands or proficien

46、cy levels(Upshur&Turner 1995;Brindley 1999;Mickan 2003).Due to this weak link between the scale(meaning)and performance(score)of intuitively developed scales(Pollitt&Murray 1996;Fulcher et al.2011),there have been some concerns that raters might not be able to successfully make fine-grained distinct

47、ions of different traits across levels and lose important diagnostic information(Knoch 2009).Unlike the measurement-based approach that relies on intuitively derived,pre-determined features and post-hoc measurements to ensure its reliability and/or validity,performance-based data-driven approach sta

48、rts from collecting and analyzing the actual performance samples.The analyses of performance data result in identifying the key features or traits that can distinguish performances between different proficiency levels.The number of levels in a scale is also empirically established using discriminant

49、 analysis,and the features identified in an earlier analysis are used to describe each level in the scale(Fulcher 1993,1996,2003;Fulcher et al.2011).Even though this method allows a close analysis of actual performance samples and strengthens the link between scale and actual performance,it is not w

50、ithout any criticisms.Researchers have noted that the data-driven approach to scale development can be time-consuming,and it produces analytic descriptorsoften linguistic constructsthat human raters might find difficult to use in real-time rating(e.g.,Upshur&Turner 1995;Fulcher 2003;29YAN Xun Toward

展开阅读全文