Chapelle et al (2015) examine two EAP automated diagnostic assessments for writing by means of validity argument. All assessments, they suggest, "need to be evaluated in view of the validity of their intended interpretations, uses and consequences", (p385-6). The question is, "how to frame a validity argument by identifying five types of inferences that one might wish to make on the basis of results from an assessment". I need to see Chapelle et al (2008) for Domain Definition. Also important are the warrants, inferences, assumptions and ramifications, (consequences?) of an assessment. From these we can build a framework for a validity argument. [This has echoes of Bachman and consequential validity]. Resist temptation to go sideways from this ref, but just bear in mind the concepts for now and come back later.
Youn (2015) is also interested in argument based validity, (in relation to L2 pragmatics interaction assessment), and he also refers to Chapelle et al (2008), [which I can't find in library database, but which I'm going to have to read...]. Youn refers to the (in his study) need for "isolating concrete features of interaction for scoring", where we could substitute speaking (or any construct) for "interaction". The point is, (I know because I've designed many a test): being able to point to something tangible with at least two levels of performance (Can Do v Can't Do) to be able to give a test score. This, Youn argues, goes to "backing" (?) for the "evaluation inference" which is an element in a validity argument.
[I know I wasn't going to go sideways in this current reading bout, but this applies to my own research. It doesn't really matter what the test designer/user/administrator says about validity. We can design a validity argument framework for it, including, say, aspects of speaking which can be scored to find an evaluation inference, whatever the test's originator's say.]
[NB, from Youn I gather Kane (3 refs) is the originator of argument based validity. Still resisting urge to go sideways on this, but I'm picking up that this, rather than consequential validity, is the paradigm. If so, see esp Kane (2013), it may be that I don't need to look at all the other validities, but just stick to argument-based. It seems complicated enough (with the need to build a framework) to keep me busy anyhow. Of three articles mentioning validity in 2015 so far, two have been argument based (Chapelle and Youn) and another didn't really address the issue in any depth, (Power, 2014, see last Friday for reference). If that's the case, need to address this in Lit Rev outline.]
Huhta et al (2014) opine that the CEFR has "raised awareness of the principles of valid and fair assessment." Which is interesting, as the GESE is of course CEFR based. We can't really get away from the CEFR here, it is tied up with any assessment based on it on many levels impinging on validity.
NB, come back to July 2013 which is all about "assessment literacy, a concept introduced with reference to the knowledge assessors need to possess".
Zhao (2013) refers to a "mixed methods approach" in connexion to design and validation of a rubric for authorial voice in writing, and I'll come back to this tomorrow, inshallah.
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405. http://doi.org/10.1177/0265532214565386
Chapelle, C. A., Enright, M., & Jamieson, J. (Eds.) (2008) (Eds.) (2008). Building a validity argu- ment for the Test of English as a Foreign Language. London: Routledge.
Huhta, a., Alanen, R., Tarnanen, M., Martin, M., & Hirvela, T. (2014). Assessing learners’ writing skills in a SLA study: Validating the rating process across tasks, scales and languages. Language Testing, 31(3), 307–328. http://doi.org/10.1177/0265532214526176
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Kane, M. T. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed.) (pp. 17–64). Westport, CT: Greenwood.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.
Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing , 32 (2 ), 199–225. http://doi.org/10.1177/0265532214557113
Zhao, C. G. (2013). Measuring authorial voice strength in L2 argumentative writing: The development and validation of an analytic rubric. Language Testing, 30(2), 201–230. http://doi.org/10.1177/0265532212456965