[PPT] - Validating a semiadaptive Korean placement test AATK 2014, Boston PowerPoint Presentation

SLIDE 1

Sun-Young Shin & Hyo Sang Lee

Indiana University

AATK 2014, Boston University June 21, 2014

Validating a semiadaptive Korean placement test

SLIDE 2

AATK 2 0 1 4 , Boston University

Overview

Background
Context for the study
Placement testing; Computerized (semi-)adaptive test
Validity
Outstanding issues
Methods
instruments: IU online semiadaptive Korean placement test & TOPIK
data analysis
Results
Discussion

#2

SLIDE 3

Background

Context for the study
Placement decision in a language program

AATK 2 0 1 4 , Boston University

#3

Common placement tools (Brown, Hudson, & Clark, 2004):

(i) a proficiency test (institutional) – a M/C & cloze test, & an essay; (ii) an oral interview; (iii) self-placement Placing students at the appropriate levels is important to ensure

that course curriculum and materials are well targeted to their learning needs (Green, 2012)

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 4

Background

Context of the problem
Limitation of using a proficiency test for placement

in a Korean language program in the U.S.

lack of alignment with the content of courses onto which

students are placed (Green & Weir, 2004)

providing little information for high- and low- level of

students due to the majority of test items of medium level of difficulty

AATK 2 0 1 4 , Boston University

#4

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 5

Background

 Larger standard error of measurements (SEMs) often

btained at the high end or the low end of the scale

AATK 2 0 1 4 , Boston University

#5

particularly problematic in Korean language programs in the US where student populations are often polarized between high (heritage Korean learners) and low-end (non-heritage Korean learners) proficiency levels (Sohn & Shin, 2007)

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 6

Background

a computer adaptive language test (CALT) has been

Background

Limitations of CALT (Meunier, 1994)
large item bank needs to be constructed and piloted
technically challenging to come up with the CALT algorithm
a questionable content/construct validity

AATK 2 0 1 4 , Boston University

#7

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 8

Background

Recommendations made for using semi-adaptive language test

(Ockey, 2009) or the testlet model (Wainer & Kiely, 1987) under which a bundle of items are arranged linearly based on test takers’ response to them to overcome such limitations of CALT

AATK 2 0 1 4 , Boston University

#8

To date, however, little studies have been conducted regarding

development and validation of an online semi-adaptive foreign language test

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 9

Background

AATK 2 0 1 4 , Boston University

#9

Validity
Appropriateness of inferences and uses that we make based on

test scores (Messick, 1989)

Discussed in terms of different sources of evidence for validity

(Bachman, 1990)

 Content validity  Concurrent validity  Construct validity

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 10

Outstanding issues

How can we go about developing an online semi-

adaptive Korean placement test?

AATK 2 0 1 4 , Boston University

#10

What are the validity evidence of an online semi-

adaptive Korean placement test?

Background Methods Results & Discussion Context for the study Placement test / CAT/ Validity Outstanding issues

SLIDE 11

Developing an online semi- adaptive placement test

Why online & computerized?
Mixture of audios, videos, graphics, and texts
Mixture of language skills in a single test item
Semi-adaptive

AATK 2 0 1 4 , Boston University

#11

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 12

Characteristics of the test

Why online and computerized?
Any time, any place, year around
Self-assessment, no proctoring needed
Automatic grading
Instant result notification
Incorporating multi-media, multi-modality material
Ease of Data subtraction and analysis

AATK 2 0 1 4 , Boston University

#12

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 13

Characteristics of the test

Diverse stimuli: audios, videos, graphics and texts
Reflection of natural language input environments
Measuring receptive and production skills

simultaneously

Contextualized and content-oriented as opposed to

structure (vocabulary, grammar)-oriented

AATK 2 0 1 4 , Boston University

#13

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 14

Characteristics of the test

AATK 2 0 1 4 , Boston University

#14

Mixture of language skills in a single test item
Both questions and options can be in audio or text

(Text–test, audio-text, text-audio, audio-audio)

Integrating language skills

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 15

Characteristics of the test

AATK 2 0 1 4 , Boston University

#15

Why adaptive?
A single test: All students have to finish
Level-specific tests: an a priori appropriate level

cannot be identified

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 16

Characteristics of the test

AATK 2 0 1 4 , Boston University

#16

Semi-adaptive: Compromising ideal and practicality under the circumstances;

Impractical due to limited resources: requires large amount of test items and sophisticated and complex computer programing

Page-by-page instead of item-by-item: each page has 3 or more question items
Cutoff lines: 1-10 (57 items), 11-22 (50 items), 51-63 (66 items), 64-69 (39

items)

The order of pages reflects the progression of the course material
One needs to get 60% of the items in a given page to move forward
Stopped when failing to get 60% three times and not having gotten 70% of the

total items completed at the time

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 17

Methods

Data
112 students enrolled in K101 & K102 at a large Midwestern

university who took both an online semi-adaptive Korean placement test (KPT) and Test of Proficiency in Korean (TOPIK)

336 test takers who took an online semi-adaptive Korean placement

test

AATK 2 0 1 4 , Boston University

#17

Data analysis
Data were analyzed for Pearson’s correlation coefficients,

Agreement index, and independent sample t-tests using SPSS 21 (2012)

Background Methods Results & Discussion Instruments Data Data analysis

SLIDE 18

Results

AATK 2 0 1 4 , Boston University

#18

Item discrimination & B-index (item bundles – pages) were

calculated:

average item discriminations: .31 (reasonably good)
some low discriminatory items (6% of the total items),

which need to be revised, have been found

B-index results (M= .39, ranging from .05 to .79)

Background Methods Results & Discussion Reliability Validity Discussion

Reliability

SLIDE 19

Results

Content validity – KPT has a direct relationship to the course

#19

Concurrent validity – KPT vs. TOPIK
Correlation coefficient: .75 (item

level); .89 (page level)

Agreement index: .71

Background Methods Results & Discussion Reliability Validity Discussion

SLIDE 20

Results

AATK 2 0 1 4 , Boston University

#20

Construct validity - differential group differences in

performance on a test (experimental approach, Brown, 2005)

Those who passed on TOPIK vs. those who did not on TOPIK

scored statistically significantly differently on KPT (t= 7.64, p< .00)

Background Methods Results & Discussion Reliability Validity Discussion

SLIDE 21

Results

AATK 2 0 1 4 , Boston University

#21

Background Methods Results & Discussion Reliability Validity Discussion

Levels Pages (#s) Questions (#s) Mean IFs K101 10 57 .72 (.13) K102 12 50 .43 (.09) K201 13 66 .22 (.04) K202 6 39 .20 (.01)

Construct validity – internal evidence Do the testlet (item bundles) for each level differ in the intended difficulty? F (3,37)=81.01, p<.00 Yes, but except for K201 vs. K202

SLIDE 22

Discussion

AATK 2 0 1 4 , Boston University

#22

An online semi-adaptive placement test can be developed

and implemented for Korean language programs in a relatively reliable and valid manner

Note that this is costly and time-consuming project, so

the cost-benefit analysis should be conducted in advance

Classification errors for placement based on KPT is

minimal but some false-negative errors were found

Some items still need to be revised to better the test

Background Methods Results & Discussion Reliability Validity Discussion

SLIDE 23

References

AATK 2 0 1 4 , Boston University

#23

Bachman, L.F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press. Brown, J. D. (2005). Testing in Language Programs: A Comprehensive Guide to English Language Assessment. New York: McGraw-Hill. Brown, J. D., Hudson, T., & Clark, M. (2004). Issues in language placement. Manoa, Hawai’i: National Foreign Language Resource Center. Retrieved June 14, 2014 from http://nflrc.hawaii.edu/NetWorks/NW41.pdf. Chalhoub-Deville, M., & Deville, C. (1999). Computer adaptive testing in second language

contexts. Annual Review of Applied Linguistics, 19, 273-299.

Green, A. (2012). Placement Testing. In C. Coombe, B. O’Sullivan, P. Davidson & S. Stoynoff (Eds.), The Cambridge Guide to Language Assessment (pp.164-170). Cambridge: Cambridge University Press. Green, A., & Weir, C. (2004). Can placement testing inform instructional decisions? Language Testing, 21, 467-494. Meunier, L.E. (1994). Computer Adaptive Language Tests Offer a Great Potential for Functional

Testing. Yet, Why Don't They? CALICO Journal, 11(4), 23-39.

Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd edition, pp.13-103) New York: Macmillan. Ockey, G. (2009). Developments and challenges in the use of computer-based testing (CBT) for assessing second language ability. Modern Language Journal, 93, 836-847. Sohn, S., & Shin, S-K. (2007). True beginners, false beginners, and fake beginners: Placement strategies for Korean heritage speakers. Foreign Language Annals, 40(3), 407–18. Wainer, H., & Kiely, G.L. (1987). Item clusters and computerized adaptive testing: a case for

testlets. Journal of Educational Measurement, 24(3), 185-201.

Sun-Young Shin & Hyo Sang Lee

Indiana University

Validating a semiadaptive Korean placement test

Overview

#2

Background

#3

(i) a proficiency test (institutional) – a M/C & cloze test, & an essay; (ii) an oral interview; (iii) self-placement Placing students at the appropriate levels is important to ensure

that course curriculum and materials are well targeted to their learning needs (Green, 2012)

Background

in a Korean language program in the U.S.

students are placed (Green & Weir, 2004)

students due to the majority of test items of medium level of difficulty

#4

Background

 Larger standard error of measurements (SEMs) often

#5

particularly problematic in Korean language programs in the US where student populations are often polarized between high (heritage Korean learners) and low-end (non-heritage Korean learners) proficiency levels (Sohn & Shin, 2007)

Background

recommended to address such problems because its items can be tailored to test takers’ ability levels in a shorter, quicker test, and it also allows for the use of more innovative item types (Chalhoub-Deville & Deville, 1999)

#6

Background

#7

Background

(Ockey, 2009) or the testlet model (Wainer & Kiely, 1987) under which a bundle of items are arranged linearly based on test takers’ response to them to overcome such limitations of CALT

#8

development and validation of an online semi-adaptive foreign language test

Background

#9

test scores (Messick, 1989)

(Bachman, 1990)

 Content validity  Concurrent validity  Construct validity

Outstanding issues

adaptive Korean placement test?

#10

adaptive Korean placement test?

Developing an online semi- adaptive placement test

#11

Characteristics of the test

#12

Characteristics of the test

simultaneously

structure (vocabulary, grammar)-oriented

#13

Characteristics of the test

#14

(Text–test, audio-text, text-audio, audio-audio)

Characteristics of the test

#15

cannot be identified

Characteristics of the test

#16

Impractical due to limited resources: requires large amount of test items and sophisticated and complex computer programing

items)

total items completed at the time

Methods

university who took both an online semi-adaptive Korean placement test (KPT) and Test of Proficiency in Korean (TOPIK)

test

#17

Agreement index, and independent sample t-tests using SPSS 21 (2012)

Results

#18

calculated:

which need to be revised, have been found

Results

contents

#19

level); .89 (page level)

Results

#20

performance on a test (experimental approach, Brown, 2005)

scored statistically significantly differently on KPT (t= 7.64, p< .00)

Results

#21

Levels Pages (#s) Questions (#s) Mean IFs K101 10 57 .72 (.13) K102 12 50 .43 (.09) K201 13 66 .22 (.04) K202 6 39 .20 (.01)

Construct validity – internal evidence Do the testlet (item bundles) for each level differ in the intended difficulty? F (3,37)=81.01, p<.00 Yes, but except for K201 vs. K202

Discussion

#22

and implemented for Korean language programs in a relatively reliable and valid manner

the cost-benefit analysis should be conducted in advance