for Testing & Evaluation Tzur Karelitz Sept 2016 Outline - - PowerPoint PPT Presentation

for testing amp evaluation
SMART_READER_LITE
LIVE PREVIEW

for Testing & Evaluation Tzur Karelitz Sept 2016 Outline - - PowerPoint PPT Presentation

of the National Institute for Testing & Evaluation Tzur Karelitz Sept 2016 Outline NITE: history, structure & objectives Overview of NITEs assessment services & activities A detailed look at the RAMA project:


slide-1
SLIDE 1
  • f the National Institute

for Testing & Evaluation

Tzur Karelitz Sept 2016

slide-2
SLIDE 2

Outline

 NITE: history, structure & objectives  Overview of NITE’s assessment services & activities  A detailed look at the RAMA project:

  • Analyzing and reporting data from large scale assessments

and survey for the Israeli Ministry of Education.

2

slide-3
SLIDE 3

NITE: History & Structure

 Established in 1981 by a consortium of the universities in Israel in order to centralize admissions testing for applicants to higher education.  NITE is a public, not-for-profit organization, that is supervised by a board of directors (representatives of founding universities).  NITE staff includes about 130 professionals: item writers, statisticians, psychometricians, computer programmers, graphic designers, language editors, and logistic & administrative staff.

3

slide-4
SLIDE 4

NITE’s Organizational Structure

 CEO: Dr. Anat Ben-Simon  Deputy Director: Dr. Naomi Gafni  Departments

  • Test development
  • Scoring
  • Research
  • Operations
  • IT
  • Finance & administrations

4

 Units

  • CAT
  • Non-cognitive tests
  • Test accommodation
  • Computerized LD

diagnosis system

  • Automated text analysis
  • RAMA project
slide-5
SLIDE 5

NITE: Main Objectives

 Provision of assessment services primarily to institutions of higher education,

 and also to the educational system (K-12), and other public organizations

 Conducting research on admissions, placement, assessment and evaluation in institutions of higher education  Advancement of the fields of measurement, testing and psychometrics in Israel

5

slide-6
SLIDE 6

Typical Assessment Service

 Design and development of assessments & tools  Test registration  Test administration & accommodations  Scoring and quality control  Reporting of results  Conducting related research

6

slide-7
SLIDE 7

Other Assessment Services

 Expert item review  Test translation/adaptation  Computerizing P&P tests  Analyzing test results  Conducting evaluation projects  Training professional staff  Consulting organizations in Israel and abroad  Teaching relevant academic courses

slide-8
SLIDE 8

Tests in Higher Education

8

Test Admission to - Type ~N

PET Universities & colleges P&P / CAT 70,000 MEIMAD Pre-academic prep schools Online 6,000 MOR/MIRKAM Medical schools Assessment Centers 1,700 MITAM

  • Psych. graduate studies

P&P 1,300

And multiple, smaller-scale, admission & proficiency tests to various programs and organizations.

Test Proficiency in - Type ~N

AMIR/AMIRAM English P&P / CAT 45,000 YAEL Hebrew P&P 25,000

slide-9
SLIDE 9

9

The Psychometric Entrance Test (PET)

 Admission to higher ed. is based on the mean of PET and school matriculation exams grades (BAGRUT).

 PET is a scholastic aptitude test consisting of:

  • Verbal Reasoning (~60 MC items)
  • Writing (one task)
  • Quantitative Reasoning (~50 MC items)
  • English as a Foreign Language (~ 55 MC items)

 PET is a standardized P&P test given 5 times a year

  • It takes about 3.5 hours to complete
  • Scores range 300-800, mean 540, SD=110.
  • Adapted versions for examinees with disabilities

 PET is translated into: Arabic, Russian, English, French,

and Spanish (and sometimes to Italian & Portuguese)

slide-10
SLIDE 10

Other projects

 MATAL- a comprehensive, standardized,

computer-based test battery for the diagnosis

  • f learning disability.
  • Aiding the provision of test accommodations in

higher education.

 HLP- computational tools for analyzing and rating Hebrew texts.

  • NiteRater- automatic essay rating system.

 ICAP- an initiative to advance educational measurement and psychometrics in Israel.

10

slide-11
SLIDE 11

The RAMA Project: Analyzing and reporting results from large-scale tests and surveys for the Ministry of Education

slide-12
SLIDE 12

Outline

  • Background
  • Team
  • Main projects
  • Tasks
  • Tools
  • School Climate and Pedagogical Environment

(CPE) surveys

  • Growth and Effectiveness Measures for Schools

(GEMS)

  • Report production
  • Challenges and successes

17/12/2015 Rama Project 12

slide-13
SLIDE 13

Background

  • A 5-year, variable-quota contract issued by the MOE, which

began in 2012. Expected to be renewed in 2017.  Providing services to “RAMA” – The National Authority for Measurement and Evaluation in Education (a branch of MOE).  Main project cycle is between May and January, peaking in July-October.

17/12/2015 Rama Project 13

Cleansing, Analysis, Reporting NITE , Human Rating & Data Entry Adminstration TALDOR Test/Survey Development Climate: RAMA Achievement: CET

slide-14
SLIDE 14

The RAMA Team

17/12/2015 Rama Project 14

Eran DBA Eliran Analyst Evgeny Analyst Shaul Analyst Valla Analyst Nethanel

Report Production

Matan Manager And 6 part- time assistants Tzur Psychometrician

slide-15
SLIDE 15

Main Projects - 2015

  • Growth and Effectiveness Measures for Schools (GEMS):

Achievement Tests for 5th & 8th grades

  • First language (Heb/Arab), Math, English, Science & Technology
  • About 200,000 records per year
  • Climate and Pedagogical Environment (CPE) Surveys
  • 5th – 9th grade : about 150,000 students and 12,000 teachers per year
  • 10th & 11th grades: about 50,000 students and 8,000 teachers per year

 Results for surveys and tests are reported on 4 levels:

  • Schools (1/3 of the country): about 900 elementary & middle schools,

and 300 high schools, per year.

  • Municipalities: about 100 per year
  • Districts: 8 every year
  • National: by language, school type (secular/religious), sub-groups

within the Arab population, SES.

  • Hebrew as a Second Language for Arabic speakers
  • 7,000 6th grade students (test and survey) and 600 teachers (survey).
  • Results reported nationally.

17/12/2015 Rama Project 15

slide-16
SLIDE 16

Main Project Tasks

17/12/2015 Rama Project 16

  • Database design and maintenance
  • Data cleansing
  • Dealing with inconsistent, missing, corrupted, inappropriate or

duplicate data in surveys, tests and background information

  • Quality control of surveys prepared by RAMA
  • Maintenance of item properties in the database
  • Factor analysis for surveys and tests
  • Item analysis for surveys and tests
  • Scoring using classical scaling and calibration
  • Aggregations and norms
  • IRT analyses (sometimes)

Parallel channel

slide-17
SLIDE 17

(cont.) Main Project Tasks

17/12/2015 Rama Project 17

  • Extraction of historical comparison data
  • Dealing with special cases
  • Generation of reports for project control & monitoring
  • Automatic generation of personalized reports
  • levels: school, district, municipality, national
  • Human and automatic quality control of reports
  • Language and format editing of reports
  • Writing insights and conclusions based on results
  • Preparation of CDs and envelopes for mailing
  • Secondary data analysis (research questions)
  • Documentation and technical reports
slide-18
SLIDE 18

Tools

17/12/2015 Rama Project 18

  • SQL – Data management, cleansing and history
  • SAS, SPSS – Data manipulation and analysis
  • Parallel channel: All the main analyses are performed

separately by different analysts using different codes and software, the results are compared, and inconsistencies are resolved.

  • VBA – automation, quality control, reports’ post-

processing

  • Magic Publisher – Inserting SPSS output into Word
  • Winsteps – IRT
  • Word – Reports
  • Excel – Everything…
slide-19
SLIDE 19

School Climate and Pedagogical Environment (CPE) Surveys

slide-20
SLIDE 20

Goal of CPE Reports

17/12/2015 Rama Project 20

The surveys aim to provide a detailed picture of various aspects of the social climate and pedagogical processes in schools, that are essential for educational quality: satisfaction, relationship within the school community, security and safety, discipline and behavioral issues, emotional-motivational aspects, teaching-learning-assessing processes, value-based education, etc. The reports are meant to help school personnel to set evidence-based goals, and to track, plan and monitor important aspects of schooling such as inter-personal relationships, educational interaction, motivation and educational aptitude among students.

slide-21
SLIDE 21

Reported Indicators, 2015

17/12/2015 Rama Project 21

Climate Pedagogy

An overall positive attitude among students towards school Practices of quality teaching-learning-assessment Close relationship and caring between teachers and students Class assignments that promote inquiry learning Positive relations between students and their peers Self-learning strategies Involvement in violent incidents Receiving feedback to promote learning Digital violence via social networks and on the Internet School's efforts to encourage social and civic involvement Verbal violence School's efforts to promote tolerance of diversity School’s efforts to encourage a sense of safety School trips & tours Proper behavior of students in the classroom Students’ recreational activities Teachers’ satisfaction with school Differentiated instruction at school Involvement of parents in school Giving feedback to promote learning Teachers’ lack of a sense of safety Teamwork at school Competence, curiosity and interest in learning Collaborative learning in school School's efforts to encourage motivation and curiosity among students Teachers’ professional development

slide-22
SLIDE 22

Parameter Report

  • Main source of information about survey items
  • Content, type, instructions for coding and analyzing, conditioning, etc.
  • Effective and uniform communication with RAMA
  • Improved automation, quality control, code design

17/12/2015 Rama Project 22

Miss. val code ID Code ID Recode Miss. vals Resp. range Use? Indicator name

Item response Item text Code type

Core/ version Item ID Item #

slide-23
SLIDE 23

Coding Items and Aggregating Indicators

 Coding

  • Dicho: Strongly Agree & Agree are coded as 1, rest=0
  • R_Dicho: Strongly Disagree & Disagree are coded as 1, rest=0

 Aggregating

  • Calculate the mean of coded items, across all respondents.
  • mean is calculated within the desired aggregation level (e.g., mean of each

item within each language group, or school type).

  • Calculate the mean of item means, for all the items that belong to

the indicator.

  • The result is the mean % of respondents advocating the statements that

constitute the indicator.

  • For categorical items, we calculate the number of respondents

who selected each category within each aggregation level.

 Historical comparison data

  • Past results are extracted from the DB and presented in reports.
  • Exclusionary rules must be followed because in some cases, past data

should be censored.

23

slide-24
SLIDE 24

Aggregation Levels

  • School level:
  • Whole school
  • Across age groups: 5th –6th grades, 7th –9th grades, 5th –9th

grades

  • Within age groups: 5th, 6th, 7th, 8th, 9th

 Comparison norms:

  • Language: Hebrew, Arabic
  • Across age groups : 5th –6th, 7th –9th, 5th –9th grades
  • Educational authority: religious, secular
  • Arabic sub-groups: Arab, Druze, Bedouin
  • SES: low, medium, high

17/12/2015 Rama Project 24

slide-25
SLIDE 25

National CPE Report

17/12/2015 Rama Project 25

Positive general attitude towards school (for students)

All schools (2008-2015) Hebrew schools (2008-2015) Arabic schools (2008-2015) % agreeing

5-6 7-9 10-11 2008 2009 2010 2011 2012 2013 2014 2015

5-6 7-9 10-11

slide-26
SLIDE 26

Goal of the GEMS Achievement Tests

To examine the extent to which elementary and middle school students are performing at the expected level according to the curriculum in four core subjects: First Language (Hebrew/Arabic), Mathematics, English, and Science and Technology.

26

slide-27
SLIDE 27

Achievement Tests – Background

 Each school participates once every 3 years, by taking all relevant tests.

  • Every year, ⅓ of the schools take the external tests (data is

sent to RAMA), and ⅔ use internal tests (same test, but data stays in school).

27

Subject 5th Grade 8th Grade English

 

Science & Technology

 

Math

 

Language

   Tests (post-2014)

slide-28
SLIDE 28

Achievement Tests – Forms

  • Every year:
  • Two operational forms in Hebrew -> translated into Arabic
  • Two pilot forms (next year’s test, administered securely)
  • Form adaptations for Ultra-Orthodox population
  • Students with special needs are tested with accommodations

 The test forms cover:

  • Main topics from the curriculum
  • A range of skills, abilities and levels of thinking

 The forms are composed of:

  • Multiple-choice items
  • Open-ended items (dichotomous or polytomous scoring)
  • Matching items
  • Items with multiple scoring dimensions
  • Listening comprehension items
  • Multi-stage items
  • Testlets

28

slide-29
SLIDE 29
  • Scoring items based on the parameter report
  • Parameter reports are created by the test developers and

contain information needed to score & aggregate items.

  • Calculations are performed in two parallel channels

and compared.

  • SAS: Macro-based code gets input from parameter report
  • SPSS: VBA code generates a SPSS analysis code based
  • n the parameter report
  • Calculations are also triple-checked by RAMA.
  • Calculating a total score for each examinee
  • Calculating sub-scores for each examinee

29

Achievement Tests – Scoring

slide-30
SLIDE 30

Achievement Tests – Item Analysis

  • Using VBA, analysis outputs are compared, pasted

into excel, and formatted to highlight problems.

  • Item analysis includes:
  • Descriptive statistics of the total score and sub-scores
  • Reliability analyses of the total score and sub-scores
  • Correlations between items and total scores
  • Item response and score distributions (+ total score means)
  • Correlations between total score and sub-scores
  • Graphical item analysis (response distribution over total score)
  • Analysis of time and effort from self-report data
  • DIF analysis includes:
  • Form A vs. Form B
  • Hebrew form vs. Arabic form
  • Ultra-Orthodox form vs. regular forms (Jewish sample only)
  • Boys vs. girls within language
  • Pilot form vs. operational form (for making equating decisions)

30

slide-31
SLIDE 31

Item Analysis – Graphical Aids

31

0% 20% 40% 60% 80% 100% 10 30 50 70 90 Percentage of Students Total Score

Item Q12

Answer 1 Answer 2 Answer 3 Answer 4 Answer 98 Answer 99

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Score on form 31 Score on form 30

Score Comparison- Forms 30 & 31

  • 0.2
  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2

S8_0 S9 S10_0 S11_0 S12_0 S13 S14 S15_0 S16 S17_0 S18_0 S19_0 S20_0 S21_0 S22_a S22_b S23_a S24_a S24_b S26_0HAM S26_0LAS S26_0TMI

Difference

Score Difference- Form 30 & 31

חסונ

slide-32
SLIDE 32

Achievement Tests – Scaling

  • Goal: Transforming all raw scores to a scaled score,

uniform across different forms of a test. This scaled score will be used for reporting total score and sub- scores to schools.

  • Assumptions:
  • Equivalent populations within a year
  • Translation does not affect the difficulty of the form
  • Method: Linearly scaling all forms to the main
  • perational form (Form A)

32

α=S(A)/S(B) β= M(A)- M(B) × α

slide-33
SLIDE 33

Equating Achievement Test Scores

  • Goal:
  • Transforming the GEMS scores into a uniform scale across

years, to aid interpretation.

  • Background:
  • Established in 2008, with mean=500 and SD=100.
  • Applies only to the total scores, not the subscores.
  • Method:
  • Previous year’s pilot forms are linked to current year’s
  • perational forms using anchor items.
  • Building an “equating chain” (using both linear equating

and Tucker method) to obtain parameters for transforming this year’s scaled score to the multi-year score.

33

slide-34
SLIDE 34

Multi-year Equating of GEMS Tests

34

5 2

Pilot 2014 Form A 2015 Form B 2015 Tucker: α2, β2 Form A 2014 Multi-year score α5, β5 Parameters for multi-year score, calculated last year

1 3 4 6

α1=S(B)/S(A) β1= M(B) - M(A)×α1 α3=S(A*)/S(A1*) β3= M(A*) - M(A1*)×α3 α4= α1× α2× α3 β4=α2× α3× β1+ α3× β2+ β3 α6= α4× α5 β6= α5× β4+ β5

slide-35
SLIDE 35

Achievement Tests – Aggregations

  • Reported results:
  • Mean scores, quartiles and percentiles, attitudes

towards the subject, # examinees and response rates

  • Historical comparison data
  • Aggregation levels:
  • Grade, school, municipality, district,

language, educational supervision, national

  • Excluding special needs, recent

immigrants & ultra-orthodox schools

  • Using sampling & nonresponse weights
  • Segmentation within aggregation levels: student

SES, school SES, special needs

35

445 450 465 487 520 511 518 511 508 521 526 548 558 556 562 559

350 400 450 500 550 600 650 זס חס טס ע אע בע גע דע הע Multi-year score mean תיברע ירבוד תירבע ירבוד

Math- 5th grade

Hebrew Arabic

07 08 09 10 11 12 13 14 15

slide-36
SLIDE 36

Report Production

17/12/2015 Rama Project 36

  • Building & designing report templates
  • Mapping reports: which datum goes where?
  • Bookmarking for automatic data insertion
  • Dissecting a report into sub-reports
  • Running SPSS code for creating the report’s XML
  • Inserting data into the report using Magic Publisher
  • Post-processing – deletions, comments, visual editing
  • Dealing with exceptions and special cases
  • Performing manual & automatic data checking
  • Burning CDs, preparing envelopes, uploading files to web
  • A typical school report contains 150 pages, 50 tables, 50

graphs, and over 5,000 data points.

  • The municipal, district and national reports are even bigger
  • These numbers increased by more than 150% since 2012.
slide-37
SLIDE 37

Magic Publisher

A tool for inserting SPSS output into Word and PPT

17/12/2015 Rama Project 37

slide-38
SLIDE 38

Generating Report Maps

17/12/2015 Rama Project 38

Table template Table map Table map check

Hebrew Math English Subject Grade

Number of examinees Percentage of examinees

5 5 5

Final table after data insertion

slide-39
SLIDE 39

17/12/2015 Rama Project 39

Report Template

slide-40
SLIDE 40

Tables after Data Insertion

17/12/2015 Rama Project 40

slide-41
SLIDE 41

Using Macros to Build Tables & Graphs

17/12/2015 Rama Project 41

slide-42
SLIDE 42

Dealing with Exceptions

  • Low response rates
  • Cheating
  • Exemption
  • Refusal
  • Bilingual schools
  • Exceptions in testing conditions
  • Exceptions in historical comparisons for

particular schools or groups of schools

17/12/2015 Rama Project 42

slide-43
SLIDE 43

Challenges

  • Tight schedule and bottlenecks
  • Long learning curve
  • Distribution of knowledge
  • New projects and requests every year
  • Extensive changes every year
  • Psychometric challenges
  • Need to work in parallel channels
  • Exceptions and special cases
  • Extraction of historical data
  • Documentation

17/12/2015 Rama Project 43

slide-44
SLIDE 44

Successes

  • Meeting deadlines & quality control standards
  • Improving work processes, organization and

automation

  • Improving and extending data analysis and report

production

  • Reducing time for treating special cases
  • Designing and building a new project database

17/12/2015 Rama Project 44