[PPT] - for Testing & Evaluation Tzur Karelitz Sept 2016 Outline PowerPoint Presentation

SLIDE 1

f the National Institute

for Testing & Evaluation

Tzur Karelitz Sept 2016

SLIDE 2

Outline

 NITE: history, structure & objectives  Overview of NITE’s assessment services & activities  A detailed look at the RAMA project:

Analyzing and reporting data from large scale assessments

and survey for the Israeli Ministry of Education.

2

SLIDE 3

NITE: History & Structure

 Established in 1981 by a consortium of the universities in Israel in order to centralize admissions testing for applicants to higher education.  NITE is a public, not-for-profit organization, that is supervised by a board of directors (representatives of founding universities).  NITE staff includes about 130 professionals: item writers, statisticians, psychometricians, computer programmers, graphic designers, language editors, and logistic & administrative staff.

3

SLIDE 4

NITE’s Organizational Structure

 CEO: Dr. Anat Ben-Simon  Deputy Director: Dr. Naomi Gafni  Departments

Test development
Scoring
Research
Operations
IT
Finance & administrations

4

 Units

CAT
Non-cognitive tests
Test accommodation
Computerized LD

diagnosis system

Automated text analysis
RAMA project

SLIDE 5

NITE: Main Objectives

 Provision of assessment services primarily to institutions of higher education,

 and also to the educational system (K-12), and other public organizations

 Conducting research on admissions, placement, assessment and evaluation in institutions of higher education  Advancement of the fields of measurement, testing and psychometrics in Israel

5

SLIDE 6

Typical Assessment Service

 Design and development of assessments & tools  Test registration  Test administration & accommodations  Scoring and quality control  Reporting of results  Conducting related research

6

SLIDE 7

Other Assessment Services

 Expert item review  Test translation/adaptation  Computerizing P&P tests  Analyzing test results  Conducting evaluation projects  Training professional staff  Consulting organizations in Israel and abroad  Teaching relevant academic courses

SLIDE 8

Tests in Higher Education

8

Test Admission to - Type ~N

PET Universities & colleges P&P / CAT 70,000 MEIMAD Pre-academic prep schools Online 6,000 MOR/MIRKAM Medical schools Assessment Centers 1,700 MITAM

Psych. graduate studies

P&P 1,300

And multiple, smaller-scale, admission & proficiency tests to various programs and organizations.

Test Proficiency in - Type ~N

AMIR/AMIRAM English P&P / CAT 45,000 YAEL Hebrew P&P 25,000

SLIDE 9

9

The Psychometric Entrance Test (PET)

 Admission to higher ed. is based on the mean of PET and school matriculation exams grades (BAGRUT).

 PET is a scholastic aptitude test consisting of:

Verbal Reasoning (~60 MC items)
Writing (one task)
Quantitative Reasoning (~50 MC items)
English as a Foreign Language (~ 55 MC items)

 PET is a standardized P&P test given 5 times a year

It takes about 3.5 hours to complete
Scores range 300-800, mean 540, SD=110.
Adapted versions for examinees with disabilities

 PET is translated into: Arabic, Russian, English, French,

and Spanish (and sometimes to Italian & Portuguese)

SLIDE 10

Other projects

 MATAL- a comprehensive, standardized,

computer-based test battery for the diagnosis

f learning disability.
Aiding the provision of test accommodations in

higher education.

 HLP- computational tools for analyzing and rating Hebrew texts.

NiteRater- automatic essay rating system.

 ICAP- an initiative to advance educational measurement and psychometrics in Israel.

10

SLIDE 11

The RAMA Project: Analyzing and reporting results from large-scale tests and surveys for the Ministry of Education

SLIDE 12

Outline

Background
Team
Main projects
Tasks
Tools
School Climate and Pedagogical Environment

(CPE) surveys

Growth and Effectiveness Measures for Schools

(GEMS)

Report production
Challenges and successes

17/12/2015 Rama Project 12

SLIDE 13

Background

A 5-year, variable-quota contract issued by the MOE, which

began in 2012. Expected to be renewed in 2017.  Providing services to “RAMA” – The National Authority for Measurement and Evaluation in Education (a branch of MOE).  Main project cycle is between May and January, peaking in July-October.

17/12/2015 Rama Project 13

Cleansing, Analysis, Reporting NITE , Human Rating & Data Entry Adminstration TALDOR Test/Survey Development Climate: RAMA Achievement: CET

SLIDE 14

The RAMA Team

17/12/2015 Rama Project 14

Eran DBA Eliran Analyst Evgeny Analyst Shaul Analyst Valla Analyst Nethanel

Report Production

Matan Manager And 6 part- time assistants Tzur Psychometrician

SLIDE 15

Main Projects - 2015

Growth and Effectiveness Measures for Schools (GEMS):

Achievement Tests for 5th & 8th grades

First language (Heb/Arab), Math, English, Science & Technology
About 200,000 records per year
Climate and Pedagogical Environment (CPE) Surveys
5th – 9th grade : about 150,000 students and 12,000 teachers per year
10th & 11th grades: about 50,000 students and 8,000 teachers per year

 Results for surveys and tests are reported on 4 levels:

Schools (1/3 of the country): about 900 elementary & middle schools,

and 300 high schools, per year.

Municipalities: about 100 per year
Districts: 8 every year
National: by language, school type (secular/religious), sub-groups

within the Arab population, SES.

Hebrew as a Second Language for Arabic speakers
7,000 6th grade students (test and survey) and 600 teachers (survey).
Results reported nationally.

17/12/2015 Rama Project 15

SLIDE 16

Main Project Tasks

17/12/2015 Rama Project 16

Database design and maintenance
Data cleansing
Dealing with inconsistent, missing, corrupted, inappropriate or

duplicate data in surveys, tests and background information

Quality control of surveys prepared by RAMA
Maintenance of item properties in the database
Factor analysis for surveys and tests
Item analysis for surveys and tests
Scoring using classical scaling and calibration
Aggregations and norms
IRT analyses (sometimes)

Parallel channel

SLIDE 17

(cont.) Main Project Tasks

17/12/2015 Rama Project 17

Extraction of historical comparison data
Dealing with special cases
Generation of reports for project control & monitoring
Automatic generation of personalized reports
levels: school, district, municipality, national
Human and automatic quality control of reports
Language and format editing of reports
Writing insights and conclusions based on results
Preparation of CDs and envelopes for mailing
Secondary data analysis (research questions)
Documentation and technical reports

SLIDE 18

Tools

17/12/2015 Rama Project 18

SQL – Data management, cleansing and history
SAS, SPSS – Data manipulation and analysis
Parallel channel: All the main analyses are performed

separately by different analysts using different codes and software, the results are compared, and inconsistencies are resolved.

VBA – automation, quality control, reports’ post-

processing

Magic Publisher – Inserting SPSS output into Word
Winsteps – IRT
Word – Reports
Excel – Everything…

SLIDE 19

School Climate and Pedagogical Environment (CPE) Surveys

SLIDE 20

Goal of CPE Reports

17/12/2015 Rama Project 20

The surveys aim to provide a detailed picture of various aspects of the social climate and pedagogical processes in schools, that are essential for educational quality: satisfaction, relationship within the school community, security and safety, discipline and behavioral issues, emotional-motivational aspects, teaching-learning-assessing processes, value-based education, etc. The reports are meant to help school personnel to set evidence-based goals, and to track, plan and monitor important aspects of schooling such as inter-personal relationships, educational interaction, motivation and educational aptitude among students.

SLIDE 21

Reported Indicators, 2015

17/12/2015 Rama Project 21

Climate Pedagogy

An overall positive attitude among students towards school Practices of quality teaching-learning-assessment Close relationship and caring between teachers and students Class assignments that promote inquiry learning Positive relations between students and their peers Self-learning strategies Involvement in violent incidents Receiving feedback to promote learning Digital violence via social networks and on the Internet School's efforts to encourage social and civic involvement Verbal violence School's efforts to promote tolerance of diversity School’s efforts to encourage a sense of safety School trips & tours Proper behavior of students in the classroom Students’ recreational activities Teachers’ satisfaction with school Differentiated instruction at school Involvement of parents in school Giving feedback to promote learning Teachers’ lack of a sense of safety Teamwork at school Competence, curiosity and interest in learning Collaborative learning in school School's efforts to encourage motivation and curiosity among students Teachers’ professional development

SLIDE 22

Parameter Report

Main source of information about survey items
Content, type, instructions for coding and analyzing, conditioning, etc.
Effective and uniform communication with RAMA
Improved automation, quality control, code design

17/12/2015 Rama Project 22

Miss. val code ID Code ID Recode Miss. vals Resp. range Use? Indicator name

Item response Item text Code type

Core/ version Item ID Item #

SLIDE 23

Coding Items and Aggregating Indicators

 Coding

Dicho: Strongly Agree & Agree are coded as 1, rest=0
R_Dicho: Strongly Disagree & Disagree are coded as 1, rest=0

 Aggregating

Calculate the mean of coded items, across all respondents.
mean is calculated within the desired aggregation level (e.g., mean of each

item within each language group, or school type).

Calculate the mean of item means, for all the items that belong to

the indicator.

The result is the mean % of respondents advocating the statements that

constitute the indicator.

For categorical items, we calculate the number of respondents

who selected each category within each aggregation level.

 Historical comparison data

Past results are extracted from the DB and presented in reports.
Exclusionary rules must be followed because in some cases, past data

should be censored.

23

SLIDE 24

Aggregation Levels

School level:
Whole school
Across age groups: 5th –6th grades, 7th –9th grades, 5th –9th

grades

Within age groups: 5th, 6th, 7th, 8th, 9th

 Comparison norms:

Language: Hebrew, Arabic
Across age groups : 5th –6th, 7th –9th, 5th –9th grades
Educational authority: religious, secular
Arabic sub-groups: Arab, Druze, Bedouin
SES: low, medium, high

17/12/2015 Rama Project 24

SLIDE 25

National CPE Report

17/12/2015 Rama Project 25

Positive general attitude towards school (for students)

All schools (2008-2015) Hebrew schools (2008-2015) Arabic schools (2008-2015) % agreeing

5-6 7-9 10-11 2008 2009 2010 2011 2012 2013 2014 2015

5-6 7-9 10-11

SLIDE 26

Goal of the GEMS Achievement Tests

To examine the extent to which elementary and middle school students are performing at the expected level according to the curriculum in four core subjects: First Language (Hebrew/Arabic), Mathematics, English, and Science and Technology.

26

SLIDE 27

Achievement Tests – Background

 Each school participates once every 3 years, by taking all relevant tests.

Every year, ⅓ of the schools take the external tests (data is

sent to RAMA), and ⅔ use internal tests (same test, but data stays in school).

27

Subject 5th Grade 8th Grade English

 

Science & Technology

 

Math

 

Language

   Tests (post-2014)

SLIDE 28

Achievement Tests – Forms

Every year:
Two operational forms in Hebrew -> translated into Arabic
Two pilot forms (next year’s test, administered securely)
Form adaptations for Ultra-Orthodox population
Students with special needs are tested with accommodations

 The test forms cover:

Main topics from the curriculum
A range of skills, abilities and levels of thinking

 The forms are composed of:

Multiple-choice items
Open-ended items (dichotomous or polytomous scoring)
Matching items
Items with multiple scoring dimensions
Listening comprehension items
Multi-stage items
Testlets

28

SLIDE 29

Scoring items based on the parameter report
Parameter reports are created by the test developers and

contain information needed to score & aggregate items.

Calculations are performed in two parallel channels

and compared.

SAS: Macro-based code gets input from parameter report
SPSS: VBA code generates a SPSS analysis code based
n the parameter report
Calculations are also triple-checked by RAMA.
Calculating a total score for each examinee
Calculating sub-scores for each examinee

29

Achievement Tests – Scoring

SLIDE 30

Achievement Tests – Item Analysis

Using VBA, analysis outputs are compared, pasted

into excel, and formatted to highlight problems.

Item analysis includes:
Descriptive statistics of the total score and sub-scores
Reliability analyses of the total score and sub-scores
Correlations between items and total scores
Item response and score distributions (+ total score means)
Correlations between total score and sub-scores
Graphical item analysis (response distribution over total score)
Analysis of time and effort from self-report data
DIF analysis includes:
Form A vs. Form B
Hebrew form vs. Arabic form
Ultra-Orthodox form vs. regular forms (Jewish sample only)
Boys vs. girls within language
Pilot form vs. operational form (for making equating decisions)

30

SLIDE 31

Item Analysis – Graphical Aids

31

0% 20% 40% 60% 80% 100% 10 30 50 70 90 Percentage of Students Total Score

Item Q12

Answer 1 Answer 2 Answer 3 Answer 4 Answer 98 Answer 99

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Score on form 31 Score on form 30

Score Comparison- Forms 30 & 31

0.2
0.15
0.1
0.05

0.05 0.1 0.15 0.2

S8_0 S9 S10_0 S11_0 S12_0 S13 S14 S15_0 S16 S17_0 S18_0 S19_0 S20_0 S21_0 S22_a S22_b S23_a S24_a S24_b S26_0HAM S26_0LAS S26_0TMI

Difference

Score Difference- Form 30 & 31

חסונ

SLIDE 32

Achievement Tests – Scaling

Goal: Transforming all raw scores to a scaled score,

uniform across different forms of a test. This scaled score will be used for reporting total score and sub- scores to schools.

Assumptions:
Equivalent populations within a year
Translation does not affect the difficulty of the form
Method: Linearly scaling all forms to the main
perational form (Form A)

32

α=S(A)/S(B) β= M(A)- M(B) × α

SLIDE 33

Equating Achievement Test Scores

Goal:
Transforming the GEMS scores into a uniform scale across

years, to aid interpretation.

Background:
Established in 2008, with mean=500 and SD=100.
Applies only to the total scores, not the subscores.
Method:
Previous year’s pilot forms are linked to current year’s
perational forms using anchor items.
Building an “equating chain” (using both linear equating

and Tucker method) to obtain parameters for transforming this year’s scaled score to the multi-year score.

33

SLIDE 34

Multi-year Equating of GEMS Tests

34

5 2

Pilot 2014 Form A 2015 Form B 2015 Tucker: α2, β2 Form A 2014 Multi-year score α5, β5 Parameters for multi-year score, calculated last year

1 3 4 6

α1=S(B)/S(A) β1= M(B) - M(A)×α1 α3=S(A*)/S(A1*) β3= M(A*) - M(A1*)×α3 α4= α1× α2× α3 β4=α2× α3× β1+ α3× β2+ β3 α6= α4× α5 β6= α5× β4+ β5

SLIDE 35

Achievement Tests – Aggregations

Reported results:
Mean scores, quartiles and percentiles, attitudes

towards the subject, # examinees and response rates

Historical comparison data
Aggregation levels:
Grade, school, municipality, district,

language, educational supervision, national

Excluding special needs, recent

immigrants & ultra-orthodox schools

Using sampling & nonresponse weights
Segmentation within aggregation levels: student

SES, school SES, special needs

35

445 450 465 487 520 511 518 511 508 521 526 548 558 556 562 559

350 400 450 500 550 600 650 זס חס טס ע אע בע גע דע הע Multi-year score mean תיברע ירבוד תירבע ירבוד

Math- 5th grade

Hebrew Arabic

07 08 09 10 11 12 13 14 15

SLIDE 36

Report Production

17/12/2015 Rama Project 36

Building & designing report templates
Mapping reports: which datum goes where?
Bookmarking for automatic data insertion
Dissecting a report into sub-reports
Running SPSS code for creating the report’s XML
Inserting data into the report using Magic Publisher
Post-processing – deletions, comments, visual editing
Dealing with exceptions and special cases
Performing manual & automatic data checking
Burning CDs, preparing envelopes, uploading files to web
A typical school report contains 150 pages, 50 tables, 50

graphs, and over 5,000 data points.

The municipal, district and national reports are even bigger
These numbers increased by more than 150% since 2012.

SLIDE 37

Magic Publisher

A tool for inserting SPSS output into Word and PPT

17/12/2015 Rama Project 37

SLIDE 38

Generating Report Maps

17/12/2015 Rama Project 38

Table template Table map Table map check

Hebrew Math English Subject Grade

Number of examinees Percentage of examinees

5 5 5

Final table after data insertion

SLIDE 39

17/12/2015 Rama Project 39

Report Template

SLIDE 40

Tables after Data Insertion

17/12/2015 Rama Project 40

SLIDE 41

Using Macros to Build Tables & Graphs

17/12/2015 Rama Project 41

SLIDE 42

Dealing with Exceptions

Low response rates
Cheating
Exemption
Refusal
Bilingual schools
Exceptions in testing conditions
Exceptions in historical comparisons for

particular schools or groups of schools

17/12/2015 Rama Project 42

SLIDE 43

Challenges

Tight schedule and bottlenecks
Long learning curve
Distribution of knowledge
New projects and requests every year
Extensive changes every year
Psychometric challenges
Need to work in parallel channels
Exceptions and special cases
Extraction of historical data
Documentation

17/12/2015 Rama Project 43

SLIDE 44

Successes

Meeting deadlines & quality control standards
Improving work processes, organization and

automation

Improving and extending data analysis and report

production

Reducing time for treating special cases
Designing and building a new project database

17/12/2015 Rama Project 44