- f the National Institute
for Testing & Evaluation Tzur Karelitz Sept 2016 Outline - - PowerPoint PPT Presentation
for Testing & Evaluation Tzur Karelitz Sept 2016 Outline - - PowerPoint PPT Presentation
of the National Institute for Testing & Evaluation Tzur Karelitz Sept 2016 Outline NITE: history, structure & objectives Overview of NITEs assessment services & activities A detailed look at the RAMA project:
Outline
NITE: history, structure & objectives Overview of NITE’s assessment services & activities A detailed look at the RAMA project:
- Analyzing and reporting data from large scale assessments
and survey for the Israeli Ministry of Education.
2
NITE: History & Structure
Established in 1981 by a consortium of the universities in Israel in order to centralize admissions testing for applicants to higher education. NITE is a public, not-for-profit organization, that is supervised by a board of directors (representatives of founding universities). NITE staff includes about 130 professionals: item writers, statisticians, psychometricians, computer programmers, graphic designers, language editors, and logistic & administrative staff.
3
NITE’s Organizational Structure
CEO: Dr. Anat Ben-Simon Deputy Director: Dr. Naomi Gafni Departments
- Test development
- Scoring
- Research
- Operations
- IT
- Finance & administrations
4
Units
- CAT
- Non-cognitive tests
- Test accommodation
- Computerized LD
diagnosis system
- Automated text analysis
- RAMA project
NITE: Main Objectives
Provision of assessment services primarily to institutions of higher education,
and also to the educational system (K-12), and other public organizations
Conducting research on admissions, placement, assessment and evaluation in institutions of higher education Advancement of the fields of measurement, testing and psychometrics in Israel
5
Typical Assessment Service
Design and development of assessments & tools Test registration Test administration & accommodations Scoring and quality control Reporting of results Conducting related research
6
Other Assessment Services
Expert item review Test translation/adaptation Computerizing P&P tests Analyzing test results Conducting evaluation projects Training professional staff Consulting organizations in Israel and abroad Teaching relevant academic courses
Tests in Higher Education
8
Test Admission to - Type ~N
PET Universities & colleges P&P / CAT 70,000 MEIMAD Pre-academic prep schools Online 6,000 MOR/MIRKAM Medical schools Assessment Centers 1,700 MITAM
- Psych. graduate studies
P&P 1,300
And multiple, smaller-scale, admission & proficiency tests to various programs and organizations.
Test Proficiency in - Type ~N
AMIR/AMIRAM English P&P / CAT 45,000 YAEL Hebrew P&P 25,000
9
The Psychometric Entrance Test (PET)
Admission to higher ed. is based on the mean of PET and school matriculation exams grades (BAGRUT).
PET is a scholastic aptitude test consisting of:
- Verbal Reasoning (~60 MC items)
- Writing (one task)
- Quantitative Reasoning (~50 MC items)
- English as a Foreign Language (~ 55 MC items)
PET is a standardized P&P test given 5 times a year
- It takes about 3.5 hours to complete
- Scores range 300-800, mean 540, SD=110.
- Adapted versions for examinees with disabilities
PET is translated into: Arabic, Russian, English, French,
and Spanish (and sometimes to Italian & Portuguese)
Other projects
MATAL- a comprehensive, standardized,
computer-based test battery for the diagnosis
- f learning disability.
- Aiding the provision of test accommodations in
higher education.
HLP- computational tools for analyzing and rating Hebrew texts.
- NiteRater- automatic essay rating system.
ICAP- an initiative to advance educational measurement and psychometrics in Israel.
10
The RAMA Project: Analyzing and reporting results from large-scale tests and surveys for the Ministry of Education
Outline
- Background
- Team
- Main projects
- Tasks
- Tools
- School Climate and Pedagogical Environment
(CPE) surveys
- Growth and Effectiveness Measures for Schools
(GEMS)
- Report production
- Challenges and successes
17/12/2015 Rama Project 12
Background
- A 5-year, variable-quota contract issued by the MOE, which
began in 2012. Expected to be renewed in 2017. Providing services to “RAMA” – The National Authority for Measurement and Evaluation in Education (a branch of MOE). Main project cycle is between May and January, peaking in July-October.
17/12/2015 Rama Project 13
Cleansing, Analysis, Reporting NITE , Human Rating & Data Entry Adminstration TALDOR Test/Survey Development Climate: RAMA Achievement: CET
The RAMA Team
17/12/2015 Rama Project 14
Eran DBA Eliran Analyst Evgeny Analyst Shaul Analyst Valla Analyst Nethanel
Report Production
Matan Manager And 6 part- time assistants Tzur Psychometrician
Main Projects - 2015
- Growth and Effectiveness Measures for Schools (GEMS):
Achievement Tests for 5th & 8th grades
- First language (Heb/Arab), Math, English, Science & Technology
- About 200,000 records per year
- Climate and Pedagogical Environment (CPE) Surveys
- 5th – 9th grade : about 150,000 students and 12,000 teachers per year
- 10th & 11th grades: about 50,000 students and 8,000 teachers per year
Results for surveys and tests are reported on 4 levels:
- Schools (1/3 of the country): about 900 elementary & middle schools,
and 300 high schools, per year.
- Municipalities: about 100 per year
- Districts: 8 every year
- National: by language, school type (secular/religious), sub-groups
within the Arab population, SES.
- Hebrew as a Second Language for Arabic speakers
- 7,000 6th grade students (test and survey) and 600 teachers (survey).
- Results reported nationally.
17/12/2015 Rama Project 15
Main Project Tasks
17/12/2015 Rama Project 16
- Database design and maintenance
- Data cleansing
- Dealing with inconsistent, missing, corrupted, inappropriate or
duplicate data in surveys, tests and background information
- Quality control of surveys prepared by RAMA
- Maintenance of item properties in the database
- Factor analysis for surveys and tests
- Item analysis for surveys and tests
- Scoring using classical scaling and calibration
- Aggregations and norms
- IRT analyses (sometimes)
Parallel channel
(cont.) Main Project Tasks
17/12/2015 Rama Project 17
- Extraction of historical comparison data
- Dealing with special cases
- Generation of reports for project control & monitoring
- Automatic generation of personalized reports
- levels: school, district, municipality, national
- Human and automatic quality control of reports
- Language and format editing of reports
- Writing insights and conclusions based on results
- Preparation of CDs and envelopes for mailing
- Secondary data analysis (research questions)
- Documentation and technical reports
Tools
17/12/2015 Rama Project 18
- SQL – Data management, cleansing and history
- SAS, SPSS – Data manipulation and analysis
- Parallel channel: All the main analyses are performed
separately by different analysts using different codes and software, the results are compared, and inconsistencies are resolved.
- VBA – automation, quality control, reports’ post-
processing
- Magic Publisher – Inserting SPSS output into Word
- Winsteps – IRT
- Word – Reports
- Excel – Everything…
School Climate and Pedagogical Environment (CPE) Surveys
Goal of CPE Reports
17/12/2015 Rama Project 20
The surveys aim to provide a detailed picture of various aspects of the social climate and pedagogical processes in schools, that are essential for educational quality: satisfaction, relationship within the school community, security and safety, discipline and behavioral issues, emotional-motivational aspects, teaching-learning-assessing processes, value-based education, etc. The reports are meant to help school personnel to set evidence-based goals, and to track, plan and monitor important aspects of schooling such as inter-personal relationships, educational interaction, motivation and educational aptitude among students.
Reported Indicators, 2015
17/12/2015 Rama Project 21
Climate Pedagogy
An overall positive attitude among students towards school Practices of quality teaching-learning-assessment Close relationship and caring between teachers and students Class assignments that promote inquiry learning Positive relations between students and their peers Self-learning strategies Involvement in violent incidents Receiving feedback to promote learning Digital violence via social networks and on the Internet School's efforts to encourage social and civic involvement Verbal violence School's efforts to promote tolerance of diversity School’s efforts to encourage a sense of safety School trips & tours Proper behavior of students in the classroom Students’ recreational activities Teachers’ satisfaction with school Differentiated instruction at school Involvement of parents in school Giving feedback to promote learning Teachers’ lack of a sense of safety Teamwork at school Competence, curiosity and interest in learning Collaborative learning in school School's efforts to encourage motivation and curiosity among students Teachers’ professional development
Parameter Report
- Main source of information about survey items
- Content, type, instructions for coding and analyzing, conditioning, etc.
- Effective and uniform communication with RAMA
- Improved automation, quality control, code design
17/12/2015 Rama Project 22
Miss. val code ID Code ID Recode Miss. vals Resp. range Use? Indicator name
Item response Item text Code type
Core/ version Item ID Item #
Coding Items and Aggregating Indicators
Coding
- Dicho: Strongly Agree & Agree are coded as 1, rest=0
- R_Dicho: Strongly Disagree & Disagree are coded as 1, rest=0
Aggregating
- Calculate the mean of coded items, across all respondents.
- mean is calculated within the desired aggregation level (e.g., mean of each
item within each language group, or school type).
- Calculate the mean of item means, for all the items that belong to
the indicator.
- The result is the mean % of respondents advocating the statements that
constitute the indicator.
- For categorical items, we calculate the number of respondents
who selected each category within each aggregation level.
Historical comparison data
- Past results are extracted from the DB and presented in reports.
- Exclusionary rules must be followed because in some cases, past data
should be censored.
23
Aggregation Levels
- School level:
- Whole school
- Across age groups: 5th –6th grades, 7th –9th grades, 5th –9th
grades
- Within age groups: 5th, 6th, 7th, 8th, 9th
Comparison norms:
- Language: Hebrew, Arabic
- Across age groups : 5th –6th, 7th –9th, 5th –9th grades
- Educational authority: religious, secular
- Arabic sub-groups: Arab, Druze, Bedouin
- SES: low, medium, high
17/12/2015 Rama Project 24
National CPE Report
17/12/2015 Rama Project 25
Positive general attitude towards school (for students)
All schools (2008-2015) Hebrew schools (2008-2015) Arabic schools (2008-2015) % agreeing
5-6 7-9 10-11 2008 2009 2010 2011 2012 2013 2014 2015
5-6 7-9 10-11
Goal of the GEMS Achievement Tests
To examine the extent to which elementary and middle school students are performing at the expected level according to the curriculum in four core subjects: First Language (Hebrew/Arabic), Mathematics, English, and Science and Technology.
26
Achievement Tests – Background
Each school participates once every 3 years, by taking all relevant tests.
- Every year, ⅓ of the schools take the external tests (data is
sent to RAMA), and ⅔ use internal tests (same test, but data stays in school).
27
Subject 5th Grade 8th Grade English
Science & Technology
Math
Language
Tests (post-2014)
Achievement Tests – Forms
- Every year:
- Two operational forms in Hebrew -> translated into Arabic
- Two pilot forms (next year’s test, administered securely)
- Form adaptations for Ultra-Orthodox population
- Students with special needs are tested with accommodations
The test forms cover:
- Main topics from the curriculum
- A range of skills, abilities and levels of thinking
The forms are composed of:
- Multiple-choice items
- Open-ended items (dichotomous or polytomous scoring)
- Matching items
- Items with multiple scoring dimensions
- Listening comprehension items
- Multi-stage items
- Testlets
28
- Scoring items based on the parameter report
- Parameter reports are created by the test developers and
contain information needed to score & aggregate items.
- Calculations are performed in two parallel channels
and compared.
- SAS: Macro-based code gets input from parameter report
- SPSS: VBA code generates a SPSS analysis code based
- n the parameter report
- Calculations are also triple-checked by RAMA.
- Calculating a total score for each examinee
- Calculating sub-scores for each examinee
29
Achievement Tests – Scoring
Achievement Tests – Item Analysis
- Using VBA, analysis outputs are compared, pasted
into excel, and formatted to highlight problems.
- Item analysis includes:
- Descriptive statistics of the total score and sub-scores
- Reliability analyses of the total score and sub-scores
- Correlations between items and total scores
- Item response and score distributions (+ total score means)
- Correlations between total score and sub-scores
- Graphical item analysis (response distribution over total score)
- Analysis of time and effort from self-report data
- DIF analysis includes:
- Form A vs. Form B
- Hebrew form vs. Arabic form
- Ultra-Orthodox form vs. regular forms (Jewish sample only)
- Boys vs. girls within language
- Pilot form vs. operational form (for making equating decisions)
30
Item Analysis – Graphical Aids
31
0% 20% 40% 60% 80% 100% 10 30 50 70 90 Percentage of Students Total Score
Item Q12
Answer 1 Answer 2 Answer 3 Answer 4 Answer 98 Answer 99
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Score on form 31 Score on form 30
Score Comparison- Forms 30 & 31
- 0.2
- 0.15
- 0.1
- 0.05
0.05 0.1 0.15 0.2
S8_0 S9 S10_0 S11_0 S12_0 S13 S14 S15_0 S16 S17_0 S18_0 S19_0 S20_0 S21_0 S22_a S22_b S23_a S24_a S24_b S26_0HAM S26_0LAS S26_0TMI
Difference
Score Difference- Form 30 & 31
חסונ
Achievement Tests – Scaling
- Goal: Transforming all raw scores to a scaled score,
uniform across different forms of a test. This scaled score will be used for reporting total score and sub- scores to schools.
- Assumptions:
- Equivalent populations within a year
- Translation does not affect the difficulty of the form
- Method: Linearly scaling all forms to the main
- perational form (Form A)
32
α=S(A)/S(B) β= M(A)- M(B) × α
Equating Achievement Test Scores
- Goal:
- Transforming the GEMS scores into a uniform scale across
years, to aid interpretation.
- Background:
- Established in 2008, with mean=500 and SD=100.
- Applies only to the total scores, not the subscores.
- Method:
- Previous year’s pilot forms are linked to current year’s
- perational forms using anchor items.
- Building an “equating chain” (using both linear equating
and Tucker method) to obtain parameters for transforming this year’s scaled score to the multi-year score.
33
Multi-year Equating of GEMS Tests
34
5 2
Pilot 2014 Form A 2015 Form B 2015 Tucker: α2, β2 Form A 2014 Multi-year score α5, β5 Parameters for multi-year score, calculated last year
1 3 4 6
α1=S(B)/S(A) β1= M(B) - M(A)×α1 α3=S(A*)/S(A1*) β3= M(A*) - M(A1*)×α3 α4= α1× α2× α3 β4=α2× α3× β1+ α3× β2+ β3 α6= α4× α5 β6= α5× β4+ β5
Achievement Tests – Aggregations
- Reported results:
- Mean scores, quartiles and percentiles, attitudes
towards the subject, # examinees and response rates
- Historical comparison data
- Aggregation levels:
- Grade, school, municipality, district,
language, educational supervision, national
- Excluding special needs, recent
immigrants & ultra-orthodox schools
- Using sampling & nonresponse weights
- Segmentation within aggregation levels: student
SES, school SES, special needs
35
445 450 465 487 520 511 518 511 508 521 526 548 558 556 562 559
350 400 450 500 550 600 650 זס חס טס ע אע בע גע דע הע Multi-year score mean תיברע ירבוד תירבע ירבוד
Math- 5th grade
Hebrew Arabic
07 08 09 10 11 12 13 14 15
Report Production
17/12/2015 Rama Project 36
- Building & designing report templates
- Mapping reports: which datum goes where?
- Bookmarking for automatic data insertion
- Dissecting a report into sub-reports
- Running SPSS code for creating the report’s XML
- Inserting data into the report using Magic Publisher
- Post-processing – deletions, comments, visual editing
- Dealing with exceptions and special cases
- Performing manual & automatic data checking
- Burning CDs, preparing envelopes, uploading files to web
- A typical school report contains 150 pages, 50 tables, 50
graphs, and over 5,000 data points.
- The municipal, district and national reports are even bigger
- These numbers increased by more than 150% since 2012.
Magic Publisher
A tool for inserting SPSS output into Word and PPT
17/12/2015 Rama Project 37
Generating Report Maps
17/12/2015 Rama Project 38
Table template Table map Table map check
Hebrew Math English Subject Grade
Number of examinees Percentage of examinees
5 5 5
Final table after data insertion
17/12/2015 Rama Project 39
Report Template
Tables after Data Insertion
17/12/2015 Rama Project 40
Using Macros to Build Tables & Graphs
17/12/2015 Rama Project 41
Dealing with Exceptions
- Low response rates
- Cheating
- Exemption
- Refusal
- Bilingual schools
- Exceptions in testing conditions
- Exceptions in historical comparisons for
particular schools or groups of schools
17/12/2015 Rama Project 42
Challenges
- Tight schedule and bottlenecks
- Long learning curve
- Distribution of knowledge
- New projects and requests every year
- Extensive changes every year
- Psychometric challenges
- Need to work in parallel channels
- Exceptions and special cases
- Extraction of historical data
- Documentation
17/12/2015 Rama Project 43
Successes
- Meeting deadlines & quality control standards
- Improving work processes, organization and
automation
- Improving and extending data analysis and report
production
- Reducing time for treating special cases
- Designing and building a new project database
17/12/2015 Rama Project 44