SLIDE 1
Ioannis Caragiannis University of Patras Joint work with George - - PowerPoint PPT Presentation
Ioannis Caragiannis University of Patras Joint work with George - - PowerPoint PPT Presentation
Ioannis Caragiannis University of Patras Joint work with George Krimpas and Alexandros Voudouris massive : available to a large number of people (16-18 million students) online : through the internet/web open : no cost for the
SLIDE 2
SLIDE 3
www.edx.org www.coursera.org www.udacity.com > 100 employees each business model: verified certificates, head-
hunting (connecting students to industry), specializations, corporate collaborations
SLIDE 4
400+ universities 2400+ courses 22 out of the top-25 US universities 3000+ instructors TAs, video assistants 13 languages (80% english, 8.5% spanish,
french, chinese)
subjects: humanities, computer science,
business & management
SLIDE 5
SLIDE 6
Daphne Koller, Andrew Ng (Coursera founders):
- “… courses in the humanities and social sciences - in
which the material is more open to interpretation - have proven more complicated to translate into an
- nline format, especially when it came to the
assessment and grading of the students.”
SLIDE 7
What? Should result in quantitative information
- successfully completed her class, achieved a 9/10
(A+), ranked in the top 1% of her class of 100,000, etc
Why? Information in the verified certificate,
important for employers (new revenue source)
Who? Experts (graders, TAs) are costly A common solution: automatic grading
(multiple choice questions)
SLIDE 8
Highly unsatisfactory when evaluating the
students’ ability of
- proving a mathematical statement
- expressing their critical thinking over an issue
- demonstrating their creative writing skills
In these cases, assessment and grading is a
human computation task
Alternative solution: peer grading
- outsource the grading task to the students
SLIDE 9
How does it work?
- each student grades some of the other students’
assignments (as part of her own assignment)
Allowing the students to grade using cardinal
scores is risky:
- not experienced in assessing their peers’
performance in absolute terms
- have strong incentives to assign low scores
Solution: ordinal peer grading
SLIDE 10
Cardinal peer grading
- Piech, Huang, Chen, Do, Ng, & Koller (2013)
- Kulkarni, Wei, Le, Chia, Papadopoulos, Cheng, Koller,
& Klemmer (2013)
- Walsh (2014)
- de Alfaro & Shavlovsky (2014)
- www.crowdgrader.org
Ordinal peer grading
- Raman & Joachims (2014)
- Shah, Bradley, Parekh, Wainwright, & Ramachandran
(2014)
SLIDE 11
n students (exam papers) Distributing the exam papers: each student
gets k<<n exam papers to grade so that each exam paper is given to k students
Grading: each student ranks the exam papers
assigned to her
Rank aggregation: compute a global ranking
from the partial ranks
Goal: to come up with a global ranking that is
“as correct as possible”
SLIDE 12
Similarities:
- on input a profile of rankings, compute a final full
ranking
Differences:
- each student is simultaneously an alternative and a
voter
- voters do not have to rank all alternatives
- the alternatives to be ranked are decided externally
SLIDE 13
(n,k)-bundle graph: k-regular bipartite graph
G=(U,V,E) with |U|=|V|=n
U: exam papers (randomly assigned to nodes) V: graders Edge (u,v) with u in U and v in V indicates that
exam paper u will be given to student v
Warning! Nodes corresponding to a grader and
her exam paper should not be connected
SLIDE 14
The students participate in the exam and submit
their papers
Scenario I:
- the instructor announces indicative solutions and
grading instructions
- the students use this info when grading
Scenario II:
- no info by the instructor
- students’ grading performance is similar to their
performance in the exam
SLIDE 15
Basic assumption: there is a ground truth
ranking of the exam papers
Perfect grading: each grader ranks the k exam
papers she gets consistently to the ground truth
SLIDE 16
Quality measure: number of pairs of exam
papers which compare in the global ranking as in the ground truth
- .. or total number of pairs minus the kendall-tau
distance
- (bad) example: a random permutation recovers
correctly 50% of pairwise relations on average
SLIDE 17
Find the minimum-degree (n,k)-bundle graph
that guarantees that the whole ground truth is always recovered if perfect grading is used
1 2 3 4 5 6 7 1 2 3 4 5 6 7
graders exam papers k = Θ(n1/2)
SLIDE 18
1 2 3 4 5 6 7 1 2 3 4 5 6 7
graders exam papers k = Θ(n1/2)
Find a minimum-degree diameter-3 bipartite graph
Find the minimum-degree (n,k)-bundle graph
that guarantees that the whole ground truth is always recovered if perfect grading is used
Miller and Siran (2013)
SLIDE 19
Use much simpler bundle graphs
- E.g., any k-regular bip. graph for small values of k
- even by putting together Kk,k’s
- or a k-regular bip. graph not containing a 4-cycle
Aggregation rules
- plurality, approval
- Borda
- Random serial dictatorship
- Markov-chain-based aggregation rules
SLIDE 20
Each grader gives k+i-1 points to the exam
paper she ranks i-th
Global ranking is obtained by sorting the exam
papers in terms of non-increasing number of total points (Borda score)
Ties are broken randomly
SLIDE 21
Theorem: When Borda is applied on partial
rankings that are consistent to the ground truth, the expected fraction of correctly recovered pairwise relations is at least 1-O(1/k) when the bundle graph is 4-cycle-free and at least 1-O(1/k1/2) in general
SLIDE 22
SLIDE 23
Students have qualities in [1/2,1]
- ability to compare correctly two exam papers
(probability to find the correct outcome)
Qualities define the ground truth ranking σ* Grading according to a Mallows noise model
for generating random rankings
- each grader of quality p ranks each pair among the k
exam papers she gets as in σ* with prob. p and incorrectly with prob. 1-p
- if no ranking is defined, she repeats
C., Procaccia, & Shah (2013)
SLIDE 24
Comparison of Borda and RSD in 500 executions
(n = 1000, k = 8)
SLIDE 25
Theory:
- Is a 1-O(1/k2) fraction (or better) possible? Upper
bounds?
- Analysis for noisy grading?
- Impact of incentives?
Practice:
- Which is the most realistic noise model for grading?
- How do the methods considered perform in practice
(with real students)?
SLIDE 26
2 4 6 8 10 12 14 2 4 6 8 10