How far down the digital road will EL assessment go? TECHNOLOGY FOR - - PowerPoint PPT Presentation

how far down the digital road will el assessment go
SMART_READER_LITE
LIVE PREVIEW

How far down the digital road will EL assessment go? TECHNOLOGY FOR - - PowerPoint PPT Presentation

How far down the digital road will EL assessment go? TECHNOLOGY FOR TEACHERS IN ASSESSMENT THE IMMEDIATE FUTURE 1&2 November, 2018 Alex Thorp Lead Academic - Europe Trinity College London English qualifications for real-world


slide-1
SLIDE 1

Trinity College London

English qualifications for real-world communication

TECHNOLOGY FOR TEACHERS IN ASSESSMENT – THE IMMEDIATE FUTURE 1&2 November, 2018 Alex Thorp Lead Academic - Europe

How far down the digital road will EL assessment go?

slide-2
SLIDE 2

Overview

  • 1. Back to the start – Introduction
  • 2. Introducing AI – history and definitions
  • 3. AI and language – NLP
  • 4. Chatbots
  • 5. AI and Language assessment

(Speaking focus)

  • 6. Case study – Communicative competence
  • 7. Summary
  • 8. Test evaluation – the 3 c’s
  • 9. Future considerations
slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Introduction – true or false?

Current AI still hugely limited, processing equivalent to a 2 year old AI and, more particularly NLP, can now offer a fully automated 4-skill assessment solution AI dates back as far as the 1950s The human brain provided the model for modern machine learning That which humans find easy, computers find difficult – and vice versa Elon Musk labelled AI ‘a fundamental risk to the existence of civilization’ Machine scoring is more reliable than human scoring I’ve utilized AI this morning!

slide-5
SLIDE 5
slide-6
SLIDE 6

Spot the odd one out?

slide-7
SLIDE 7

Name of tool Developer Language learning/testing Write & Improve English Language iTutoring Learning Write & Improve +Class View English Language iTutoring Learning Write & Improve +Test Zone English Language iTutoring Testing Read & Improve (coming soon) English Language iTutoring Learning Duolingo Duolingo Learning / testing e-Rater ETS Testing Writing Mentor ETS Learning Language Muse Activity Palette ETS Learning / testing AuraLang AuraLang Learning BetterAccentTutor Better Accent Learning TriplePlayPlus Syracuse Language Systems Learning Test of English Language Learning Pearson Testing Intelligent Essay Assessor Pearson Testing IntelliMetric Vantage Learning Testing MyAccess! Vantage Learning Learning Project Essay Grade MI Learning / testing Summary table of identified commercially-available language learning and language testing tools. Gillings et al. 2018

Computers as ‘tutors or tools’?

slide-8
SLIDE 8

Introducing AI

slide-9
SLIDE 9

Coding is the application of linguistic resource through a range

  • f cognitive

processes to generate meaning – often described as competences

Back to basics - Communication cycle

slide-10
SLIDE 10

Coding is the application of linguistic resource through a range

  • f cognitive

processes to generate meaning – often described as competences

Back to basics - Communication cycle

slide-11
SLIDE 11
slide-12
SLIDE 12

Back to the beginning

29’086 measures barley 37 months. Kushim A clay tablet with an administrative text from the city of Uruk, c.3400–3000 BC. Probably

  • ur first ever recorded code. If Kushim was indeed a person, he may be the first

individual in history whose name is known to us! Y N Harari 2015

slide-13
SLIDE 13

Let’s go back

Partial scripts Numerical partial script became the language of advancement As societies developed external codes required to cope with sociological demands to support larger collectives Full scripts Unto the era of computers….

slide-14
SLIDE 14

Can computers think like humans?

H Simon and A Newell – Pittsburgh 1955. A thinking machine?

slide-15
SLIDE 15

Can computers think like humans?

Alan Turing 1948 1st chess programme How to overcome Combinational Explosion? How to give intelligence to make good decisions? Turing developed rules to guide.

slide-16
SLIDE 16

The birth of Classical AI

A problem defined, a set of programmed rules applied (Heuristics) Could plan complex

  • perations in highly

controlled environments Could deliver maximum efficiency and economy But classical AI couldn’t engage with it’s environment

slide-17
SLIDE 17

Our world is a little more… chaotic

slide-18
SLIDE 18

Enter Machine learning

System’s ability to learn for themselves from raw data (training datasets) System’s learn from first principles – from structure in data, and seeks potential solutions to problems

  • Image recognition
  • Voice recognition
  • Optical character recognition
  • Advanced customisation
  • Intelligent data analysis
  • Sensory data analysis
  • Model (predicts) based on Parameters
  • Input to inform (training data)
  • Learner (adjusts parameters through differences in prediction and actual)
slide-19
SLIDE 19

1960’s – Bayesian methods introduced for probabilistic inference 1980’s – back propagation 1990’s: Shift from knowledge to a data driven approach – analysis

  • f large amount of

data >1990s: Support Vector machines and Recurrent Neural Networks 2010>: ANN and Deep learning

Enter Machine learning

Machine learning: Algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions. The algorithm needs to be told how to make an accurate prediction

slide-20
SLIDE 20

The Moravec Paradox

The things that our brains find difficult to cope with, that require a lot of conscious mental effort, like chess, were simple for AI. The things that our brains find easy to cope with, that require a little conscious mental effort, like making sense of what we see and hear, or movement, were very difficult for AI “We are prodigious Olympians in perceptual and motor areas… abstract thought though is a new trick.. We’ve not yet mastered it” (Moravec 1988)

slide-21
SLIDE 21

How does ML work? Enter Artificial Neural Networks

You recognised a dog instantaneously, by the firing of choral assemblies of neural networks

slide-22
SLIDE 22

Neural Networks consist of the following components

  • An input layer, x
  • An arbitrary amount of hidden

layers

  • An output layer, ŷ
  • A set of weights and

biases between each layer, W and b

  • A choice of activation function for

each hidden layer, σ.

Enter Artificial Neural Networks

slide-23
SLIDE 23

Artificial Neural Networks

Is there a full stop? Is there a capital? Is it at start of para? Is there a subject ? Is there an

  • bject?

Is there a noun? SVOCA ? OC? VA? Sentence

sample Sentence sample Sentence Non sentence

Training data

slide-24
SLIDE 24

Training data – each time we tell it what it’s looking at, it tweaks the connections to better recognise what it’s looking for. AI is now booming

  • Optimise harvesting
  • Interpret medical images
  • Grading students
  • Id financial opportunities
  • Driverless cars

AI ANN : taught then develops

10’s of 1000’s of simulations every second and chooses to do the best one

slide-25
SLIDE 25

Enter Deep Learning

Solve intelligence. Use it to make the world a better place. (Mission statement – DeepMind)

Demis Hassabis - CEO

Entering a process (e.g. playing a game) through a ‘learning algorithm’ that changes millions of connections in a neural network to reinforce or stop an action to improve the desired outcome (not task-based algorithm) Uses Representation Learning – automatically discovers characteristics needed for feature detection or classification of raw data, that is then used to perform a task Deep learning: ML requires input – DL can learn by itself through learning algorhithm. E.g. Automatic light – ML accepts only ‘dark’, DL would learn ‘I can’t see’

slide-26
SLIDE 26

Could a DL neural network system go beyond human understanding? AlphaGo played a completely unpredictable move – can come up with a new idea beyond the remit of human thought….

Let’s ‘Go’

In DL systems, the algorithm learns how to make accurate predictions through its own data processing (ML needs to be told).

slide-27
SLIDE 27

AI Limitations

Can find patterns in, and learn from, data, but no real understanding of what those patterns actually mean, there is no meaningful conceptual thinking.

  • Patterns in complex data
  • Convert data into meaningful concepts
  • Process ‘predictable’ (images /
  • utcomes)
  • ‘Understand’ content or images – easily

tricked With no real conceptual understanding of patterns – hardest challenge of all is ability that relies on exactly this - language Prof Al Khalili

  • Data engagement beyond

human capacity

  • Operate autonomously – based
  • n training datasets
slide-28
SLIDE 28

AI and language

slide-29
SLIDE 29

Recognise these? Chatbot NLP NLU ASR NLG AI SDS DMS

slide-30
SLIDE 30

Coding is the application of linguistic resource through a range

  • f cognitive

processes to generate meaning – often described as competences

Communication cycle

slide-31
SLIDE 31

AI in language - NLP

Automated Speech Recognition (ASR) Speech generation Text recognition Text generation (NLG) [Response driven] When was Elvis born?

slide-32
SLIDE 32

AI in language – Speech Recognition

Limited until advent

  • f AI and Machine

Learning techniques

Collect waveforms (phonetic input) Fast Fourier Transform = spectogram Identifies resonances

  • f production

Labels ‘Formants’ recognising phonemes, words and phrases Converts to text – ‘best fit hypothesis’

slide-33
SLIDE 33
slide-34
SLIDE 34

ASR Challenges – Who ate all the cake?

I think David ate all the delicious chocolate cake. Tonic / Keywords / Onset – Volume / Pitch / Length / Pausing Remarkable number of variables - immense amount of comparative data to be processed to arrive at correct hypothesis as to meaning beyond denotation. Yet any communication act is a combination of oral production and non- verbal cues, paralinguistics and contextual parameters.

slide-35
SLIDE 35

AI in language – Speech Recognition

Formants – limited with 44 phonemes and syntactic training If only it were that easy:-) Requires a ‘Language model’

slide-36
SLIDE 36

Automatic Speech Recognition (ASR)

Speech signal (audio) Decoding Orthographic representation

Language models Acoustic model Lexical data

Training data

Learns with more training data INPUT OUTPUT

slide-37
SLIDE 37

Text recognition

Fails if can’t parse sentences – higher risk when rules based Can process sentence meaning (denotative) Tag sentence structure (syntactic) Parse tree - tag words with likely part of speech Phrase structure rules (e.g. parts of speech)

slide-38
SLIDE 38

AI in language - NLP

Automated Speech Recognition (ASR) Speech generation Text recognition Text generation (NLG) [Response driven] When was Elvis born?

slide-39
SLIDE 39

Natural Language Generation (text)

Fails if can’t access relevant semantic meaning – higher risk when rules based Produces sentence ‘parsed text’ related to meaning (denotative) Knowledge Graph generated (Google 70b+ facts end 2016) Exploits web of semantic information (entities linked through meaningful relationships) Codifying of language applied

slide-40
SLIDE 40

Speech synthesis

  • Speech recognition in reverse
  • Text broken into phonetic

elements

  • Speech sound generated
  • Rules of phonemic

representation manipulable

  • ML can extrapolate models

from input (training) data

slide-41
SLIDE 41

Putting the pieces together

So a computer can…

  • Convert our speech to text
  • Establish meaning
  • Generate a text response
  • Convert this test to speech

But can it have a meaningful conversation?

slide-42
SLIDE 42

Spoken Dialogue Systems (SDS)

SDS use both speech and NLP technologies to enable extended human-machine conversation. Determine appropriate system response Commercially driven to achieve success in constrained conversation to achieve a specific scenario’s goal (Litman et al, 2016). Limited application in assessing interactive language.

slide-43
SLIDE 43
  • DMS uses ASR and NLU, in conjunction with an internal representation of ‘system state’

SDS – Dialogue Management system

  • Limited number of ‘states’ – interaction at any point represented by one ‘state’
  • Each utterance moves the interaction from one ‘state’ to another
  • Applicable to mapped dialogues (scripted)
slide-44
SLIDE 44

System ask ‘Do you live in a town or the countryside?’ System Ask: Which town do you live in? System ask: How far is the nearest town? System say: (not_understood)

SDS – Dialogue Management system

NLU – live

  • Town

NLU – live - Countryside NLU – live

  • ?

‘Finite state machine’ – predictable path of interaction – not spontaneous

slide-45
SLIDE 45

Summary - AI in language - NLP

Automated Speech Recognition (ASR) Speech generation Text recognition Text generation (NLG) [Response driven] When was Elvis born?

slide-46
SLIDE 46

Chatbots

slide-47
SLIDE 47

Can AI simulate human interaction?

Chatbots – several programmes simultaneously analysing output, these generate wide range of hypothetical responses and choose that which is most likely to prolong dialogic exchange:

  • Person bot – personality with character and baseline facts
  • Rapport bot – find out about you and interests
  • Wikibot – seek facts based on conversation content
  • A ranking function – choosing the best response

Heriot-Watt University Alana the bot Prof Oliver Lemon

slide-48
SLIDE 48

Can AI simulate human interaction?

In communication there is a lot more going on than just words. Whilst AI can recognise complex patterns it cannot understand concepts. AI still very limited in terms of:

  • Pragmatics
  • Socio-linguistic competence
  • Strategic competence
  • Co-constructed dialogue / authentic exchange
slide-49
SLIDE 49

Let’s have a chat to a bot

https://www.masswerk.at/eliza/

Eliza the psychotherapist

  • Heuristic engine
  • Mitsuku. ML engine

60b+ messages processed https://www.pandorabots.com/mitsuku/

slide-50
SLIDE 50

Chatbot - task

What went right? Why do you think it worked? What did not work so well? What do you think was the cause of the communication break-down?

In pairs, sharing a device, have a chat with Mitsuku (or alternative conversational chatbot) 1: Try a simple interaction 2: Try a more demanding dialogue

slide-51
SLIDE 51
slide-52
SLIDE 52

AI and language assessment

(speaking)

slide-53
SLIDE 53

AI and language assessment (Productive skills)

Machine scores automatically generated

Utilise set criteria and dependent variables (e.g. repeat accuracy, length of

production, fluency, vocabulary, grammar and pronunciation)

Compared to reference scores (manually set)

slide-54
SLIDE 54

ASR - automated and human correlation

  • Correlations improve with longer utterances (Bernstein, 2012; Neumeyer et al, 2000)
  • Repeat accuracy = high correlation (0.92) (Graham et al, 2008)
  • Repeat accuracy used as predictor of oral proficiency
  • Further high correlation studies as predictors (Cook et al, 2011; De Wet et al, 2009)
  • Predictive measures for fluency stronger for read speech rather than

spontaneous (Cucchiarini et al, 2010)

  • Correlations higher for rate of speech and accuracy compared to ‘goodness of

pronunciation’ (Müller et al, 2009)

slide-55
SLIDE 55

AI scores – case studies

Pearson PTE ETS – TOEFL iBT

System Versant Speechrater Task example Read aloud Repeat sentence Short answer Opinion on familiar topic Speak based on reading (total 6 tasks) Scoring includes Pronunciation Fluency Vocabulary Sentence mastery Pronunciation Fluency Grammatical facility Topical coherence Idea progression (Multiple regression scoring) Correlation 0.84 - 0.92 0.73 Construct Psycholinguistic (Van Moere, 2012) Direct and immediate interaction (Butler et al, 2000) Predictive Ability to use core language in real time / use lexis to build phrases and clauses and articulate Contextualised and limited restriction – account for content, coherence and interactive (but task monologic)

Adapted from Litman et al, 2018 Only practice tests subject to further research

slide-56
SLIDE 56

Automatic Speech Recognition (ASR) in assessment

  • Repeat accuracy
  • Length of production
  • Fluency (rate of speech)
  • Vocabulary – complexity and

accuracy

  • Grammar – complexity and

accuracy

  • Pronunciation (compared to

reference acoustic model)

  • Test task limited: e.g. elicited imitation,

reading aloud or short free responses

  • Limited opportunity for spontaneous
  • r dialogic speech
  • Copes with transactional rather than

interactional dialogue

slide-57
SLIDE 57

Dialogue Management system – Finite state

At utterance level ‘States’ created (for example) Syntactic analysis

  • Grammar errors
  • (NLU:Grammar = No)

Semantic analysis

  • Meaning for

expected answer

  • Detail or gist

Pragmatics

  • Politeness
  • Contextual

coherence Acoustic input

  • Prosodic

features

  • Fluency

Alternative ‘State tracking’ –DMS gives probability of path Less ASR errors and can resolve ambiguities further in dialogue

Potential application of holistic scales including CEFR. (Shashidhar et al, 2015)

slide-58
SLIDE 58

Spoken Dialogue Systems (SDS) in assessment (Finite state)

  • User tolerance of recognition

errors

  • Pedagogical value of

misrecognised utterances

  • Narrow domain scenario-guided

conversation

  • Useful for constrained and

transactional dialogues

  • Applicable where semi-scripted

dialogue used

  • Conversations simple and constrained
  • Based on L1 competent model
  • Test-takers have limited speaking skills,

SDSs – designed to process speech from proficient users

  • Most conversational responses are not

right or wrong (as required from tutorial dialogue system technology)

  • SDS needs to be easily configurable by

language experts

  • Limited training data (despite machine

learning)

slide-59
SLIDE 59

The assessor will need to consider the extent to which their construct can accommodate [SDS’s] deviation from authentic dialogue (Litman et al, 2018)

Spoken Dialogue Systems (SDS)

Opportunities for spontaneous yet non-conversational speech, within constrained domain State tracking SDSs – overcome ASR difficulties during dialogue

slide-60
SLIDE 60

Applying AI – case study

slide-61
SLIDE 61

Bachman & Palmer (2010): communicative competence model

Communicative Competence

Linguistic competence Socio-linguistic competence Discourse competence Strategic competence

Case study - Communicative competence

slide-62
SLIDE 62

Conversational features in co-constructed dialogue

Communicative competence - features

  • Higher level contextual user ability – often related to concept
  • Semantic and topical relationship – tied to utterance history
  • Appropriate conversational functions (e.g. ending dialogue)
  • Linguistic devices (referring expressions, prosody etc.)
  • Turn taking conventions (linguistic signalling etc.)
  • Conversation coordination (confirming understanding, recovering etc.)
  • .
  • .
slide-63
SLIDE 63
slide-64
SLIDE 64

Communicative competence

1: In groups of 3 or 4 you will be allocated one of the four competences.

  • Identify elements of one competence (e.g. Socio-cultural = register)
  • What parameters would need to be measured in spoken performance to

assess these elements?

  • Discuss if you think the AI systems covered today could be applied to assess

each element / the overall competence? 2: Cross-group into groups of 4, with one person covering each competence.

  • Share your ideas around the application of AI to the communicative

competences

slide-65
SLIDE 65

Summary

slide-66
SLIDE 66

Development

  • f AI over

time

Speaking constructs assessed

AI and automated language assessment

2018 2010 Linguistic competence

? ? ?

Fully automated language assessment Initial criticism as only narrow constructs could be assessed

slide-67
SLIDE 67

Construct or assessment engine?

Choose construct to assess, audit available technologies, and compensate for short- fallings with human intervention

  • r

Select available technologies, and align to construct they can cover, decide if compensation necessary To what extent can contained measures be used as indicators or predictors of overarching language proficiency?

slide-68
SLIDE 68

Construct – considerations

There is a rethinking of what speaking constructs could be….

  • Expand theoretical definition of interactional competence
  • Encompass co-constructive and dynamic dialogue
  • Engage personal cognitive and contextual factors
  • Incorporate digital literacies, human – machine interaction
  • Consider narrower / partial constructs as sufficient predictors of proficiency
  • Scope for plurilingual and translanguaging competencies
  • Inclusion of transferable skills and mediation

There is a long road ahead…

slide-69
SLIDE 69

Role of individual agency – impact on identity in test taking experience ‘I deserve to engage with a human’

AI in language assessment - identity

slide-70
SLIDE 70

To conclude - some predictions

  • Increasing number of collaborations between exam developers and

high-tech IT companies

  • Increasing use of blended modes of assessment delivery – digital /

human

  • Inclusion of digital literacies written into assessed constructs (coping

with latency, paucity of NVs or paralinguistics, digital interface engagement, mediating NLP shortcomings etc.)

  • Blended modes may include tasks of recorded human interaction that

machines score – but not actual interaction with the machine (to

  • vercome restrictions with SDSs etc.)
  • Commercial opportunities for establishing L2 spoken corpora at

differentiated CEFR levels for training datasets

  • Development of AI formative assessment engine integrated into

course delivery

slide-71
SLIDE 71

To conclude - some predictions

And in the long term….

  • Localisation to class-level through local-populated datasets driving

adaptive assessment on an ongoing and formative basis – mediated through individual devices…

slide-72
SLIDE 72
slide-73
SLIDE 73

Trinity College London

English qualifications for real-world communication

Alex Thorp Lead Academic, Language (Europe) alex.thorp@trinitycollege.com