Trinity College London
English qualifications for real-world communication
TECHNOLOGY FOR TEACHERS IN ASSESSMENT – THE IMMEDIATE FUTURE 1&2 November, 2018 Alex Thorp Lead Academic - Europe
How far down the digital road will EL assessment go?
How far down the digital road will EL assessment go? TECHNOLOGY FOR - - PowerPoint PPT Presentation
How far down the digital road will EL assessment go? TECHNOLOGY FOR TEACHERS IN ASSESSMENT THE IMMEDIATE FUTURE 1&2 November, 2018 Alex Thorp Lead Academic - Europe Trinity College London English qualifications for real-world
Trinity College London
English qualifications for real-world communication
TECHNOLOGY FOR TEACHERS IN ASSESSMENT – THE IMMEDIATE FUTURE 1&2 November, 2018 Alex Thorp Lead Academic - Europe
How far down the digital road will EL assessment go?
Overview
(Speaking focus)
Introduction – true or false?
Current AI still hugely limited, processing equivalent to a 2 year old AI and, more particularly NLP, can now offer a fully automated 4-skill assessment solution AI dates back as far as the 1950s The human brain provided the model for modern machine learning That which humans find easy, computers find difficult – and vice versa Elon Musk labelled AI ‘a fundamental risk to the existence of civilization’ Machine scoring is more reliable than human scoring I’ve utilized AI this morning!
Spot the odd one out?
Name of tool Developer Language learning/testing Write & Improve English Language iTutoring Learning Write & Improve +Class View English Language iTutoring Learning Write & Improve +Test Zone English Language iTutoring Testing Read & Improve (coming soon) English Language iTutoring Learning Duolingo Duolingo Learning / testing e-Rater ETS Testing Writing Mentor ETS Learning Language Muse Activity Palette ETS Learning / testing AuraLang AuraLang Learning BetterAccentTutor Better Accent Learning TriplePlayPlus Syracuse Language Systems Learning Test of English Language Learning Pearson Testing Intelligent Essay Assessor Pearson Testing IntelliMetric Vantage Learning Testing MyAccess! Vantage Learning Learning Project Essay Grade MI Learning / testing Summary table of identified commercially-available language learning and language testing tools. Gillings et al. 2018
Computers as ‘tutors or tools’?
Coding is the application of linguistic resource through a range
processes to generate meaning – often described as competences
Back to basics - Communication cycle
Coding is the application of linguistic resource through a range
processes to generate meaning – often described as competences
Back to basics - Communication cycle
Back to the beginning
29’086 measures barley 37 months. Kushim A clay tablet with an administrative text from the city of Uruk, c.3400–3000 BC. Probably
individual in history whose name is known to us! Y N Harari 2015
Let’s go back
Partial scripts Numerical partial script became the language of advancement As societies developed external codes required to cope with sociological demands to support larger collectives Full scripts Unto the era of computers….
Can computers think like humans?
H Simon and A Newell – Pittsburgh 1955. A thinking machine?
Can computers think like humans?
Alan Turing 1948 1st chess programme How to overcome Combinational Explosion? How to give intelligence to make good decisions? Turing developed rules to guide.
The birth of Classical AI
A problem defined, a set of programmed rules applied (Heuristics) Could plan complex
controlled environments Could deliver maximum efficiency and economy But classical AI couldn’t engage with it’s environment
Our world is a little more… chaotic
Enter Machine learning
System’s ability to learn for themselves from raw data (training datasets) System’s learn from first principles – from structure in data, and seeks potential solutions to problems
1960’s – Bayesian methods introduced for probabilistic inference 1980’s – back propagation 1990’s: Shift from knowledge to a data driven approach – analysis
data >1990s: Support Vector machines and Recurrent Neural Networks 2010>: ANN and Deep learning
Enter Machine learning
Machine learning: Algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions. The algorithm needs to be told how to make an accurate prediction
The Moravec Paradox
The things that our brains find difficult to cope with, that require a lot of conscious mental effort, like chess, were simple for AI. The things that our brains find easy to cope with, that require a little conscious mental effort, like making sense of what we see and hear, or movement, were very difficult for AI “We are prodigious Olympians in perceptual and motor areas… abstract thought though is a new trick.. We’ve not yet mastered it” (Moravec 1988)
How does ML work? Enter Artificial Neural Networks
You recognised a dog instantaneously, by the firing of choral assemblies of neural networks
Neural Networks consist of the following components
layers
biases between each layer, W and b
each hidden layer, σ.
Enter Artificial Neural Networks
Artificial Neural Networks
Is there a full stop? Is there a capital? Is it at start of para? Is there a subject ? Is there an
Is there a noun? SVOCA ? OC? VA? Sentence
sample Sentence sample Sentence Non sentence
Training data
Training data – each time we tell it what it’s looking at, it tweaks the connections to better recognise what it’s looking for. AI is now booming
AI ANN : taught then develops
10’s of 1000’s of simulations every second and chooses to do the best one
Enter Deep Learning
Solve intelligence. Use it to make the world a better place. (Mission statement – DeepMind)
Demis Hassabis - CEO
Entering a process (e.g. playing a game) through a ‘learning algorithm’ that changes millions of connections in a neural network to reinforce or stop an action to improve the desired outcome (not task-based algorithm) Uses Representation Learning – automatically discovers characteristics needed for feature detection or classification of raw data, that is then used to perform a task Deep learning: ML requires input – DL can learn by itself through learning algorhithm. E.g. Automatic light – ML accepts only ‘dark’, DL would learn ‘I can’t see’
Could a DL neural network system go beyond human understanding? AlphaGo played a completely unpredictable move – can come up with a new idea beyond the remit of human thought….
Let’s ‘Go’
In DL systems, the algorithm learns how to make accurate predictions through its own data processing (ML needs to be told).
AI Limitations
Can find patterns in, and learn from, data, but no real understanding of what those patterns actually mean, there is no meaningful conceptual thinking.
tricked With no real conceptual understanding of patterns – hardest challenge of all is ability that relies on exactly this - language Prof Al Khalili
human capacity
Recognise these? Chatbot NLP NLU ASR NLG AI SDS DMS
Coding is the application of linguistic resource through a range
processes to generate meaning – often described as competences
Communication cycle
AI in language - NLP
Automated Speech Recognition (ASR) Speech generation Text recognition Text generation (NLG) [Response driven] When was Elvis born?
AI in language – Speech Recognition
Limited until advent
Learning techniques
Collect waveforms (phonetic input) Fast Fourier Transform = spectogram Identifies resonances
Labels ‘Formants’ recognising phonemes, words and phrases Converts to text – ‘best fit hypothesis’
ASR Challenges – Who ate all the cake?
I think David ate all the delicious chocolate cake. Tonic / Keywords / Onset – Volume / Pitch / Length / Pausing Remarkable number of variables - immense amount of comparative data to be processed to arrive at correct hypothesis as to meaning beyond denotation. Yet any communication act is a combination of oral production and non- verbal cues, paralinguistics and contextual parameters.
AI in language – Speech Recognition
Formants – limited with 44 phonemes and syntactic training If only it were that easy:-) Requires a ‘Language model’
Automatic Speech Recognition (ASR)
Speech signal (audio) Decoding Orthographic representation
Language models Acoustic model Lexical data
Training data
Learns with more training data INPUT OUTPUT
Text recognition
Fails if can’t parse sentences – higher risk when rules based Can process sentence meaning (denotative) Tag sentence structure (syntactic) Parse tree - tag words with likely part of speech Phrase structure rules (e.g. parts of speech)
AI in language - NLP
Automated Speech Recognition (ASR) Speech generation Text recognition Text generation (NLG) [Response driven] When was Elvis born?
Natural Language Generation (text)
Fails if can’t access relevant semantic meaning – higher risk when rules based Produces sentence ‘parsed text’ related to meaning (denotative) Knowledge Graph generated (Google 70b+ facts end 2016) Exploits web of semantic information (entities linked through meaningful relationships) Codifying of language applied
Speech synthesis
elements
representation manipulable
from input (training) data
Putting the pieces together
So a computer can…
But can it have a meaningful conversation?
Spoken Dialogue Systems (SDS)
SDS use both speech and NLP technologies to enable extended human-machine conversation. Determine appropriate system response Commercially driven to achieve success in constrained conversation to achieve a specific scenario’s goal (Litman et al, 2016). Limited application in assessing interactive language.
SDS – Dialogue Management system
System ask ‘Do you live in a town or the countryside?’ System Ask: Which town do you live in? System ask: How far is the nearest town? System say: (not_understood)
SDS – Dialogue Management system
NLU – live
NLU – live - Countryside NLU – live
‘Finite state machine’ – predictable path of interaction – not spontaneous
Summary - AI in language - NLP
Automated Speech Recognition (ASR) Speech generation Text recognition Text generation (NLG) [Response driven] When was Elvis born?
Can AI simulate human interaction?
Chatbots – several programmes simultaneously analysing output, these generate wide range of hypothetical responses and choose that which is most likely to prolong dialogic exchange:
Heriot-Watt University Alana the bot Prof Oliver Lemon
Can AI simulate human interaction?
In communication there is a lot more going on than just words. Whilst AI can recognise complex patterns it cannot understand concepts. AI still very limited in terms of:
Let’s have a chat to a bot
https://www.masswerk.at/eliza/
Eliza the psychotherapist
60b+ messages processed https://www.pandorabots.com/mitsuku/
Chatbot - task
What went right? Why do you think it worked? What did not work so well? What do you think was the cause of the communication break-down?
In pairs, sharing a device, have a chat with Mitsuku (or alternative conversational chatbot) 1: Try a simple interaction 2: Try a more demanding dialogue
AI and language assessment (Productive skills)
Machine scores automatically generated
Utilise set criteria and dependent variables (e.g. repeat accuracy, length of
production, fluency, vocabulary, grammar and pronunciation)
Compared to reference scores (manually set)
ASR - automated and human correlation
spontaneous (Cucchiarini et al, 2010)
pronunciation’ (Müller et al, 2009)
AI scores – case studies
Pearson PTE ETS – TOEFL iBT
System Versant Speechrater Task example Read aloud Repeat sentence Short answer Opinion on familiar topic Speak based on reading (total 6 tasks) Scoring includes Pronunciation Fluency Vocabulary Sentence mastery Pronunciation Fluency Grammatical facility Topical coherence Idea progression (Multiple regression scoring) Correlation 0.84 - 0.92 0.73 Construct Psycholinguistic (Van Moere, 2012) Direct and immediate interaction (Butler et al, 2000) Predictive Ability to use core language in real time / use lexis to build phrases and clauses and articulate Contextualised and limited restriction – account for content, coherence and interactive (but task monologic)
Adapted from Litman et al, 2018 Only practice tests subject to further research
Automatic Speech Recognition (ASR) in assessment
accuracy
accuracy
reference acoustic model)
reading aloud or short free responses
interactional dialogue
Dialogue Management system – Finite state
At utterance level ‘States’ created (for example) Syntactic analysis
Semantic analysis
expected answer
Pragmatics
coherence Acoustic input
features
Alternative ‘State tracking’ –DMS gives probability of path Less ASR errors and can resolve ambiguities further in dialogue
Potential application of holistic scales including CEFR. (Shashidhar et al, 2015)
Spoken Dialogue Systems (SDS) in assessment (Finite state)
errors
misrecognised utterances
conversation
transactional dialogues
dialogue used
SDSs – designed to process speech from proficient users
right or wrong (as required from tutorial dialogue system technology)
language experts
learning)
The assessor will need to consider the extent to which their construct can accommodate [SDS’s] deviation from authentic dialogue (Litman et al, 2018)
Spoken Dialogue Systems (SDS)
Opportunities for spontaneous yet non-conversational speech, within constrained domain State tracking SDSs – overcome ASR difficulties during dialogue
Bachman & Palmer (2010): communicative competence model
Communicative Competence
Linguistic competence Socio-linguistic competence Discourse competence Strategic competence
Case study - Communicative competence
Conversational features in co-constructed dialogue
Communicative competence - features
Communicative competence
1: In groups of 3 or 4 you will be allocated one of the four competences.
assess these elements?
each element / the overall competence? 2: Cross-group into groups of 4, with one person covering each competence.
competences
Development
time
Speaking constructs assessed
AI and automated language assessment
2018 2010 Linguistic competence
? ? ?
Fully automated language assessment Initial criticism as only narrow constructs could be assessed
Construct or assessment engine?
Choose construct to assess, audit available technologies, and compensate for short- fallings with human intervention
Select available technologies, and align to construct they can cover, decide if compensation necessary To what extent can contained measures be used as indicators or predictors of overarching language proficiency?
Construct – considerations
There is a rethinking of what speaking constructs could be….
There is a long road ahead…
Role of individual agency – impact on identity in test taking experience ‘I deserve to engage with a human’
AI in language assessment - identity
To conclude - some predictions
high-tech IT companies
human
with latency, paucity of NVs or paralinguistics, digital interface engagement, mediating NLP shortcomings etc.)
machines score – but not actual interaction with the machine (to
differentiated CEFR levels for training datasets
course delivery
To conclude - some predictions
And in the long term….
adaptive assessment on an ongoing and formative basis – mediated through individual devices…
Trinity College London
English qualifications for real-world communication
Alex Thorp Lead Academic, Language (Europe) alex.thorp@trinitycollege.com