Verbal grammars for weather bulletins in isiXhosa and isiZulu - PowerPoint PPT Presentation

Verbal grammars for weather bulletins in isiXhosa and isiZulu Generation and similarity Zola Mahlaza zmahlaza@cs.uct.ac.za Department of Computer Science University of Cape Town September SAICSIT ’17 Supervisor: Dr. C. Maria Keet

Outline 2 ◮ Field of study : brief summary. ◮ Identified problem. ◮ Current solution. ◮ Proposed improved solution. ◮ Research questions. ◮ Methodology. ◮ Results. ◮ Conclusion and final remarks.

Background 3 ◮ Natural language processing. ◮ Natural language understanding. ◮ Natural language generation. ◮ Natural language texts from structured representations of data, information, or knowledge. Figure: An example of input and output of an NLG system (Source : Arria NLG plc n.d.)

Background 4 ◮ Met Office (S. Sripada et al. 2014). ◮ Online Trial ended 17 May 2016. ◮ Five-day weather forecast for 10,000 locations worldwide in under 2 minutes. ◮ Different climates & time zone changes. ◮ Based on Arria NLG engine. ◮ Swiss Federal Institute for Snow and Avalanche Research (Winkler, Kuhn, and Volk 2014). ◮ Avalanche warnings. ◮ German, French, Italian, and English. ◮ Catalogue-based system.

Background 5 Table: List of NLG systems that have been developed to produce weather forecasts System name Establishing literature Realisation method Languages Year WMO-based and NATURAL Gkatzia, Lemon, and Rieser 2016 SimpleNLG English 2016 CBR-METEO Adeyanju 2015 String manipulation English 2015 Winkler-Kuhn-Volk’s system Winkler, Kuhn, and Volk 2014 Catalogued phrases German,French,Italian,English 2014 Zhang-Wu-Gao-Zhao-Lv’s system H. Zhang et al. 2011 Not implemented Chinese 2011 pCRU Belz 2008 Statistical methods Possibly all 2007 SumTime-Mousam S. G. Sripada et al. 2002 “Grammar” English 2003 SumTime S. G. Sripada et al. 2002 “Grammar” English 2001 Mitkov’s system Mitkov 1991 (as cited by Sigurd et al. 1992) - - 2001 Autotext - - - 2000 MLWFA Yao, D. Zhang, and Wang 2000 Grammar English, German, Chinese 2000 Siren - - - 2000 Scribe - - - 1999 TREND Boyd 1998 FUF/SURGE English 1998 Multimeteo - - - 1998 ICWF Ruth and Peroutka 1993 Grammar English 1993 IGEN Rubinoff 1992 Grammar English 1992 Kerpedjiev’s system Kerpedjiev 1992 Grammar English 1992 Weathra Sigurd et al. 1992 Grammar English, Swedish 1992 FoG Bourbeau et al. 1990 MTT Models English, French 1990 MARWORDS Goldberg, Kittredge, and Polguere 1988 Grammar English, French 1988 RAREAS Kittredge, Polgu` ere, and Goldberg 1986 - English, French 1986 Glahn’s system Glahn 1970 Templates English 1970

Problem 6 In our examination of the current state and use of Nguni languages, we have observed that there is no fast and large scale producer, automated or otherwise, of weather summaries in said languages.

Currrent reporting 7 ◮ SABC TV station (SABC 1) daily report. ◮ IsiZulu/isiXhosa at 19h00 South African Standard Time (SAST). ◮ IsiNdebele/siSwati report at 17h30 SAST. ◮ Nguni language radio stations (e.g Umhlobo Wenene 1 , Ukhozi 2 , etc). Figure: SABC weather report (Source : SABCNewsOnline) 1 http://www.umhlobowenenefm.co.za/ 2 http://www.ukhozifm.co.za/

Possible solution and challenges 8 ◮ Four NLG systems. ◮ Languages are “verby” (Nurse 2008). ◮ Agglutinating morphology + concordial agreement system. ◮ zizakuhamba (they will walk/leave) → [zi][za][ku]hamb[a]. Figure: Bantu verb structure (Source : Keet and Khumalo 2016).

Possible solution and challenges 9 ◮ Templates are incompatible (Keet and Khumalo 2014;Keet Figure: Example of a database and Khumalo 2017). table with South African ◮ Grammars are solution for domestic bus schedules (Adapted realization. from Gyawali 2016, p.20). ◮ Nguni languages S40 : IsiXhosa The bus [bus number] S41, IsiZulu S42, siSwati S43, departing from [origin] reaches and isiNdebele S44 (Maho [destination] in [duration] . 1999). Figure: Example of template for describing the bus schedules (Source : Gyawali 2016, p.20).

Research questions 10 ◮ How grammatically similar are isiZulu verbs with their isiXhosa counterparts? ◮ Can a singular merged set of grammar rules be used to produce correct verbs for both languages?

Methodology 11 ◮ A corpus to determine the output text requirements (Dale and Reiter 2000). ◮ The weather corpus will be collected from the South African Weather Service (SAWS). ◮ Translated into isiXhosa by members of the School of African Languages and Literature at UCT. ◮ Incrementally develop grammar rules for isiZulu and isiXhosa through literature intensive approach. ◮ The evaluation of the quality of the rules will use an expertise-oriented approach (Rovai 2003, p.117 ; Ross 2010, p.483). ◮ IsiXhosa and isiZulu compared through verb rule parse trees and ‘language’ space using binary similarity measures.

Corpus development 12 Directed to Western Cape regional office ◮ South African Weather Service (SAWS) : No records. After further queries to Tshwane office ◮ SAWS : Forecast for first day of each month in 2015 (Jan 2015 - Dec 2015).

Corpus development 13 ◮ Data Cleaning (“The expected UVB sunburn index”). ◮ Randomly sampled 48 sentences for translation from English to isiXhosa. ◮ School of African Languages & Literature at UCT. “Lipholile kumkhwezo wonxweme apho kulindeleke izibhaxu zenkungu yakusasa ngaphaya kokoliyakuthi gqabagqaba ngamafu kwaye libeshushu okanye litshise kwaye libeneziphango ezithe saa emantla” ◮ 53 verbs, only 27 unique. ‘Verb’ means string not verb root. ◮ 22 indicative, 2 participial, 3 subjunctive. ◮ Near past, present, and near future. ◮ Simple, exclusive, and progressive.

CFG Development 14 ◮ Increment 0: Prefix ◮ Gathering preliminary rules. ◮ Verb generation, correctness classification, and elimination of incorrect verbs. ◮ Increment 1: Prefix + Object Concord + Verb Root + Suffix - Final Vowel ◮ Suffix addition, verb generation and correctness classification. ◮ Elimination of incorrect verbs, verb generation and correctness classification. ◮ Increment 2: Complete verbs ◮ Investigate missing features, add missing features (where necessary), add final vowel, correctness classification. ◮ Elimination of incorrect verbs, verb generation and correctness classification.

CFG Development 15 Indicative and Participial ◮ Verb → NPC 2 A pes OC VR S p ◮ Verb → NPC 0 A pes OC VR S np Figure: Context free grammar rules that generate isiXhosa past tense inductive, and participial verbs. Indicative and Participial ◮ Verb → NPC 0 A pes OC VR S np ◮ Verb → NPC 2 A pes OC VR S p Figure: Context free grammar rules that generate isiZulu past tense inductive, and participial verbs.

CFG IsiXhosa Quality 16 Table: Number of correct and incorrect words generated using the third increment isiXhosa grammar (indicative and participial mood). Correctness is divided into semantic and syntactic categories. Percentage correct Correct Incorrect Total Syntax 97.4% 38 1 39 Past Semantics 51.3% 20 19 39 Syntax 80.0% 28 7 35 Present Semantics 45.7% 16 19 35 Syntax 98.6% 72 1 73 Future Semantics 53.4% 39 34 73

CFG IsiZulu Quality 17 Table: Number of correct and incorrect words generated using the third increment isiZulu grammar (indicative and participial mood). Correctness is divided into semantic and syntactic categories. Percentage correct Correct Incorrect Total Syntax 97.2% 35 1 36 Past Semantics 47.2% 17 19 36 Syntax 88.9% 16 2 18 Present Semantics 55.6% 10 8 18 Syntax 98.6% 72 1 73 Future Semantics 53.4% 39 34 73

CFG Linguist Evaluation 18 ◮ 2 linguists (UCT & UKZN). ◮ 25 isiZulu and isiXhosa verbs from English-isiZulu dictionary (Doke et al. 1990). ◮ - zol - root, 5 pairs of subject and object concords are randomly selected. ◮ Generated 49400 strings using natural language toolkit (NLTK), and sampled 100. ◮ Packaged 99 in spreadsheet, and sent to linguists. ◮ Strings are not subjected to phonological conditioning. ◮ True/False for syntactic correctness, True/False for semantic correctness, and add a comment

CFG Linguist Evaluation 19 Table: Summary of the linguists’ semantic and syntactic correctness evaluation of the isiXhosa and isiZulu generated strings. Percentage correct Correct Incorrect Total Syntax 52% 51 48 IsiXhosa 99 Semantics 58% 57 42 Syntax 23% 16 57 71 isiZulu Semantics 25% 17 52 69

CFG Linguist Evaluation 20 ◮ Significant statistical association between syntactic correctness (two-tailed p=0.0001, Fisher’s exact test) and language. ◮ The same is true for semantic correctness and language (two-tailed p=0.0023, Fisher’s exact test). ◮ Verb phrases without semantic correctness annotation. ◮ Updated values show a strong statistically significant association between the syntactic correctness (two-tailed p < 0 . 0001, Fisher’s exact test) and language.

Similarity Questions and Methods 21 Asking ◮ How grammatically similar are isiZulu verbs with their isiXhosa counterparts? ◮ Can a singular merged set of grammar rules be used to produce correct verbs for both languages? Answer by ◮ Manual scanning ◮ Parse tree analysis ◮ Binary similarity measures

Verbal grammars for weather bulletins in isiXhosa and isiZulu - PowerPoint PPT Presentation

Verbal grammars for weather bulletins in isiXhosa and isiZulu Generation and similarity Zola Mahlaza zmahlaza@cs.uct.ac.za Department of Computer Science University of Cape Town September SAICSIT 17 Supervisor: Dr. C. Maria Keet Outline

How Weather Forecasting Works Extension Climate Learning Lab Forecasting Weather Weather

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

45 th Weather Squadron Space Weather Support to Launch Space Weather Workshop, 29 April 2016

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

VERBAL JUDO VERBAL JUDO The Gentle Art of Persuasion THE CONTENT OF THIS PRESENTATION IS

The Weather and Climate Enterprise in the United States April 2, 2012 Seoul, South Korea

lessons learned in communicating weather and climate uncertainty Jason Samenow, Capital Weather

Weather Effects (Group 1) Jared Headings, Ted Zhu, Ian Kirchner Weather in Games Audio and

April 9-13, 2018 Severe Weather Awareness Week 2018 What is Severe Weather Awareness Week?

Severe Weather Awareness Week April 8-12, 2019 Severe Weather Awareness Week 2019 What is

Winter Weather Safety Know Your Risk Take Action Be a Force of Nature Winter Weather Safety

Severe Weather Walls/Roofs Walls/Roofs SPFA Conference March 16 th 2008 a c 6 008 Value

Research s role in helping society cope with high impact weather events High Impact Weather

Spring Weather Safety Know Your Risk Take Action Be a Force of Nature Spring Weather Safety

CONCRETING 1 3/4/2015 2 3/4/2015 3 3/4/2015 ACI DEFINITION OF COLD WEATHER Cold Weather - A

Storytelli Storytelling ng in in Infosec Infosec (Draft aft of) a Pr Practical ctical Guide

2019 Ski Trip to Andalo, Italy Information Evening 25/11/19 Things to cover this evening

A Beginners Guide Dr. Andrew Robinson Part 1 Introduction The Quantum Jump 9:38 AM About

Outline Overview Part I: Emacs ESS tutorial, UseR! 2011 meeting part II: ESS Stephen Eglen

Who likes us, and what difference does it make? Evaluating process, impact and outcomes of a

LIGHT, ELECTRONS & QUANTUM MODEL UNIT 2 Day 2 LM15, 16 & 17 due W 8:45AM QUIZ: CLICKER

1 7 # d e b i a n - q u e b e c o n o f t c Montral, Canada I n t r

EDP 613 Fall 2020 Chapter 1 Slides Abhik Roy Abhik.Roy@mail.wvu.edu West Virginia University

Sambuz

Useful Links

Newsletter

Mail Us

Verbal grammars for weather bulletins in isiXhosa and isiZulu - PowerPoint PPT Presentation

Verbal grammars for weather bulletins in isiXhosa and isiZulu Generation and similarity Zola Mahlaza zmahlaza@cs.uct.ac.za Department of Computer Science University of Cape Town September SAICSIT 17 Supervisor: Dr. C. Maria Keet Outline

How Weather Forecasting Works Extension Climate Learning Lab Forecasting Weather Weather

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

45 th Weather Squadron Space Weather Support to Launch Space Weather Workshop, 29 April 2016

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

VERBAL JUDO VERBAL JUDO The Gentle Art of Persuasion THE CONTENT OF THIS PRESENTATION IS

The Weather and Climate Enterprise in the United States April 2, 2012 Seoul, South Korea

lessons learned in communicating weather and climate uncertainty Jason Samenow, Capital Weather

Weather Effects (Group 1) Jared Headings, Ted Zhu, Ian Kirchner Weather in Games Audio and

April 9-13, 2018 Severe Weather Awareness Week 2018 What is Severe Weather Awareness Week?

Severe Weather Awareness Week April 8-12, 2019 Severe Weather Awareness Week 2019 What is

Winter Weather Safety Know Your Risk Take Action Be a Force of Nature Winter Weather Safety

Severe Weather Walls/Roofs Walls/Roofs SPFA Conference March 16 th 2008 a c 6 008 Value

Research s role in helping society cope with high impact weather events High Impact Weather

Spring Weather Safety Know Your Risk Take Action Be a Force of Nature Spring Weather Safety

CONCRETING 1 3/4/2015 2 3/4/2015 3 3/4/2015 ACI DEFINITION OF COLD WEATHER Cold Weather - A

Storytelli Storytelling ng in in Infosec Infosec (Draft aft of) a Pr Practical ctical Guide

2019 Ski Trip to Andalo, Italy Information Evening 25/11/19 Things to cover this evening

A Beginners Guide Dr. Andrew Robinson Part 1 Introduction The Quantum Jump 9:38 AM About

Outline Overview Part I: Emacs ESS tutorial, UseR! 2011 meeting part II: ESS Stephen Eglen

Who likes us, and what difference does it make? Evaluating process, impact and outcomes of a

LIGHT, ELECTRONS &amp; QUANTUM MODEL UNIT 2 Day 2 LM15, 16 &amp; 17 due W 8:45AM QUIZ: CLICKER

1 7 # d e b i a n - q u e b e c o n o f t c Montral, Canada I n t r

EDP 613 Fall 2020 Chapter 1 Slides Abhik Roy Abhik.Roy@mail.wvu.edu West Virginia University

Sambuz

Useful Links

Newsletter

Mail Us

LIGHT, ELECTRONS & QUANTUM MODEL UNIT 2 Day 2 LM15, 16 & 17 due W 8:45AM QUIZ: CLICKER