grammatical inference an introduction
play

Grammatical inference: an introduction Colin de la Higuera - PowerPoint PPT Presentation

Grammatical inference: an introduction Colin de la Higuera University of Nantes Nantes @wikipedia 2 Colin de la Higuera, Nantes 2013 Acknowledgements Pieter Adriaans, Hasan Ibne Akram, Anne-Muriel Arigon, Leo Becerra-Bonache,


  1. Grammatical inference: an introduction Colin de la Higuera University of Nantes

  2. Nantes @wikipedia 2 Colin de la Higuera, Nantes 2013

  3. Acknowledgements � Pieter Adriaans, Hasan Ibne Akram, Anne-Muriel Arigon, Leo Becerra-Bonache, Cristina Bibire, Alex Clark, Rafael Carrasco, Paco Casacuberta, Pierre Dupont, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Jeffrey Heinz, Jean-Christophe Janodet, Satoshi Kobayachi, Laurent Miclet, Thierry Murgue, Tim Oates, Jose Oncina, Frédéric Tantini, Franck Thollard, Sicco Verwer, Enrique Vidal, Menno van Zaanen,... http://pagesperso.lina.univ-nantes.fr/~cdlh/ http://videolectures.net/colin_de_la_higuera/ 3 Colin de la Higuera, Nantes 2013

  4. Practical information � Grammatical Inference is module X9IT050 � 18 hours � http://pagesperso.lina.univ- nantes.fr/~cdlh/X9IT050.html � Exam: to be decided 4 Colin de la Higuera, Nantes 2013

  5. Some useful links Grammatical Inference Software � The Repository https://logiciels.lina.univ- nantes.fr/redmine/projects/gisr/wiki � Talks on http://videolectures.net � A book � Articles � Start here: http://pagesperso.lina.univ- nantes.fr/~cdlh/X9IT050.html 5 Colin de la Higuera, Nantes 2013

  6. What I plan to talk about 11/9/2013 An introduction to grammatical inference. About what learning a 1. language means, how we can measure success 18/9/2013 An introduction to grammatical inference. A motivating example 2. 25/9/2013 Learning: identifying or approximating? 3. 2/10/2013 Learning from text 4. 9/10/2013 Learning from text: the window languages 5. 16/10/2013 Learning from an informant: the RPNI algorithm and variants 6. 23/10/2013 Learning distributions: why? How should we measure success? 7. About distances between distributions 6/11/2013 Learning distributions: learning the weights given a structure. 8. EM, Gibbs sampling and the spectral methods 13/11/2013 Learning distributions: state merging techniques 9. 20/11/2013 Active learning 1 About active learning 10. 27/11/2013 Active learning 2 The MAT algorithm 11. 4/12/2013 Learning transducers 12. 11/12/2013 Learning probabilistic transducers 13. 18/12/2013 Exam 6 14. Colin de la Higuera, Nantes 2013

  7. Outline (of this first talk) What is grammatical inference about? 1. Why is it a difficult task? 2. Why is it a useful task? 3. Validation issues 4. Some criteria 5. 7 Colin de la Higuera, Nantes 2013

  8. 1 Grammatical inference is about learning a grammar given information about a language � Information is strings, trees or graphs � Information can be (typically) � Text: only positive information � Informant: labelled data � Actively sought (query learning, teaching) Above lists are not limitative 8 Colin de la Higuera, Nantes 2013

  9. The functions/goals � Languages and grammars from the Chomsky hierarchy � Probabilistic automata and context-free grammars � Hidden Markov Models � Patterns � Transducers 9 Colin de la Higuera, Nantes 2013

  10. The Chomsky hierarchy Recursively enumerable languages Context sensitive languages Regular Context-free languages languages 10 Colin de la Higuera, Nantes 2013

  11. The Chomsky hierarchy revisited � Regular languages � Recognized by DFA, NFA � Generated by regular grammars � Described by regular expressions � Context-free languages � Generated by CF grammars � Recognized by Stack automata � Context-sensitive languages � CS grammars (parsing is not in P) � Turing machines � Parsing is undecidable 11 Colin de la Higuera, Nantes 2013

  12. Other formalisms � Topological formalisms � Semilinear languages � Hyperplanes � Balls of strings 12 Colin de la Higuera, Nantes 2013

  13. Distributions of strings � A probabilistic automaton defines a distribution over the strings 13 Colin de la Higuera, Nantes 2013

  14. Fuzzy automata � An automaton will say that string w belongs to the language with probability p � The difference with the probabilistic automata is that � The total sum of probabilities may be different than 1 (may even be infinite) � The fuzzy automaton cannot be used as a generator of strings 14 Colin de la Higuera, Nantes 2013

  15. The data: examples of strings A string in Gaelic and its translation to English: � Tha thu cho duaichnidh ri èarr àirde de a ’ coisich deas damh � You are as ugly as the north end of a southward traveling ox 15 Colin de la Higuera, Nantes 2013

  16. http://www.flickr.com/photos/popfossa/3992549630/ Time series pose the problem of the alphabet: • An infinite alphabet? • Discretizing? • An ordered alphabet 16 Colin de la Higuera, Nantes 2013

  17. GIORGIO BERNARDI, REGINA GOURSOT, EDDA RAYKO, RENÉ GOURSOT, BAYA CHERIF-ZAHAR, AND ROBERTA MELIS http://www.scopenvironment.org/downloadpubs/scope44/ chapter05.html 17 Colin de la Higuera, Nantes 2013

  18. >A BAC=41M14 LIBRARY=CITB_978_SKB AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAA TGCTATTGCCATCTTTATTTCA GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAA TGAACAGCCAAG TATTCACTATCAAATTTGAGGAA TAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT 18 Colin de la Higuera, Nantes 2013

  19. http://bandelestudio.com/tutoriel-mao-sur- la-creation-musicale/ 19 Colin de la Higuera, Nantes 2013

  20. http://fr.wikipedia.org/wiki/Philippe_VI_de_France 20 Colin de la Higuera, Nantes 2013

  21. 21 Colin de la Higuera, Nantes 2013

  22. <book> <part> <chapter> <sect1/> <sect1> <orderedlist numeration="arabic"> <listitem/> <f:fragbody/> </orderedlist> </sect1> </chapter> </part> </book> 22 Colin de la Higuera, Nantes 2013

  23. <?xml version="1.0"?> <?xml-stylesheet href="carmen.xsl" type="text/xsl"?> <?cocoon-process type="xslt"?> <!DOCTYPE pagina [ <!ELEMENT pagina (titulus?, poema)> <!ELEMENT titulus (#PCDATA)> <!ELEMENT auctor (praenomen, cognomen, nomen)> <!ELEMENT praenomen (#PCDATA)> <!ELEMENT nomen (#PCDATA)> <!ELEMENT cognomen (#PCDATA)> <!ELEMENT poema (versus+)> <!ELEMENT versus (#PCDATA)> ]> <pagina> <titulus>Catullus II</titulus> <auctor> <praenomen>Gaius</praenomen> <nomen>Valerius</nomen> <cognomen>Catullus</cognomen> </auctor> 23 Colin de la Higuera, Nantes 2013

  24. 24 Colin de la Higuera, Nantes 2013

  25. And also � Business processes � Bird songs � Images (contours and shapes) � Robot moves � Web services � Malware � … 25 Colin de la Higuera, Nantes 2013

  26. 2 What does learning mean? � Suppose we write a program that can learn grammars … are we done? � A first question is: « why bother? » � If my programme works, why do something more about it? � Why should we do something when other researchers in Machine Learning are not? 26 Colin de la Higuera, Nantes 2013

  27. Motivating reflection #1 � Is 17 a random number? � Is 0110110110110101011000111101 a random sequence? (Is grammar G the correct grammar for a given sample S ?) 27 Colin de la Higuera, Nantes 2013

  28. Motivating reflection #2 � In the case of languages, learning is an ongoing process � Is there a moment where we can say we have learnt a language? 28 Colin de la Higuera, Nantes 2013

  29. Motivating reflection #3 � Statement “ I have learnt ” does not make sense � Statement “ I am learning ” makes sense � At least when learning over infinite spaces 29 Colin de la Higuera, Nantes 2013

  30. What usually is called “ having learnt ” � That the grammar / automaton is the smallest, best (re a score) � Combinatorial characterisation � That some optimisation problem has been solved � That the “ learning ” algorithm has converged (EM) 30 Colin de la Higuera, Nantes 2013

  31. What is not said � That having solved some complex combinatorial question we have an Occam, Compression, MDL, Kolmogorov complexity like argument which gives us some guarantee with respect to the future � Computational learning theory has got such results 31 Colin de la Higuera, Nantes 2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend