Language and Stats 11-(7/6)61 Introduction Objectives Logistics - PowerPoint PPT Presentation

Language and Stats 11-(7/6)61 Introduction Objectives Logistics Statistical Language Modeling (SLM); Computational Linguistics (CL) Bhiksha Raj 11-761 1

11-761 2

Language and Statistics • Iozmne pqmnzg habfbngyeydh shahmw • Language or not? 11-761 3

Language and Statistics • Iozmne pqmnzg habfbngyeydh shahmw • Language or not? • pair none fair happy happy happy but but but brave brave brave the the the the deserves • Language or not? • happy happy happy pair none but the brave none but the brave none but the brave deserves the fair • Language or not? 11-761 4

Language and Statistics • Iozmne pqmnzg habfbngyeydh shahmw • Language or not? • pair none fair happy happy happy but but but brave brave brave the the the the deserves • Language or not? • happy happy happy pair none but the brave none but the brave none but the brave deserves the fair • Language or not? 11-761 5

Language and Statistics • Composed of mutually agreed upon units • pair happy none fair deserves but the • In a mutually agreed upon arrangement • happy happy happy pair none but the brave none but the brave none but the brave deserves the fair 11-761 6

The linguistic point of view • Language is the outcome of a complex process of lexical semiosis to communicate information • Requiring conceptualization, planning, formation and delivery • Based on a set of implicitly agreed upon units and rules of combination • Phonological, morphological and syntactic rules • Adequately conveying semantics requires following rules • Deep complex theories dating back to Plato.. • Key point: absolutely not random! • Random gobbledygook doesn’t convey any useful meaning 11-761 7

The linguistic point of view • Language is the outcome of a complex process of lexical semiosis to communicate information • Requiring conceptualization, planning, formation and delivery • Based on a set of implicitly agreed upon units and rules of combination • Phonological, morphological and syntactic rules • Adequately conveying semantics requires following rules • Deep complex theories dating back to Plato.. • Key point: absolutely not random! • Random gobbledygook doesn’t convey any useful meaning 11-761 8

“Mutually agreed upon”? • When a fox is in the bottle where the tweetle beetles battle with their paddles in a puddle on a noodle-eating poodle, THIS is what they call…a tweetle beetle noodle poodle bottled paddled muddled duddled fuddled wuddled fox in socks, sir 11-761 9

“Mutually agreed upon”? • When a fox is in the bottle where the tweetle beetles battle with their paddles in a puddle on a noodle-eating poodle, THIS is what they call…a tweetle beetle noodle poodle bottled paddled muddled duddled fuddled wuddled fox in socks, sir • ’Twas brillig, and the slithy toves Did gyre and gimble in the wabe: All mimsy were the borogoves, And the mome raths outgrabe. 11-761 10

Rules? • It’s like déjà vu all over again. • We made too many wrong mistakes. • I never said most of the things I said. • The future ain’t what it used to be. • Bill Dickey is learning me his experience. Who? • • “How do you like them apples?” • "Soylent Green is people!“ • "It's a fool looks for logic in the chambers of the human heart." • Slim said, "You hadda, George. I swear you hadda. Come on with me." He led George into the entrance of the trail and up toward the highway. Curley and Carlson looked after them. And Carlson said, "Now what the hell ya suppose is eatin' them two guys?“ 11-761 11

Language and Statistics • Why are these understandable? • Bill Dickey is learning me his experience • It's a fool looks for logic • What’s is eating them two guys • They are built on common usages • Statistically plausible • Statistical approach: The “acceptability” of a sequence of words is related to how frequently it is used • Or how statistically plausible it is 11-761 14

Statistical approach • Based entirely on frequency of occurrence • “Acceptable” word sequences will occur • “Unacceptable” ones wont • Actually – predicted frequency of occurrence • Not just counting from whatever we have already observed • Will require predicting probability of word sequences we have never encountered • Some sequences we have never seen are nevertheless much more likely to be expressed in a valid sentence than other sequences we have never seen 11-761 15

The problem with the statistical approach char O, o[]; main(l) {for(;~l;O||puts(o)) O=(O[o]=~(l=getchar())?4<(4^l>>5)?l:46:0)?-~O & printf("%02x ",l)*5:!O;} • Will a statistical model know with certainty if the above is valid code? 11-761 16

The linguist’s objection • The statistical approach treats language as a random process • Language is not random • A blind statistical approach ignores agency : • Human (or animal) language treated no differently from other patterned sequences of symbolic units • Is the sequence of sounds your car produces really language? • Language has an agent • Is generally the outcome of a deliberate act of communication • With an entire sequence of conceptualization, composition and communication • Agents intend to communicate • The rules of language affect what unseen word sequences are likely 11-761 17

In this course • We take the perspective that the statistical framework is more appropriate • Can never explicitly catalogue all the rules of language • Particularly when they change all the time • Not utilizing the prescriptive theory of linguistics • Frequency/plausibility of usage is representative of the rules of the language • Statistical characterization of language • Related to descriptive theory of linguistics • But the framework may be informed by linguistics or linguistic intuition • Required in particular to predict occurrences/behaviors of previously unseen patterns 11-761 18

The fiction we maintain • Language comes from a probabilistic source.. • Which randomly produces the text we see • We will focus on written language • We will concede agency • The source is trying to convey a message, not just to produce text • But we will often ignore it(!) 11-761 19

The fiction we maintain • To generate a text, the source randomly chooses a “hidden” message ℎ • The concept to be conveyed • It also randomly produces a “surface form” to convey the message ℎ • The accessible form • Words, sentences, paragraphs, documents.. • We only get to observe the surface form • This is what we must work with • To try to decipher inner message ℎ • Or just to learn all about valid surface forms • Course objectives: Learn all about statistical mechanisms to achieve the above.. 11-761 20

Course Goals • Teaching statistical foundation and techniques for language technologies • Plugging gaping holes in LTI/CS grad student education in probability, statistics and information theory. • “This course is about how to convert linguistic intuition and understanding of language into statistical models”. • About how to developed statistically sound methodology, but informed by what we know of the domain of language.” 11-761 21

Course philosophy • Socratic Method • Based on discussion and enagagement • Participation strongly encouraged (pls state your name) • Highly interactive • Highly adaptable • based on how fast we move • Lots of Probability, Statistics, Information theory • not in the abstract, but rather as the need arises • Lectures emphasize intuition, not rigor or detail • background reading will have rigor & detail • Will be done partially using slides, and partially on the board 11-761 22

Course Prerequisites & Mechanics • You need to be able to program, from scratch. • Largest program is O(100) lines • You need to be comfortable with probabilities • Can you derive Bayes equation in your sleep? • 11661 (masters level): no final project • Hand in assignments via Blackboard • Vigorous enforcement of collaboration & disclosure policy 11-761 23

Language and Stats 11-(7/6)61 Introduction Objectives Logistics - PowerPoint PPT Presentation

Language and Stats 11-(7/6)61 Introduction Objectives Logistics Statistical Language Modeling (SLM); Computational Linguistics (CL) Bhiksha Raj 11-761 1 11-761 2 Language and Statistics Iozmne pqmnzg habfbngyeydh shahmw Language or

Integrated Data at Stats NZ Stats NZ Stats NZ is the public service department of New

Any-Code Completion public static Path[] stat2Paths(FileStatus[] stats) { if (stats == null)

Results Presentation Source: Stats SA: Mid-year population estimates 2018 Stats SA signed an MoU

Carmel OConnor Prepayment Meter Stats Keypad Meter Stats Total installed up to 16 October

Census of Commercial Agriculture, 2017 Presented by Risenga Maluleke THE MANDATE OF STATS SA

Objectives Objectives Objectives Objectives Learning Learning Learning Learning

CME/STATS 195 Lecture 1: Intro to R Evan Rosenman April 2, 2019 Contents Course Objectives

Outline Language learning Computers Computers Computers Topic 6: CALL Topic 6: CALL Topic 6:

Hypertext Markup Language Introduction to Web Design Hypertext Markup Language Introduction to

CME/STATS 195 CME/STATS 195 Lecture 3: Importing and transforming data Lecture 3: Importing and

CME/STATS 195 CME/STATS 195 Lecture 6: Data Modeling and Linear Lecture 6: Data Modeling and

CME/STATS 195 CME/STATS 195 Lecture 7: Hypothesis Testing and Lecture 7: Hypothesis Testing and

CME/STATS 195 CME/STATS 195 Lecture 8: Hypothesis Testing and Lecture 8: Hypothesis Testing and

CME/STATS 195 CME/STATS 195 Lecture 2: Programming and Lecture 2: Programming and Communicating

CME/STATS 195 CME/STATS 195 Lecture 4: Visualizing data Lecture 4: Visualizing data Evan

CME/STATS 195 CME/STATS 195 Lecture 5: Exploratory Data Analysis Lecture 5: Exploratory Data

Object Oriented Programming in C++ Destructors Based on

CS615 - Aspects of System Administration HTTPS, Monitoring Department of Computer Science

Outline SSL/TLS DNSSEC CSci 5271 Introduction to Computer Security Announcements intermission

BetterCrypto - three years in.. TROOPERS16, Heidelberg, DE | 2016-03-17 Aaron Zauner | @a_z_e_t |

Constructors and Destructors Invoking Mechanisms Advantage of OOP Multiple Constructors

Downgrade Resilience in Key Exchange markulf kohlweiss joint work with: k. bhargavan, c.

CIS 6930 - Cellular and Mobile Network Security: Classical Telephony Security Professor Patrick

Computer and Information Security Fall 2019 Computer Security Overview Tyler Bletsch Duke

Sambuz

Useful Links

Newsletter

Mail Us