Linguistics in a nutshell by hook or by crook Jeremy G. Kahn - - PowerPoint PPT Presentation

linguistics in a nutshell
SMART_READER_LITE
LIVE PREVIEW

Linguistics in a nutshell by hook or by crook Jeremy G. Kahn - - PowerPoint PPT Presentation

Big questions Survey of areas of linguistics Summary The lab Linguistics in a nutshell by hook or by crook Jeremy G. Kahn Signal, Speech & Language Interpretation Laboratory Department of Linguistics University of Washington 22 June


slide-1
SLIDE 1

Big questions Survey of areas of linguistics Summary The lab

Linguistics in a nutshell

by hook or by crook Jeremy G. Kahn

Signal, Speech & Language Interpretation Laboratory Department of Linguistics University of Washington

22 June 2008 / Workshop 2007

Kahn Linguistics brushup

slide-2
SLIDE 2

Big questions Survey of areas of linguistics Summary The lab

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-3
SLIDE 3

Big questions Survey of areas of linguistics Summary The lab

Business information

Linguistics introductions By necessity, incomplete Apologies

my personal speaking style guessing about level of preparation

Caveat: I’m a computational linguist Caveat: I have an engineering bias Goal: informality. Questions are good Thanks to Don Baumer (Linguistics) for letting me crib slides & examples

Kahn Linguistics brushup

slide-4
SLIDE 4

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-5
SLIDE 5

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

What is linguistics?

Scientific study of human language How is language organized? How is it used? General questions about Language (capital L) What do all languages have in common? How can we describe how Language (or languages) works? How can we describe how a language works?

Kahn Linguistics brushup

slide-6
SLIDE 6

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Language & communication

All communications have: mode or medium : speech, gesture, olfaction, etc semanticity : meaning carried pragmatic function : intention carried some also have: interchangeability (send *and* receive) cultural transmission : learned from other users arbitrariness : non-iconicity discreteness "compositionality" displacement : discuss things that aren’t here productivity : new ways to organize it Where do computer languages differ from human languages?

Kahn Linguistics brushup

slide-7
SLIDE 7

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

What makes language interesting?

Language is creative, but constrained “Seattle is rainy.” – well-formed * “rainy Seattle is.” – ill-formed “I like caffeinated drinks without bubbles.” * “Bubbles without drinks caffeinated like I” Not just word order: “pronk” could be an English word (in fact, it is) “przak” could not be (how do you know?)

Kahn Linguistics brushup

slide-8
SLIDE 8

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Constraint and creativity

Linguists like to say language is “rule-governed”. Statistically-minded engineers might quibble... Engineering way of looking at it (thanks Shannon): sender wants to have symbol for every idea recipient won’t have those symbols compositionality and productivity allows novelty and communication

Kahn Linguistics brushup

slide-9
SLIDE 9

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-10
SLIDE 10

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Language as a part of the human OS

Language: not literacy. major advantage over chimpanzees (e.g. displacement) we’ve got specialist wetware Competent language use No school required No explicit instruction required Most humans competent in one language before age 3 What do we mean when we say “competent”?

Kahn Linguistics brushup

slide-11
SLIDE 11

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Competence and Performance

Big idea in modern linguistics Competence : what a native user of a language knows. ability to produce & comprehend language system or knowledge (“grammar”) that supports that largely subconscious learned (first-language) without effort Performance : what language users do

  • ften fully competent

not always: speech errors, typos, “brain-o’s”

Kahn Linguistics brushup

slide-12
SLIDE 12

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

What’s so neat about competence?

Many modern linguists care about competence more than performance. Their view (Chomsky): your competence is a window on the underlying structure

  • f your grammar

your performance includes a bunch of messy wetware These (self-proclaimed “theoretical”) linguists are very very interested in trying to figure out what the OS is from the behavior of the code.

Kahn Linguistics brushup

slide-13
SLIDE 13

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Grammaticality and meaningfulness

“Meaningful” and “grammatical” not synonymous: Grammatical, but meaningless : ‘Colorless green ideas sleep furiously.’ — Noam Chomsky Ungrammatical, but meaningful : ‘Around the survivors, a perimeter create.’ — Yoda, Episode 2

Kahn Linguistics brushup

slide-14
SLIDE 14

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-15
SLIDE 15

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

What’s all this about grammar, then?

Descriptive grammar : an attempt to describe the acceptability judgments (or patterns of use/competence) of a speaker. Prescriptive grammar : explicit instructions on how one should write (or speak); the language police. Linguistics is not about descriptive grammar. We don’t tell you how you should. We try to describe how you do. Dogma: All human languages, stigmatized or not, are equally expressive.

Kahn Linguistics brushup

slide-16
SLIDE 16

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Linguistics and semi-supervised learning

Humans do it We get very little explicit labeling of our language data, yet we learn without instruction: what words and parts of words mean how to pronounce words we read how to understand sophisticated sentence constructions (“respectively”) and more. . . It’s not all hard-coded (“universal grammar”): patterns often language-specific

Kahn Linguistics brushup

slide-17
SLIDE 17

Big questions Survey of areas of linguistics Summary The lab Charter What linguists look at Linguistics’ role

Linguistics and semi-supervised learning

The corpora are out there : the web email (Enron emails!) newsgroups also speech corpora: radio television podcasts All mostly unlabeled but enormous Natural language problems: perfect for semi-supervised work.

Kahn Linguistics brushup

slide-18
SLIDE 18

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Overview of the different parts of language

Overview of the different parts of language (different parts of "grammar") Phonetics - how sounds are made and perceived Phonology - function and patterning of sounds Morphology - structure of words Syntax - analysis of sentence structure (word order) Semantics - meaning (words to meaning)

Kahn Linguistics brushup

slide-19
SLIDE 19

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Other areas of linguistic study

Other areas of linguistic study: Historical linguistics - language evolution and creation Pragmatics - what else is intended and performed Typology - language classification and differences Psycholinguistics - neurobiological basis for language Language acquisition Sociolinguistics - language’s influence on and indication of social status and behavior Writing systems - . . . a mess and more. . . We’ll not cover those here

Kahn Linguistics brushup

slide-20
SLIDE 20

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-21
SLIDE 21

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Phonetics

Phonetics: the study of linguistic speech sounds articulatory auditory (perceptual) acoustic Problems phonetics works with: no "spaces" between words: but we perceive them sounds are in a continuous (acoustic) space, but we chunk them into the (discrete) space of the language’s segments Tools phoneticians use: spectrogram readers human listening transcription system (usually the International Phonetic Alphabet, IPA) Why use IPA?

Kahn Linguistics brushup

slide-22
SLIDE 22

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Spelling is not pronunciation

Probably obvious to non-native English speakers Some languages have cleaner spelling-sound relationships (Spanish, Korean), but: “corazon” and “quesadilla” have the same initial sound Even a “clean” alphabetic language (e.g. Spanish) doesn’t have a 1:1 relationship between characters and phonetic segments: English is alphabetic, but with even noisier mappings “this” vs. “thought” English voicing of interdental (tongue-between-teeth) fricative: not represented in orthography ever. This is why we use IPA.

Kahn Linguistics brushup

slide-23
SLIDE 23

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

More on phonetics

Lots more available on phonetics: articulatory names (parts of the speech system) classification system learning the IPA “supra-segmentals”: articulations across multiple segments (e.g., pitch shapes) . . . and still not even touching the perceptual or acoustic domain

Kahn Linguistics brushup

slide-24
SLIDE 24

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Phonology

Phonology: Study of inventory of sounds in a language How sounds pattern together or contrast Minimal pair (research tool): ‘had’ vs. ‘hat’ : /t/ and /d/ are contrastive in English ‘steel’ vs. ‘stale’ vs. ‘stool‘ : /i/, /e/, /u/ are contrastive Contrastive sounds are phonemes: minimal units of sound

Kahn Linguistics brushup

slide-25
SLIDE 25

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Phonology (2)

Complementary distribution: two sounds appear in consistently different environments (never the same). [ph ] ‘pit’ [p ] ‘spit’ [ph], [p] not phonemically different: allophones of /p/ Glossing over much more in phonology. . .

Kahn Linguistics brushup

slide-26
SLIDE 26

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

An aside for the deaf

Sign languages (e.g. American Sign Language) have phonology as well. Handshapes and gestures are essentially phonemic Different sign languages have different choices about how to cluster handshapes: different phonemes I am not an expert, but I know it’s an open research area.

Kahn Linguistics brushup

slide-27
SLIDE 27

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-28
SLIDE 28

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Morphology

Morphology is: the study of words the rules (patterns) of word formation Word : a minimal free form. Can appear in isolation in multiple positions “The hunter pursued the bears.” is “-er” a word? No. (constrained after “hunt”) is “the hunter” a word? No. (not minimal) wait: what is “-er” then? Morpheme : the smallest part of a word carrying meaning Some morphemes can’t stand alone (affixes): (prefix, suffix, infix, circumfix)

Kahn Linguistics brushup

slide-29
SLIDE 29

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Syntax

Lexicon : a dictionary (form and category) Lexical category : (also “content word”). “Open class”, e.g. Noun (rabbit, bicycle) Verb (die, love, walk) Adjective (red, tall, frivolous) Adverb (often, very) Grammatical category (also “function word”). “Closed class”, e.g. Preposition (with, on, of, for) Conjunction (and, or, because) Determiner (our, the, this, many) Auxiliary (will, can, may)

Kahn Linguistics brushup

slide-30
SLIDE 30

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Syntax

Some words are ambiguous (especially open-class). Consider “comb”. How to tell what category it is? some examples: meaning : acting as a person/place thing? probably NOUN inflection : if you can add ‘-ed’ or ‘-ing’ to it? probably VERB distribution : if it appears after a degree word (e.g. “very”): probably ADJ (Computational linguistics: “part-of-speech tagging”) Morphology ties to syntax.

Kahn Linguistics brushup

slide-31
SLIDE 31

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Back to morphology

Nope, not done: Not just words in the lexicon: also morphemes: closed-class (function) morphemes : prepositions & articles (function words) inflectional morphemes: don’t change class

  • pen-class morphemes :

usually stand-alone (nouns, verbs, etc) also ‘-ly’, ‘-er’, ‘anti-’ derivational morpheme (may change class of stem)

Kahn Linguistics brushup

slide-32
SLIDE 32

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Word formation in English

Inflectional morphemes (no class change)

  • s

third person singular present

  • ed

past tense

  • ing

progressive

  • en

past participle

  • s

plural

  • ’s

possessive

  • er

comparative

  • est

superlative

Derivational affixes (class change)

input result happy [adj] + -ness happiness [n] beauty [n] + -full beautiful [adj] beautiful [adj] + -ly beautifully [adv] stable [adj] + -ize stabilize [v]

Kahn Linguistics brushup

slide-33
SLIDE 33

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Subtleties in morphology

Perverse cases, even in English: recursive-ish morphology:

input result beauty [n] + -ful + -ness beautifulness [n]

English has roughly one (rather rude, emphatic) infix:

input result

  • ****ing- + Massachusetts

("Massa-****ing-chusetts")

Comp ling task: stemming, morphological analysis (v. important in other languages, e.g. Czech)

Kahn Linguistics brushup

slide-34
SLIDE 34

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Back to syntax

Review: some words are ambiguous (“comb”): what to do? meaning inflection distribution Distribution could be a lot: Constituent : grammatical unit; part of larger unit sentence = noun phrase (NP) + verb phrase (VP) noun phrase (NP) = determiner + nourn noun is a (minimal) constituent Note recursion is possible.

Kahn Linguistics brushup

slide-35
SLIDE 35

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Phrases and ambiguity

How does phrase structure help with ambiguity?

S NP Det the N men VP V comb NP Det their N hair S NP Det the N men VP V share NP Det a N comb

Note that structure resolves lexical ambiguity: whether “comb” is noun or verb

Kahn Linguistics brushup

slide-36
SLIDE 36

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Syntax and structural ambiguity

Another kind of ambiguity: The woman shot the man with the gun. Who has the gun? (she shot him with it):

S NP The woman VP V shot NP Det the N man PP with the gun

Kahn Linguistics brushup

slide-37
SLIDE 37

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Syntax and structural ambiguity

Another kind of ambiguity: The woman shot the man with the gun. Who has the gun? (he had it):

S NP The woman VP V shot NP Det the N man PP with the gun

Kahn Linguistics brushup

slide-38
SLIDE 38

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Syntax and structural ambiguity

No ambiguity about the meaning of any word two different kinds of attachment for “with the gun” PP attachment? messy. POS? fairly easy.

Kahn Linguistics brushup

slide-39
SLIDE 39

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Outline

1

Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2

Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3

The lab

Kahn Linguistics brushup

slide-40
SLIDE 40

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Semantics

Two major areas within the study of language meaning: Lexical semantics : meaning of individual morphemes Compositional semantics : (or “phrasal semantics”): how meaning gets built up from pieces

Kahn Linguistics brushup

slide-41
SLIDE 41

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Lexical semantics

synonymy : “means (almost) the same thing”: (angry,sad), (vomit,puke) homonymy : “same form, unrelated meanings”: (pass[abstain],pass[succeed]) antonymy : “opposite meaning” hyponymy (hypernymy) : A is a hyponym of B (A is a special case of B; B is a hypernym of A; B is a generalization of A) poodle ; dog ; animal sprint ; run ; move

Kahn Linguistics brushup

slide-42
SLIDE 42

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Compositional semantics

sense (intension) : the meaning of a word/phrase as a function (e.g., “rabbit” is a function from items to boolean value) reference (extension) : which thing(s) in the world the function (word,phrase) picks out (the set of rabbits) Example: “Jeremy” “today’s linguistics tutor” Same reference (extension), different sense

Kahn Linguistics brushup

slide-43
SLIDE 43

Big questions Survey of areas of linguistics Summary The lab Phonetics & Phonology Morphology and syntax Semantics

Compositional semantics

Dealing with sentences. Sentences are boolean function on universe. “I like cheese” “I live in Seattle” Same reference (TRUE), different sense (different function).

Kahn Linguistics brushup

slide-44
SLIDE 44

Big questions Survey of areas of linguistics Summary The lab

Summarizing

Lots of areas of linguistic research. Most of these are becoming approachable computationally None are very easy But: these represent what linguists think is going on in natural language not necessarily what is needed: these classes may not relate to task at hand in computation

Kahn Linguistics brushup

slide-45
SLIDE 45

Big questions Survey of areas of linguistics Summary The lab

Emotion detection task, revisited

What can we add to the emotion detection task? Class together words (let’s use POS) sequence of classes might be interesting

Kahn Linguistics brushup

slide-46
SLIDE 46

Big questions Survey of areas of linguistics Summary The lab

The lab

1

Read the datafiles; extract text, write out datafile.tok

2

Invoke the Ratnaparkhi tagger on the tokenized text: datafile.maxHpos

3

read the .maxHpos file and pull out just the tags (clean up the punctuation so it doesn’t break BoosTexter). Create datafile.pos, which must end with space-comma

4

paste together datafile.pos with datafile.orig

5

rerun the emotion detection, but this time with the extra sequence information

Kahn Linguistics brushup

slide-47
SLIDE 47

Big questions Survey of areas of linguistics Summary The lab

The lab’s goals

Practice Perl Practice practical scripting (Perl is great, but not always the answer) Get comfortable with a new tool (the Ratnaparkhi tagger; very easy)

Kahn Linguistics brushup