Computer applications of language technology (a) How can we apply - - PDF document

computer applications of language technology a
SMART_READER_LITE
LIVE PREVIEW

Computer applications of language technology (a) How can we apply - - PDF document

Computer applications of language technology (a) How can we apply models of the kind Getting Computers to Process shown so far in automatically Language II processing language? How is that related to current Human Communication 1


slide-1
SLIDE 1

1

17/02/09 Susen Rabold 1

Getting Computers to Process Language II

Human Communication 1 Lecture 15

17/02/09 Susen Rabold 2

Computer applications of language technology (a)

  • How can we apply models of the kind

shown so far in automatically processing language?

  • How is that related to current

engineering practice?

  • What can we learn from this about

humans ?

17/02/09 Susen Rabold 3

Computer applications of language technology (b)

Language-based computer applications are of growing importance both for

  • improving the effective use of information
  • broadening the base of computer literacy

Major problems will arise in exploiting background knowledge in the same way as humans do.

17/02/09 Susen Rabold 4

Database query (a)

  • Which road links Edinburgh to

Penicuik? might be represented as:

Edinburgh (y) Penicuik (z) road (x) Link (x,y.z) x

slide-2
SLIDE 2

2

17/02/09 Susen Rabold 5

Database query (b)

  • We can then investigate our model, which

could be a database of UK roads, to see if there is such an x. Possible problems:

  • syntactic coverage
  • semantic representations of e.g. plurals,

times

  • disambiguation
  • working out what an appropriate response is.

17/02/09 Susen Rabold 6

Machine translation (a)

  • One way of doing this is to have semantic

rules which map into the same DRSs from different languages. N → Hund with symbol “dog” V0 → bellen with symbol “bark” Det → ein with the same semantic rule as for “a”

  • So, ein Hund bellt will have the same

semantic representation as “a dog barks”.

17/02/09 Susen Rabold 7

Machine translation (b)

  • We can then define a routine to produce an English

sentence on the basis of a DRS.

  • Problems:

– determining a set of conditions to translate to – different languages seem to carve up the space of words differently

17/02/09 Susen Rabold 8

Speech understanding and synthesis

  • Add in rules for grouping sounds into

words.

  • Problems:

– phonetic ambiguity – which interacts multiplicatively with other forms – modelling the complex effects of speech production

slide-3
SLIDE 3

3

17/02/09 Susen Rabold 9

Limits (a)

  • To date, computer applications in processing

natural language work but . . .

  • in limited domains: simplifying problems in

interpretation, e.g. limiting ambiguity.

  • with restrictions on the kind of

language/speech used

– speech systems in which one must leave gaps between words – what looks or sounds reasonable to you may be rejected by the system

17/02/09 Susen Rabold 10

Limits (b)

  • Or to a limited extent:

– in document processing, one may try just to extract key information rather than understand the whole of a document.

17/02/09 Susen Rabold 11

An engineering solution? (a)

  • Many approaches to”language

engineering” adopt statistical methods. For example, to do machine translation:

  • Get a bi-lingual collection of texts (e.g.

the Canadian Hansard)

  • Compute the frequency with which pairs
  • f English and French words appear in

similar positions in the text

17/02/09 Susen Rabold 12

An engineering solution? (b)

  • To translate, examine the source text

and find the target text that is the best

  • fit. The system “learns” the

correspondences between English and German words. Various techniques can be used to improve the quality of the

  • utput.
slide-4
SLIDE 4

4

17/02/09 Susen Rabold 13

Views on statistical methods (a)

  • A statistical (or non-symbolic) approach

means that we don’t have to characterize the different kinds of knowledge we’ve identified in humans. More and more sophisticated statistical techniques are being applied to problems in language processing.

  • One drawback with the statistical method

from the perspective of cognitive science is that once you’ve derived your set of statistics it’s difficult to extract general rules from them.

17/02/09 Susen Rabold 14

Views on statistical methods (b)

  • “attendu correlates with expected with

factor 85%”

  • Machine translation uses statistical

models

  • http://babelfish.yahoo.com/?fr=bf-res
  • Hidden Markov Model, statistical model

used in NLP, can be considered simplest dynamic Bayesian network.

17/02/09 Susen Rabold 15

Symbolic or non-symbolic? (a)

Possible responses:

  • there are no general rules in this sense;

everything is probabilistic “Connectionism” and neural networks

  • ccupy this extreme, particularly if one

disputes the claim that there are mental representations.

17/02/09 Susen Rabold 16

Symbolic or non-symbolic? (b)

  • It’s all symbolic; what we see (or model) as statistical

behaviour is the result of complex interactions between different sources of knowledge not yet understood

  • Some aspects of the methodology of linguistics lead

towards this position.

  • or . . .
slide-5
SLIDE 5

5

17/02/09 Susen Rabold 17

symbolic?

  • It’s a mixture; some aspects of processing are

statistically based, others symbolically. A wishy-washy view or a golden mean?

  • It seems likely that computation at the level of

neurons is non-discrete; neurons fire more rapidly as their inputs excite them more.

  • On the other hand, aspects of linguistic

processing seem more discrete: we either hear a sound as, say, a “b”, or we don’t.

17/02/09 Susen Rabold 18

Hybrid Systems (a)

  • Using a combination of linguistic

knowledge and statistics helps:

  • one acquire a statistical model with

sparse training data (via more accurate smoothing)

  • estimate which features will be most

informative during the learning phase

17/02/09 Susen Rabold 19

Hybrid Systems (b)

From a cognitive science perspective, we have seen that we want these different levels of description. That is, we want both

  • explicit rules that capture some aspects of humans’

knowledge of language, e.g. intuitions about meaning, and

  • to be able to express information about the relative

frequency with which people use certain words, or linguistic constructions (a grammar only says what’s possible, not what’s frequent).

17/02/09 Susen Rabold 20

Summary

Today we have seen:

  • how to get computers to do part of the

job of processing language

  • difficulties that arise in this
  • applications in language technology
  • the debate between statistical and

symbolical approaches.