Entropy minimization in emergent languages
Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, Marco Baroni
Entropy minimization in emergent languages Eugene Kharitonov , Rahma - - PowerPoint PPT Presentation
Entropy minimization in emergent languages Eugene Kharitonov , Rahma Chaabouni, Diane Bouchacourt, Marco Baroni Setup: signalling game (Lewis, 1969) Two deterministic neural agents, Sender sends a discrete message (one- Sender and
Eugene Kharitonov, Rahma Chaabouni, Diane Bouchacourt, Marco Baroni
Sender and Receiver, solving a task collaboratively
message, Receiver performs an action
Sender’s input Receiver’s input
Receiver’s output
message
2
some task
depends on Receiver’s action
protocol Motivated by
communicate with humans (Mikolov et al., 2016)
itself (Hurford, 2014)
Sender’s input Receiver’s input
Receiver’s output
message
3
Suppose Receiver has only a part of the information required to perform a task, while Sender has all available information Two opposite scenarios of successful communication:
We measure complexity of the protocol by its entropy
4
Processing its input, Sender non-increases entropy Conditioning does not increase entropy Again, applying a function does not increase the entropy When task is solved,
equal to ground-truth l
Entropy of the messages is bounded between entropy of Sender’s inputs and the amount
5
6
Efficiency pressures are frequently observed in language and other biological communication systems (Ferrer i Cancho et al., 2013; Gibson et al., 2019)
al., 2018, 2019)
7
Would something similar happen when two agents are communicating with each other?
8
needs to perform a task
9
masked
1 0 0 1 1 0 1 1 1 0 0 [1 0 0 1 1 0 1 1]
10
B L O P 5 dimensions masked
masked
11
1 0 0 1 1 0 1 1 1 0 0 1 1 0 1 [1 0 0 1 1 0 1 1] B L I P 1 dimension masked
(00 … 99) (uniformly sampled from MNIST train data)
[class 96]
12
B E E P 100 classes
(00 … 99) (uniformly sampled from MNIST train data)
[class 0]
13
T A D A 4 classes
We experiments with:
(Schulman et al., 2015)
solving the task
14
15
Entropy of the messages How much information Receiver needs to perform the task Lower bound on the information required for solving the task Degenerate case of non- communication
16
Upper bound on the entropy: 8 bits
17
18
Upper bound
entropy: 10 bits
necessary
19
20
The entropy of the protocol consistently approaches the lower bound that still allows to solve a task
while still solving the task The level of discreteness of this protocol impacts the tightness of this approximation Discrete channel has useful properties:
21
Efficiency pressures arise in artificial discrete communication systems
Discrete protocols have useful properties
22
Agents wouldn’t develop complex languages (protocols) unless that is necessary
absolutely required
23