Towards a Computational History of the ACL: 1980–2008
Ashton Anderson, Dan McFarland, Dan Jurafsky Stanford University
1
Towards a Computational History of the ACL: 19802008 Ashton - - PowerPoint PPT Presentation
Towards a Computational History of the ACL: 19802008 Ashton Anderson, Dan McFarland, Dan Jurafsky Stanford University 1 Intro + Motivation Simple data-driven methodology for computational history of science What are the natural
1
2
3
T.L. Griffiths and M. Steyvers. Finding scientific topics. PNAS 2004 David Hall, Daniel Jurafsky, and Christopher D. Manning. Studying the history of ideas using topic models. EMNLP 2008
history through large scale text mining. CIKM 2011.
4
5
6
7
Topic 1 Topic 2 Topic 3 Topic 4
. . . . . .
8
ACL anthology
Threshold ( > 0.1)
9
10
Topic 1 Topic 2 Topic 3 Topic 4
. . . . . .
11
12
13
14
Topic 1 Topic 2 Topic 3 Topic 4
. . . . . .
Topic 1 Topic 2 Topic 3 Topic 4
15
1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
16
1980 1993
17
18
19
20
21
22
— Topics only need to be similar in how people move in and out of them
— Not necessarily similar in content
23
Topic 1 Topic 2 Topic 3 Topic 4
. . . . . .
Topic 1 Topic 2 Topic 3 Topic 4
First compute how people moved in and out of all topics in adjacent time windows:
24
Then, a flow profile for topic i is the concatenation of the ith row and ith column of each matrix:
1980
1983-85 1981
1984-86 1982
1985-87 1983
1986-88
25
Using these flow profiles we can easily compute similarity between topics, and thus group topics into clusters
Our optimal cluster solution groups the 73 topics into 9 clusters:
26
1980–83 — 1984–88 1986–88 — 1989–91 1989–91 — 1992–94
Finally, we define flow between clusters to be the average flow between topics in those clusters
27
2002–04 — 2005–07 1992–94 — 1995–98
28
29
30
31
32