Sign Clustering and Topic Extraction in Proto-Elamite Logan Born 1 - PowerPoint PPT Presentation

Sign Clustering and Topic Extraction in Proto-Elamite Logan Born 1 Kate Kelley 2 Nishant Kambhatla 1 Carolyn Chen 1 Anoop Sarkar 1 1 Natural Language Laboratory 2 Department of Classical, Near School of Computing Science Eastern, and Religious Studies Simon Fraser University University of British Columbia 7 June 2019 1 / 37

Outline Introduction to Proto-Elamite Experiments Sign Clustering n -Gram Frequency LDA Topic Modeling Summary References 2 / 37

Introduction 3 / 37

Proto-Elamite Overview 4 / 37

Proto-Elamite Overview &P008016 = MDP 06, 217 #atf: lang qpc @tablet @obverse 1. |M218+M218| , # header 2. M056 ∼ f M288 , 1(N14) 3(N01) 3. |M054+M384 ∼ i+M054 ∼ i| M365 , 5(N01) 4. M111 ∼ e , 4(N14) 1(N01) 3(N39B) 5. M365 , 1(N14) 3(N01) 6. M075 ∼ g , 1(N14) 3(N01) 7. M387 ∼ l M348 , 1(N14) 3(N01) 5 / 37

Proto-Elamite Overview Proto-Elamite Proto-Cuneiform N08A N01 N14 N34 N48 N45 N50 6 / 37

Proto-Elamite Overview 7 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign ◮ Average tablet length is 27 signs (10 non-numeric) 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign ◮ Average tablet length is 27 signs (10 non-numeric) ◮ 1623 sign types 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign ◮ Average tablet length is 27 signs (10 non-numeric) ◮ 1623 sign types ◮ 49 numeric 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign ◮ Average tablet length is 27 signs (10 non-numeric) ◮ 1623 sign types ◮ 49 numeric ◮ 287 basic non-numeric 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign ◮ Average tablet length is 27 signs (10 non-numeric) ◮ 1623 sign types ◮ 49 numeric ◮ 287 basic non-numeric ◮ 1087 variants 8 / 37

Proto-Elamite Data ◮ Corpus transcribed by CDLI ◮ 1399 texts containing ≥ 1 readable non-numeric sign ◮ Average tablet length is 27 signs (10 non-numeric) ◮ 1623 sign types ◮ 49 numeric ◮ 287 basic non-numeric ◮ 1087 variants ◮ 249 complex graphemes 8 / 37

Data Exploration in Proto-Elamite ◮ Goal: Extract information to assist human decipherment experts 9 / 37

Data Exploration in Proto-Elamite ◮ Goal: Extract information to assist human decipherment experts ◮ Hierarchical clustering of signs 9 / 37

Data Exploration in Proto-Elamite ◮ Goal: Extract information to assist human decipherment experts ◮ Hierarchical clustering of signs ◮ n -gram frequencies 9 / 37

Data Exploration in Proto-Elamite ◮ Goal: Extract information to assist human decipherment experts ◮ Hierarchical clustering of signs ◮ n -gram frequencies ◮ LDA topic modelling 9 / 37

Contributions ◮ Rediscover results from manual investigation of the corpus 10 / 37

Contributions ◮ Rediscover results from manual investigation of the corpus ◮ Highlight novel patterns to inform future decipherment attempts 10 / 37

Contributions ◮ Rediscover results from manual investigation of the corpus ◮ Highlight novel patterns to inform future decipherment attempts ◮ Provide code for other groups to work with proto-Elamite 10 / 37

Sign Clustering 11 / 37

Sign Clustering Methodology Goal: ◮ Group signs with similar distributions. 12 / 37

Sign Clustering Methodology Goal: ◮ Group signs with similar distributions. Three different clustering techniques: 12 / 37

Sign Clustering Methodology Goal: ◮ Group signs with similar distributions. Three different clustering techniques: ◮ Co-occurrence vectors (left and right neighbors) 12 / 37

Sign Clustering Methodology Goal: ◮ Group signs with similar distributions. Three different clustering techniques: ◮ Co-occurrence vectors (left and right neighbors) ◮ Hidden Markov Model (HMM) emission probabilities 12 / 37

Sign Clustering Methodology Goal: ◮ Group signs with similar distributions. Three different clustering techniques: ◮ Co-occurrence vectors (left and right neighbors) ◮ Hidden Markov Model (HMM) emission probabilities ◮ Brown clustering 12 / 37

Sign Clustering Methodology Goal: ◮ Group signs with similar distributions. Three different clustering techniques: ◮ Co-occurrence vectors (left and right neighbors) ◮ Hidden Markov Model (HMM) emission probabilities ◮ Brown clustering Reduce impact of noise by finding common groupings across all three techniques. 12 / 37

Sign Clustering Results Rediscover results from manual work: ◮ Groups variants believed to have similar/identical function 13 / 37

Sign Clustering Results Rediscover results from manual work: ◮ Groups “syllabic” signs (Dahl 2019, Desset 2016, Meriggi 1971) Neighbor HMM Brown 13 / 37

Sign Clustering Results Novel grouping: signs resembling numerals Neighbor HMM Brown 14 / 37

Sign Clustering Results Novel grouping: signs resembling numerals or written with rounded stylus. Neighbor HMM Brown 14 / 37

n -Gram Frequency 15 / 37

n -Gram Frequency Methodology Goal: ◮ Identify important (i.e. frequently repeated) signs and phrases. 16 / 37

n -Gram Frequency Methodology Goal: ◮ Identify important (i.e. frequently repeated) signs and phrases. ◮ See signs in wider context. 16 / 37

n -Gram Frequency Methodology Goal: ◮ Identify important (i.e. frequently repeated) signs and phrases. ◮ See signs in wider context. Did not count n -grams containing numeric signs. ◮ Want to focus on undeciphered signs. 16 / 37

n -Gram Frequency Methodology Goal: ◮ Identify important (i.e. frequently repeated) signs and phrases. ◮ See signs in wider context. Did not count n -grams containing numeric signs. ◮ Want to focus on undeciphered signs. ◮ Do not want n -grams spanning multiple entries. 16 / 37

n -Gram Frequency Results Can group n -grams with low edit distance: M305 M388 M240 M097 ∼ h M004 M218 M305 M388 M146 M097 ∼ h M004 M218 M305 M388 M347 M097 ∼ h M004 M218 17 / 37

n -Gram Frequency Results Can group n -grams with low edit distance: M305 M388 M240 M097 ∼ h M004 M218 M305 M388 M146 M097 ∼ h M004 M218 M305 M388 M347 M097 ∼ h M004 M218 Highlighted signs may... ◮ Qualify M388? 17 / 37

n -Gram Frequency Results Can group n -grams with low edit distance: M305 M388 M240 M097 ∼ h M004 M218 M305 M388 M146 M097 ∼ h M004 M218 M305 M388 M347 M097 ∼ h M004 M218 Highlighted signs may... ◮ Qualify M388? ◮ Identifying specific classes of individual 17 / 37

n -Gram Frequency Results Can group n -grams with low edit distance: M305 M388 M240 M097 ∼ h M004 M218 M305 M388 M146 M097 ∼ h M004 M218 M305 M388 M347 M097 ∼ h M004 M218 Highlighted signs may... ◮ Qualify M388? ◮ Identifying specific classes of individual ◮ Form series of names built on M097 ∼ h M004 M218? 17 / 37

Sign Clustering and Topic Extraction in Proto-Elamite Logan Born 1 - PowerPoint PPT Presentation

Sign Clustering and Topic Extraction in Proto-Elamite Logan Born 1 Kate Kelley 2 Nishant Kambhatla 1 Carolyn Chen 1 Anoop Sarkar 1 1 Natural Language Laboratory 2 Department of Classical, Near School of Computing Science Eastern, and Religious

Ziggurat and Elamite Sadaf Yahyai Shadi Mahmoodi ELAMITE The Elamite citizens a nation

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

1 CEMUSA PROTO- SWC 6TH AVENUE & WAVERLY PLACE DATE: 01/08/18 164 WEST 79TH STREET, SUITE

Outline 1. Overview 2. Status of the proto-DPC 3. From a proto-DPC to a consortium DPC 1 / 17

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 CEMUSA PROTO- DATE: 03/11/19 164 WEST 79TH STREET, SUITE 5B NWC COLUMBUS AVENUE & WEST

Decentralized Consensus Proto cols 1 Goals of the lecture Decentralized Consensus

Web Information Retrieval Lecture 15 Clustering Todays Topic: Clustering Document

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Disclosures Current Cervical Spine Clearance Protocols in Level I Trauma Centers in the United

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Matings and Thurston obstruction in an early stage of a work in progress, since 1988 Mitsuhiro

Holographic obstructions to symmetry-preserving regulators John McGreevy, UCSD based on

Introducing Roman imperial coinage Coins pre-weighed quantity of metal in a system of related

Calculating in floating sexagesimal place value notation, 4000 years ago Christine Proust

Matthew Series Lesson #130 July 31, 2016 Dean Bible Ministries www.deanbibleministries.org Dr.

Early Twentieth-Century Fiction e20fic19.blogs.rutgers.edu Prof. Andrew Goldstone

Sambuz

Useful Links

Newsletter

Mail Us

Sign Clustering and Topic Extraction in Proto-Elamite Logan Born 1 - PowerPoint PPT Presentation

Sign Clustering and Topic Extraction in Proto-Elamite Logan Born 1 Kate Kelley 2 Nishant Kambhatla 1 Carolyn Chen 1 Anoop Sarkar 1 1 Natural Language Laboratory 2 Department of Classical, Near School of Computing Science Eastern, and Religious

Ziggurat and Elamite Sadaf Yahyai Shadi Mahmoodi ELAMITE The Elamite citizens a nation

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

1 CEMUSA PROTO- SWC 6TH AVENUE &amp; WAVERLY PLACE DATE: 01/08/18 164 WEST 79TH STREET, SUITE

Outline 1. Overview 2. Status of the proto-DPC 3. From a proto-DPC to a consortium DPC 1 / 17

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

1 CEMUSA PROTO- DATE: 03/11/19 164 WEST 79TH STREET, SUITE 5B NWC COLUMBUS AVENUE &amp; WEST

Decentralized Consensus Proto cols 1 Goals of the lecture Decentralized Consensus

Web Information Retrieval Lecture 15 Clustering Todays Topic: Clustering Document

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

Disclosures Current Cervical Spine Clearance Protocols in Level I Trauma Centers in the United

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Matings and Thurston obstruction in an early stage of a work in progress, since 1988 Mitsuhiro

Holographic obstructions to symmetry-preserving regulators John McGreevy, UCSD based on

Introducing Roman imperial coinage Coins pre-weighed quantity of metal in a system of related

Calculating in floating sexagesimal place value notation, 4000 years ago Christine Proust

Matthew Series Lesson #130 July 31, 2016 Dean Bible Ministries www.deanbibleministries.org Dr.

Early Twentieth-Century Fiction e20fic19.blogs.rutgers.edu Prof. Andrew Goldstone

Sambuz

Useful Links

Newsletter

Mail Us

1 CEMUSA PROTO- SWC 6TH AVENUE & WAVERLY PLACE DATE: 01/08/18 164 WEST 79TH STREET, SUITE

1 CEMUSA PROTO- DATE: 03/11/19 164 WEST 79TH STREET, SUITE 5B NWC COLUMBUS AVENUE & WEST