Process & Methodology ( Quantitative ) Frequency analysis - - PowerPoint PPT Presentation

process methodology quantitative
SMART_READER_LITE
LIVE PREVIEW

Process & Methodology ( Quantitative ) Frequency analysis - - PowerPoint PPT Presentation

Process & Methodology ( Quantitative ) Frequency analysis Thematic seasonality analysis Lexicon: richness & ProQuest Newsstand Hierarchical clustering complexity: describe Search filter on: Probabilistic topic models statistics on #


slide-1
SLIDE 1

J.Bonilla | PhD Defense |12/08/2016 Slide #22

Process & Methodology (Quantitative)

News (NYT, WS, FT News (NYT, WS, FT) ProQuest Newsstand Search filter on: NYT, WSJ, FT “analytics” 2004-2015 à8102 articles Sampled Corpus Lexicon: richness & complexity: describe statistics on # of words, types of words, and # of sentences Document Similarities: cosine distance Corpus evaluation of readability, complexity, and lexical diversity Random sample with 33% stratification à2352 articles Text pre- processing Stop words: syntax vs semantic words Stemming: words in its root form DTM: document term matrix representation of the corpus Sparsity: handling on zeros in DTM Text analytics & Natural Language Processing Frequency analysis Thematic seasonality analysis Hierarchical clustering Probabilistic topic models Statistical associations Words in context Named Entity Recognition Features and Entities

slide-2
SLIDE 2

J.Bonilla | PhD Defense |12/08/2016 Slide #23

Case: “analytics”+“CUSP” àCorpus: (202 articles; years 2011-15)

slide-3
SLIDE 3

J.Bonilla | PhD Defense |12/08/2016 Slide #24

Case: “CUSP”& “analytics” àCorpus: (202 articles; years 2011-15) Frequency Analysis

Words for analysis

slide-4
SLIDE 4

J.Bonilla | PhD Defense |12/08/2016 Slide #25

Case: “CUSP”& “analytics” àCorpus: (202 articles; years 2011-15)

Hierarchical clustering

What is this corpus about?

slide-5
SLIDE 5

J.Bonilla | PhD Defense |12/08/2016 Slide #26

Case: “CUSP”& “analytics” àCorpus: (202 articles; years 2011-15) new york citi nyu research institute scienc engin center, technology, brooklyn program, urban, inform

  • ne, work, innov

Include, progres, appli univer s will school

Hierarchical clustering

What is this corpus about?

slide-6
SLIDE 6

J.Bonilla | PhD Defense |12/08/2016 Slide #27

Case: “CUSP”& “analytics” àCorpus: (202 articles; years 2011-15)

Probabilistic topic models

Hidden patterns & emerging themes

slide-7
SLIDE 7

J.Bonilla | PhD Defense |12/08/2016 Slide #28

Case: “CUSP”& “analytics” àCorpus: (202 articles; years 2011-15)

“director”, “koonin”, “president”, “sexton”, “faculty”, “people”, “researchers”, “professor”, “student”, “leaders” à HR ”cyber”, “tech”, “app”, “mobile”, “online”, “energy”, data”, ”climate”, “computer”, “wireless”, “campus” àINFRASTRUCTURE “private”, “partnership”, ”entrepreneurship” àGOVERNANCE Probabilistic topic models

Hidden patterns & emerging themes