Natural Language Processing STOR 390 4/18/17 Kurt Vonnegut on the - PowerPoint PPT Presentation

Jul 03, 2023 •145 likes •509 views

Natural Language Processing STOR 390 4/18/17 Kurt Vonnegut on the Shapes of Stories https://www.youtube.com/watch?v=oP3c1h8v2ZQ We know how to work with tidy data We know how to work with tidy data Regression linear model, polynomial

Natural Language Processing STOR 390 4/18/17
Kurt Vonnegut on the Shapes of Stories https://www.youtube.com/watch?v=oP3c1h8v2ZQ
We know how to work with tidy data
We know how to work with tidy data Regression linear model, polynomial terms Classification K-nearest-neighbors, SVM Clustering K-means
Unstructured data : not all data is tidy Networks Text Images
Network data
Image data http://www.dailytarheel.com/article/2017/04/a-title-to- remember-north-carolina-wins-its-sixth-ncaa- championship http://dogtime.com/puppies/255-puppies
Text data https://emeraldcitybookreview.com/2014/06/beautiful-books-picturing-jane-austen_20.html
Unstructured ≠ no structure
Two strategies Invent new tools PageRank Turn it into tidy data
Images are numbers https://medium.com/@ageitgey/machine-learning-is-fun-part-3-deep-learning-and-convolutional-neural-networks- f40359318721
https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html
Text data One document = string of words Corpus = collection of documents
“ A token is a meaningful unit of text , most often a word, that we are interested in using for further analysis, and tokenization is the process of splitting text into tokens.” —Text Mining with R
Tokenization turns text into tidy format Word Sentence Paragraph Chapter
Jane Austen’s books tokenized by word
Make text lower case Make words more comparable Door —> door
Tokenization loses information Ignores word order
Most frequently appearing words
Remove stop words Commonly occurring words the to and Hand code a list of words
Most frequently occurring words (no stop words)
Sentiment analysis attempts to quantify emotional content Assign each word an emotional value positive/negative trust, fear, sadness, anger, surprise, disgust, joy, anticipation” -5, -4, … 4, 5
There are precompiled lexicons Hand coded Crowdsourced Amazon turk Online reviews Yelp
Assign each word a sentiment
Sentiment analysis is noisy
Sentiment analysis is noisy Lexicons may not generalize Unigrams no good Context
Sentiment analysis is noisy Statistics is so much fun vs. Statistics is so much fun
Jane Austen novels are fairly balanced
Different ways to quantify “time" chapter paragraph line sentence
Different ways to quantify “time" chapter paragraph line sentence we choose one unit of time = 80 lines
index = line number %/% 80 sentiment = (# positive words) - (# negative words)
Smooth time series with a low band pass filter http://www.matthewjockers.net/2015/02/02/syuzhet/
References Text Mining with R http://tidytextmining.com/ Revealing Sentiment and Plot Arcs with the Syuzhet Package http://www.matthewjockers.net/2015/02/02/syuzhet/

Recommend

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 5: Constraint-based grammars Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 5: Constraint-based

730 views • 30 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 6: Compositional Semantics Simone Teufel (Materials by Ann

493 views • 22 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 10: Discourse Simone Teufel (Materials by Ann Copestake)

501 views • 36 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula Buttery (materials by Ann Copestake) Computer Laboratory

554 views • 37 slides

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture 7: Lexical Semantics Simone Teufel (Materials mostly by Ann

552 views • 31 slides

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Industrial Natural Language Processing & Information Extraction Industrial Natural Language Processing Industrial Natural Language Processing Overview Natural Language Processing Developing and applying techniques NLP and methods for

479 views • 20 slides

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova ILLC University of Amsterdam 6 December 2018 Natural Language Processing 1 Language generation Language generation

828 views • 50 slides

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova ILLC University of Amsterdam 2 December 2019 1 / 51 Natural Language Processing 1 Language generation Language

830 views • 54 slides

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing 1 Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia Shutova ILLC University of Amsterdam 26 November 2018 1 / 45 Natural Language Processing 1 Compositional

1.06k views • 80 slides

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

CMSC 473/673 Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358 ferraro@umbc.edu Semantics Monday: 2:15-3 Tuesday: 11:00-11:30 Vision & language processing by appointment Learning with low-to-no

1.46k views • 117 slides

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans use language to communicate.

1.18k views • 33 slides

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural Language Processing Similarity and Clustering Advanced Natural Language Processing 1 Similarity and Clustering Similarity and Clustering

360 views • 35 slides

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Overview Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background and Overview Why is NLP hard? What will this course be about? Michael Collins EECS/CSAIL September 6, 2007 Advanced Natural

296 views • 7 slides

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

CS 533: Natural Language Processing Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing 1/10 Modern Natural Language Processing (NLP) NLP is everywhere Other examples? Karl Stratos CS 533: Natural

115 views • 10 slides

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Natural Language Processing Outline of todays lecture Overview of Natural Language Generation Components of Natural Language Generation systems Data for NNs via classical realization Referring expressions Natural Language Processing

586 views • 37 slides

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing (NLP) The engineering discipline of doing what people do with language, but using computers Computational Linguistics (CL) The science of

934 views • 28 slides

IIT Bombay Course Code : EE 611 Department: Electrical Engineering Instructor Name: Jayanta

Page 1 IIT Bombay Course Code : EE 611 Department: Electrical Engineering Instructor Name: Jayanta Mukherjee Email: jayanta@ee.iitb.ac.in Lecture 15 EE 611 Lecture 15 Jayanta Mukherjee Page 2 IIT Bombay

409 views • 16 slides

Distortion of filtered signals MATLAB tutorial series (Part 3.2) Pouyan Ebrahimbabaie Laboratory

Distortion of filtered signals MATLAB tutorial series (Part 3.2) Pouyan Ebrahimbabaie Laboratory for Signal and Image Exploitation (INTELSIG) Dept. of Electrical Engineering and Computer Science University of Lige Lige, Belgium Applied

769 views • 58 slides

for GAVRT at Caltech Glenn Jones Aug. 03, 2008 2008 CASPER Workshop Acknowledgements Xilinx

CASPER Development for GAVRT at Caltech Glenn Jones Aug. 03, 2008 2008 CASPER Workshop Acknowledgements Xilinx Generous FPGA and software donations Sandy Weinreb & Hamdi Mani Feed measurement data Useful stuff first! The

458 views • 43 slides

CEBAF waveguide absorbers R. Rimmer for JLab SRF Institute Outline Original CEBAF HOM

CEBAF waveguide absorbers R. Rimmer for JLab SRF Institute Outline Original CEBAF HOM absorbers Modified CEBAF loads for FEL New materials for replacement loads High power loads for next generation FELs Other applications

326 views • 19 slides

Local regime of 1d random band matrices Tatyana Shcherbina Princeton University QMath13:

Local regime of 1d random band matrices Tatyana Shcherbina Princeton University QMath13: Mathematical Results in Quantum Physics, Georgia Tech, October 8, 2016 Tatyana Shcherbina (PU) Local regime of 1d RBM 08.10.2016 1 / 17 Local

443 views • 19 slides

Thermalization and Random Matrices Anatoly Dymarsky University of Kentucky Great Lakes Strings

Thermalization and Random Matrices Anatoly Dymarsky University of Kentucky Great Lakes Strings 2018 University of Chicago, April 14 Thermalization of Quantum Systems How isolated quantum systems thermalize? Systems without additional

338 views • 16 slides

Piazza Recitation session : Review of linear algebra Location: Thursday, April 11, from

Piazza Recitation session : Review of linear algebra Location: Thursday, April 11, from 3:30-5:20 pm in SIG 134 (here) Deadlines next Thu, 11:59 PM : HW0, HW1 How to find teammates for project? Piazza Team Search Make sure you have

1k views • 56 slides

Matrix Factorization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Matrix Factorization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html Carlos Fernandez-Granda Low-rank models Matrix completion Structured low-rank models Motivation

586 views • 56 slides