Natural Language Processing queen Transformers spaCy Context - PowerPoint PPT Presentation

Electronical Health Record BERT Data Privacy Ontology Construction Training Corpus Linear Activation Probability Distribution Character Codes Meaningful Representation king Drug Interaction Word Embedding Natural Language Processing queen Transformers spaCy Context Task word2vec one hot encoded Representation Named Entity Recognition gensim Hidden Layer Pre trained Models Weight Matrix Kullback Leibler Divergence Attention Activation Function Softmax Activation De identification https://files.jansellner.net/NLPSeminar.pdf

What can we do with NLP? 2 • Named entity recognition • Sentence similarity • Family history extraction [1] S1: Mental: Alert and oriented to person place time and situation. S2: Feet:Neurological: He is alert and oriented to person, place, and time. → Similarity: 4.5/5 [2] [3] 2020-01-29 l Jan Sellner l Natural Language Processing

Image vs. Text 3 There are new streaky left basal opacities which could represent only atelectasis; however, superimposed pneumonia or aspiration cannot be excluded in the appropriate clinical setting. There is only mild vascular congestion without pulmonary edema. [4] 2020-01-29 l Jan Sellner l Natural Language Processing

Embeddings 4 • Idea: map each word to a fixed-size vector from an embedding space [5] 2020-01-29 l Jan Sellner l Natural Language Processing

Language Understanding Requires More 5 • Understanding the meaning of a sentence is hard • Words have multiple meanings • Word compounds may alter the meaning • Coreference resolution • ... 2020-01-29 l Jan Sellner l Natural Language Processing

Coreference Resolution (cf. Winograd Schemas) 6 The coin does not fit into the backpack because it is too small. The coin does not fit into the backpack because it is too large. 2020-01-29 l Jan Sellner l Natural Language Processing

Coreference Resolution (cf. Winograd Schemas) 7 The coin does not fit into the backpack because it is too small. The coin does not fit into the backpack because it is too large. 2020-01-29 l Jan Sellner l Natural Language Processing

Coreference Resolution (cf. Winograd Schemas) 8 The coin does not fit into the backpack because it is too small. → Die Münze passt nicht mehr in den Rucksack, weil er zu klein ist. The coin does not fit into the backpack because it is too large. → Die Münze passt nicht mehr in den Rucksack, weil sie zu groß ist. 2020-01-29 l Jan Sellner l Natural Language Processing

2019: One Step Towards Language Understanding 9 • BERT: language model developed by Google • Word embeddings aren’t unique anymore; they depend on the context instead • Different architecture: the system has an explicit notion to model word dependencies → attention [7] [6] 2020-01-29 l Jan Sellner l Natural Language Processing

Attention (Transformer Architecture) 10 • Goal: model dependencies between words • Idea: allow each word to pay attention to other words [12] The black cat plays with the piano • “ The ” → “ cat ”: determiner-noun relationship • “bl ack ” → “ cat ”: adjective-noun relationship • “plays” → “ with the piano ”: verb-object relationship 2020-01-29 l Jan Sellner l Natural Language Processing

How is Attention Calculated? 11 The black cat Input 𝒚 3 𝒚 1 𝒚 2 Embedding 𝒓 3 𝒓 1 𝒓 2 Queries Keys 𝒍 1 𝒍 2 𝒍 3 𝒘 3 Values 𝒘 1 𝒘 2 𝒓 1 , 𝒍 1 = 96 𝒓 1 , 𝒍 2 = 0 𝒓 1 , 𝒍 3 = 112 Score 12 0 14 Normalization 0.88 0.12 0 Softmax Weighting 𝒘 3 𝒘 1 𝒘 2 𝒜 3 𝒜 1 𝒜 2 Sum Based on [8] 2020-01-29 l Jan Sellner l Natural Language Processing

BERT Model 12 𝒜 1 𝒜 2 Self-Attention 𝒚 1 𝒚 2 The black 2020-01-29 l Jan Sellner l Natural Language Processing

BERT Model 13 𝒜 1 𝒜 2 𝒜 1 𝒜 2 𝒜 1 𝒜 2 Head 3 Head 2 Head 1 𝒚 1 𝒚 2 The black 2020-01-29 l Jan Sellner l Natural Language Processing

BERT Model 14 𝒔 1 𝒔 2 Feed Forward Network Encoder 𝒜 1 𝒜 2 Head 3 Head 2 Head 1 𝒚 1 𝒚 2 The black 2020-01-29 l Jan Sellner l Natural Language Processing

BERT Model 15 𝒔 1 𝒔 2 Encoder Encoder Encoder Encoder 𝒚 1 𝒚 2 The black 2020-01-29 l Jan Sellner l Natural Language Processing

Training 16 • Goal: BERT should get a basic understanding of the language • Problem: not enough annotated training data available • Idea: make use of the tons of unstructured data we have (Wikipedia, websites, Google Books) and define training tasks • Next sentence prediction • Masking 2020-01-29 l Jan Sellner l Natural Language Processing

Masking 17 blue, red, orange… FNN + Softmax 𝑠 𝑠 2 𝑠 3 𝑠 𝑠 5 1 4 𝒚 1 𝒚 2 𝒚 3 𝒚 4 𝒚 5 My favourite colour is [MASK] 2020-01-29 l Jan Sellner l Natural Language Processing

Attention in Action 18 Library [9] 2020-01-29 l Jan Sellner l Natural Language Processing

Cracking Transfer Learning 19 [10] 2020-01-29 l Jan Sellner l Natural Language Processing

Model Size 20 Millions of parameters [11] 2020-01-29 l Jan Sellner l Natural Language Processing

Real-World Implications 21 https://www.blog.google/products/search/search-language-understanding-bert/ 2020-01-29 l Jan Sellner l Natural Language Processing

Literature 22 • Papers • Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin . ‘ Attention Is All You Need ’. In NIPS, 2017. • Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. ‘ BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding ’. ArXiv:1810.04805 [Cs], 10 October 2018. • Blogs • https://jalammar.github.io/ • https://medium.com/synapse-dev/understanding-bert-transformer-attention- isnt-all-you-need-5839ebd396db • https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial/ • Implementation • https://github.com/huggingface/transformers 2020-01-29 l Jan Sellner l Natural Language Processing

References 23 • [1] https://www.youtube.com/watch?v=2_HSKDALwuw&list=PLBmcuObd5An4UC6jvK_- eSl6jCvP1gwXc • [2] 2019 n2c2 Shared-Task and Workshop - Track 1: n2c2/OHNLP Track on Clinical Semantic Textual Similarity • [3] Lewis, Neal, Gruhl , Daniel, Yang, Hu. ‘ Extracting Family History Diagnosis from Clinical Texts ’. In BICoB, 2011. • [4] Johnson, Alistair E W, Pollard, Tom J, Berkowitz, Seth, Greenbaum, Nathaniel R, Lungren, Matthew P, Deng, Chih-ying, Mark, Roger G, Horng , Steven. ‘ MIMIC-CXR: A large publicly available database of labeled chest radiographs ’. arXiv preprint arXiv:1901.07042, 2019 • [5] https://jalammar.github.io/illustrated-word2vec/ • [6] https://twitter.com/seb_ruder/status/1070470060987310081/photo/3 • [7] https://mc.ai/how-to-fine-tune-and-deploy-bert-in-a-few-and-simple-steps-to-production/ • [8] https://jalammar.github.io/illustrated-transformer/ • [9] https://github.com/jessevig/bertviz • [10] https://jalammar.github.io/illustrated-bert/ • [11] https://medium.com/huggingface/distilbert-8cf3380435b5 • [12] https://medium.com/synapse-dev/understanding-bert-transformer-attention-isnt-all-you- need-5839ebd396db 2020-01-29 l Jan Sellner l Natural Language Processing

Natural Language Processing queen Transformers spaCy Context - PowerPoint PPT Presentation

Electronical Health Record BERT Data Privacy Ontology Construction Training Corpus Linear Activation Probability Distribution Character Codes Meaningful Representation king Drug Interaction Word Embedding Natural Language Processing

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

MIA - Master on Artificial Intelligence Advanced Natural Language Processing Advanced Natural

Advanced Natural Language Processing: What is Natural Language Processing (NLP)? Background

Introduction Karl Stratos Rutgers University Karl Stratos CS 533: Natural Language Processing

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Introduction to Natural Language Processing CMSC 470 Marine Carpuat Natural Language Processing

Canadian Society of Internal Medicine Annual Meeting Oct 12, 2018 Banff, AB Competency Based

West of England Learning Disability Collaborative Physical Health and Learning Disability

Management Updates and Questions and Answers April 27, 2016 4/28/2016 2 Mortality Review

Performance Measurement Work Group Meeting 12/18 / 2019 Agenda 1. Welcome and introductions 2.

GLOBAL OPTIMIZATION WITH BRANCH-AND-REDUCE: Algorithms, Software, and Applications Nick

CYTOMINE A rich internet application for remote visualization, collaborative annotation, and

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Bayesian Methods for Variable Selection with Applications to High-Dimensional Data Part 4: