towards a deep and unified understanding of deep neural
play

Towards a Deep and Unified Understanding of Deep Neural Models in - PowerPoint PPT Presentation

Towards a Deep and Unified Understanding of Deep Neural Models in NLP Chaoyu Guan* 2 , Xiting Wang* 2 , Quanshi Zhang 1 , Runjin Chen 1 , Di He 2 , Xing Xie 2 * Equal Contribution 1 John Hopcroft Center and the MoE Key Lab of Artificial


  1. Towards a Deep and Unified Understanding of Deep Neural Models in NLP Chaoyu Guan* 2 , Xiting Wang* 2 , Quanshi Zhang 1 , Runjin Chen 1 , Di He 2 , Xing Xie 2 * Equal Contribution 1 John Hopcroft Center and the MoE Key Lab of Artificial Intelligence, AI Institute, at the Shanghai Jiao Tong University, Shanghai, China 2 Microsoft Research Asia, Beijing, China

  2. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Introduction A key task in explainable AI is to associate latent representations with input units by quantifying layerwise information discarding of inputs. Most explanation methods (e.g., DNN visualization) have coherency & generality issues • Coherency: requires that a method generates consistent explanations across different neurons, layers, and models. • Generality: existing measures are usually defined under certain restrictions on model architectures or tasks.

  3. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Our solution Considering both coherency and generality • A unified information-based measure: quantify the information of each input word that is encoded in an intermediate layer of a deep NLP model. • The information-based measure as a tool • Evaluating different explanation methods. • Explaining different deep NLP models • This measure enriches the capability of explaining DNNs.

  4. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Problem • Quantification of sentence-level information discarding: quantify the information of an entire sentence 𝐲 that is encoded in 𝐭 . • Quantification of word-level information discarding : quantify the information of each specific word 𝐲 𝑗 that is encoded in 𝐭 . • Fine-grained analysis of word attributes : analyze the fine-grained reason why 𝐭 uses the information of 𝐲 𝑗 . 𝑈 𝑈 ∈ 𝐘 : Input sentence 𝑈 , … , 𝐲 𝑜 𝑈 : word embedding 𝐲 = 𝐲 1 𝐲 𝑗 𝐭 = Φ 𝐲 ∈ 𝐓 : hidden state Φ ⋅ : function of the intermediate layer

  5. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Word Information Quantification Multi-Level Quantification Corpus level 𝑁𝐽 𝐘; 𝐓 = 𝐼 𝐘 − 𝐼(𝐘|𝐓) 𝑁𝐽(𝐘; 𝐓) 𝐼(𝐘|𝐓) 𝐼 𝐘 𝐓 = න 𝑞 𝐭 𝐼 𝐘 𝐭 𝑒𝐭 𝐭∈𝐓 𝑞 𝐲 ′ 𝐭 log 𝑞 𝐲 ′ 𝐭 𝑒𝐲′ 𝐼(𝐲) = − න Sentence level 𝐼(𝐘) 𝐲 ′ ∈𝐘 𝐼 𝐘 𝐭 = ∗ ෍ 𝐼 𝐘 𝑗 𝐭 Word level 𝐼(𝐘 𝑗 |𝐭 = Φ(𝐲)) reflects how 𝑗 much information from word ′ 𝐭 log 𝑞 𝐲 𝑗 ′ 𝐭 𝑒𝐲′ 𝑗 𝐲 𝑗 is discarded by 𝐭 during 𝐼 𝐘 𝑗 𝐭 = − න 𝑞 𝐲 𝑗 the forward propagation. ′ ∈𝐘 𝒋 𝐲 𝒋 * Suppose the words are independent in one sentence.

  6. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Word Information Quantification Perturbation-based Approximation We use 𝐼(෩ 𝐘 𝑗 |𝐭) to approximate 𝐼(𝐘 𝑗 |𝐭) by minimizing the following loss: 𝑜 2 − 𝜇 ෍ 𝐼(෩ 𝑀 𝝉 = 𝔽 𝝑 Φ ෤ 𝐲 − 𝐭 𝐘 𝑗 𝐭 ቚ 2 𝐉) 𝝑 𝑗 ∼𝒪(𝟏,𝜏 𝑗 𝑗=1

  7. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Fine-Grained Analysis of Word Attributes Disentangle the information of a common concept 𝐝 away from each word 𝐲 𝑗 Importance of the i-th word ′ |𝐭) 𝐵 𝑗 = log 𝑞(𝐲 𝑗 |𝐭) − 𝔽 𝐲 𝑗 ′ ∈𝐘 𝑗 log 𝑞(𝐲 𝑗 concerning random words Importance of the common ′ |𝐭) − 𝔽 𝐲 𝑗 ′ |𝐭) 𝐵 𝐝 = 𝔽 𝐲 𝑗 ′ ∈𝐘 𝐝 log 𝑞(𝐲 𝑗 ′ ∈𝐘 𝑗 log 𝑞(𝐲 𝑗 concept c w.r.t. random words 𝑠 𝑗,𝐝 = 𝐵 𝑗 − 𝐵 𝐝 indicates the remaining information of the word 𝐲 𝑗 when we remove the information of the common attribute 𝐝 from the word.

  8. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Comparative Study • Three baselines: LRP, gradient-based, perturbation • Conclusion: our method provides the most faithful explanations for • Across timestamp analysis Our method clearly shows that the • Across layer analysis model gradually focuses on the most • Across model analysis important parts of the sentence.

  9. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Understanding Neural Models in NLP We explain four NLP models (BERT, Transformer, LSTM, and CNN): • What information is leveraged for prediction? • How does the information flow through layers? • How do the models evolve during training?

  10. Towards a Deep and Unified Understanding of Deep Neural Models in NLP #62 Understanding Neural Models in NLP • Bert and Transformer use words for prediction, while LSTM and CNN use subsequences of sentences for prediction. • Different models process the input sentence in different manners.

  11. Towards A Deep and Unified Understanding of Deep Neural Models in NLP Please visit our poster at #62 !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend