Information Extraction Industrial Natural Language Processing - - PowerPoint PPT Presentation

information extraction
SMART_READER_LITE
LIVE PREVIEW

Information Extraction Industrial Natural Language Processing - - PowerPoint PPT Presentation

Industrial Natural Language Processing & Information Extraction Industrial Natural Language Processing Industrial Natural Language Processing Overview Natural Language Processing Developing and applying techniques NLP and methods for


slide-1
SLIDE 1

Industrial Natural Language Processing & Information Extraction

slide-2
SLIDE 2

Industrial Natural Language Processing

slide-3
SLIDE 3

3 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Overview

NLP NLU

summarization semantic parsing sentiment analysis dialogue agents natural language inference question answering machine translation text categorization syntactic parsing POS tagging keyword extraction named entity recognition topic recognition

Natural Language Processing

Developing and applying techniques and methods for the automatic processing of text

Industrial Natural Language Processing

Developing and applying techniques and methods for the automatic processing of text in industry by explicitly considering the requirements and circumstances of industrial environments

slide-4
SLIDE 4

4 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Research Goals Reliable application and deployment of NLP in industrial environments Anonymization of textual data in order to be able to forward it to third parties Exploration of new areas for the use of natural language processing in the wild

slide-5
SLIDE 5

5 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

NLP in Industrial Environments Reliable application and deployment of NLP in industrial environments ▪ Design, develop and evaluate software architectures to make NLP useable by non-technical users ▪ Improve the process of deploying and using NLP in industrial environments ▪ Analyze textual data based on state-of-the-art NLP approaches

Software Architectures Usability Analyze

slide-6
SLIDE 6

6 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Anonymization Anonymization of textual data in order to be able to forward it to third parties ▪ The analysis of unstructured company data in the cloud is either undesired by the company itself or even forbidden by law (DSGVO) ▪ Cloud services provide more accurate and sophisticated analytics ▪ Develop new anonymization approaches by using machine learning as the current approaches are too inaccurate

slide-7
SLIDE 7

7 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Exploiting new Application Domains Exploration of new areas for the use of natural language processing in the wild ▪ Most of the data that is available comprises unstructured data and especially textual documents ▪ Identify available data sources and derive meaningful use cases from it ▪ Develop appropriate models and applications for the identified use cases that generate an additional value

Identify Data Sources Derive Use Cases Explore possible Solutions

slide-8
SLIDE 8

Selected Research Projects and Applications

slide-9
SLIDE 9

9 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Integrating External Data into an Enterprise Information System

External Information Extraction Tool

▪ Personalized quick and easy access to a large amount of data from several different sources within a single tool ▪ Identifying relevant data sources (e.g., new websites, social media, internal enterprise data) ▪ Integrating data into a common data storage ▪ Creation of dedicated analytical services for specific user requirements, like ▪ Natural Language Processing ▪ Translations ▪ Overview of business knowledge graph ▪ Sentiment analysis ▪ Recommendation of relevant data

Results

▪ Integrating internal & external data into an enterprise information system to gain faster insights into changing markets, relations etc.

Approach

News Websites Social Media

Enterprise Information System

Information Data

Goal

slide-10
SLIDE 10

10 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Utilizing Textual Maintenance Data from Production

Maintenance Data Insights Tool

▪ Tool for ▪ assisted generation of maintenance report texts ▪ supported finding of solutions ▪ visualization of errors and costs ▪ Extracting textual reports from maintenance staff ▪ Classify text into description of symptoms, causes and solutions ▪ Calculation of relevant statistics ▪ Creation of dedicated analytical services for staff and decision makers, like ▪ Occurrence of similar error descriptions over time and location ▪ Costs per machine location ▪ Troubleshooting proposal for specified symptoms

Results

▪ Utilizing unstructured textual information from machines’ maintenance protocols to gain insights and optimize processes

Approach

Maintenance Data Platform

Information

Goal

„… defect, please check“ „… part was exchanged“ „… machine losing oil“ „… spare part ordered “

Solution Hints

slide-11
SLIDE 11

11 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

Anonymization of Enterprise Documents using the Cloud

Hybrid Anonymizer

▪ Functional hybrid system for the automatic anonymization/pseudonymization of textual data ▪ Enabled the use of cloud analysis for textual documents ▪ Development of an anonymization approach based on predefined rules and deep learning ▪ Implementation and testing of the hybrid anonymizer ▪ Deployment of the anonymizer within the customers ecosystem ▪ Methods: Natural Language Processing, Deep Learning, Micro-Service Architecture

Results

▪ Enable the usage of cloud services for data processing and analysis without revealing sensitive information

Approach Goal

slide-12
SLIDE 12

12 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Industrial Natural Language Processing

AISLE – Support learning academic phrases

AISLE

▪ Web platform that is actively used by students to improve their vocabulary ▪ User studies showed the system's positive impact on vocabulary growth ▪ Construct a large domain and target group specific text corpus using NLP methods ▪ Use recent methods in the area of natural language processing for extracting and evaluating words and phrases based on their relevance ▪ Development of an adaptive learning system to improve vocabulary on the basis of a developed learning algorithm and the built up corpora

Results

▪ Support students at the beginning of their studies in reading and understanding scientific publications

Approach Goal

Interact Enter Word: Vocabulary Size Evaluate & Select Words View Results Analyze Results

slide-13
SLIDE 13

Information Extraction

slide-14
SLIDE 14

14 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Information Extraction

Overview

POS tagging

Unstructured Data Information Extraction

“… Application of methods from practical computer science, artificial intelligence and computational linguistics to the problem of automatic machine processing of unstructured information … ”

Source: Wikipedia

Different Types of Unstructured Data

named entity recognition

Data-Specific Processing Structured Data

Structured Data

named entity recognition

Data Analysis Results

slide-15
SLIDE 15

15 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Information Extraction

Research Goals Leveraging of machine learning techniques to improve information extraction Transformation of unstructured data into useful structured information and knowledge

slide-16
SLIDE 16

16 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Information Extraction

Structuring unstructured data Transformation of unstructured data into useful structured information and knowledge ▪ Identify all the relevant information that need to be extracted ▪ Identify approaches for extracting information from unstructured data and turning it into valuable knowledge ▪ Develop processing pipelines to automatically extract the identified information and make them accessible in a structured way

Identify Information Choose Approaches Develop Processing Pipeline

slide-17
SLIDE 17

17 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Information Extraction

Machine Learning based Information Extraction Leveraging of machine learning techniques to improve information extraction

Combine Machine Learning & Classical Approaches Data Annotation Model Training & Refinement

▪ Combine or substitute classical information extraction approaches with machine learning ▪ Development of tools to improve the process of annotating unstructured data in order to create a suitable data set for the training of ML models ▪ Development & Refinement of ML models

slide-18
SLIDE 18

Selected Research Projects and Applications

slide-19
SLIDE 19

19 Industrial Natural Language Processing & Information Extraction Chair of Technologies and Management of Digital Transformation, University of Wuppertal

Information Extraction

Structuring PDF Documents

PDF Analyzer

▪ Tool for identifying document elements in PDF files ▪ Header ▪ Text body ▪ Tables ▪ Figures ▪ Formulas and Algorithms ▪ First approaches for deriving information from diagrams and tables exist ▪ Classifying diagram types ▪ Extracting values, axis labels etc. ▪ Consider additional context ▪ Using Deep Learning (CNNs) to detect different elements within PDF documents ▪ Extract additional information from diagrams and tables for further processing

Results

▪ Structuring of unstructured PDF documents to extract additional information and prepare the data for further analytics

Approach Goal

slide-20
SLIDE 20

Your Contact Person:

André Pomp, M.Sc. Tel: +49 (0)202 439 1153 pomp@uni-wuppertal.de Chair for Technologies and Management of Digital Transformation

  • Univ. Prof. Dr. Ing. Tobias Meisen

https://www.tmdt.uni-wuppertal.de/ Campus Freudenberg Rainer-Gruenter-Str. 21 D-42119 Wuppertal Germany University of Wuppertal School of Electrical, Information and Media Engineering