natural language technology for business intelligence
play

Natural Language Technology for Business Intelligence Business - PowerPoint PPT Presentation

Natural Language Technology for Business Intelligence Business Intelligence Horacio Saggion & Adam Funk


  1. ���������������������������� Natural Language Technology for Business Intelligence Business Intelligence Horacio Saggion & Adam Funk

  2. ���������������������������� ����������������������������� ����������������������������� ����������������������������� ����������������������������� ��������������������� ��������������������� ��������������������� ��������������������� • Business Intelligence (BI) is the process of finding, gathering, aggregating, and analysing information for decision making • BI has relied on structured/quantitative information for decision making and hardly ever use qualitative information found in unstructured sources which the information found in unstructured sources which the industry is keen in using • NLP and BI: – gathering information through Information Extraction and Text analysis – aggregating information through cross-source coreference, identity resolution, text clustering • This presentation will be based on part on our work for the MUSING EU project

  3. ���������������������������� ��������������������������� ��������������������������� ��������������������������� ��������������������������� • IE pulls facts from the document collection • It is based on the idea of scenario template – some domains can be represented in the form of one or more templates – templates contain slots representing semantic information – IE instantiates the slots with values: strings from the text or associated values • IE is domain dependent and has to be adapted to each application domain either manually or by machine learning

  4. ���������������������������� ���������� ���������� ���������� ���������� ������������������ ������������������ ������������������ ������������������ SENER and Abu Dhabi’s $15 billion renewable energy company MASDAR new joint venture Torresol Energy has announced an ambitious solar power initiative to develop, build and operate large Concentrated Solar Power (CSP) plants worldwide….. SENER Grupo de Ingeniería will control 60% of Torresol Energy and MASDAR, the remaining 40%. The Spanish holding will contribute all its experience in the design of high technology that has positioned it as a leader in world engineering. For its part, MASDAR will contribute with this initiative to diversifying Abu Dhabi’s economy and strengthening will contribute with this initiative to diversifying Abu Dhabi’s economy and strengthening the country’s image as an active agent in the global fight for the sustainable development of the Planet. ��������� ������������������������� ��������� ������ ��������� ��� ��������� ��� ��������������� ����������� ��������� ������ ������ ������� !"��#����$�%����$�������������&�'� �������(����(���")

  5. ���������������������������� ���������������������� ���������������������� ���������������������� ���������������������� ����������� ����������� ����������� ����������� • Template can be used to populate a data base (slots in the template mapped to the DB schema) • Template can be used to generate a short • Template can be used to generate a short summary of the input text “SENER and MASDAR will form a joint venture – to develop, build, and operate CSP plants” • Data base can be used to perform querying/reasoning – Want all company agreements where company X is the principal investor

  6. ���������������������������� �������! ������� !"����������������� "����������������� ������� ������� ! ! "����������������� "����������������� ���������� ���������� ���������� ���������� The application domain (concepts, relations, instances, etc.) is modelled • through an ontology or set of ontologies Onto-based Information Extraction identifies in text instances of concepts • and relations expressed in the ontology the extraction task is modelled through “RDF templates” – X is a company; Z is a person; Z is manager of X; etc. – It generally uses the ontology as input and output It generally uses the ontology as input and output – – Extracted information is used to populate a knowledge repository • Updating the KR involves a process of identity resolution • GATE components are particularly well adapted for Ontology-based IE • in particular GATE has an API to manipulate the ontology and the ontology can – be manipulated in extraction grammars

  7. ���������������������������� �������! ������� !"������������#�$�%& "������������#�$�%& ������� ������� ! ! "������������#�$�%& "������������#�$�%& DATA SOURCE DOMAIN EXPERT ONTOLOGY CURATOR PROVIDER DOCUMENT USER MUSING ONTOLOGY DOCUMENT COLLECTOR USER INPUT DOCUMENT MUSING APPLICATION REGION ONTOLOGY-BASED MUSING DATA ECONOMIC SELECTION INFORMATION EXTRACTION REPOSITORY INDICATORS REGION MODEL SYSTEM RANK ENTERPRISE MANUALLY INTELLIGENCE COMPANY ANNOTATED ANNOTATED DOCUMENTS INFORMATION REPORT DOCUMENT ONTOLOGY ANNOTATION KNOWLEDGE POPULATION TOOL BASE INSTANCES & DOMAIN EXPERT RELATIONS

  8. ���������������������������� ����������������������� ����������������������� ����������������������� ����������������������� #�$�%& #�$�%& #�$�%& #�$�%&

  9. ���������������������������� '����$����������#�$�%& '����$����������#�$�%& '����$����������#�$�%& '����$����������#�$�%& Data sources are provided by MUSING partners and include balance sheets, • company profiles, press data, web data, etc. (some private data) Il Sole 24 ORE – Italian financial news paper – Some English press data – Financial Times – Companies’ web pages (main, “about us”, “contact us”, etc.) – Wikipedia, CIA Fact Book, etc. – CreditReform (data provider): company profiles; payment information – data provider CreditReform (data provider): company profiles; payment information – data provider – – European Business Registry (data provider): profiles, appointments – Discussion forums – Log files for IT related applications – Ontology is manually developed through interaction with domain experts and ontology • curators It extends the PROTON ontology and covers the financial, international, and IT operational – risk domain Particular methodology used to pull out the information from domain experts: “Competence – Questions”

  10. ����������������������������

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend