p ro g en ie biographical descriptions for intelligence
play

P RO G EN IE: Biographical descriptions for Intelligence Analysis - PowerPoint PPT Presentation

P RO G EN IE: Biographical descriptions for Intelligence Analysis Pablo Duboue, Kathleen McKeown and Vassileios Hatzivassiloglou Computer Science Department Columbia University in the city of New York Goals Provide final users with quick


  1. P RO G EN IE: Biographical descriptions for Intelligence Analysis Pablo Duboue, Kathleen McKeown and Vassileios Hatzivassiloglou Computer Science Department Columbia University in the city of New York

  2. Goals • Provide final users with quick and concise descriptions – Foreign military personnel – Foreign political personnel – Terrorists – Criminal • Customizable – Different users – Different scenarios – Different requirements • P RO G EN IE’s approach On the fly generation of person’s descriptions

  3. Motivation and Relevance • Information Retrieval – Look for existing biographies • Summarization – Integrate pieces of text from various textual sources • Natural Language Generation (NLG) – Create text from structured information sources P RO G EN IE’s Approach – Builds on the NLG tradition ∗ Diverges from it, automatically construct content plans – Combine a generator with an agent-based infrastructure – Mix textual with non-textual sources

  4. System Description Internet Knowledge Component text KB Text and Knowledge knowledge resource Knowledge Sources Generation Component Learning Component Content Planner Schema Generated Biographies

  5. Learning Component • Content Planner – Structuring: Distribution of the information among textual elements – Selection: Filtering of the available data • Schemas – An implementation for Content Planners (McKeown, 1983) • Construct Content Planning Schemas, from training data – Training material: data and biographies – The learned schemas will be used with new, unseen people

  6. Text and Knowledge Resource • Celebrities – Easily available – Representative of the learning issues – Possibility of corpus re-distribution • Size – Data frames for 1,100 different celebrities – assorted biographies, ranging from 110 to 500 words – Data and biographies crawled from independent web sites

  7. Example of Text and Knowledge Resource "Thomas" name−1 Actor, born Thomas Connery on August 25, first name "Sean" 1930, in Fountainbridge, Edinburgh, Scot- birth−1 middle last date land, the son of a truck driver and char- birth "Connery" ... year woman. He has a brother, Neil, born in 1938. date−1 1930 ... person−2654 ... occupation Connery dropped out of school at age fif- ... occupation−1 c−actor relative TYPE teen to join the British Navy. Connery is best "Jason" relative c−grand−son first TYPE relative−1 c−son known for his portrayal of the suave, sophisti- ... relative−2 TYPE ... name−2 person person name "Dashiel" cated British spy, James Bond, in the 1960s. person−7312 ... first name ... ... ... . . . ... name−2 ...

  8. Learning of Content Selection Rules (1) • To appear – Duboue and McKeown, “Statistical Acquisition of Content Selec- tion Rules for Natural Language Generation”, EMNLP 2003 • Goals – Analyze how variation on the data influence variations in the text – Obtain high-level content selection rules, to filter out the input

  9. Learning of Content Selection Rules (2) • Example Given: – (KB-1,Bio-1),(KB-2,Bio-2),(KB-3,Bio-3),(KB-4,Bio-4) If: – KB- { 1 , 2 } contain � birth place state ‘ MD ′ � – KB- { 3 , 4 } contain � birth place state ‘ NY ′ � Then: – Compare the language models of Bio- { 1 , 2 } against Bio- { 3 , 4 } . – If the models differ (cross entropy), content select � birth place state � .

  10. Learning of Content Planning Schemas semantic input • Earlier experiments performed transcripts in a medical domain. • Corpus collected during the order constraints evaluation described in McK e- own et al. (2001). • In Duboue and McKeown genetic search (2001), we mined the corpus to extract ordered constraints operators genetic pool structure between semantic elements. structure atomic operators structure atomic operators generation fitness fn • In Duboue and McKeown system atomic operators (2002), we used the corpus to learn content planning schemas using an alignment- planner based fitness function.

  11. Knowledge Component • Data for Learning – Supplied by internal databases and networks – E.g., Intelink, IAFIS • Data for Execution – Information Extraction Agents on the Internet – Publicly available data as a test bed – Data represented in RDF (Semantic Web)

  12. Generation Component 1. Inference Module Limited world knowledge inferencing 2. Content Planner McKeown’s schemas 3. Text Planner Splits a rhetorical tree into paragraphs 4. Referring Expression Generator Handles pronominalization 5. Aggregation Mixes together clauses with similar structure 6. Lexical Chooser Selects words for concepts 7. Surface Realizer FUF/SURGE unification based realizer

  13. Generated Example Osama Bin Laden • overview: – name of the person: ∗ He is Usama Bin Laden. – place of birth: ∗ He was born in Saudi Arabia. – nationality of the person: ∗ He was a national of Saudi Arabia. ∗ He does not currently have a nationality. – occupation: ∗ He is a terrorist. ∗ He is the leader of Al-Qaeda. ∗ He is a civil engineer. ∗ He is a constructor. – education received: ∗ He attended the primary school in Jeddah, Saudi Arabia. ∗ He attended the secondary school in Jeddah, Saudi Arabia. ∗ To study security, the CIA gave him training according to Hazhir Teimourian.

  14. Conclusions • P RO G EN IE – Solves an existing requirement for intelligence and law enf orce- ment personnel • Status – Prototype Learning Component implemented in an earlier do- main ∗ New version, acquired Content Selection rules – Generation Component , five operational modules – Knowledge Component , under construction

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend