Natural Language Question-Answering with Visualizations Bianca Yu, - - PowerPoint PPT Presentation

natural language question answering with visualizations
SMART_READER_LITE
LIVE PREVIEW

Natural Language Question-Answering with Visualizations Bianca Yu, - - PowerPoint PPT Presentation

Natural Language Question-Answering with Visualizations Bianca Yu, Hannah DeBalsi CS 294W Spr 2020 Outline Visualization + Natural Language Application in Natural The Purpose of Interfaces for Conversational Language Visualization


slide-1
SLIDE 1

Natural Language Question-Answering with Visualizations

Bianca Yu, Hannah DeBalsi

CS 294W Spr 2020

slide-2
SLIDE 2

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization Application in Conversational Virtual Assistants

Backend Implementation

From Answer to Display

Outline

NLP Query to Database Query

slide-3
SLIDE 3

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization

Outline

Application in Conversational Virtual Assistants

Backend Implementation

From Answer to Display NLP Query to Database Query

slide-4
SLIDE 4

What is visualization?

  • “Transformation of the symbolic into the geometric”

(McCormick et al. 1987)

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-5
SLIDE 5

What is visualization?

  • “Transformation of the symbolic into the geometric”

(McCormick et al. 1987)

  • “... finding the artificial memory that best supports our natural means of

perception.” (Bertin 1967)

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-6
SLIDE 6

What is visualization?

  • “Transformation of the symbolic into the geometric”

(McCormick et al. 1987)

  • “... finding the artificial memory that best supports our natural means of

perception.” (Bertin 1967)

  • “The use of computer-generated, interactive, visual representations of data to

amplify cognition.” (Card, Mackinlay, and Shneiderman 1999)

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-7
SLIDE 7

We use visualization to ...

  • Record information

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-8
SLIDE 8

We use visualization to ...

  • Record information

Slide content from Maneesh Agrawala’s slides in CS 448B

http://galileo.rice.edu/sci/observations/moon.html

THE PURPOSE OF VISUALIZATION

slide-9
SLIDE 9

We use visualization to ...

  • Record information

Slide content from Maneesh Agrawala’s slides in CS 448B

http://galileo.rice.edu/sci/observations/moon.html Getty Images

THE PURPOSE OF VISUALIZATION

slide-10
SLIDE 10

We use visualization to ...

  • Record information
  • Analyze information

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-11
SLIDE 11

We use visualization to ...

  • Record information
  • Analyze information

○ See data in context ○ Make a decision

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-12
SLIDE 12

See data in context

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

  • Example: Cholera outbreak, 1854
  • Tufte. Visual and Statistical Thinking 1997
slide-13
SLIDE 13

See data in context

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

  • Example: Cholera outbreak, 1854
  • Tufte. Visual and Statistical Thinking 1997
slide-14
SLIDE 14

Make a decision

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

  • Example: Challenger space shuttle launch, 1986

Wikipedia

slide-15
SLIDE 15

Make a decision

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

  • Example: Challenger space shuttle launch, 1986
  • Tufte. Visual and Statistical Thinking 1997
slide-16
SLIDE 16

Make a decision

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

  • Example: Challenger space shuttle launch, 1986
  • Tufte. Visual and Statistical Thinking 1997
slide-17
SLIDE 17

We use visualization to ...

  • Record information
  • Analyze information

○ See data in context ○ Make a decision

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-18
SLIDE 18

We use visualization to ...

  • Record information
  • Analyze information

○ See data in context ○ Make a decision

  • Convey information

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

slide-19
SLIDE 19

We use visualization to ...

  • Record information
  • Analyze information

○ See data in context ○ Make a decision

  • Convey information

Slide content from Maneesh Agrawala’s slides in CS 448B

THE PURPOSE OF VISUALIZATION

Rosalind Franklin and RG Gosling

slide-20
SLIDE 20

Graphs in statistical analysis

THE PURPOSE OF VISUALIZATION

“Graphs can have various purposes, such as: (i) to help us perceive and appreciate some broad features of the data, (ii) to let us look behind those broad features and see what else is there.” (Anscombe 1973)

slide-21
SLIDE 21

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization

Outline

Application in Conversational Virtual Assistants

Backend Implementation

From Answer to Display NLP Query to Database Query

slide-22
SLIDE 22

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization

Outline

Application in Conversational Virtual Assistants

Backend Implementation

From Answer to Display NLP Query to Database Query

slide-23
SLIDE 23

Why implement natural language interaction?

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

slide-24
SLIDE 24

Why implement natural language interaction?

  • “Natural language interaction allows users to ask questions directly in

complex programs without having to learn how to use an interface.” (Gao et al.)

  • Users of sophisticated visual analytic tools are “... usually domain experts

with marginal knowledge of visualization techniques.” (Sun et al.)

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

slide-25
SLIDE 25

Types of current natural language interfaces

Those that answer questions about existing visualizations

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

Those that create a new visualization

1 2

slide-26
SLIDE 26

1) Answering questions about existing visualizations

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

Kim et al. Answering Questions about Charts and Generating Visual Explanations

slide-27
SLIDE 27

2) Creating new visualizations

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

slide-28
SLIDE 28

2) Creating new visualizations

  • Commercial

○ IBM ○ Microsoft ○ Wolfram Alpha

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

slide-29
SLIDE 29

2) Creating new visualizations

  • Commercial

○ IBM ○ Microsoft ○ Wolfram Alpha

  • Research Projects

○ Articulate ○ DataTone

NATURAL LANGUAGE INTERFACES FOR VISUALIZATION

slide-30
SLIDE 30

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization

Outline

From Answer to Display Application in Conversational Virtual Assistants

Backend Implementation

NLP Query to Database Query

slide-31
SLIDE 31

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization

Outline

From Answer to Display Application in Conversational Virtual Assistants

Backend Implementation

NLP Query to Database Query

slide-32
SLIDE 32

Motivation

  • Current tools created for data analysts,

not the general, curious public

  • “Amplify cognition”

(Card, Mackinlay, and Shneiderman 1999)

  • Provide context to numerical responses to

increase comprehension

  • Encourage curiosity and “see what else is

there” (Anscombe 1973)

APPLICATION IN CONVERSATIONAL VIRTUAL ASSISTANTS Understanding from single verbal response

Understanding from single verbal response + chart

slide-33
SLIDE 33

Challenges

  • Ambiguity

○ What is the user asking for specifically?

  • Inferring when to include a chart in the response

○ When does a user benefit from viewing a chart?

  • Determining what to display

○ What kind of additional data should be displayed? ○ What kind of chart is most effective?

  • CUI vs GUI

APPLICATION IN CONVERSATIONAL VIRTUAL ASSISTANTS

? ?

slide-34
SLIDE 34

Question #1

Besides standard graphs, what other types of visualizations do you think would be helpful to integrate into a virtual assistant? For what domain(s)?

slide-35
SLIDE 35

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization

Outline

From Answer to Display Application in Conversational Virtual Assistants

Backend Implementation

NLP Query to Database Query

slide-36
SLIDE 36

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization Application in Conversational Virtual Assistants

Outline

From Answer to Display

Backend Implementation

NLP Query to Database Query

slide-37
SLIDE 37

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization Application in Conversational Virtual Assistants

Outline

From Answer to Display

Backend Implementation

NLP Query to Database Query

slide-38
SLIDE 38

DataTone System Architecture

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

slide-39
SLIDE 39

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Client Side: Web-based interface that

  • perates in standard web browsers
slide-40
SLIDE 40

Client Side Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

slide-41
SLIDE 41

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Server Side: handles translation of user input to a visualization

slide-42
SLIDE 42

Query Analyzer

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

slide-43
SLIDE 43

Query Analyzer

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

slide-44
SLIDE 44

Tokenization

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

  • Identify low-level language features (words and phrases) that have meaning

within the context of the dataset and analysis tasks

○ Example: words that identify column names

1. Construct set of possible phrases

○ Extract all n-grams, ranging from 1 (single words) to k, the sentence length ○ Example: This is a sentence. => {this, is, a, sentence, this is, is a, a sentence, this is a, is a sentence, this is a sentence }

2. Identify n-grams with relevance to dataset/query

○ comparing each n-gram to a set of regular expressions and a lexicon consisting of general phrases ○ tag each matched n-gram with one of eight category labels

slide-45
SLIDE 45

Category Labels

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

1. database attributes (i.e., column names) 2. database cell values 3. numerical values 4. time expressions 5. data operators and functions (greater than, less than, equal, sum, average, sort) 6. visualization key phrases (trend, correlation, relationship, distribution, time series, bars, stacked bars, line graph) 7. boolean operators (e.g., and, or), 8. “direct manipulation” terms (e.g., color)

slide-46
SLIDE 46

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

slide-47
SLIDE 47

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

Numerical Values

slide-48
SLIDE 48

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

Numerical Value Time

slide-49
SLIDE 49

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

Numerical Value Time DB Value

slide-50
SLIDE 50

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

Numerical Value Time DB Value Database Attribute

slide-51
SLIDE 51

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

Operator

slide-52
SLIDE 52

Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Query: What is the relationship between unemployment and family income for those families earning more than 20000 and less than 150000 between 2007 and 2010 for California and Michigan? 1. Break into N-grams 2. Identify relevant N-grams by matching to categories

Visual Keyword Boolean Operator

slide-53
SLIDE 53

Relation Identification

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

  • We now have a set of tokens with category tags
  • We need to define relationships between these tokens in order to construct a

query

Stanford Core NLP Parser Natural Language Query Grammatical relationships Manually constructed set

  • f patterns

Query filters

slide-54
SLIDE 54

Relation Identification Example

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

Stanford Core NLP Parser

“Show me the states that had total sales greater than than 20000.”

  • “total sales”: noun

phrase

  • “greater than 20000”:

adjective phrase

  • NP and ADJP are

siblings of a sentence

Manually constructed set

  • f patterns

Apply:

  • SUM to Sales
  • the operator “>” to 20000
  • generate a filter SUM(Sales) >

20000.

slide-55
SLIDE 55

Natural Language Parse ⇒ Data Specification (DSP)

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

  • DSPs contain:

○ Attributes: all column names in the original query (ie unemployment, family income) ○ Values: all strings, numbers, times (ie California, Michigan, 20000, 150000) ○ Filters: as explained in the relation identification ○ Aggregates: “Show me average medal count by country per year” → AVG(MedalCount). ○ Order: “show me the sorted medal count by country from largest to smallest” →

  • rderBy(MedalCount, DESC)
  • Generate one database query for each DSP
slide-56
SLIDE 56

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization Application in Conversational Virtual Assistants

Outline

From Answer to Display

Backend Implementation

NLP Query to Database Query

slide-57
SLIDE 57

Visualization + Natural Language

The Purpose of Visualization Natural Language Interfaces for Visualization Application in Conversational Virtual Assistants

Backend Implementation

From Answer to Display

Outline

NLP Query to Database Query

slide-58
SLIDE 58

Query Analyzer

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

slide-59
SLIDE 59

Visual Specification (VSP)

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

From Answer to Display

  • Template for a graph
  • Each template has constraints on how parameters can be filled

○ supported dimension and data types (categorical, quantitative, or time) for each parameter in the graph

  • Map each DSP to the VSP template that can accept that specific DSP’s

configuration

  • Bar Chart VSP:

○ x-axis: one categorical dimension ○ y-axis: one quantitative measure ○ color: a color encoding (mapping) of one dimension (optional)

  • Given a DSP, there are may be several possible templates
slide-60
SLIDE 60

VSP→ Client → D3.js → Image

Slide content from https://dl.acm.org/doi/pdf/10.1145/2807442.2807478

NLP Query to Database Query

slide-61
SLIDE 61

Question #2

What are some of the category labels for n-grams in this string? What type of graph would you use to represent the answer? What were the trends of COVID-19 deaths in May between New York and California?

1. database attributes (i.e., column names) 2. database cell values 3. numerical values 4. time expressions 5. data operators and functions (greater than, less than, equal, sum, average, sort) 6. visualization key phrases (trend, correlation, relationship, distribution, time series, bars, stacked bars, line graph), 7. boolean operators (e.g., and, or) 8. “direct manipulation” terms (e.g., color)