Profiling user belief in BI exploration for measuring subjective - - PowerPoint PPT Presentation

profiling user belief in bi exploration for measuring
SMART_READER_LITE
LIVE PREVIEW

Profiling user belief in BI exploration for measuring subjective - - PowerPoint PPT Presentation

Profiling user belief in BI exploration for measuring subjective interestingness Alexandre Chanson, Ben Crulis, Krista Drushku, Nicolas Labroche, Patrick Marcel DOLAP 2019 - 26 March 2019 University of Tours What is Alice best next move? In


slide-1
SLIDE 1

Profiling user belief in BI exploration for measuring subjective interestingness

Alexandre Chanson, Ben Crulis, Krista Drushku, Nicolas Labroche, Patrick Marcel DOLAP 2019 - 26 March 2019

University of Tours

slide-2
SLIDE 2

What is Alice best next move?

In fact, it depends!

1

slide-3
SLIDE 3

A very subjective question?

We would need to “brain dump” analysts

2

slide-4
SLIDE 4

What is subjective interestingness?

  • Objective interestingness
  • user agnostic, based only on data
  • generality, reliability, peculiarity, diversity and conciseness,
  • directly measurable evaluation metrics: support confidence, lift or chi-squared measures

in the case of association rules

  • summaries: compact descriptions of raw data at different concept levels (Geng &

Hamilton)

  • Subjective interestingness
  • characterize the patterns’ surprise and novelty when compared to previous user

knowledge or expected data distribution

  • user adaptive exploration
  • subjective interestingness for explorative data mining

3

slide-5
SLIDE 5

De Bie’s framework

  • a pattern p ≈ restriction of data

space

  • a belief(p) ≈ prior knowledge as a

probability distribution over the pattern space

  • surprise(p) = −log(belief(p))

Interestingness(p) = surprise(p) |p|

4

slide-6
SLIDE 6

How to translate subjective interestingness to BI?

Two main problems:

  • Define the ”pattern”
  • Cell?
  • Query?
  • Query parts?
  • Learn the belief function
  • how to take into account the specificities of BI?
  • how can we decide that two pieces of information are related in BI?
  • do we consider the usage (the query logs)?
  • do we consider the structure (the DB schema)?

5

slide-7
SLIDE 7

Our proposal

slide-8
SLIDE 8

Belief expressed over query parts

Classically, a query part is either:

  • A group by set attribute
  • A measure
  • A selection predicate

6

slide-9
SLIDE 9

Query parts as patterns

Figure 1: Query as a restriction of the data space

7

slide-10
SLIDE 10

Our recipe so far

Figure 2: Caption

what ingredients we want to use ? knowing that the question is then: what is the probability that someone

8

slide-11
SLIDE 11

Random walk for learning the distribution

  • consider a graph where vertices are query parts and edges are relations (precedence,

co-occurrence) between them

  • the user does a random walk over this graph
  • the long term distribution of the user gives a measure of importance of the query

parts

  • it can be computed with a Page Rank
  • or better, by a Topic-Specific Page Rank: a Page Rank where the user’s query parts are

more important than the others

9

slide-12
SLIDE 12

Baking the pie

10

slide-13
SLIDE 13

Experiments

slide-14
SLIDE 14

Our ”Users”

  • Artificial data generated with

CubeLoad [1]

  • mimic prototypical explorations
  • More ”consistent” than real users
  • Less noisy
  • Only 4 profiles

Figure 3: CubeLoad Templates

11

slide-15
SLIDE 15

Protocol of the qualitative experiment

  • determine if there is a belief profile that is representative of each CubeLoad template

12

slide-16
SLIDE 16

Different user different beliefs

13

slide-17
SLIDE 17

Protocol of the quantitative experiment

Introducing a user agnostic recommender in the loop Robustness to logs exploring different regions (of the cube)

14

slide-18
SLIDE 18

Observing a cognitive bubble

Average Hellinger distance values on 10 runs when log files are identical

15

slide-19
SLIDE 19

Conclusions

  • First attempt to model belief in BI
  • Capture potential relations between user knowledge as a graph
  • ⇒ use well-known Page-Rank for estimating probabilities
  • Experiments
  • Different simulated user templates == different beliefs distributions
  • Possible detection of the cognitive bubble phenomena

16

slide-20
SLIDE 20

On-going and Future work

  • What about belief distribution over cell contents?
  • theoretically appealing but computationally painful...
  • (but we’re on it)
  • What about belief evolution along the exploration?
  • Subjective interestingness is a trade-off between surprise and complexity of

description

  • how to measure complexity of description in BI?
  • How to validate a user “brain dump”?
  • Perform a user study based on an improved query recommender system with

interestingness

17

slide-21
SLIDE 21

Long term vision

18

slide-22
SLIDE 22

Questions ?

18

slide-23
SLIDE 23

References i

  • S. Rizzi and E. Gallinucci.

Cubeload: A parametric generator of realistic OLAP workloads. In Advanced Information Systems Engineering - 26th International Conference, CAiSE 2014, Thessaloniki, Greece, June 16-20, 2014. Proceedings, pages 610–624, 2014.