Automatic User Preferences Elicitation: A Data-Driven Approach Tong - - PowerPoint PPT Presentation

automatic user preferences elicitation a data driven
SMART_READER_LITE
LIVE PREVIEW

Automatic User Preferences Elicitation: A Data-Driven Approach Tong - - PowerPoint PPT Presentation

Automatic User Preferences Elicitation: A Data-Driven Approach Tong Li 1 , Fan Zhang 2 , Dan Wang 1 1 Beijing University of Technology, Beijing, China 2 Institute of Software Chinese Academy of Sciences, China 24 th REFSQ @ Utrecht, The Netherlands


slide-1
SLIDE 1

Automatic User Preferences Elicitation: A Data-Driven Approach

Tong Li1, Fan Zhang2, Dan Wang1

1Beijing University of Technology, Beijing, China 2Institute of Software Chinese Academy of Sciences, China

24th REFSQ @ Utrecht, The Netherlands 22nd, March, 2018

slide-2
SLIDE 2

Outline

  • Background and Motivation
  • Related Work
  • Proposal
  • Evaluation Plan
  • Conclusion and Future Work

2

slide-3
SLIDE 3

Background and Motivation

3

(Start-up) company

Develop a particular type of software application

What features have been developed for this type of applications? What features are most liked/disliked by users Survey existing applications! Look into application reviews!

More than 2 million apps Numerous reviews

slide-4
SLIDE 4

Related Work

  • Research on mining user reviews [Carreo2013,Guzman2014]
  • Mining features from user reviews
  • Sentiment-based preference analysis
  • Research on mining application descriptions[Hariri2013]
  • Clustering-based feature extraction
  • Association rule-based feature recommendation

4

slide-5
SLIDE 5

Proposal

5

  • User

reviews

  • Features

Total comments Positive Negative Feature 1 2000 1500 500 Feature 2 100 100 Feature 3 500 130 370 ... ... ... ...

slide-6
SLIDE 6

Feature Identification

  • A Clustering-Based Method
  • Generate clusters (categories): doc2vec + density-peak
  • A collocation finding algorithm for identifying features
  • Topic Modeling-Based Method

6

slide-7
SLIDE 7

Feature Identification

  • A Clustering-Based Method
  • Generate clusters (categories): doc2vec + density-peak
  • A collocation finding algorithm for identifying features
  • Topic Modeling-Based Method

7

slide-8
SLIDE 8

Associate features with User Reviews

  • word2vec for producing word embedding
  • Train a neural network model
  • quantify and categorize semantic similarities between words

8

slide-9
SLIDE 9

Sentiment Analysis

  • Train a sentiment classifier based on
  • Lexical evidence
  • Syntactic structure
  • Semantic dependency

9

slide-10
SLIDE 10

Evaluation Plan

  • RQ1. To what extent can the topic modelling-based method and

the clustering-based method respectively extract features of a category of software applications from the unstructured descriptions?

  • RQ2. To what extent can the word2vec method associate user

reviews with previously identified features?

  • RQ3. To what extent can our proposal accurately classify

sentiments of user reviews?

  • RQ4. Whether software companies can benefit from our

approach and would like to adopt it?

10

slide-11
SLIDE 11

Evaluation Plan

  • Data collection
  • 5,000+ applications from app store
  • 1,000,000+ user reviews

11

  • Randomly pick up

three categories

  • Manually identify

features as grounded truth

Feature Identification

  • Randomly choose

1000 reviews

  • Manually associate

them with features

Review Association

  • Create a 10,000

reviews training dataset

Sentiment Analysis

slide-12
SLIDE 12

Conclusions and Future work

  • A research preview about a data driven user preference

elicitation approach

  • Methods for filtering useless information from application

descriptions

  • Syntactic templates for feature extraction
  • Effective visualization algorithms

12

slide-13
SLIDE 13

THANK YOU!

Contact: litong@bjut.edu.cn