Exploring the Integration of User Feedback in Automated Testing of - - PowerPoint PPT Presentation

exploring the integration of user feedback in automated
SMART_READER_LITE
LIVE PREVIEW

Exploring the Integration of User Feedback in Automated Testing of - - PowerPoint PPT Presentation

Exploring the Integration of User Feedback in Automated Testing of Android Applications G. Grano, A. Ciurumelea, S. Panichella, F. Palomba, H. Gall SANER 2018, 20-23 March, Campobasso (Italy) grano@ifi.uzh.ch giograno90 149 billions of


slide-1
SLIDE 1

Exploring the Integration of User Feedback in Automated Testing of Android Applications

  • G. Grano, A. Ciurumelea, S. Panichella, F. Palomba, H.

Gall SANER 2018, 20-23 March, Campobasso (Italy)

grano@ifi.uzh.ch giograno90
slide-2
SLIDE 2

149 billions of apps

12 millions of devs

60 billions

slide-3
SLIDE 3

Competition

Satisfaction

Quality

slide-4
SLIDE 4

Testing tools

Plethora of Android testing tools: > Monkey: state of the practice > Sapienz: now in Facebook > Dynodroid > ... > and a lot of others! 4 — Giovanni Grano @ s.e.a.l.
slide-5
SLIDE 5

Limitations

They are not suited for generating inputs that require human intelligence Redundancy of generated input sequences

5 — Giovanni Grano @ s.e.a.l.
slide-6
SLIDE 6

Tools behavior

  • 1. Stack Trace
  • 2. Sequence of inputs
slide-7
SLIDE 7

Stack Trace

// CRASH: com.danvelazco.fbwrapper (pid 4302) // Short Msg: java.lang.NullPointerException // Long Msg: java.lang.NullPointerException // Build Label: samsung/espressowifixx/espressowifi:4.2.2/JDQ39/P3110XXDMH1:user/release-keys // Build Changelist: 8291 // Build Time: 1419156873000 // java.lang.NullPointerException // at com.danvelazco.fbwrapper.activity.BaseFacebookWebViewActivity .onKeyDown(BaseFacebookWebViewActivity.java:649) // at com.danvelazco.fbwrapper.FbWrapper.onKeyDown(FbWrapper.java:429) // at android.view.KeyEvent.dispatch(KeyEvent.java:2640) // at android.app.Activity.dispatchKeyEvent(Activity.java:2433) // at com.android.internal.policy.impl.PhoneWindow$DecorView.dispatchKeyEvent(PhoneWindow.java:2021) // at android.view.ViewRootImpl$ViewPostImeInputStage.processKeyEvent(ViewRootImpl.java:3845) // at android.view.ViewRootImpl$ViewPostImeInputStage.onProcess(ViewRootImpl.java:3819) // at android.view.ViewRootImpl$InputStage.deliver(ViewRootImpl.java:3392) // at android.view.ViewRootImpl$InputStage.onDeliverToNext(ViewRootImpl.java:3442) // at android.view.ViewRootImpl$InputStage.forward(ViewRootImpl.java:3411) // at android.view.ViewRootImpl$AsyncInputStage.forward(ViewRootImpl.java:3518) 7 — Giovanni Grano @ s.e.a.l.
slide-8
SLIDE 8

Sequence of Inputs

type= raw events count= -1 speed= 1.0 start data >> LaunchActivity(com.ringdroid,com.ringdroid.RingdroidSelectActivity) DispatchKey(223989,223989,0,23,0,0,-1,0) DispatchKey(224204,224204,1,23,0,0,-1,0) DispatchPointer(224346,224347,0,479.0,774.0,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224351,2,479.60635,797.5855,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224353,2,482.31937,814.9475,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224357,2,483.44247,829.02045,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224359,2,486.9434,848.0035,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224361,2,490.1806,859.495,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224364,2,497.59595,872.6837,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224367,2,500.53647,894.2986,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224346,224369,1,503.94815,896.686,0.0,0.0,0,1.0,1.0,0,0) DispatchPointer(224374,224374,0,166.0,4.0,0.0,0.0,0,1.0,1.0,0,0) 8 — Giovanni Grano @ s.e.a.l.
slide-9
SLIDE 9

Can we make it easier?

slide-10
SLIDE 10

History of success

> release planning 1 2 > change localization 3 2 > user feedback categorization 4

4 Panichella et al - How can i improve my app? classifying user reviews for software maintenance and evolution 2 Ciurumelea et al - Analyzing reviews and code of mobile apps for better release planning 3 Palomba et al - Recommending and localizing change requests for mobile apps based on user reviews 1 Villaroel et al - Release planning of mobile apps based on user reviews 10 — Giovanni Grano @ s.e.a.l.
slide-11
SLIDE 11

Concrete Example

slide-12
SLIDE 12

A Stack Trace

Long Msg: java.lang.NumberFormatException: Invalid int: "/" java.lang.RuntimeException: An error occurred while executing doInBackground() at android.os.AsyncTask$3.done(AsyncTask.java:300) at java.util.concurrent.FutureTask.finishCompletion(FutureTask.java:355) ... at com.amaze.filemanager.services.asynctasks.LoadList.doInBackground(LoadList.java:120) at com.amaze.filemanager.services.asynctasks.LoadList.doInBackground(LoadList.java:50) at android.os.AsyncTask$2.call(AsyncTask.java:288) at java.util.concurrent.FutureTask.run(FutureTask.java:237) ... 3 more 12 — Giovanni Grano @ s.e.a.l.
slide-13
SLIDE 13

An User Review

"Love the idea of this app but anytime I leave the page the screen goes completely white and won’t come back until force-stopped. Update: I thought the white screen was because my phone was so outdated but it still does it on my Nexus 6 ...." 13 — Giovanni Grano @ s.e.a.l.
slide-14
SLIDE 14

Underline idea

User reviews might be helpful for: > comprehending the causes behind a failure > easing the debugging phase > discovering errors that tools cannot reveal 14 — Giovanni Grano @ s.e.a.l.
slide-15
SLIDE 15

Research Questions

slide-16
SLIDE 16

> RQ1: What type of user feedback can we leverage to detect bugs and support testing activities of mobile apps? > RQ2: How complementary is user feedback information with respect to the outcomes of automated testing tools? > RQ3: To what extent can we automatically link the crash- related information reported in both user feedback and testing tools?

16 — Giovanni Grano @ s.e.a.l.
slide-17
SLIDE 17 ML 1 2 Data Collection Classification stack traces HLT & LLT user reviews external validator golden set tools 6,600 reviews 8 apps

RQ1: which reviews can we use?

Data collection

> Reviews Crawler for Google Play Store > Manually validated from an external validator > Run our apps against Monkey and Sapienz

Output

> Machine Learning classifier > Two (high and low) level taxonomy 17 — Giovanni Grano @ s.e.a.l.
slide-18
SLIDE 18 Taxonomy Bugs crashes features & UI bugs Feature Requests feature additions feature improvements Usability Resources performance battery Request Information Compatibility & Update Issues

RQ1: Results

Category Precision Recall F1 Score Features & UI Bugs 0.83 0.82 0.83 Crashes 0.91 0.94 0.92 18 — Giovanni Grano @ s.e.a.l.
slide-19
SLIDE 19

We are able to predict with good precision reviews claminig about bugs

slide-20
SLIDE 20 ML 3 Complementarity golden set crash-related external validator stack traces

RQ2: complementarity

We gave to an external inspector: > stack traces > event logs for crashes > crash-related reviews > apk and source > emulator Goal: establish manually validated links between reviews and stack traces 20 — Giovanni Grano @ s.e.a.l.
slide-21
SLIDE 21

RQ2: Results

App Common Only Reviews Only Tools app 1 13.6% 68.2% 18.2% app 2 23.1% 69.2% 7.7% ... ... ... ... Average 16% 62% 22%

21 — Giovanni Grano @ s.e.a.l.
slide-22
SLIDE 22

Testing tools potentially miss several failures experienced by users

slide-23
SLIDE 23 IR 4 Linking crash related stack traces source bag of words bag of words

RQ3: linking

Goal: automatically link stack traces with user reviews

Steps

> Augmenting stack trace with source code information > Preprocessing for both source > 2 bags of word for each source > 3 different IR techniques: Dice, Jaccard, VSM 23 — Giovanni Grano @ s.e.a.l.
slide-24
SLIDE 24

RQ3: results

App Precision Recall F1 Score app 1 67% 57% 62% app 2 62% 68% 65% ... ... ... ... Average 82% 75% 78%

24 — Giovanni Grano @ s.e.a.l.
slide-25
SLIDE 25

good performances in linking crash-related user reviews and stack traces

slide-26
SLIDE 26

Future work

User-oriented testing

> summarization > prioritization > generation 26 — Giovanni Grano @ s.e.a.l.
slide-27
SLIDE 27