My Project Matter? 2 2 1 1 1 Ying Wang , Ming Wen, Zhenwei Liu, - - PowerPoint PPT Presentation

my project matter
SMART_READER_LITE
LIVE PREVIEW

My Project Matter? 2 2 1 1 1 Ying Wang , Ming Wen, Zhenwei Liu, - - PowerPoint PPT Presentation

Do the Dependency Conflicts in My Project Matter? 2 2 1 1 1 Ying Wang , Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, 1 1* 2* 1 Bo Yang, Hai Yu, Zhiliang Zhu and Shing-Chi Cheung 1. Northeastern University 2. The Hong Kong University


slide-1
SLIDE 1

Do the Dependency Conflicts in My Project Matter?

Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu and Shing-Chi Cheung

1 1 1 1 1 1* 2 2 2*

1. Northeastern University 2. The Hong Kong University of Science and Technology Ying ng Wang ng 2018-12 12-19 19

slide-2
SLIDE 2

Example xample 1

java –cp a.jar; b.jar …

foo.class foo.class

  • 1-
slide-3
SLIDE 3

Exam xample ple 2

lib1 lib2

  • 2-
slide-4
SLIDE 4

954(42%) 1003(44%) 1457(64%)

Popular Java projects

2289

Projects contain the same library of different versions Projects contain the duplicate classes in different libraries Projects contain both conflicting classes and libraries

Ob Observ servat ations ions

  • 3-
slide-5
SLIDE 5
  • 4-
slide-6
SLIDE 6

Example xample 3

879 days! 176 downstream clients! Maven-shade-plugin

  • 5-
slide-7
SLIDE 7

Motivation ivation

Dependency Conflict (DC) problem is very common in practice. Most building tools do not guarantee loading the most appropriate class for the client project. Building tools do not differentiate benign from harmful (e.g., causing runtime exceptions) DC warnings.

  • 6-
slide-8
SLIDE 8

Ou Our wo r work rk

Empirical study Empirical study

Manifestation Patterns Fixing Patterns

Automated diagnosis Empirical study

Detection Assessing DC severity levels

Evaluation Empirical study

Effectiveness Usefulness

  • 7-
slide-9
SLIDE 9

Emp mpir irical ical st stud udy---

  • --Re

Research search ques estio tions ns

RQ1(Issue manifestation patterns): What are the common manifestations of DC issues? Are there patterns that can be extracted to enable automated detection of these problems? RQ2(Issue fixing patterns): How do developers fix DC issues in practice? Are there factors that affect developers’ choices of different fixing solutions?

  • 8-
slide-10
SLIDE 10

Emp mpir irical ical st stud udy---

  • --Data

Data Co Colle lectio ction

Java open source projects built by Maven from the Apache ecosystem are selected as the subjects for our empirical study, due to the following reasons: Key words: 1) “library”, “dependency” or “compatibility”, etc. 2) “conflict” or “NoSuchMethodError”, etc. 135 DC issues (128 of them have been fixed)

  • 9-
slide-11
SLIDE 11

Emp mpir irical ical st stud udy-RQ1:

RQ1: Iss ssue e man anife ifestatio station n pat atterns erns

  • A. Conflicts in library versions
  • B. Conflicts in classes among libraries
  • C. Conflicts in classes between

host project and libraries

A: 29% B: 67% C: 4%

  • 10-
slide-12
SLIDE 12

Emp mpir irical ical st stud udy-RQ1:

RQ1: Iss ssue e man anife ifestatio station n pat atterns erns

  • A. Conflicts in library versions

(39 out of 135 issues)

⚫ If there are multiple versions of the same library, according to Maven’s nearest wins strategy, Maven chooses the version that appears at the nearest to the root (host project)

  • f

the dependency tree.

If the host project references the features

  • nly

defined in the shadowed library (i.e., Lib2v2.0), a runtime exception will occur. NoClassDefFoundError NoSuchMethodError System Failure

  • 11-
slide-13
SLIDE 13

Emp mpir irical ical st stud udy-RQ1:

RQ1: Iss ssue e man anife ifestatio station n pat atterns erns

  • B. Conflicts in classes among libraries

(90 out of 135 issues)

⚫ Based on the Maven’s first declaration wins strategy, the duplicate classes within the first declared library (i.e., lib2) will shadow the ones included in the others (lib1).

If the host project references the features

  • nly

defined in the shadowed classes (i.e., class A, B, C in Lib1), a runtime exception will occur. NoSuchMethodError System Failure

  • 12-
slide-14
SLIDE 14

Emp mpir irical ical st stud udy-RQ1:

RQ1: Iss ssue e man anife ifestatio station n pat atterns erns

  • C. Conflicts in classes between host project and libraries

(6 out of 135 issues)

⚫ If the host project and its referenced library (i.e., Lib1) include duplicate classes (i.e., A, B and C), then only those included in the library (i.e., Lib1) will be included during the packaging process.

The classes included in library Lib1 shadowed those defined in the host project, which leaded to a runtime failure. NoSuchMethodError System Failure

  • 13-
slide-15
SLIDE 15

Emp mpir irical ical st stud udy-RQ1:

RQ1: Iss ssue e man anife ifestatio station n pat atterns erns

Referenced Loaded

  • 14-
slide-16
SLIDE 16

Emp mpir irical ical st stud udy-RQ2:

RQ2: Iss ssue e fixing xing pat atterns erns

Pattern 1: Shading the conflicting libraries (25 out of 128 solutions)

Maven-Shade-Plugin provides the capability to package the project in an Uber Jar, including its third party libraries. It will also shade (i.e., rename) the packages of some of the libraries.

Pattern 2: Adjusting the classpath order

  • f dependencies (42 out of 128 solutions)

Forcing a particular dependency order on the classpath is a strategy commonly used by developers for fixing DC issues at a relatively low cost.

#HDFS-10570

HDFS

Netty 2.0 Hadoop Netty 2.8 Hdfsproxy

  • 15-
slide-17
SLIDE 17

Emp mpir irical ical st stud udy-RQ2:

RQ2: Iss ssue e fixing xing pat atterns erns

Pattern 1: Shading the conflicting libraries (25 out of 128 solutions)

Maven-Shade-Plugin provides the capability to package the project in an Uber Jar, including its third party libraries. It will also shade (i.e., rename) the packages of some of the libraries.

Pattern 2: Adjusting the classpath order of dependencies (42 out of 128 solutions)

Forcing a particular dependency order on the classpath is a strategy commonly used by developers for fixing DC issues at a relatively low cost.

Pattern 3: Harmonizing library versions (51 out of 128 solutions)

Solutions of this pattern upgrade or downgrade some

  • f

the JARs to resolve the version inconsistencies.

  • 16-
slide-18
SLIDE 18

Emp mpir irical ical st stud udy-RQ2:

RQ2: Iss ssue e fixing xing pat atterns erns

Pattern 1: Shading the conflicting libraries (25 out of 128 solutions)

Maven-Shade-Plugin provides the capability to package the project in an Uber Jar, including its third party libraries. It will also shade (i.e., rename) the packages of some of the libraries.

Pattern 2: Adjusting the classpath order of dependencies (42 out of 128 solutions)

Forcing a particular dependency order on the classpath is a strategy commonly used by developers for fixing DC issues at a relatively low cost.

Pattern 3: Harmonizing library versions (51 out of 128 solutions)

Solutions of this pattern upgrade or downgrade some

  • f

the JARs to resolve the version inconsistencies.

Pattern 4: Classloader customization (5 out of 128 solutions)

This solution uses dynamic module system frameworks such as OSGI and Wildfly, to allow different versions of the same libraries or classes coexist in one project by creating multiple classloaders.

Pattern 5: Other workarounds (5 out of 128 solutions)

The remaining issues are resolved in miscellaneous ways

  • 17-
slide-19
SLIDE 19

De Depe penden ndency cy conflict nflict diag iagno nosis sis

Manifestaion patterns Maintenance efforts

  • n fixing solutions

Detect dependency conflict issues Assess their severity levels

  • 18-
slide-20
SLIDE 20

De Depe penden ndency cy conflict nflict diag iagno nosis sis

Library Dependency Management Script Binary Code File

Lib2 Lib3 v1.0 Lib2 Lib4 Lib5 Lib3 v2.0

Extract library dependency tree

1 2

Loaded Shadowed Referenced

Lib3 v2.0 Lib3 v1.0

Analyze Relations Between Different Feature Set

3

Assessing Warning Severity Levels L1

4

L2 L3 L4 Identify Duplicate Libraries or Classes

Lib2 Lib2 Lib4 Lib5 Lib3 v1.0 Lib3 v2.0

  • 19-
slide-21
SLIDE 21

Eva valuat luation ion

RQ3 (Effectiveness): How effective can Decca detect real DC issues and assess their severity levels? RQ4 (Usefulness): Can Decca detect unknown DC issues in real-world projects and facilitate developers in diagnosing them?

  • 20-
slide-22
SLIDE 22

Eva valuat luation: ion: Effectiveness of Decca

A high quality dataset containing high-severity (i.e., Level 3 and 4) and low-severity (i.e., Level 1 and 2) DC issues.

Subjects:

Assumption: Bugs are usually repaired within 2 years across different projects since they were introduced to the project True Positive (TP) : the conflict identified as a high-severity issue (i.e., Level 3 or Level 4) is a high-severity issue. False Positive (FP) : the conflict identified as a high-severity issue (i.e., Level 3 or Level 4) is a low-severity issue. True Negative (TN) : the conflict identified as a low-severity issue (i.e., Level 1 or Level 2) is a low-severity issue. False Negative (FN) : the conflict identified as a low-severity issue (i.e., Level 1 or Level 2) is a high-severity issue. Precision = TP/(TP + FP) Recall = TP/(TP + FN ) F-measure =2 × Precision × Recall /(Precision + Recall) Precision : 0.923, Recall : 0.766 and F-measure : 0.837

  • 21-
slide-23
SLIDE 23

Eva valuat luation: ion: Usefulness of Decca

ID Project Severity level L1 L2 L3 L4 1 Spark 40 1 2 Beam 17 2 3 Bahir 22 1 1 4 Wicketstuff/Core 16 1 5 Javasoze clue 18 1 6 ActiveMQ Artemis 24 7 Apex Core 34 8 Ignite 7 9 Wicket 2 10 Closure-Compiler 4 1 11 Orientdb 8 1 12 Cm 5 1 13 Brooklyn 20 1 14 CarbonData 25 4 15 Prestodb 16 1 16 Solr 10 1 17 Tomcat exporter 10 2 18 Hadoop Common 16 1 19 Oozie 25 1 20 Accumulo 33 1 21 Eclipse jetty 6 2 22 Parquet 2 1 23 Apex Malhar 34 1 24 Atlas 44 1 1

Decca successfully identified 466 DC issues from 24 projects among all the 30 projects analyzed.

Results:

Bug ID SPARK-23509 BEAM-3690 BAHIR-159 Issue #621 Issue #61

  • Issue #2815

Issue #8111 Issue #1 BROOKLYN-581 CARBONDATA-2169 Issue #29 DATASOLR-447 Issue #8 HADOOP-15261 OOZIE-3185 ACCUMULO-4812 Issue #2232 PARQUET-1236 APEXMALHAR-2556 ATLAS-2437

438 (93.9%) of them are at Level 1, 20 (4.2%) of them are at Level 2, 4 (0.08%) of them are at Level 3, 4 (0.08%) of them are at Level 4.

Bug report

Severity Root cause Fixing suggestions

  • 23-
slide-24
SLIDE 24

Eva valuat luation: ion: Usefulness of Decca

ID Project Severity level L1 L2 L3 L4 1 Spark 40 1 2 Beam 17 2 3 Bahir 22 1 1 4 Wicketstuff/Core 16 1 5 Javasoze clue 18 1 6 ActiveMQ Artemis 24 7 Apex Core 34 8 Ignite 7 9 Wicket 2 10 Closure-Compiler 4 1 11 Orientdb 8 1 12 Cm 5 1 13 Brooklyn 20 1 14 CarbonData 25 4 15 Prestodb 16 1 16 Solr 10 1 17 Tomcat exporter 10 2 18 Hadoop Common 16 1 19 Oozie 25 1 20 Accumulo 33 1 21 Eclipse jetty 6 2 22 Parquet 2 1 23 Apex Malhar 34 1 24 Atlas 44 1 1

Results:

Bug ID SPARK-23509 BEAM-3690 BAHIR-159 Issue #621 Issue #61

  • Issue #2815

Issue #8111 Issue #1 BROOKLYN-581 CARBONDATA-2169 Issue #29 DATASOLR-447 Issue #8 HADOOP-15261 OOZIE-3185 ACCUMULO-4812 Issue #2232 PARQUET-1236 APEXMALHAR-2556 ATLAS-2437

11 bugs (55%) were confirmed by developers as real issues within a few days; 6 out of the 11 confirmed bugs (55%) were quickly fixed using

  • ur suggestions

3 confirmed bugs (30%) are in the process of being fixed 2 confirmed bugs are to be resolved by the developers of upstream third party libraries -24-

slide-25
SLIDE 25

Eva valuat luation: ion: Usefulness of Decca

ID Project Category Revision Size (LOC) Star 1 Spark Big data 8077bb0 130.0k 16262 2 Beam Big data a750128 337.0k 1722 3 Bahir Extension tool 6ea42a8 0.9k 152 4 Wicketstuff/Core Container 5cc41f5 228.5k 314 5 Javasoze clue Command 23c9da4 2.8k 103 6 ActiveMQ Artemis Network server f6c5408 557.8k 271 7 Apex Core Platform 4fb580f 87.0k 277 8 Ignite OSGI 4e86660 2218.4k 1505 9 Wicket Web framework b728c69 352.5k 412 10 Closure-Compiler JS compiler 900251b 427.6k 4005 11 Orientdb Database 56ab1ac 496.3k 3366 12 Cm Web application 9e6f45b 19.k 12 13 Brooklyn Cloud 48dbcc3 276.1k 69 14 CarbonData Big data 9f2884a 127.9k 612 15 Prestodb Big data 89fed3a 0.8k 15 16 Solr Network Server d32048c 31.7k 295 17 Tomcat exporter Exporter 70ac377 0.9k 19 18 Hadoop Common Database 1e85a99 2042.8.k 5883 19 Oozie Big data 9e662c7 198.6k 364 20 Accumulo Database d98843b 563.8k 343 21 Eclipse jetty Debugging b71cd70 375.9k 1868 22 Parquet Big data b82d962 0.9k 550 23 Apex Malhar Big data 0d98d05 243.7k 110 24 Atlas Framework 6770091 123.4k 33

Popular

Subjects:

Java Maven platform

  • 22-
slide-26
SLIDE 26

Eva valuat luation: ion: Feedback from developers

“This seems like a handy report, is the tool you used to identify this error open source? I am curious to give it a try (also for

  • ther stuff).”
  • -----------BEAM-3690

“Related, but not the same: I have tried turning on dependency convergence in the Maven-enforcer-plugin. We need the same for gradle to ensure long-term health and protect from regressions. Maybe the tool that generated this fine-grained conflicts report can also fail the build? That would be nice.”

  • -----------SPARK-23509
  • 25-
slide-27
SLIDE 27

Conc nclusio lusion

First empirical study of DC issues between host project and third-party libraries. Formulation of the dependency conflict problem and its root cause. An automated technique Decca to detect DC issues and assess their severity levels.

  • 26-
slide-28
SLIDE 28

Th Thank ank yo you! u!