 
              Kharon dataset: Android malware under a microscope N. Kiss J.-F. Lalande EPI CIDRE INSA Centre Val de Loire CentraleSupelec, Inria, Univ. Rennes 1, CNRS, Univ. Orléans, LIFO EA 4022, F-35065 Rennes, France F-18020 Bourges, France M. Leslous, V. Viet Triem Tong EPI CIDRE CentraleSupelec, Inria, Univ. Rennes 1, CNRS, F-35065 Rennes, France Abstract the set of "bad things" should be perfectly understood. We claim here that rigorous experiments have to rely on Background – This study is related to the understand- malware samples totally reversed. ing of Android malware that now populate smartphone’s Building an understandable dataset to be used for dy- markets. Aim – Our main objective is to help other namic analysis is a difficult challenge. Indeed, an au- malware researchers to better understand how malware tomatic methodology for reverse engineering a malware works. Additionally, we aim at supporting the repro- does not exist. First, no mature reverse engineering tool ducibility of experiments analyzing malware samples: has been developed for Android that would be compara- such a collection should improve the comparison of new ble to the ones used for x86 malware. Second, each mal- detection or analysis methods. Methodology – In order ware is different and finding automatically the malicious to achieve these goals, we describe here an Android mal- code by statically analyzing the bytecode is a very diffi- ware collection called Kharon. This collection gives as cult task because this code is mixed up with benign code. much as possible a representation of the diversity of mal- It requires a human expertise to extract relevant parts of ware types. With such a dataset, we manually dissected the code. Finally, most advanced malware now include each malware by reversing their code. We run them in countermeasures to avoid to trigger their malicious be- a controlled and monitored real smartphone in order to havior at first run and in emulated environments. Thus, extract their precise behavior. We also summarized their an additional expertise is required to understand the spe- behavior using a graph representations of the informa- cial events and conditions the malware is awaiting. tion flows induced by an execution. With such a process, Thus, building an understandable malware dataset re- we obtained a precise knowledge of their malicious code quires a huge amount of work. We made this effort for and actions. Results and conclusions – Researchers can evaluating our previous works [1] and we propose here figure out the engineering efforts of malware developers to make our training dataset well documented in order to and understand their programming patterns. Another im- initiate the construction of a reference dataset of Android portant result of this study is that most of malware now malware. Our goal is to build a well documented set of include triggering techniques that delay and hide their malware that researchers can use to conduct reproducible malicious activities. We also think that this collection experiments. This dataset tries to represent most of the can initiate a reference test set for future research works. possible know types of malware that can be found. When choosing a malware for representing a type, we excluded 1 Introduction the malware that are too obfuscated or encrypted to be reversed engineered in a reasonable time. Android malware have become a very active research The contributions of the paper are: subject in the last years. Inevitably, all new proposi- 1. A precise description of the internals of 7 malware tions of detection, analysis, classification or remediation samples i.e. how each malware attacks the oper- of malware must deal with their own evaluation. This ating system, how it interacts with external servers evaluation will rely on a set of "malicious indicators" that and the effects from the user perspective; have to be detected/analyzed/classified as bad and a set of "legitimate indicators" that have to be ignored by the evaluation method. Designing a set of "good things" ap- 2. A graphical view of the induced information flows pears simple but on the contrary, for precise evaluation, when the malware is successfully executed; USENIX Association LASER 2016 • Learning from Authoritative Security Experiment Results 1
Recommend
More recommend