Dalvik VM Internals Dan Bornstein Google Intro Memory CPU - - PowerPoint PPT Presentation

dalvik vm internals
SMART_READER_LITE
LIVE PREVIEW

Dalvik VM Internals Dan Bornstein Google Intro Memory CPU - - PowerPoint PPT Presentation

Dalvik VM Internals Dan Bornstein Google Intro Memory CPU Advice Conclusion Dalvk, Iceland The Big Picture The Big Picture What is the Dalvik VM? It is a virtual machine to run on a slow CPU with relatively


slide-1
SLIDE 1

Dan Bornstein Google

Dalvik VM Internals

slide-2
SLIDE 2
  • Intro
  • Memory
  • CPU
  • Advice
  • Conclusion
slide-3
SLIDE 3

Dalvík, Iceland

slide-4
SLIDE 4

The Big Picture

slide-5
SLIDE 5

The Big Picture

slide-6
SLIDE 6

It is a virtual machine to…

What is the Dalvik VM?

  • run on a slow CPU
  • with relatively little RAM
  • on an OS without swap space
  • while powered by a battery
slide-7
SLIDE 7

It is a virtual machine to…

What is the Dalvik VM?

  • run on a slow CPU
  • with relatively little RAM
  • on an OS without swap space
  • while powered by a battery
slide-8
SLIDE 8

Memory Efficiency

  • Intro
  • Memory
  • CPU
  • Advice
  • Conclusion
slide-9
SLIDE 9

Problem: Memory Efficiency

  • total system RAM: 64 MB
  • available RAM after low-level startup: 40 MB
  • available RAM after high-level services have started: 20 MB
  • multiple independent mutually-suspicious processes
  • separate address spaces, separate memory
  • large system library: 10 MB jar)
slide-10
SLIDE 10

Problem: Memory Efficiency

  • total system RAM: 64 MB
  • available RAM after low-level startup: 40 MB
  • available RAM after high-level services have started: 20 MB
  • multiple independent mutually-suspicious processes
  • separate address spaces, separate memory
  • large system library: 10 MB
slide-11
SLIDE 11

Dex File Anatomy

header string_ids type_ids proto_ids field_ids method_ids class_defs data

int String[] com.google.Blort … void fn(int) double fn(Object, int) String fn() … "Hello World" "Lcom/google/Blort;" "println" … String.offset Integer.MAX_VALUE … PrintStream.println(…) Collection.size() …

slide-12
SLIDE 12

Dex File Anatomy

header string_ids type_ids proto_ids field_ids method_ids class_defs data

int String[] com.google.Blort … void fn(int) double fn(Object, int) String fn() … "Hello World" "Lcom/google/Blort;" "println" … String.offset Integer.MAX_VALUE … PrintStream.println(…) Collection.size() …

slide-13
SLIDE 13

Dex File Anatomy

header string_ids type_ids proto_ids field_ids method_ids class_defs data

int String[] com.google.Blort … void fn(int) double fn(Object, int) String fn() … "Hello World" "Lcom/google/Blort;" "println" … String.offset Integer.MAX_VALUE … PrintStream.println(…) Collection.size() …

slide-14
SLIDE 14

Dex File Anatomy

header string_ids type_ids proto_ids field_ids method_ids class_defs data

int String[] com.google.Blort … void fn(int) double fn(Object, int) String fn() … "Hello World" "Lcom/google/Blort;" "println" … String.offset Integer.MAX_VALUE … PrintStream.println(…) Collection.size() …

slide-15
SLIDE 15

Dex File Anatomy

.class file heterogeneous constant pool

  • ther data

.jar file .class file heterogeneous constant pool

  • ther data

.class file heterogeneous constant pool

  • ther data

.dex file string_ids constant pool

  • ther data

type_ids constant pool proto_ids constant pool field_ids constant pool method_ids constant pool

slide-16
SLIDE 16

Shared Constant Pool

public interface Zapper { public String zap(String s, Object o); } public class Blort implements Zapper { public String zap(String s, Object o) { ...; } } public class ZapUser { public void useZap(Zapper z) { z.zap(...); } }

slide-17
SLIDE 17

Original .class files

Shared Constant Pool

"(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" "SourceFile" "Zapper.java" class Zapper "(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" method ref "Blort" "<init>" method ref "Blort.java" "()V" "Code" "LineNumberTable" "SourceFile" class Blort "(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" method ref "ZapUser" "<init>" method ref "ZapUser.java" "()V" "Code" "LineNumberTable" "SourceFile" "useZap" "(LZapper;)V" class ZapUser

slide-18
SLIDE 18

Original .class files

Shared Constant Pool

"(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" "SourceFile" "Zapper.java" class Zapper "(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" method ref "Blort" "<init>" method ref "Blort.java" "()V" "Code" "LineNumberTable" "SourceFile" class Blort "(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" method ref "ZapUser" "<init>" method ref "ZapUser.java" "()V" "Code" "LineNumberTable" "SourceFile" "useZap" "(LZapper;)V" class ZapUser

slide-19
SLIDE 19

Original .class files

Shared Constant Pool

"(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" "SourceFile" "Zapper.java" class Zapper "(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" method ref "Blort" "<init>" method ref "Blort.java" "()V" "Code" "LineNumberTable" "SourceFile" class Blort "(Ljava/lang/String;Ljava/lang/ Object;)Ljava/lang/String;" "java/lang/Object" "Zapper" "zap" method ref "ZapUser" "<init>" method ref "ZapUser.java" "()V" "Code" "LineNumberTable" "SourceFile" "useZap" "(LZapper;)V" class ZapUser

slide-20
SLIDE 20

.dex file

Shared Constant Pool

"zap" "<init>" "ZapUser.java" "useZap" "Ljava/lang/Object;" "LZapper;" "LZapUser;" "LBlort;" "V" "Zapper.java" "Blort.java" proto id proto id proto id method id method id method id method id method id method id method id "Ljava/lang/String;"

slide-21
SLIDE 21

Memory is saved via…

Shared Constant Pool

  • minimal repetition
  • per-type pools (implicit typing)
  • implicit labeling
slide-22
SLIDE 22

Size Comparison

common system libraries (U) 21445320 — 100% (J) 10662048 — 50% (D) 10311972 — 48% web browser app (U) 470312 — 100% (J) 232065 — 49% (D) 209248 — 44% alarm clock app (U) 119200 — 100% (J) 61658 — 52% (D) 53020 — 44% (U) uncompressed jar file (J) compressed jar file (D) uncompressed dex file

slide-23
SLIDE 23

4 Kinds Of Memory

  • clean vs. dirty
  • clean: mmap()ed and unwritten
  • dirty: malloc()ed
  • shared vs. private
  • shared: used by many processes
  • private: used by only one process
slide-24
SLIDE 24

4 Kinds Of Memory

  • clean (shared or private)
  • common dex files (libraries)
  • application-specific dex files
  • shared dirty
  • ???
  • private dirty
  • application “live” dex structures
  • application heap
slide-25
SLIDE 25

Enter The Zygote

  • nascent VM process
  • starts at boot time
  • preloads and preinitializes classes
  • fork()s on command
slide-26
SLIDE 26

Enter The Zygote

core library dex files (mmap()ed) "live" core libraries (shared dirty; read-only) Zygote heap (shared dirty, copy-on-write; rarely written) Zygote Home dex file (mmap()ed) Home live code and heap (private dirty) Home shared from Zygote Maps dex file (mmap()ed) Maps live code and heap (private dirty) Maps shared from Zygote Browser dex file (mmap()ed) Browser live code and heap (private dirty) Browser shared from Zygote

slide-27
SLIDE 27

4 Kinds Of Memory

  • clean (shared or private)
  • common dex files (libraries)
  • application-specific dex files
  • shared dirty
  • library “live” dex structures
  • shared copy-on-write heap (mostly not written)
  • private dirty
  • application “live” dex structures
  • application heap
slide-28
SLIDE 28

GC And Sharing

  • bject data

mark bits

  • bject data

mark bits

  • bject data

mark bits . . .

  • bject data

parallel mark bits

  • bject data
  • bject data

. . .

  • bject data

embedded mark bits separated mark bits

slide-29
SLIDE 29

GC And Sharing

  • separate process, separate heaps, separate GCs
  • GCs must be independent
  • GC should respect the sharing!
slide-30
SLIDE 30

Mark bits kept separate from other heap memory.

GC And Sharing

  • avoids un-sharing pages
  • better small cache behavior
  • doesn’t waste memory
  • bject data

parallel mark bits

  • bject data
  • bject data

. . .

  • bject data
slide-31
SLIDE 31

CPU Efficiency

  • Intro
  • Memory
  • CPU
  • Advice
  • Conclusion
slide-32
SLIDE 32

Problem: CPU Efficiency

  • CPU speed: 250-500MHz
  • bus speed: 100MHz
  • data cache: 16-32K
  • available RAM for apps: 20 MB
slide-33
SLIDE 33

No JIT

  • usually doesn’t matter
  • lots of native code
  • system provides libs for graphics, media
  • JNI available
  • hardware support common (graphics, audio)
slide-34
SLIDE 34

Install-Time Work

  • verification
  • dex structures aren’t “lying”
  • valid indices
  • valid offsets
  • code can’t misbehave
slide-35
SLIDE 35

Install-Time Work

  • optimization
  • byte-swapping and padding (unnecessary on ARM)
  • static linking
  • “inlining” special native methods
  • pruning empty methods
  • adding auxiliary data
slide-36
SLIDE 36

Why?

Register Machine

  • avoid instruction dispatch
  • avoid unnecessary memory access
  • consume instruction stream efficiently
  • higher semantic density per instruction
slide-37
SLIDE 37

The stats

Register Machine

  • 30% fewer instructions
  • 35% fewer code units
  • 35% more bytes in the instruction stream
  • but we get to consume two at a time
slide-38
SLIDE 38

Example #1: Source

public static long sumArray(int[] arr) { long sum = 0; for (int i : arr) { sum += i; } return sum; }

slide-39
SLIDE 39

Example #1: .class

0000: lconst_0 0001: lstore_1 0002: aload_0 0003: astore_3 0004: aload_3 0005: arraylength 0006: istore 04 0008: iconst_0 0009: istore 05 000b: iload 05 // rl ws 000d: iload 04 // rl ws 000f: if_icmpge 0024 // rs rs 0012: aload_3 // rl ws 0013: iload 05 // rl ws 0015: iaload // rs rs ws 0016: istore 06 // rs wl 0018: lload_1 // rl rl ws ws 0019: iload 06 // rl ws 001b: i2l // rs ws ws 001c: ladd // rs rs rs rs ws ws 001d: lstore_1 // rs rs wl wl 001e: iinc 05, #+01 // rl wl 0021: goto 000b 0024: lload_1 0025: lreturn

read local write local read stack write stack

  • 25 bytes
  • 14 dispatches
  • 45 reads
  • 16 writes
slide-40
SLIDE 40

Example #1: .dex

0000: const-wide/16 v0, #long 0 0002: array-length v2, v8 0003: const/4 v3, #int 0 0004: move v7, v3 0005: move-wide v3, v0 0006: move v0, v7 0007: if-ge v0, v2, 0010 // r r 0009: aget v1, v8, v0 // r r w 000b: int-to-long v5, v1 // r w w 000c: add-long/2addr v3, v5 // r r r r w w 000d: add-int/lit8 v0, v0, #int 1 // r w 000f: goto 0007 0010: return-wide v3

  • 18 bytes
  • 6 dispatches
  • 19 reads
  • 6 writes
slide-41
SLIDE 41

Example #2: Source

private static final int[] S33KR1T_1NF0RM4T10N = { 0x4920616d, 0x20726174, 0x68657220, 0x666f6e64, 0x206f6620, 0x6d756666, 0x696e732e };

slide-42
SLIDE 42

Example #2: .class

0000: bipush #+07 0002: newarray int 0004: dup 0005: iconst_0 0006: ldc #+4920616d 0008: iastore 0009: dup 000a: iconst_1 000b: ldc #+20726174 000d: iastore 000e: dup 000f: iconst_2 0010: ldc #+68657220 0012: iastore 0013: dup 0014: iconst_3 0015: ldc #+666f6e64 0017: iastore 0018: dup 0019: iconst_4 001a: ldc #+206f6620 001c: iastore 001d: dup 001e: iconst_5 001f: ldc #+6d756666 0021: iastore 0022: dup 0023: bipush #+06 0025: ldc #+696e732e 0027: iastore 0028: putstatic Example2.S33KR1T_1NF0RM4T10N:[I 002b: return

... dup bipush #+NN ldc #VVVVVVVV iastore ...

slide-43
SLIDE 43

Example #2: Hack!

private static final int[] S33KR1T_1NF0RM4T10N; static { String s = "\u4920\u616d\u2072\u6174\u6865" + "\u7270\u666f\u6e64\u206f\u6620" + "\u6d75\u6666\u696e\u732e"; S33KR1T_1NF0RM4T10N = new int[7]; for (int i = 0, j = 0; i < 7; i++, j += 2) { S33KR1T_1NF0RM4T10N[i] = (s.charAt(j) << 16) | s.charAt(j+1); } }

slide-44
SLIDE 44

Example #2: .dex

0000: const/4 v0, #int 7 // #7 0001: new-array v0, v0, int[] 0003: fill-array-data v0, 000a 0006: sput-object v0, Example2.S33KR1T_1NF0RM4T10N:int[] 0008: return-void 0009: nop // spacer 000a: array-data // for fill-array-data @ 0003 0: 1226858861 // #4920616d 1: 544366964 // #20726174 2: 1751478816 // #68657220 3: 1718578788 // #666f6e64 4: 544171552 // #206f6620 5: 1836410470 // #6d756666 6: 1768846126 // #696e732e 0026:

slide-45
SLIDE 45

Example #2: .dex

0000: const/4 v0, #int 7 // #7 0001: new-array v0, v0, int[] 0003: fill-array-data v0, 000a 0006: sput-object v0, Example2.S33KR1T_1NF0RM4T10N:int[] 0008: return-void 0009: nop // spacer 000a: array-data // for fill-array-data @ 0003 0: 1315272293 // #4e657665 1: 1914726255 // #7220676f 2: 1852727584 // #6e6e6120 3: 1734964837 // #67697665 4: 544829301 // #20796f75 5: 544567355 // #2075703b 6: 544105846 // #206e6576 7: 1701978215 // #65722067 8: 1869508193 // #6f6e6e61 9: 543974772 // #206c6574 10: 544829301 // #20796f75 11: 543453047 // #20646f77 12: 1848520238 // #6e2e2e2e 003e:

slide-46
SLIDE 46

The portable way

Interpreters 101

static void interp(const char* s) { for (;;) { switch (*(s++)) { case 'a': printf("Hell"); break; case 'b': printf("o"); break; case 'c': printf(" w"); break; case 'd': printf("rld!\n"); break; case 'e': return; } } } int main(int argc, char** argv) { interp("abcbde"); }

slide-47
SLIDE 47

The gcc way

Interpreters 101

#define DISPATCH() \ { goto *op_table[*((s)++) - 'a']; } static void interp(const char* s) { static void* op_table[] = { &&op_a, &&op_b, &&op_c, &&op_d, &&op_e }; DISPATCH();

  • p_a: printf("Hell"); DISPATCH();
  • p_b: printf("o"); DISPATCH();
  • p_c: printf(" w"); DISPATCH();
  • p_d: printf("rld!\n"); DISPATCH();
  • p_e: return;

}

slide-48
SLIDE 48

ARM assembly

Interpreters 101

  • p_table:

.word op_a .word op_b ... #define DISPATCH() ldrb r0, [rPC], #1 \ ldr pc, [rOP_TABLE, r0, lsl #2]

  • p_a: ...

DISPATCH()

  • p_b: ...

DISPATCH() ...

Two memory reads

slide-49
SLIDE 49

ARM assembly (cleverer)

Interpreters 101

#define DISPATCH() ldrb r0, [rPC], #1 \ add pc, rFIRST_OP, r0, lsl #6 .align 64

  • p_a: // address gets stored in rFIRST_OP

... up to 16 instructions ...

  • p_b:

... up to 16 instructions ...

  • p_c:

... up to 16 instructions ... ...

One memory read

slide-50
SLIDE 50

Optimizing Your Code

  • Intro
  • Memory
  • CPU
  • Advice
  • Conclusion
slide-51
SLIDE 51

Time Scale

  • human interaction scale
  • 10-30 interactions / sec
  • human perception scale
  • 25-30 image frames / sec
  • continuous audio, synched within 100 msec
  • computer scale
  • run as much and as fast as possible
slide-52
SLIDE 52

A well-behaved app…

Get Plenty Of Rest

  • spends most of its time sleeping
  • reacts quickly and decisively to user and network input
slide-53
SLIDE 53

Loop Wisely

(1) for (int i = initializer; i >= 0; i--) (2) int limit = calculate limit; for (int i = 0; i < limit; i++) (3) Type[] array = get array; for (Type obj : array) (4) for (int i = 0; i < array.length; i++) (5) for (int i = 0; i < this.var; i++) (6) for (int i = 0; i < obj.size(); i++) (7) Iterable<Type> list = get list; for (Type obj : list)

slide-54
SLIDE 54

Loop Wisely

(1) for (int i = initializer; i >= 0; i--) (2) int limit = calculate limit; for (int i = 0; i < limit; i++) (3) Type[] array = get array; for (Type obj : array) (4) for (int i = 0; i < array.length; i++) (5) for (int i = 0; i < this.var; i++) (6) for (int i = 0; i < obj.size(); i++) (7) Iterable<Type> list = get list; for (Type obj : list)

slide-55
SLIDE 55

Loop Wisely

(1) for (int i = initializer; i >= 0; i--) (2) int limit = calculate limit; for (int i = 0; i < limit; i++) (3) Type[] array = get array; for (Type obj : array) (4) for (int i = 0; i < array.length; i++) (5) for (int i = 0; i < this.var; i++) (6) for (int i = 0; i < obj.size(); i++) (7) Iterable<Type> list = get list; for (Type obj : list)

Danger! Danger! Danger! Danger!

slide-56
SLIDE 56

Avoid Allocation

  • short-lived objects need to be GCed
  • long-lived objects take precious memory
slide-57
SLIDE 57

That’s all!

  • Intro
  • Memory
  • CPU
  • Advice
  • Conclusion
slide-58
SLIDE 58

Questions?

?