the whole tamale
play

THE WHOLE TAMALE Mostly ASSEMBLY/MEMORY IMAGE # Execution begins - PowerPoint PPT Presentation

THE WHOLE TAMALE Mostly ASSEMBLY/MEMORY IMAGE # Execution begins at address 0 Process-specific data ! .pos 0 irmovq stack, %rsp # Set up stack pointer structures ! rrmovq %rsp, %rbp # initialize the base pointer Different for (e.g.,


  1. THE WHOLE TAMALE Mostly

  2. ASSEMBLY/MEMORY IMAGE # Execution begins at address 0 Process-specific data ! .pos 0 irmovq stack, %rsp # Set up stack pointer structures ! rrmovq %rsp, %rbp # initialize the base pointer Different for (e.g., page tables, ! call main # Execute main program halt # Terminate program each process ! task and mm structs, kernel ! Kernel ! # Array of 4 elements stack) ! .align 8 virtual ! array: .quad 0x000d000d000d memory ! Physical memory ! .quad 0x00c000c000c0 Identical for .quad 0x0b000b000b00 .quad 0xa000a000a000 each process ! Kernel code and data ! main: pushq %rbp rrmovq %rsp, %rbp User stack ! irmovq array,%rdi %rsp irmovq $4,%rsi call sum # sum(array, 2) ret # long sum(long *start, long count) Memory mapped region ! # start in %rdi, count in %rsi sum: for shared libraries ! pushq %rbp rrmovq %rsp, %rbp Process ! irmovq $8,%r8 # Constant 8 virtual ! irmovq $1,%r9 # Constant 1 xorq %rax,%rax # sum = 0 brk memory ! andq %rsi,%rsi # Set CC Runtime heap (via malloc) ! jmp test # Goto test loop: mrmovq (%rdi),%r10 # Get *start addq %r10,%rax # Add to sum Uninitialized data (.bss ) ! addq %r8,%rdi # start++ subq %r9,%rsi # count--. Set CC Initialized data (.data ) ! test: jne loop # Stop when 0 ret # Return Code ( .text ) ! 0x400000 ! # Stack starts here and grows to lower addresses .pos 0xf8 0 ! stack:

  3. ASSEMBLY/MEMORY IMAGE # Execution begins at address 0 Process-specific data ! .pos 0 irmovq stack, %rsp # Set up stack pointer structures ! rrmovq %rsp, %rbp # initialize the base pointer Different for (e.g., page tables, ! call main # Execute main program halt # Terminate program each process ! task and mm structs, kernel ! Kernel ! # Array of 4 elements stack) ! .align 8 virtual ! array: .quad 0x000d000d000d memory ! Physical memory ! .quad 0x00c000c000c0 Identical for .quad 0x0b000b000b00 .quad 0xa000a000a000 each process ! Kernel code and data ! main: pushq %rbp rrmovq %rsp, %rbp User stack ! irmovq array,%rdi %rsp irmovq $4,%rsi call sum # sum(array, 2) ret # long sum(long *start, long count) Memory mapped region ! # start in %rdi, count in %rsi sum: for shared libraries ! pushq %rbp rrmovq %rsp, %rbp Process ! irmovq $8,%r8 # Constant 8 virtual ! irmovq $1,%r9 # Constant 1 xorq %rax,%rax # sum = 0 brk memory ! andq %rsi,%rsi # Set CC Runtime heap (via malloc) ! jmp test # Goto test loop: mrmovq (%rdi),%r10 # Get *start addq %r10,%rax # Add to sum Uninitialized data (.bss ) ! addq %r8,%rdi # start++ subq %r9,%rsi # count--. Set CC Initialized data (.data ) ! test: jne loop # Stop when 0 ret # Return Code ( .text ) ! 0x400000 ! # Stack starts here and grows to lower addresses .pos 0xf8 0 ! stack:

  4. BASIC ARCHITECTURE CPU ! Register file ! ALU ! System bus ! Memory bus ! Main ! I/O ! Bus interface ! memory ! bridge ! I/O bus ! Expansion slots for ! other devices such ! as network adapters ! USB ! Host bus ! Graphics ! ! controller ! adapter ! adapter ! (SCSI/SATA) ! Mouse ! Solid ! Key ! Monitor ! state ! board ! Disk ! disk ! controller ! Disk drive !

  5. BASIC ARCHITECTURE Control Unit Functionality Processor Control Processor State State Signals data Signals IR S Main Input / Z O Memory Output irw pewpcr cr alu cw Data Path rd rw aw rs rr maw mdr mew CPU

  6. DATA PATH 16 pcw SELECT R0 E PC MUX E O0 2 rd SEL S O1 16 E R1 rw I0 O2 E Out irw E I1 pcr O3 I2 R2 E I3 E S E IR R3 2 rr E maw rs E 16 MA A aw 16 E mdw A B 2 E MD alu ALU C Z S O SELE 16 mdr C cw E 16 Main SEL 16 cr E Memory

  7. add r1, r2 # Register Transfer Notation time RTN alu rd rs rw rr aw cw cr pcr pcw maw mdr mdw irw 0: MA ← PC; C ← PC+2 11 00 00 0 0 0 1 0 1 0 1 0 0 0 1: MD ← M[MA]; PC ← C 00 00 00 0 0 0 0 1 0 1 0 0 0 0 2: IR ← MD 00 00 00 0 0 0 0 0 0 0 0 1 0 1 3: A ← R[src] 00 00 01 0 1 1 0 0 0 0 0 0 0 0 4: C ← A+R[r1] 00 00 10 0 1 0 1 0 0 0 0 0 0 0 5: R[r2] ← C 00 10 00 1 0 0 0 1 0 0 0 0 0 0 This is the register transfer notation for a register to register add, where src and dest codes are used to specify the register numbers. These codes would come from the Instruction Register. Note, time step 1 is unusual, because MD is being loaded directly from memory and does not use the signals controlling the data path.

  8. add (r1), r2(r3) # Register Transfer Notation time RTN What are the steps?

  9. ADD (R1), R2(R3) 16 pcw SELECT R0 E PC MUX E O0 2 rd SEL S O1 16 E R1 rw I0 O2 E Out irw E I1 pcr O3 I2 R2 E I3 E S E IR R3 2 rr E maw rs E 16 MA A aw 16 E mdw A B 2 E MD alu ALU C Z S O SELE 16 mdr C cw E 16 Main SEL 16 cr E Memory

  10. add (r1), r2(r3) # Register Transfer Notation time RTN What are the steps? 0: MA ← PC; C ← PC+2 1: MD ← M[MA]; PC ← C 1. Get the instruction 2: IR ← MD

  11. add (r1), r2(r3) # Register Transfer Notation time RTN What are the steps? 0: MA ← PC; C ← PC+2 1: MD ← M[MA]; PC ← C 1. Get the instruction 2: IR ← MD 2. Get value referenced in destination 3: A ← R[r3] 4: C ← A + R[r2] 5: MA ← C 6: MD ← M[MA] 7: A ← MD

  12. add (r1), r2(r3) # Register Transfer Notation time RTN What are the steps? 0: MA ← PC; C ← PC+2 1: MD ← M[MA]; PC ← C 1. Get the instruction 2: IR ← MD 2. Get value referenced in destination 3: A ← R[r3] 4: C ← A + R[r2] 3. Add value referenced in source 5: MA ← C 6: MD ← M[MA] 7: A ← MD 8: MA ← R[r1] 9: MD ← M[MA] 10: C ← A + MD

  13. add (r1), r2(r3) # Register Transfer Notation time RTN What are the steps? 0: MA ← PC; C ← PC+2 1: MD ← M[MA]; PC ← C 1. Get the instruction 2: IR ← MD 2. Get value referenced in destination 3: A ← R[r3] 4: C ← A + R[r2] 3. Add value referenced in source 5: MA ← C 4. Store value to reference in destination 6: MD ← M[MA] 7: A ← MD 8: MA ← R[r1] 9: MD ← M[MA] 10: C ← A + MD 11: MD ← C 12: A ← R[r3] 13: C ← A + R[r2] 14: MA ← C 15: M[MA] ← MD

  14. MEMORY OPERATION Processor package ! int sumarraycols(int a[M][N]) { Core 0 ! Core 3 ! Regs ! Regs ! int i, j, sum = 0; L1 ! L1 ! L1 ! L1 ! for (j = 0; j < N; j++) … ! d-cache ! i-cache ! d-cache ! i-cache ! for (i = 0; i < N; i++) sum += a[i][j]; L2 unified cache ! L2 unified cache ! } CPU ! L3 unified cache ! Register file ! (shared by all cores) ! ALU ! System bus ! Memory bus ! Main ! I/O ! Main memory ! Bus interface ! memory ! bridge ! I/O bus ! Expansion slots for ! Memory is pulled in chunks to other devices such ! as network adapters ! USB ! Host bus ! Graphics ! ! controller ! adapter ! adapter ! increase locality (SCSI/SATA) ! Mouse ! Solid ! Key ! Monitor ! state ! board ! Disk ! disk ! controller ! Disk drive !

  15. BASIC ARCHITECTURE L0: ! Regs ! CPU registers hold words retrieved from Smaller, ! cache memory. ! faster, ! L1 cache ! and ! L1: ! costlier ! (SRAM) ! L1 cache holds cache lines retrieved (per byte) ! from the L2 cache. ! storage ! L2 cache ! L2: ! devices ! (SRAM) ! L2 cache holds cache lines ! retrieved from L3 cache ! L3: ! L3 cache ! (SRAM) ! L3 cache holds cache lines ! retrieved from memory. ! Larger, ! slower, ! L4: ! Main memory ! and ! cheaper ! (DRAM) ! Main memory holds disk ! (per byte) ! blocks retrieved from local ! storage ! disks. ! devices ! Local secondary storage ! L5: ! (local disks) ! Local disks hold files retrieved from disks on remote network servers. ! L6: ! Remote secondary storage ! (distributed file systems, Web servers) !

  16. PRINCIPAL OF LOCALITY Spatial Locality, or the fact that when a given address has been referenced, it is likely that addresses near it will be referenced within a short period of time. Temporal Locality, or the fact that once a particular memory item has been referenced, it is likely that it will be referenced again within a short period of time.

  17. ASSEMBLY/CACHING # Execution begins at address 0 .pos 0 irmovq stack, %rsp # Set up stack pointer rrmovq %rsp, %rbp # initialize the base pointer call main # Execute main program halt # Terminate program # Array of 4 elements Tag Valid Cache .align 8 Memory Bits Memory Main Memory block numbers array: .quad 0x000d000d000d Group #: .quad 0x00c000c000c0 30 1 0 0 256 512 7680 7936 0 .quad 0x0b000b000b00 .quad 0xa000a000a000 9 1 1 1 257 513 2305 7681 7937 1 1 1 2 2 258 514 7681 7938 2 main: pushq %rbp rrmovq %rsp, %rbp irmovq array,%rdi irmovq $4,%rsi 1 1 255 255 511 767 8191 255 call sum # sum(array, 2) ret 0 1 2 9 30 31 Tag #: Tag One # long sum(long *start, long count) field, Cache # start in %rdi, count in %rsi 5 bits line, Memory Address: 5 8 3 sum: 8 bytes pushq %rbp Tag Group Byte rrmovq %rsp, %rbp irmovq $8,%r8 # Constant 8 irmovq $1,%r9 # Constant 1 xorq %rax,%rax # sum = 0 andq %rsi,%rsi # Set CC jmp test # Goto test loop: mrmovq (%rdi),%r10 # Get *start addq %r10,%rax # Add to sum addq %r8,%rdi # start++ subq %r9,%rsi # count--. Set CC test: jne loop # Stop when 0 ret # Return # Stack starts here and grows to lower addresses .pos 0xf8 stack:

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend