CS241 Computer Organization Spring 2015 Buffer Overflow 4-022015 - - PowerPoint PPT Presentation
CS241 Computer Organization Spring 2015 Buffer Overflow 4-022015 - - PowerPoint PPT Presentation
CS241 Computer Organization Spring 2015 Buffer Overflow 4-022015 Outline Linking & Loading, continued Buffer Overflow Read: CSAPP2: section 3.12: out-of-bounds memory references & buffer overflow K&R:
Linking & Loading, continued Buffer Overflow
Read:
■ CSAPP2: section 3.12: out-of-bounds memory references & buffer overflow ■ K&R: Chapter 5, section 5.11 ■ C Traps & Pitfalls (course website, on-line references)
Quiz today on IA32 (HW4) Quiz Tuesday, April 7th on run-time stack (HW5) Lab#3 BufferLab goes live tomorrow HW#7 due today
HW#6 due: Tuesday, April 7th
Outline
Carnegie Mellon
Linker Symbols
⬛ Global symbols
▪ Symbols defined by module m that can be referenced by other
modules.
▪ E.g.: non-static C functions and non-static global variables.
⬛ External symbols
▪ Global symbols that are referenced by module m but defined by
some other module.
⬛ Local symbols
▪ Symbols that are defined and referenced exclusively by module m. ▪ E.g.: C functions and variables defined with the static attribute. ▪ Local linker symbols are not local program variables
Carnegie Mellon
Resolving Symbols
int buf[2] = {1, 2}; int main() { swap(); return 0; } main.c extern int buf[]; static int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } swap.c Global External External Local Global Linker knows nothing of temp
Carnegie Mellon
Relocating Code and Data
main()
main.o
int *bufp0=&buf[0] swap()
swap.o
int buf[2]={1,2} Headers main() swap() System code int *bufp0=&buf[0] int buf[2]={1,2} System data More system code int *bufp1 System data
Relocatable Object Files Executable Object File
.text .text .data .text .data .text .data .bss
.symtab .debug
.data
Uninitialized data
.bss
System code
Carnegie Mellon
Relocation Info (main)
0000000 <main>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: e8 fc ff ff ff call 7 <main+0x7> 7: R_386_PC32 swap b: 31 c0 xor %eax,%eax d: 89 ec mov %ebp,%esp f: 5d pop %ebp 10: c3 ret Disassembly of section .data: 00000000 <buf>: 0: 01 00 00 00 02 00 00 00 Source: objdump
int buf[2] = {1,2}; int main() { swap(); return 0; } main.c main.o
Carnegie Mellon
Relocation Info (swap, .text)
Disassembly of section .text: 00000000 <swap>: 0: 55 push %ebp 1: 8b 15 00 00 00 00 mov 0x0,%edx 3: R_386_32 bufp0 7: a1 0 00 00 00 mov 0x4,%eax 8: R_386_32 buf c: 89 e5 mov %esp,%ebp e: c7 05 00 00 00 00 04movl $0x4,0x0 15: 00 00 00 10: R_386_32 bufp1 14: R_386_32 buf 18: 89 ec mov %ebp,%esp 1a: 8b 0a mov (%edx),%ecx 1c: 89 02 mov %eax,(%edx) 1e: a1 00 00 00 00 mov 0x0,%eax 1f: R_386_32 bufp1 23: 89 08 mov %ecx,(%eax) 25: 5d pop %ebp 26: c3 ret
extern int buf[]; static int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } swap.c swap.o
Carnegie Mellon
Relocation Info (swap, .data)
Disassembly of section .data: 00000000 <bufp0>: 0: 00 00 00 00 0: R_386_32 buf
extern int buf[]; static int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } swap.c
Carnegie Mellon
Executable After Relocation (.text)
080483b4 <main>: 80483b4: 55 push %ebp 80483b5: 89 e5 mov %esp,%ebp 80483b7: 83 ec 08 sub $0x8,%esp 80483ba: e8 09 00 00 00 call 80483c8 <swap> 80483bf: 31 c0 xor %eax,%eax 80483c1: 89 ec mov %ebp,%esp 80483c3: 5d pop %ebp 80483c4: c3 ret 080483c8 <swap>: 80483c8: 55 push %ebp 80483c9: 8b 15 5c 94 04 08 mov 0x804945c,%edx 80483cf: a1 58 94 04 08 mov 0x8049458,%eax 80483d4: 89 e5 mov %esp,%ebp 80483d6: c7 05 48 95 04 08 58 movl $0x8049458,0x8049548 80483dd: 94 04 08 80483e0: 89 ec mov %ebp,%esp 80483e2: 8b 0a mov (%edx),%ecx 80483e4: 89 02 mov %eax,(%edx) 80483e6: a1 48 95 04 08 mov 0x8049548,%eax 80483eb: 89 08 mov %ecx,(%eax) 80483ed: 5d pop %ebp 80483ee: c3 ret
Carnegie Mellon
Executable After Relocation (.data)
Disassembly of section .data: 08049454 <buf>: 8049454: 01 00 00 00 02 00 00 00 0804945c <bufp0>: 804945c: 54 94 04 08
Carnegie Mellon
Strong and Weak Symbols
⬛ Program symbols are either strong or weak
▪ Strong: procedures and initialized globals ▪ Weak: uninitialized globals
int foo=5; p1() { } int foo; p2() { } p1.c p2.c strong weak strong strong
Carnegie Mellon
Linker’s Symbol Rules
⬛ Rule 1: Multiple strong symbols are not allowed
▪ Each item can be defined only once ▪ Otherwise: Linker error
⬛ Rule 2: Given a strong symbol and multiple weak
symbol, choose the strong symbol
▪ References to the weak symbol resolve to the strong symbol
⬛ Rule 3: If there are multiple weak symbols, pick an
arbitrary one
▪ Can override this with gcc –fno-common
Carnegie Mellon
Linker Puzzles
int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} int x=7; int y=5; p1() {} double x; p2() {} int x=7; p1() {} int x; p2() {} int x; p1() {} p1() {}
Link time error: two strong symbols (p1) References to x will refer to the same uninitialized int. Is this what you really want? Writes to x in p2 might overwrite y! Evil! Writes to x in p2 will overwrite y! Nasty! Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. References to x will refer to the same initialized variable.
Carnegie Mellon
Global Variables
⬛ Avoid if you can ⬛ Otherwise
▪ Use static if you can ▪ Initialize if you define a global variable ▪ Use extern if you use external global variable
Carnegie Mellon
Packaging Commonly Used Functions
⬛ How to package functions commonly used by
programmers?
▪ Math, I/O, memory management, string manipulation, etc.
⬛ Awkward, given the linker framework so far:
▪ Option 1: Put all functions into a single source file
▪ Programmers link big object file into their programs ▪ Space and time inefficient
▪ Option 2: Put each function in a separate source file
▪ Programmers explicitly link appropriate binaries into their
programs
▪ More efficient, but burdensome on the programmer
Carnegie Mellon
Solution: Static Libraries
⬛ Static libraries (.a archive files)
▪ Concatenate related relocatable object files into a single file
with an index (called an archive).
▪ Enhance linker so that it tries to resolve unresolved external
references by looking for the symbols in one or more archives.
▪ If an archive member file resolves reference, link into
executable.
Carnegie Mellon
Creating Static Libraries
Translator atoi.c atoi.o Translator printf.c printf.o libc.a Archiver (ar)
...
Translator random.c random.o
unix> ar rs libc.a \ atoi.o printf.o … random.o
C standard library
⬛ Archiver allows incremental updates ⬛ Recompile function that changes and replace .o file in archive.
Carnegie Mellon
Commonly Used Libraries
libc.a (the C standard library) ▪ 8 MB archive of 900 object files. ▪ I/O, memory allocation, signal handling, string handling, data and time,
random numbers, integer math
libm.a (the C math library) ▪ 1 MB archive of 226 object files. ▪ floating point math (sin, cos, tan, log, exp, sqrt, …)
% ar -t /usr/lib/libc.a | sort … fork.o … fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o … % ar -t /usr/lib/libm.a | sort … e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o …
Carnegie Mellon
Linking with Static Libraries
Translators (cpp, cc1, as) main2.c main2.o libc.a Linker (ld) p2 printf.o and any other modules called by printf.o libvector.a addvec.o Static libraries Relocatable
- bject files
Fully linked executable object file vector.h Archiver (ar) addvec.o multvec.o
Carnegie Mellon
Using Static Libraries
⬛ Linker’s algorithm for resolving external references:
▪ Scan .o files and .a files in the command line order. ▪ During the scan, keep a list of the current unresolved
references.
▪ As each new .o or .a file, obj, is encountered, try to resolve
each unresolved reference in the list against the symbols defined in obj.
▪ If any entries in the unresolved list at end of scan, then error.
⬛ Problem:
▪ Command line order matters! ▪ Moral: put libraries at the end of the command line.
unix> gcc -L. libtest.o -lmine unix> gcc -L. -lmine libtest.o libtest.o: In function `main': libtest.o(.text+0x4): undefined reference to `libfun'
Carnegie Mellon
Loading Executable Object Files
ELF header Program header table (required for executables) .text section .data section .bss section .symtab .debug Section header table (required for relocatables)
Executable Object File
Kernel virtual memory Memory-mapped region for shared libraries Run-time heap (created by malloc) User stack (created at runtime) Unused %esp (stack pointer) Memory invisible to user code brk
0xc0000000 0x08048000 0x40000000
Read/write segment (.data, .bss) Read-only segment (.init, .text, .rodata) Loaded from the executable file .rodata section .line .init section .strtab
Carnegie Mellon
Internet Worm and IM War
⬛ November, 1988
▪ Internet Worm attacks thousands of Internet hosts. ▪ How did it happen?
Carnegie Mellon
Internet Worm and IM War
⬛ November, 1988
▪ Internet Worm attacks thousands of Internet hosts. ▪ How did it happen?
⬛ July, 1999
▪ Microsoft launches MSN Messenger (instant messaging
system).
▪ Messenger clients can access popular AOL Instant Messaging
Service (AIM) servers
AIM AIM AIM MSN MSN
Carnegie Mellon
Internet Worm and IM War (cont.)
⬛ August 1999
▪ Mysteriously, Messenger clients can no longer access AIM servers. ▪ Microsoft and AOL begin the IM war:
▪ AOL changes server to disallow Messenger clients ▪ Microsoft makes changes to clients to defeat AOL changes. ▪ At least 13 such skirmishes.
▪ How did it happen?
⬛ The Internet Worm and AOL/Microsoft War were both
based on stack buffer overflow exploits!
▪ many Unix functions do not check argument sizes. ▪ allows target buffers to overflow.
Carnegie Mellon
String Library Code
⬛ Implementation of Unix function gets()
▪ No way to specify limit on number of characters to read
⬛ Similar problems with other Unix functions
▪ strcpy: Copies string of arbitrary length ▪ scanf, fscanf, sscanf, when given %s conversion
specification
/* Get string from stdin */ char *gets(char *dest) { int c = getchar(); char *p = dest; while (c != EOF && c != '\n') { *p++ = c; c = getchar(); } *p = '\0'; return dest; }
Carnegie Mellon
Vulnerable Buffer Code
int main() { printf("Type a string:"); echo(); return 0; } /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ gets(buf); puts(buf); } unix>./bufdemo Type a string:1234567 1234567 unix>./bufdemo Type a string:12345678 Segmentation Fault unix>./bufdemo Type a string:123456789ABC Segmentation Fault
Carnegie Mellon
Buffer Overflow Disassembly
080484f0 <echo>: 80484f0: 55 push %ebp 80484f1: 89 e5 mov %esp,%ebp 80484f3: 53 push %ebx 80484f4: 8d 5d f8 lea 0xfffffff8(%ebp),%ebx 80484f7: 83 ec 14 sub $0x14,%esp 80484fa: 89 1c 24 mov %ebx,(%esp) 80484fd: e8 ae ff ff ff call 80484b0 <gets> 8048502: 89 1c 24 mov %ebx,(%esp) 8048505: e8 8a fe ff ff call 8048394 <puts@plt> 804850a: 83 c4 14 add $0x14,%esp 804850d: 5b pop %ebx 804850e: c9 leave 804850f: c3 ret 80485f2: e8 f9 fe ff ff call 80484f0 <echo> 80485f7: 8b 5d fc mov 0xfffffffc(%ebp),%ebx 80485fa: c9 leave 80485fb: 31 c0 xor %eax,%eax 80485fd: c3 ret
Carnegie Mellon
Buffer Overflow Stack
echo: pushl %ebp # Save %ebp on stack movl %esp, %ebp pushl %ebx # Save %ebx leal -8(%ebp),%ebx # Compute buf as %ebp-8 subl $20, %esp # Allocate stack space movl %ebx, (%esp) # Push buf on stack call gets # Call gets . . . /* Echo Line */ void echo() { char buf[4]; /* Way too small! */ gets(buf); puts(buf); }
Return Address Saved %ebp %ebp Stack Frame for main Stack Frame for echo [3] [2] [1] [0] buf Before call to gets
Carnegie Mellon
Buffer Overflow Stack Example
unix> gdb bufdemo (gdb) break echo Breakpoint 1 at 0x8048583 (gdb) run Breakpoint 1, 0x8048583 in echo () (gdb) print /x $ebp $1 = 0xffffc638 (gdb) print /x *(unsigned *)$ebp $2 = 0xffffc658 (gdb) print /x *((unsigned *)$ebp + 1) $3 = 0x80485f7
80485f2: call 80484f0 <echo> 80485f7: mov 0xfffffffc(%ebp),%ebx # Return Point 0xffffc638 buf 0xffffc658 Return Address Saved %ebp Stack Frame for main Stack Frame for echo [3] [2] [1] [0] Stack Frame for main Stack Frame for echo xx xx xx xx buf ff ff c6 58 08 04 85 f7 Before call to gets Before call to gets
Carnegie Mellon
Buffer Overflow Example #1
Overflow buf, but no problem
0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx xx xx xx buf ff ff c6 58 08 04 85 f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34 33 32 31 buf ff ff c6 58 08 04 85 f7 00 37 36 35 Before call to gets Input 1234567
Carnegie Mellon
Buffer Overflow Example #2
Base pointer corrupted
0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx xx xx xx buf ff ff c6 58 08 04 85 f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34 33 32 31 buf ff ff c6 00 08 04 85 f7 38 37 36 35 Before call to gets Input 12345678
. . . 804850a: 83 c4 14 add $0x14,%esp # deallocate space 804850d: 5b pop %ebx # restore %ebx 804850e: c9 leave # movl %ebp, %esp; popl %ebp 804850f: c3 ret # Return
Carnegie Mellon
Buffer Overflow Example #3
Return address corrupted
0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo xx xx xx xx buf ff ff c6 58 08 04 85 f7 0xffffc638 0xffffc658 Stack Frame for main Stack Frame for echo 34 33 32 31 buf 43 42 41 39 08 04 85 00 38 37 36 35 Before call to gets Input 123456789ABC
80485f2: call 80484f0 <echo> 80485f7: mov 0xfffffffc(%ebp),%ebx # Return Point
Carnegie Mellon
Malicious Use of Buffer Overflow
⬛ Input string contains byte representation of executable code ⬛ Overwrite return address with address of buffer ⬛ When bar() executes ret, will jump to exploit code
int bar() { char buf[64]; gets(buf); ... return ...; } void foo(){ bar(); ... } return address A Stack after call to gets() B foo stack frame bar stack frame B exploit code pad data written by gets()
Carnegie Mellon
Exploits Based on Buffer Overflows
⬛ Buffer overflow bugs allow remote machines to
execute arbitrary code on victim machines
⬛ Internet worm
▪ Early versions of the finger server (fingerd) used gets() to read
the argument sent by the client:
▪ finger droh@cs.cmu.edu
▪ Worm attacked fingerd server by sending phony argument:
▪ finger “exploit-code padding new-return-
address”
▪ exploit code: executed a root shell on the victim machine
with a direct TCP connection to the attacker.
Carnegie Mellon
Exploits Based on Buffer Overflows
⬛ Buffer overflow bugs allow remote machines to
execute arbitrary code on victim machines
⬛ IM War
▪ AOL exploited existing buffer overflow bug in AIM clients ▪ exploit code: returned 4-byte signature (the bytes at some
location in the AIM client) to server.
▪ When Microsoft changed code to match signature, AOL changed
signature location.
Carnegie Mellon
Date: Wed, 11 Aug 1999 11:30:57 -0700 (PDT) From: Phil Bucking <philbucking@yahoo.com> Subject: AOL exploiting buffer overrun bug in their own software! To: rms@pharlap.com
- Mr. Smith,
I am writing you because I have discovered something that I think you might find interesting because you are an Internet security expert with experience in this area. I have also tried to contact AOL but received no response. I am a developer who has been working on a revolutionary new instant messaging client that should be released later this year. ... It appears that the AIM client has a buffer overrun bug. By itself this might not be the end of the world, as MS surely has had its share. But AOL is now *exploiting their own buffer overrun bug* to help in its efforts to block MS Instant Messenger. .... Since you have significant credibility with the press I hope that you can use this information to help inform people that behind AOL's friendly exterior they are nefariously compromising peoples' security. Sincerely, Phil Bucking Founder, Bucking Consulting philbucking@yahoo.com
It was later determined that this email originated from within Microsoft!
Carnegie Mellon
Code Red Worm
⬛ History
▪ June 18, 2001. Microsoft announces buffer overflow
vulnerability in IIS Internet server
▪ July 19, 2001. over 250,000 machines infected by new virus
in 9 hours
▪ White house must change its IP address. Pentagon shut down
public WWW servers for day
⬛ When We Set Up CS:APP Web Site
▪ Received strings of form
GET /default.ida? NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN....NNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN %u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9 090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003 %u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0" 400 325 "-" "-"
Carnegie Mellon
Code Red Exploit Code
⬛ Starts 100 threads running ⬛ Spread self
▪ Generate random IP addresses & send attack string ▪ Between 1st & 19th of month
⬛ Attack www.whitehouse.gov
▪ Send 98,304 packets; sleep for 4-1/2 hours; repeat
▪ Denial of service attack
▪ Between 21st & 27th of month
⬛ Deface server’s home page
▪ After waiting 2 hours
Carnegie Mellon
Code Red Effects
⬛ Later Version Even More Malicious
▪ Code Red II ▪ As of April, 2002, over 18,000 machines infected ▪ Still spreading
⬛ Paved Way for NIMDA
▪ Variety of propagation methods ▪ One was to exploit vulnerabilities left behind by Code Red II
⬛ ASIDE (security flaws start at home)
▪ .rhosts used by Internet Worm ▪ Attachments used by MyDoom (1 in 6 emails Monday
morning!)
Carnegie Mellon
Avoiding Overflow Vulnerability
⬛ Use library routines that limit string lengths
▪ fgets instead of gets ▪ strncpy instead of strcpy ▪ Don’t use scanf with %s conversion specification
▪ Use fgets to read the string ▪ Or use %ns where n is a suitable integer
/* Echo Line */ void echo() { char buf[4]; /* Way too small! */ fgets(buf, 4, stdin); puts(buf); }
Carnegie Mellon
System-Level Protections
unix> gdb bufdemo (gdb) break echo (gdb) run (gdb) print /x $ebp $1 = 0xffffc638 (gdb) run (gdb) print /x $ebp $2 = 0xffffbb08 (gdb) run (gdb) print /x $ebp $3 = 0xffffc6a8
⬛ Randomized stack offsets
▪ At start of program, allocate random
amount of space on stack
▪ Makes it difficult for hacker to predict
beginning of inserted code
⬛ Nonexecutable code segments
▪ In traditional x86, can mark region of
memory as either “read-only” or “writeable”
▪ Can execute anything readable
▪ Add explicit “execute” permission
Carnegie Mellon
Worms and Viruses
⬛ Worm: A program that
▪ Can run by itself ▪ Can propagate a fully working version of itself to other
computers
⬛ Virus: Code that
▪ Add itself to other programs ▪ Cannot run independently
⬛ Both are (usually) designed to spread among