based on slides by Vitaly Shmatikov and Ninghui Li
Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware - - PowerPoint PPT Presentation
Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware - - PowerPoint PPT Presentation
Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware Malicious code often masquerades as good software or attaches itself to good software Some malicious programs need host programs Trojan horses, logic bombs, viruses
Malware
- Malicious code often masquerades as good software
- r attaches itself to good software
- Some malicious programs need host programs
- Trojan horses, logic bombs, viruses
- Others can exist and propagate independently
- Worms, automated viruses
- There are many infection vectors and propagation
mechanisms
[Geer]
Remote Vulnerabilities
New vulnerabilities Exploitable targets
Trojan Horses
- A trojan horse is malicious code hidden in an
apparently useful host program
- When the host program is executed, trojan does
something harmful or unwanted
- User must be tricked into executing the host program
- In 1995, a program distributed as PKZ300B.EXE looked like a
new version of PKZIP… When executed, it formatted your hard drive.
- Trojans do not replicate
- Main difference from worms and viruses, but today many
trojans are spread by virus-like mechanisms
Viruses
- Virus propagates by infecting other programs
- Automatically creates copies of itself, but to propagate, a
human has to run an infected program
– Self-propagating malicious programs are usually called worms
- Many propagation methods
- Insert a copy into every executable (.COM, .EXE)
- Insert a copy into boot sectors of disks
– “Stoned” virus infected PCs booted from infected floppies, stayed in memory and infected every floppy inserted into PC
- Infect TSR (terminate-and-stay-resident) routines
– By infecting a common OS routine, a virus can always stay in memory and infect all disks, executables, etc.
Virus Techniques
- Macro viruses
- A macro is an executable program embedded in a word
processing document (MS Word) or spreadsheet (Excel)
- When infected document is opened, virus copies itself into
global macro file and makes itself auto-executing (e.g., gets invoked whenever any document is opened)
- Stealth techniques
- Infect OS so that infected files appear normal
– Used by rootkits (we’ll look at them later)
- Mutate, encrypt parts of code with random key
Viruses in P2P Networks
- Millions of users willingly download files
- KaZaA: 2.5 million users in May 2006
- Easy to insert an infected file into the network
- Pretend to be an executable of a popular application
– “Adobe Photoshop 10 full.exe”, “WinZip 8.1.exe”, … – ICQ and Trillian seem to be the most popular names
- Infected MP3 files are rare
- Malware can open backdoor, steal confidential information, spread
spam
- 70% of infected hosts already on DNS spam blacklists
[Shin, Jung, Balakrishnan]
Prevalence of Viruses in KaZaA
- 2006 study of 500,000 KaZaA files
- Look for 364 patterns associated with 71 viruses
- Up to 22% of all KaZaA files infected
- 52 different viruses and Trojans
- Another study found that 44% of all executable files on KaZaA
contain malicious code
- When searching for “ICQ” or “Trillian”, chances of hitting an
infected file are over 70%
- Some infected hosts are active for a long time
- 5% of infected hosts seen in February 2006 were still active in
May 2006 [Shin, Jung, Balakrishnan]
Propagation via Websites
- Websites with popular content
- Games: 60% of websites contain executable content, one-third
contain at least one malicious executable
- Celebrities, adult content, everything except news
- Most popular sites with
malicious content (Oct 2005)
- Large variety of malware
- But most of the observed programs
are variants of the same few adware applications (e.g., WhenU) [Moshchuk et al.]
Malicious Functionality
- Adware
- Display unwanted pop-up ads
- Browser hijackers
- Modify home page, search tools, redirect
URLs
- Trojan downloaders
- Download and install
additional malware
- Dialer (expensive toll numbers)
- Keylogging
[Moshchuk et al.]
Drive-By Downloads
- Website “pushes” malicious executable to user’s browser with
inline Javascript or pop-up window
- Naïve user may click “Yes” in the dialog box
- Can also install malicious software automatically by exploiting
bugs in the user’s browser
- 1.5% of URLs crawled in the Moshchuk et al. study
- Constant change
- Many infectious sites exist only for a short time or change
substantially from month to month
- Many sites behave non-deterministically
Virus: Buffer’s overflow
- Used in 1988’s Morris Internet Worm, Still extremely common
today
- Reference (not recent but still good)
- Aleph One’s “Smashing The Stack For Fun And Profit” in Phrack
Issue 49 in 1996 popularizes stack buffer overflows
http://insecure.org/stf/smashstack.html
- Buffer overflows: Attacks and defenses for the vulnerability of
the decade, Cowan et al.
Buffer Overflow
Two goals: 1. Arrange attack code in program’s address space
- Inject it: using a string that contains the malicious code
into some buffer (eg stack,heap, static area)
- It is already there: eg assume that attac code needs
execute “exec(“/bin/sh”)” and there exists code in libc that executes “exec(arg)” then only need to change “arg” with “/bin/sh” to gain shell
- 2. Get the program to jump to that code with suitable
parameters into registers and memory
- Buffer overflow: change return address of procedures,
- exploit function pointers - “void(* foo)()”-
- Checkpointing based on setjmp/lonhjmp
Buffer Overflow
- Attacker needs to know which CPU and OS are running on the
target machine.
- familiarity with machine code.
- Know how systems calls are made.
- The exec() system call.
- Our examples are for x86 running Linux.
- Details vary slightly between CPU’s and OS:
- Stack overflow
- Shell code
- Return-to-libc
– Overflow sets ret-addr to address of libc function
- Off-by-one
- Overflow function pointers & longjmp buffers
- Heap overflow
Stack Frame:
Parameters Return address Stack Frame Pointer Local variables SP Stack Growth When a procedure is called
What are buffer overflows?
- Consider the following function:
void func(char *str) { char buf[128]; strcpy(buf, str); do-something(buf); }
- When the function is invoked the stack looks like:
- What if *str is 136 bytes long? After strcpy:
str ret-addr sfp buf str *str ret
stack
Buffer overflow: example
The following example shows how to inject and jump to the attacker’s code at the same time
- Suppose *str is such that after strcpy stack looks like:
- When func() exits, the user will be given a shell !!
- Note: attack code runs in stack.
- To determine ret guess position of stack when func() is called.
top
- f
stack *str ret Code for P
Program P: exec( “/bin/sh” )
(exact shell code by Aleph One)
Some unsafe C lib functions
strcpy (char *dest, const char *src) strcat (char *dest, const char *src) gets (char *s) scanf ( const char *format, … ) printf (conts char *format, … )
Exploiting buffer overflows
- Suppose web server calls func() with given
URL.
- Attacker can create a 200 byte URL to
- btain shell on web server
- Some complications for stack overflows:
- Program P should not contain the ‘\0’ character.
- Overflow should not crash program before func()
exits.
Other control hijacking opportunities
Stack smashing attack:
- Override return address in stack activation record
by overflowing a local buffer variable.
- Function pointers: (used in attack on PHP 4.0.2)
- Overflowing buf will override function pointer.
- Longjmp buffers: longjmp(pos) (used in attack on Perl 5.003)
- Overflowing buf next to pos overrides value of pos.
Heap
- r
stack buf[128] FuncPtr
return-to-libc attack
- “Bypassing non-executable-stack during exploitation
using return-to-libs” by contex (libc: standard C libr.)
*str ret Code for P
Shell code attack: Program P: exec( “/bin/sh” )
*str ret fake_ret
system() in libc
Return-to-libc attack: “/bin/sh”
Preventing Buffer Overflow Attacks
- Static source code analysis
- Use type safe languages (Java, ML).
- Use safe library functions
- Non-executable stack
- Run time checking: StackGuard
- Randomization
- Detection deviation of program behavior
- Sandboxing
- Access control … (covered later in course)
Static source code analysis
- Statically check source code to detect buffer
- verflows.
- Several consulting companies.
- Main idea: automate the code review process.
- Several tools exist:
- Coverity (Engler et al.): Test trust inconsistency.
- Microsoft program analysis group:
– PREfix: looks for fixed set of bugs (e.g. null ptr ref) – PREfast: local analysis to find idioms for prog errors.
- Berkeley: Wagner, et al. Test constraint violations.
- Find lots of bugs, but not all.
Marking stack as non-execute
- Basic stack exploit can be prevented by marking
stack segment as non-executable.
- Support in Windows SP2. Code patches exist for Linux, Solaris.
Problems:
- Does not defend against `return-to-libc’ exploit.
- Some apps need executable stack (e.g. LISP interpreters).
- Does not block more general overflow exploits:
– Overflow on heap, overflow func pointer.
Run time checking: StackGuard
- There are many run-time checking techniques …
- Solutions 1: StackGuard
- Run time tests for stack integrity.
- Embed “canaries” in stack frames and verify their integrity
prior to function return
str ret sfp local top
- f
stack canary str ret sfp local canary Frame 1 Frame 2
Canary Types
- Random canary:
- Choose random string at program startup.
- Insert canary string into every stack frame.
- Verify canary before returning from function.
- To corrupt random canary, attacker must learn current random
string.
- Terminator canary: Canary = 0, newline, linefeed, EOF
- String functions will not copy beyond terminator.
- Hence, attacker cannot use string functions to corrupt stack.
StackGuard implemented as a GCC patch.
- Program must be recompiled.
- Minimal performance effects: 8% for Apache.
- Note: Canaries don’t offer fullproof protection.
- Some stack smashing attacks can leave canaries untouched.
Randomization: Motivations.
- Buffer overflow and return-to-libc exploits need to know the
(virtual) address to which pass control
- Address of attack code in the buffer
- Address of a standard kernel library routine
- Same address is used on many machines
- Slammer infected 75,000 MS-SQL servers using same code on every
machine
- Idea: introduce artificial diversity
- Make stack addresses, addresses of library routines, etc. unpredictable
and different from machine to machine
Address Space Layout Randomization
- Arranging the positions of key data areas randomly in a process'
address space.
- e.g., the base of the executable and position of libraries (libc), heap, and
stack,
- Effects: for return to libc, needs to know address of the key functions.
- Attacks:
– Repetitively guess randomized address – Spraying injected attack code
- Vista has this enabled, software packages available for Linux and
- ther UNIX variants
Instruction Set Randomization
- Instruction Set Randomization (ISR)
- Each program has a different and secret instruction set
- Use translator to randomize instructions at load-time
- Attacker cannot execute its own code.
- What constitutes instruction set depends on the environment.
- for binary code, it is CPU instruction
- for interpreted program, it depends on the interpreter
Anti-Virus Technologies
- Simple anti-virus scanners
- Look for signatures (fragments of known virus code)
- Heuristics for recognizing code associated with viruses
– Polymorphic viruses often use decryption loops
- Integrity checking to find modified files
– Record file sizes, checksums, MACs (keyed hashes of contents) – Often used for rootkit detection (we’ll see TripWire later)
- Generic decryption and emulation
- Emulate CPU execution for a few hundred instructions, virus will
eventually decrypt, can recognize known body
– Does not work very well against mutating viruses and viruses not located near beginning of infected executable
Polymorphic Viruses
- Encrypted viruses: virus consists of a constant decryptor,
followed by the encrypted virus body
- Relatively easy to detect because decryptor is constant
- Polymorphic viruses: constantly create new random encryptions of
the same virus body
- Marburg (Win95), HPS (Win95), Coke (Win32)
- Virus includes an engine for creating new keys and new encryptions of
the virus body
– Crypto (Win32) decrypts its body by brute-force key search to avoid explicit decryptor code
- Decryptor can start with millions of NOPs to defeat emulation
Virus Detection by Emulation
Virus body
Randomly generates a new key and corresponding decryptor code
Mutation A
Decrypt and execute
Mutation C Mutation B To detect an unknown mutation of a known virus , emulate CPU execution of until the current sequence of instruction opcodes matches the known sequence for virus body
Metamorphic Viruses
- Obvious next step: mutate the virus body, too!
- Virus can carry its source code (which deliberately contains some
useless junk) and recompile itself
- Apparition virus (Win32)
- Virus first looks for an installed compiler
– Unix machines have C compilers installed by default
- Virus changes junk in its source and recompiles itself
– New binary mutation looks completely different!
- Mutation is common in macro and script viruses
- Macros/scripts are usually interpreted, not compiled
Mutation / Obfuscation Techniques
- Goal: prevent analysis of code and signature-based
detection; foil reverse-engineering
- Insert garbage opcodes and change control structure
- Different code in each instance
- Effect of code execution is the same, but difficult to detect
by passive analysis
- Same code, different register names
- Regswap (Win32)
- Same code, different subroutine order
- BadBoy (DOS), Ghost (Win32)
- Decrypt virus body instruction by instruction, push instructions
- n stack, insert and remove jumps, rebuild body on stack
- Zmorph (Win95)
Mutation Engines
- Real Permutating Engine/RPME, ADMutate, etc.
- Large set of obfuscating techniques
- Instructions are reordered, branch conditions reversed
- Jumps and NOPs inserted in random places
- Garbage opcodes inserted in unreachable code areas
- Instruction sequences replaced with other instructions that have the
same effect, but different opcodes
– Mutate SUB EAX, EAX into XOR EAX, EAX or PUSH EBP; MOV EBP, ESP into PUSH EBP; PUSH ESP; POP EBP
- There is no constant, recognizable virus body!
Example of Zperm Mutation
- From Szor and Ferrie, “Hunting for
Metamorphic”
Putting It All Together: Zmist
- Zmist was designed in 2001 by Russian virus writer Z0mbie of
“Total Zombification” fame
- New technique: code integration
- Virus merges itself into the instruction flow of its host
- “Islands” of code are integrated
into random locations in the host program and linked by jumps
- When/if virus code is run, it infects
every available portable executable
– Randomly inserted virus entry point may not be reached in a particular execution
MISTFALL Disassembly Engine
- To integrate itself into host’s instruction flow, virus must
disassemble and rebuild host binary
- See overview at http://vx.netlux.org/lib/vzo21.html
- This is very tricky
- Addresses are based on offsets, which must be recomputed when
new instructions are inserted
- Iterative process: rebuild with new addresses, see if branch
destinations changed, then rebuild again
– Requires 32MB of RAM and explicit section names (DATA, CODE, etc.) in the host binary – doesn’t work with every file
How Hard Is It to Write a Virus?
- 2268 matches for “virus creation tool” in CA’s
Spyware Information Center
- Including dozens of poly- and metamorphic engines
- OverWritting Virus Construction Toolkit
- "The perfect choice for beginners”
- Biological Warfare Virus Creation Kit
- Vbs Worm Generator (for Visual Basic worms)
- Used to create the Anna Kournikova worm
- Many others
Reading Assignment
- Kaufman 1.12
- Buffer overflows: Attacks and defenses for the vulnerability
- f the decade, Cowan et al.
www.ece.cmu.edu/~adrian/630-f04/readings/cowan- vulnerability.pdf
- Technical (buffer overflow)
- Aleph One’s “Smashing The Stack For Fun And Profit” in Phrack
Issue 49 in 1996 popularizes stack buffer overflows
http://insecure.org/stf/smashstack.html
- Advanced:
- Hunting for metamorphic: (advanced tech. For viruses)
www.symantec.com/avcenter/reference/hunting.for.metamo rphic.pdf