Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware - - PowerPoint PPT Presentation

viruses
SMART_READER_LITE
LIVE PREVIEW

Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware - - PowerPoint PPT Presentation

Viruses based on slides by Vitaly Shmatikov and Ninghui Li Malware Malicious code often masquerades as good software or attaches itself to good software Some malicious programs need host programs Trojan horses, logic bombs, viruses


slide-1
SLIDE 1

based on slides by Vitaly Shmatikov and Ninghui Li

Viruses

slide-2
SLIDE 2

Malware

  • Malicious code often masquerades as good software
  • r attaches itself to good software
  • Some malicious programs need host programs
  • Trojan horses, logic bombs, viruses
  • Others can exist and propagate independently
  • Worms, automated viruses
  • There are many infection vectors and propagation

mechanisms

slide-3
SLIDE 3

[Geer]

Remote Vulnerabilities

New vulnerabilities Exploitable targets

slide-4
SLIDE 4

Trojan Horses

  • A trojan horse is malicious code hidden in an

apparently useful host program

  • When the host program is executed, trojan does

something harmful or unwanted

  • User must be tricked into executing the host program
  • In 1995, a program distributed as PKZ300B.EXE looked like a

new version of PKZIP… When executed, it formatted your hard drive.

  • Trojans do not replicate
  • Main difference from worms and viruses, but today many

trojans are spread by virus-like mechanisms

slide-5
SLIDE 5

Viruses

  • Virus propagates by infecting other programs
  • Automatically creates copies of itself, but to propagate, a

human has to run an infected program

– Self-propagating malicious programs are usually called worms

  • Many propagation methods
  • Insert a copy into every executable (.COM, .EXE)
  • Insert a copy into boot sectors of disks

– “Stoned” virus infected PCs booted from infected floppies, stayed in memory and infected every floppy inserted into PC

  • Infect TSR (terminate-and-stay-resident) routines

– By infecting a common OS routine, a virus can always stay in memory and infect all disks, executables, etc.

slide-6
SLIDE 6

Virus Techniques

  • Macro viruses
  • A macro is an executable program embedded in a word

processing document (MS Word) or spreadsheet (Excel)

  • When infected document is opened, virus copies itself into

global macro file and makes itself auto-executing (e.g., gets invoked whenever any document is opened)

  • Stealth techniques
  • Infect OS so that infected files appear normal

– Used by rootkits (we’ll look at them later)

  • Mutate, encrypt parts of code with random key
slide-7
SLIDE 7

Viruses in P2P Networks

  • Millions of users willingly download files
  • KaZaA: 2.5 million users in May 2006
  • Easy to insert an infected file into the network
  • Pretend to be an executable of a popular application

– “Adobe Photoshop 10 full.exe”, “WinZip 8.1.exe”, … – ICQ and Trillian seem to be the most popular names

  • Infected MP3 files are rare
  • Malware can open backdoor, steal confidential information, spread

spam

  • 70% of infected hosts already on DNS spam blacklists

[Shin, Jung, Balakrishnan]

slide-8
SLIDE 8

Prevalence of Viruses in KaZaA

  • 2006 study of 500,000 KaZaA files
  • Look for 364 patterns associated with 71 viruses
  • Up to 22% of all KaZaA files infected
  • 52 different viruses and Trojans
  • Another study found that 44% of all executable files on KaZaA

contain malicious code

  • When searching for “ICQ” or “Trillian”, chances of hitting an

infected file are over 70%

  • Some infected hosts are active for a long time
  • 5% of infected hosts seen in February 2006 were still active in

May 2006 [Shin, Jung, Balakrishnan]

slide-9
SLIDE 9

Propagation via Websites

  • Websites with popular content
  • Games: 60% of websites contain executable content, one-third

contain at least one malicious executable

  • Celebrities, adult content, everything except news
  • Most popular sites with

malicious content (Oct 2005)

  • Large variety of malware
  • But most of the observed programs

are variants of the same few adware applications (e.g., WhenU) [Moshchuk et al.]

slide-10
SLIDE 10

Malicious Functionality

  • Adware
  • Display unwanted pop-up ads
  • Browser hijackers
  • Modify home page, search tools, redirect

URLs

  • Trojan downloaders
  • Download and install

additional malware

  • Dialer (expensive toll numbers)
  • Keylogging

[Moshchuk et al.]

slide-11
SLIDE 11

Drive-By Downloads

  • Website “pushes” malicious executable to user’s browser with

inline Javascript or pop-up window

  • Naïve user may click “Yes” in the dialog box
  • Can also install malicious software automatically by exploiting

bugs in the user’s browser

  • 1.5% of URLs crawled in the Moshchuk et al. study
  • Constant change
  • Many infectious sites exist only for a short time or change

substantially from month to month

  • Many sites behave non-deterministically
slide-12
SLIDE 12

Virus: Buffer’s overflow

  • Used in 1988’s Morris Internet Worm, Still extremely common

today

  • Reference (not recent but still good)
  • Aleph One’s “Smashing The Stack For Fun And Profit” in Phrack

Issue 49 in 1996 popularizes stack buffer overflows

http://insecure.org/stf/smashstack.html

  • Buffer overflows: Attacks and defenses for the vulnerability of

the decade, Cowan et al.

slide-13
SLIDE 13

Buffer Overflow

Two goals: 1. Arrange attack code in program’s address space

  • Inject it: using a string that contains the malicious code

into some buffer (eg stack,heap, static area)

  • It is already there: eg assume that attac code needs

execute “exec(“/bin/sh”)” and there exists code in libc that executes “exec(arg)” then only need to change “arg” with “/bin/sh” to gain shell

  • 2. Get the program to jump to that code with suitable

parameters into registers and memory

  • Buffer overflow: change return address of procedures,
  • exploit function pointers - “void(* foo)()”-
  • Checkpointing based on setjmp/lonhjmp
slide-14
SLIDE 14

Buffer Overflow

  • Attacker needs to know which CPU and OS are running on the

target machine.

  • familiarity with machine code.
  • Know how systems calls are made.
  • The exec() system call.
  • Our examples are for x86 running Linux.
  • Details vary slightly between CPU’s and OS:
  • Stack overflow
  • Shell code
  • Return-to-libc

– Overflow sets ret-addr to address of libc function

  • Off-by-one
  • Overflow function pointers & longjmp buffers
  • Heap overflow
slide-15
SLIDE 15

Stack Frame:

Parameters Return address Stack Frame Pointer Local variables SP Stack Growth When a procedure is called

slide-16
SLIDE 16

What are buffer overflows?

  • Consider the following function:

void func(char *str) { char buf[128]; strcpy(buf, str); do-something(buf); }

  • When the function is invoked the stack looks like:
  • What if *str is 136 bytes long? After strcpy:

str ret-addr sfp buf str *str ret

stack

slide-17
SLIDE 17

Buffer overflow: example

The following example shows how to inject and jump to the attacker’s code at the same time

  • Suppose *str is such that after strcpy stack looks like:
  • When func() exits, the user will be given a shell !!
  • Note: attack code runs in stack.
  • To determine ret guess position of stack when func() is called.

top

  • f

stack *str ret Code for P

Program P: exec( “/bin/sh” )

(exact shell code by Aleph One)

slide-18
SLIDE 18

Some unsafe C lib functions

strcpy (char *dest, const char *src) strcat (char *dest, const char *src) gets (char *s) scanf ( const char *format, … ) printf (conts char *format, … )

slide-19
SLIDE 19

Exploiting buffer overflows

  • Suppose web server calls func() with given

URL.

  • Attacker can create a 200 byte URL to
  • btain shell on web server
  • Some complications for stack overflows:
  • Program P should not contain the ‘\0’ character.
  • Overflow should not crash program before func()

exits.

slide-20
SLIDE 20

Other control hijacking opportunities

Stack smashing attack:

  • Override return address in stack activation record

by overflowing a local buffer variable.

  • Function pointers: (used in attack on PHP 4.0.2)
  • Overflowing buf will override function pointer.
  • Longjmp buffers: longjmp(pos) (used in attack on Perl 5.003)
  • Overflowing buf next to pos overrides value of pos.

Heap

  • r

stack buf[128] FuncPtr

slide-21
SLIDE 21

return-to-libc attack

  • “Bypassing non-executable-stack during exploitation

using return-to-libs” by contex (libc: standard C libr.)

*str ret Code for P

Shell code attack: Program P: exec( “/bin/sh” )

*str ret fake_ret

system() in libc

Return-to-libc attack: “/bin/sh”

slide-22
SLIDE 22

Preventing Buffer Overflow Attacks

  • Static source code analysis
  • Use type safe languages (Java, ML).
  • Use safe library functions
  • Non-executable stack
  • Run time checking: StackGuard
  • Randomization
  • Detection deviation of program behavior
  • Sandboxing
  • Access control … (covered later in course)
slide-23
SLIDE 23

Static source code analysis

  • Statically check source code to detect buffer
  • verflows.
  • Several consulting companies.
  • Main idea: automate the code review process.
  • Several tools exist:
  • Coverity (Engler et al.): Test trust inconsistency.
  • Microsoft program analysis group:

– PREfix: looks for fixed set of bugs (e.g. null ptr ref) – PREfast: local analysis to find idioms for prog errors.

  • Berkeley: Wagner, et al. Test constraint violations.
  • Find lots of bugs, but not all.
slide-24
SLIDE 24

Marking stack as non-execute

  • Basic stack exploit can be prevented by marking

stack segment as non-executable.

  • Support in Windows SP2. Code patches exist for Linux, Solaris.

Problems:

  • Does not defend against `return-to-libc’ exploit.
  • Some apps need executable stack (e.g. LISP interpreters).
  • Does not block more general overflow exploits:

– Overflow on heap, overflow func pointer.

slide-25
SLIDE 25

Run time checking: StackGuard

  • There are many run-time checking techniques …
  • Solutions 1: StackGuard
  • Run time tests for stack integrity.
  • Embed “canaries” in stack frames and verify their integrity

prior to function return

str ret sfp local top

  • f

stack canary str ret sfp local canary Frame 1 Frame 2

slide-26
SLIDE 26

Canary Types

  • Random canary:
  • Choose random string at program startup.
  • Insert canary string into every stack frame.
  • Verify canary before returning from function.
  • To corrupt random canary, attacker must learn current random

string.

  • Terminator canary: Canary = 0, newline, linefeed, EOF
  • String functions will not copy beyond terminator.
  • Hence, attacker cannot use string functions to corrupt stack.

StackGuard implemented as a GCC patch.

  • Program must be recompiled.
  • Minimal performance effects: 8% for Apache.
  • Note: Canaries don’t offer fullproof protection.
  • Some stack smashing attacks can leave canaries untouched.
slide-27
SLIDE 27

Randomization: Motivations.

  • Buffer overflow and return-to-libc exploits need to know the

(virtual) address to which pass control

  • Address of attack code in the buffer
  • Address of a standard kernel library routine
  • Same address is used on many machines
  • Slammer infected 75,000 MS-SQL servers using same code on every

machine

  • Idea: introduce artificial diversity
  • Make stack addresses, addresses of library routines, etc. unpredictable

and different from machine to machine

slide-28
SLIDE 28

Address Space Layout Randomization

  • Arranging the positions of key data areas randomly in a process'

address space.

  • e.g., the base of the executable and position of libraries (libc), heap, and

stack,

  • Effects: for return to libc, needs to know address of the key functions.
  • Attacks:

– Repetitively guess randomized address – Spraying injected attack code

  • Vista has this enabled, software packages available for Linux and
  • ther UNIX variants
slide-29
SLIDE 29

Instruction Set Randomization

  • Instruction Set Randomization (ISR)
  • Each program has a different and secret instruction set
  • Use translator to randomize instructions at load-time
  • Attacker cannot execute its own code.
  • What constitutes instruction set depends on the environment.
  • for binary code, it is CPU instruction
  • for interpreted program, it depends on the interpreter
slide-30
SLIDE 30

Anti-Virus Technologies

  • Simple anti-virus scanners
  • Look for signatures (fragments of known virus code)
  • Heuristics for recognizing code associated with viruses

– Polymorphic viruses often use decryption loops

  • Integrity checking to find modified files

– Record file sizes, checksums, MACs (keyed hashes of contents) – Often used for rootkit detection (we’ll see TripWire later)

  • Generic decryption and emulation
  • Emulate CPU execution for a few hundred instructions, virus will

eventually decrypt, can recognize known body

– Does not work very well against mutating viruses and viruses not located near beginning of infected executable

slide-31
SLIDE 31

Polymorphic Viruses

  • Encrypted viruses: virus consists of a constant decryptor,

followed by the encrypted virus body

  • Relatively easy to detect because decryptor is constant
  • Polymorphic viruses: constantly create new random encryptions of

the same virus body

  • Marburg (Win95), HPS (Win95), Coke (Win32)
  • Virus includes an engine for creating new keys and new encryptions of

the virus body

– Crypto (Win32) decrypts its body by brute-force key search to avoid explicit decryptor code

  • Decryptor can start with millions of NOPs to defeat emulation
slide-32
SLIDE 32

Virus Detection by Emulation

Virus body

Randomly generates a new key and corresponding decryptor code

Mutation A

Decrypt and execute

Mutation C Mutation B To detect an unknown mutation of a known virus , emulate CPU execution of until the current sequence of instruction opcodes matches the known sequence for virus body

slide-33
SLIDE 33

Metamorphic Viruses

  • Obvious next step: mutate the virus body, too!
  • Virus can carry its source code (which deliberately contains some

useless junk) and recompile itself

  • Apparition virus (Win32)
  • Virus first looks for an installed compiler

– Unix machines have C compilers installed by default

  • Virus changes junk in its source and recompiles itself

– New binary mutation looks completely different!

  • Mutation is common in macro and script viruses
  • Macros/scripts are usually interpreted, not compiled
slide-34
SLIDE 34

Mutation / Obfuscation Techniques

  • Goal: prevent analysis of code and signature-based

detection; foil reverse-engineering

  • Insert garbage opcodes and change control structure
  • Different code in each instance
  • Effect of code execution is the same, but difficult to detect

by passive analysis

  • Same code, different register names
  • Regswap (Win32)
  • Same code, different subroutine order
  • BadBoy (DOS), Ghost (Win32)
  • Decrypt virus body instruction by instruction, push instructions
  • n stack, insert and remove jumps, rebuild body on stack
  • Zmorph (Win95)
slide-35
SLIDE 35

Mutation Engines

  • Real Permutating Engine/RPME, ADMutate, etc.
  • Large set of obfuscating techniques
  • Instructions are reordered, branch conditions reversed
  • Jumps and NOPs inserted in random places
  • Garbage opcodes inserted in unreachable code areas
  • Instruction sequences replaced with other instructions that have the

same effect, but different opcodes

– Mutate SUB EAX, EAX into XOR EAX, EAX or PUSH EBP; MOV EBP, ESP into PUSH EBP; PUSH ESP; POP EBP

  • There is no constant, recognizable virus body!
slide-36
SLIDE 36

Example of Zperm Mutation

  • From Szor and Ferrie, “Hunting for

Metamorphic”

slide-37
SLIDE 37

Putting It All Together: Zmist

  • Zmist was designed in 2001 by Russian virus writer Z0mbie of

“Total Zombification” fame

  • New technique: code integration
  • Virus merges itself into the instruction flow of its host
  • “Islands” of code are integrated

into random locations in the host program and linked by jumps

  • When/if virus code is run, it infects

every available portable executable

– Randomly inserted virus entry point may not be reached in a particular execution

slide-38
SLIDE 38

MISTFALL Disassembly Engine

  • To integrate itself into host’s instruction flow, virus must

disassemble and rebuild host binary

  • See overview at http://vx.netlux.org/lib/vzo21.html
  • This is very tricky
  • Addresses are based on offsets, which must be recomputed when

new instructions are inserted

  • Iterative process: rebuild with new addresses, see if branch

destinations changed, then rebuild again

– Requires 32MB of RAM and explicit section names (DATA, CODE, etc.) in the host binary – doesn’t work with every file

slide-39
SLIDE 39

How Hard Is It to Write a Virus?

  • 2268 matches for “virus creation tool” in CA’s

Spyware Information Center

  • Including dozens of poly- and metamorphic engines
  • OverWritting Virus Construction Toolkit
  • "The perfect choice for beginners”
  • Biological Warfare Virus Creation Kit
  • Vbs Worm Generator (for Visual Basic worms)
  • Used to create the Anna Kournikova worm
  • Many others
slide-40
SLIDE 40

Reading Assignment

  • Kaufman 1.12
  • Buffer overflows: Attacks and defenses for the vulnerability
  • f the decade, Cowan et al.

www.ece.cmu.edu/~adrian/630-f04/readings/cowan- vulnerability.pdf

  • Technical (buffer overflow)
  • Aleph One’s “Smashing The Stack For Fun And Profit” in Phrack

Issue 49 in 1996 popularizes stack buffer overflows

http://insecure.org/stf/smashstack.html

  • Advanced:
  • Hunting for metamorphic: (advanced tech. For viruses)

www.symantec.com/avcenter/reference/hunting.for.metamo rphic.pdf