C-Style Strings CS2253 Owen Kaser, UNBSJ Strings In C and some - - PowerPoint PPT Presentation

c style strings
SMART_READER_LITE
LIVE PREVIEW

C-Style Strings CS2253 Owen Kaser, UNBSJ Strings In C and some - - PowerPoint PPT Presentation

C-Style Strings CS2253 Owen Kaser, UNBSJ Strings In C and some other low-level languages, strings are just consecutive memory locations that contain characters. A special null character (ASCII code 0) terminates the string.


slide-1
SLIDE 1

C-Style Strings

CS2253 Owen Kaser, UNBSJ

slide-2
SLIDE 2

Strings

  • In C and some other low-level languages,

strings are just consecutive memory locations that contain characters. A special “null character” (ASCII code 0) terminates the string.

  • Common string-processing library routines are

good source of assembly-language examples.

slide-3
SLIDE 3

Making a Constant String

  • (Review) Use DCB and don't forget the null

character terminator

  • mystring dcb “hello”,0
slide-4
SLIDE 4

A String Local Variable

  • Suppose you know you need a string local variable. If

you know the maximum length you could possibly need (say 50 characters), proceed as follows....

  • mySubroutine

STMFD SP!, {some regs, LR} SUB SP, SP, #52 ;maintain SP alignment MOV R0, #0 ; null character STRB R0, [SP] ; terminate string (show picture) … use space from SP to SP+51 for your string.. ADD SP, SP, #52 ; pop off space used by string LDMFD SP!, {some regs, PC}

slide-5
SLIDE 5

Stack Smashing

  • Q: What if someone is allowed to put a 56-byte string into

your 52 byte area?

  • A: You affect the things in the memory addresses above your

string.

  • The last thing pushed by the STMFD was the return address.

So you have a wrong return address.

  • A cracker can write some nasty machine code program as the

56-byte “string” and arrange for you to return to her program.

  • Moral: String locals need to be very carefully checked to see

that they are not too long.

  • Some modern CPUs will mark the stack region of memory as

“nonexecutable” to help. You can still be forced to return to an arbitrary location in the existing program, may be good enough for cracker.

slide-6
SLIDE 6

Returning a String

  • Suppose your subroutine is supposed to return a

string.

  • You can just return the memory address of somewhere

in memory that holds the characters of your string. (In C terminology, you return a pointer to your characters.)

  • But that somewhere needs to be “safe” - not subject to

arbitrary destruction.

  • Any stack location below the top of the stack is not

safe.

slide-7
SLIDE 7

Bad Scenario

  • main subroutine calls foo
  • foo has a local string variable, v, that it puts

some lovely string into.

  • foo returns the address of v to main
  • main turns around and calls bar
  • bar returns. main tries to use the lovely string.

Unhappiness results.

slide-8
SLIDE 8

Bad Scenario, picture 1

slide-9
SLIDE 9

Bad Scenario, picture 2

  • Because the string

address sent by 'foo' to main was in the danger zone, 'bar' trashed it. Not bar's fault.

  • Solution: Never return

the address of a local variable.

slide-10
SLIDE 10

Non-Reentrant Solution

  • If a subroutine S needs to return a string (whose

maximum length is known), then it can put the string in a “buffer” memory location set aside just for S. And it can return the address of S to its caller.

  • S's buffer is safe enough...except from itself. This

approach means S won't be reentrant – S cannot be recursive.

  • And callers to S should copy out the answer, in case

anyone they invoke also calls S.

slide-11
SLIDE 11

Example

S_buffer DCB 0 SPACE 31 ; total length 32 S STMFD SP!,{...,LR} … put some string into S_buffer... LDR R0, =S_buffer ; return value in R0 LDMFD, SP!,{...,PC} ;return to caller

slide-12
SLIDE 12

Length of a String (in R0)

strlen mov R1, #0 ; length counter loop ldrb R2, [R0],#1 ; get current character cmp R2,#0 addne R1,R1,#1 bne loop mov R1, R0 ; return value in R0 mov PC,LR ; return

  • Since this is a leaf method, we didn't need STM

and LDM

slide-13
SLIDE 13

Reverse (buffer version, untested)

rev_buffer SPACE 32 reverse mov R1,R0 ;R1 is caller save stmfd SP!, {R1,LR} bl strlen ;length in R0 mov R1,#0 ldr R2,=rev_buffer strb R1, [R2,R0,LSL #0] ; mark end sub R0, R0, #1 ldr R1, [SP,#4] ; recover start of input loop ldrb R3, [R1],#1 ;the copying loop cmp R3,#0 beq done strb R3, [R2, R0, LSL #0] sub R0, R0, #1 b loop done ldmfd SP!, {R1, LR} ldr R0, =rev_buffer ;return value mov PC, LR

slide-14
SLIDE 14

Or, Use a Stack

  • Can push a bunch of characters to stack from
  • input. (And count them).
  • Pop them off, one at a time, and append to

buffer

  • Then return address of buffer.
slide-15
SLIDE 15

Alternative Approach

  • We can make the caller responsible for finding

space for us to store the returned string.

  • The address of the space for the returned string

(probably in the caller's activation record) is passed as a parameter.

  • This is a little better than the buffer approach.
slide-16
SLIDE 16

Reverse (param 2 has address)

reverse mov R2,R0 ;R2 is caller save stmfd SP!, {R2,LR} bl strlen ;length in R0 mov R2,#0 ldr R1,=rev_buffer strb R2, [R1,R0,LSL #0] ; mark end sub R0, R0, #1 ldr R2, [SP,#4] ; recover start of input loop ldrsb R3, [R2],#1 ;the copying loop beq done strb R3, [R1, R0, LSL #0] sub R0, R0, #1 b loop done ldmfd SP!, {R2, PC} ; no return value

slide-17
SLIDE 17

Making It Robust

  • When the address of an output buffer is passed in, you

should usually pass along another parameter to indicate how long the buffer is.

  • And the string routine should be coded to avoid
  • verflowing the buffer.
  • Without the “how long” parameter, the string routine would

have no way of knowing when overflow might occur.

  • Early design of the C string library didn't really seem to

appreciate this enough. Later additions did, but by then, programmers had developed sloppy habits.