11 x86-64 Assembly

8 minute read

Published:

Data format and registers

16 64-bit general purpose registers (super fast memory locations)

alt)

  • If only need to read bottom 32-bit (E. %r15), designate as a different name (%r15d)

  • Caller saved: Caller saves the register content before calling the function ^6260ff
  • Callee saved: Callee needs to restore to its previous value before return
CIntel data typeAssembly-code suffixSize (B)
charByteb1
shortwordw2
intDwordl4
longQwordq8
char *Qwordq8
floatSingle precisions4
doubledouble precisionl8
movabsq $0x0011223344556677, %rax  # %rax = 0011223344556677
movb $-1, %al                      # %rax = 00112233445566FF
movw $-1, %ax,                     # %rax = 001122334455FFFF
movl $-1, %eax                     # %rax = 00000000FFFFFFFF, kills first l!
movq $-1, %rax                     # %rax = FFFFFFFFFFFFFFFF

First mov fills %rax with a quadword (64-bit) quantity, 0x0011223344556677.

  • Regular mov can only take a 32-bit immediate. movabs allows a 64-bit imm.
  • movl $-1 %eax has a side effect of killing the first half to 0!

Memory access

Expression: <instr> <source>, <target>

  • Starting address of int E[] is stored in %rdx
  • int i stored in %rcx
  • Trying to store in %eax for data or %rax for pointers
  • Dereference syntax: <off>(<base>, <i>, <size>)
  • lea: load effective address: same format as mov, but does not dereference
    • Simply do pointer arithmetic
ExpressionTypeValueAssembly code
Eint *$x_E$movq %rdx, %rax
E[0]int$M[x_E]$movl (%rdx), %eax
E[i]int$M[x_E + 4i]$movl (%rdx, %rcx, 4), %eax
&E[2]int *$x_E+8$leaq 8(%rdx), %rax
E+i-1int *$x_E + 4i - 4$leaq -4(%rdx, %rcx, 4), %rax
*(E+i-3)intsmovl -12(%rdx, %rcx, 4), %eax
&E[i]-Elong$i$movq %rcx, %rax
  1. E: source: %rdx, target: %rax
  2. (%rdx) dereferences %rdx
  3. movl (%rdx, %rcx, 4) means dereferences `%rdx + %rcx * 4

Stack operation

Grows down from high address Top of stack stored in %rsp (stack pointer)

  • push: grow stack down, push the %rax to the stack
  • pop: shrinks stack up, copy the value to %rdx (target register)

alt)

Stack frame

Sometimes the compiler will generate instructions that put stuff but not move the stack pointer

When calling functions, need arguments and returns Stack frame: amount of stack used by a function Suppose P calls Q, stack space:

  • Earlier frames
  • Frame for calling the function P (caller)
    • Arguments (7-n) (arguments 1-6 are passed via the [[x86-reg.pngregisters]])
    • Return address (pushed onto the stack after P calls, when Q returns, it will store the return value in %rax)
  • Frames for executing a function Q (callee)
    • [[#^6260ff(callee) Saved registers]]
    • Local variables
    • Argument build area
  • When Q rets, it will pop the return address from P and jump to it

%rbp is the base pointer: start of a function’s frame

  • When a function is called, it will set its tack pointer to the previous base pointer
push   %rbp              # Save the old frame pointer on the stack
						 # changed rsp 8 bytes down
mov    %rsp,%rbp         # Set the new frame pointer             

# <function body>

mov    %rbp,%rsp         # Roll up the stack to %rbp
pop    %rbp              # Restore the old frame pointer
ret

Sample code

Compilation

gcc -g -Wall -O0 -

sum.c:

long sum(long a, long b) {
    return a + b;
}

long sum_array(long *p, int n) {
    long s = 0;
    for (int i = 0; i < n; i++) {
        s = sum(s, p[i]);
    }
    return s;
}

main.c:

long sum_array(long *p, int n);

int main() {
    long a[5] = {0, 1, 2, 3, 4};
    long sum = sum_array(a, 5);
    printf("sum=%ld\n", sum);
}

sum.o

Now disassemble sum.o, of the .text section, unoptimized

gcc -g -Wall -O0 -fno-omit-frame-pointer -fno-stack-protector -c -o sum.o sum.c
# force to use the frame pointer
objdump -d sum.o
  • -fno-omit-frame-pointer: always include stack pointer, even if not needed
  • -fo-no-stack-protector: avoid using stack protector (a guard value at the stack bottom, if overwritten, will quit the program) ```assembly 0: endbr64 # security instruction

4: push %rbp # Save the old frame pointer on the stack 5: mov %rsp,%rbp # Set the new frame pointer


```assembly
8:	mov    %rdi,-0x8(%rbp)    # Loads arguments a and b onto the stack
c:	mov    %rsi,-0x10(%rbp)   # without adjusting rsp
  • %rd1: first argument (a)
  • %rsi: second argument

%rsp is not adjusted at all, since sum is a leaf procedure. No need for it

10:	mov   -0x8(%rbp),%rdx    # Loads stack copies of a and b into %rdx and %rax
14:	mov   -0x10(%rbp),%rax   # Note that %rax holds function return value

Unoptimized: move arguments into memory %rbp

18:	add    %rdx,%rax         # b += a

Performs the addition

  • %rax: chosen so the result is left into %rax
1b:	pop    %rbp              # Restore the old frame pointer
1c:	ret                      # Return to caller

Prepare for return

  • Rolling up the stack omitted, since we never changed the stack

sum_array.o

1d:	endbr64 
21:	push   %rbp
22:	mov    %rsp,%rbp
25:	sub    $0x20,%rsp           # Allocate 32 bytes on the stack

Compiler analyzes the code and allocates 32 bytes on the stack

29:	mov    %rdi,-0x18(%rbp)     # Load arguments p and n onto the stack
2d:	mov    %esi,-0x1c(%rbp)
30:	movq   $0x0,-0x8(%rbp)      # s = 0
37:	00 
38:	movl   $0x0,-0xc(%rbp)      # i = 0

32 bytes: s, i, p, n, (int takes 8 but only uses the top 4)

Condition check: loads i into %eax, and compares i with n

3f:	jmp    6f <sum_array+0x52>  # Jump to 0x6f
...
6f:	mov    -0xc(%rbp),%eax      # %eax = i
72:	cmp    -0x1c(%rbp),%eax     # Compare i and n
75:	jl     41 <sum_array+0x24>  # Jump to 0x41 if i < n
  • jl: jump if less

Loop body:

41:	mov    -0xc(%rbp),%eax      # %eax = i
44:	cltq                        # %rax = (long) %eax
46:	lea    0x0(,%rax,8),%rdx    # Calculate offset (i * sizeof(long))
4d:	00 
4e:	mov    -0x18(%rbp),%rax     # %rax = p
52:	add    %rdx,%rax            # p += byte-offset (find &p[i])
55:	mov    (%rax),%rdx          # %rdx = *p
58:	mov    -0x8(%rbp),%rax      # %rax = s

Now set up the arguments to call sum()

5c:	mov    %rdx,%rsi            # %rsi = *p (second arg)
5f:	mov    %rax,%rdi            # %rdi = s (first arg)

62:	call   67 <sum_array+0x4a>  # Call add() (address not resolved yet)

67:	mov    %rax,-0x8(%rbp)      # Store return value into s

6b:	addl   $0x1,-0xc(%rbp)      # i++

6f:	mov    -0xc(%rbp),%eax      # %eax = i
72:	cmp    -0x1c(%rbp),%eax     # Compare i and n
75:	jl     41 <sum_array+0x24>  # Jump to 0x41 if i < n

cltq: sign extend 4 bytes to 8 bytes (convert long to quad) call address of sum() resolved at linking (HW 5). Currently just a placeholder

  • 62: e8 00 00 00 00: e8 is the call instruction. The register that has the address of the next instruction is 67 + the offset 00 00 00 00, which will be filled in at the linking stage.
77:	mov    -0x8(%rbp),%rax      # %rax = s as return value
7b:	leave                       # mov %rbp,%rsp then pop %rbp
7c:	ret   

main.o

  • Set up array a[]
  • Call sum_array()
  • Call printf() (also a8 00 00 00 00)
    • Object file has a section designated to tell apart different functions for linking

Optimization

When compiled with -O1 (optimization level 1) instead of -O0, machine code optimized

  • sum.o now doesn’t set up a stack frame. Just lea (%rdi, %rsi, 1), %rax
    • lea can be used for addition!
  • sum_array.o now doesn’t use i, just pointer p to track the position
  • sum_array does not call sum(). It in-lined sum(), reducing the calling overhead

Linked executable

  • Function calls filled in, addresses are little-endian (number of bytes to jump)

GDB

gdb ./main
# gdb will load your program and begin

(gdb) start
(gdb) layout 
  • s (step) to go to the next line
  • continue: run to the end of the program
  • print <var>: print the value of <var>
  • break sum.c:10: if continue, stop at line 10
  • reg: show all register values
  • quit