11 x86-64 Assembly
Published:
Data format and registers
16 64-bit general purpose registers (super fast memory locations)
)
If only need to read bottom 32-bit (E.
%r15
), designate as a different name (%r15d)
- Caller saved: Caller saves the register content before calling the function ^6260ff
- Callee saved: Callee needs to restore to its previous value before return
C | Intel data type | Assembly-code suffix | Size (B) |
---|---|---|---|
char | Byte | b | 1 |
short | word | w | 2 |
int | Dword | l | 4 |
long | Qword | q | 8 |
char * | Qword | q | 8 |
float | Single precision | s | 4 |
double | double precision | l | 8 |
movabsq $0x0011223344556677, %rax # %rax = 0011223344556677
movb $-1, %al # %rax = 00112233445566FF
movw $-1, %ax, # %rax = 001122334455FFFF
movl $-1, %eax # %rax = 00000000FFFFFFFF, kills first l!
movq $-1, %rax # %rax = FFFFFFFFFFFFFFFF
First mov
fills %rax
with a quadword (64-bit) quantity, 0x0011223344556677
.
- Regular
mov
can only take a 32-bit immediate.movabs
allows a 64-bit imm. movl $-1 %eax
has a side effect of killing the first half to 0!
Memory access
Expression: <instr> <source>, <target>
- Starting address of
int E[]
is stored in%rdx
int i
stored in%rcx
- Trying to store in
%eax
for data or%rax
for pointers - Dereference syntax:
<off>(<base>, <i>, <size>)
lea
: load effective address: same format asmov
, but does not dereference- Simply do pointer arithmetic
Expression | Type | Value | Assembly code |
---|---|---|---|
E | int * | $x_E$ | movq %rdx, %rax |
E[0] | int | $M[x_E]$ | movl (%rdx), %eax |
E[i] | int | $M[x_E + 4i]$ | movl (%rdx, %rcx, 4), %eax |
&E[2] | int * | $x_E+8$ | leaq 8(%rdx), %rax |
E+i-1 | int * | $x_E + 4i - 4$ | leaq -4(%rdx, %rcx, 4), %rax |
*(E+i-3) | int | s | movl -12(%rdx, %rcx, 4), %eax |
&E[i]-E | long | $i$ | movq %rcx, %rax |
E
: source:%rdx
, target:%rax
(%rdx)
dereferences%rdx
movl (%rdx, %rcx, 4)
means dereferences `%rdx + %rcx * 4
Stack operation
Grows down from high address Top of stack stored in %rsp
(stack pointer)
push
: grow stack down, push the%rax
to the stackpop
: shrinks stack up, copy the value to%rdx
(target register)
)
Stack frame
Sometimes the compiler will generate instructions that put stuff but not move the stack pointer
When calling functions, need arguments and returns Stack frame: amount of stack used by a function Suppose P
calls Q
, stack space:
- Earlier frames
- Frame for calling the function
P
(caller)Arguments (7-n) (arguments 1-6 are passed via the [[x86-reg.png registers]]) - Return address (pushed onto the stack after
P
call
s, whenQ
returns, it will store the return value in%rax
)
- Frames for executing a function
Q
(callee)[[#^6260ff (callee) Saved registers]] - Local variables
- Argument build area
- When
Q
ret
s, it will pop the return address fromP
and jump to it
%rbp
is the base pointer: start of a function’s frame
- When a function is called, it will set its tack pointer to the previous base pointer
push %rbp # Save the old frame pointer on the stack
# changed rsp 8 bytes down
mov %rsp,%rbp # Set the new frame pointer
# <function body>
mov %rbp,%rsp # Roll up the stack to %rbp
pop %rbp # Restore the old frame pointer
ret
Sample code
Compilation
gcc -g -Wall -O0 -
sum.c
:
long sum(long a, long b) {
return a + b;
}
long sum_array(long *p, int n) {
long s = 0;
for (int i = 0; i < n; i++) {
s = sum(s, p[i]);
}
return s;
}
main.c
:
long sum_array(long *p, int n);
int main() {
long a[5] = {0, 1, 2, 3, 4};
long sum = sum_array(a, 5);
printf("sum=%ld\n", sum);
}
sum.o
Now disassemble sum.o
, of the .text
section, unoptimized
gcc -g -Wall -O0 -fno-omit-frame-pointer -fno-stack-protector -c -o sum.o sum.c
# force to use the frame pointer
objdump -d sum.o
-fno-omit-frame-pointer
: always include stack pointer, even if not needed-fo-no-stack-protector
: avoid using stack protector (a guard value at the stack bottom, if overwritten, will quit the program) ```assembly 0: endbr64 # security instruction
4: push %rbp # Save the old frame pointer on the stack 5: mov %rsp,%rbp # Set the new frame pointer
```assembly
8: mov %rdi,-0x8(%rbp) # Loads arguments a and b onto the stack
c: mov %rsi,-0x10(%rbp) # without adjusting rsp
%rd1
: first argument (a
)%rsi
: second argument
%rsp
is not adjusted at all, since sum
is a leaf procedure. No need for it
10: mov -0x8(%rbp),%rdx # Loads stack copies of a and b into %rdx and %rax
14: mov -0x10(%rbp),%rax # Note that %rax holds function return value
Unoptimized: move arguments into memory %rbp
18: add %rdx,%rax # b += a
Performs the addition
%rax
: chosen so the result is left into%rax
1b: pop %rbp # Restore the old frame pointer
1c: ret # Return to caller
Prepare for return
- Rolling up the stack omitted, since we never changed the stack
sum_array.o
1d: endbr64
21: push %rbp
22: mov %rsp,%rbp
25: sub $0x20,%rsp # Allocate 32 bytes on the stack
Compiler analyzes the code and allocates 32 bytes on the stack
29: mov %rdi,-0x18(%rbp) # Load arguments p and n onto the stack
2d: mov %esi,-0x1c(%rbp)
30: movq $0x0,-0x8(%rbp) # s = 0
37: 00
38: movl $0x0,-0xc(%rbp) # i = 0
32 bytes: s
, i
, p
, n
, (int
takes 8 but only uses the top 4)
Condition check: loads i
into %eax
, and compares i
with n
3f: jmp 6f <sum_array+0x52> # Jump to 0x6f
...
6f: mov -0xc(%rbp),%eax # %eax = i
72: cmp -0x1c(%rbp),%eax # Compare i and n
75: jl 41 <sum_array+0x24> # Jump to 0x41 if i < n
jl
: jump if less
Loop body:
41: mov -0xc(%rbp),%eax # %eax = i
44: cltq # %rax = (long) %eax
46: lea 0x0(,%rax,8),%rdx # Calculate offset (i * sizeof(long))
4d: 00
4e: mov -0x18(%rbp),%rax # %rax = p
52: add %rdx,%rax # p += byte-offset (find &p[i])
55: mov (%rax),%rdx # %rdx = *p
58: mov -0x8(%rbp),%rax # %rax = s
Now set up the arguments to call sum()
5c: mov %rdx,%rsi # %rsi = *p (second arg)
5f: mov %rax,%rdi # %rdi = s (first arg)
62: call 67 <sum_array+0x4a> # Call add() (address not resolved yet)
67: mov %rax,-0x8(%rbp) # Store return value into s
6b: addl $0x1,-0xc(%rbp) # i++
6f: mov -0xc(%rbp),%eax # %eax = i
72: cmp -0x1c(%rbp),%eax # Compare i and n
75: jl 41 <sum_array+0x24> # Jump to 0x41 if i < n
cltq
: sign extend 4 bytes to 8 bytes (convert long to quad) call
address of sum()
resolved at linking (HW 5). Currently just a placeholder
62: e8 00 00 00 00
:e8
is the call instruction. The register that has the address of the next instruction is 67 + the offset00 00 00 00
, which will be filled in at the linking stage.
77: mov -0x8(%rbp),%rax # %rax = s as return value
7b: leave # mov %rbp,%rsp then pop %rbp
7c: ret
main.o
- Set up array
a[]
- Call
sum_array()
- Call
printf()
(alsoa8 00 00 00 00
)- Object file has a section designated to tell apart different functions for linking
Optimization
When compiled with -O1
(optimization level 1) instead of -O0
, machine code optimized
sum.o
now doesn’t set up a stack frame. Justlea (%rdi, %rsi, 1), %rax
lea
can be used for addition!
sum_array.o
now doesn’t usei
, just pointerp
to track the positionsum_array
does notcall
sum()
. It in-linedsum()
, reducing thecall
ing overhead
Linked executable
- Function calls filled in, addresses are little-endian (number of bytes to jump)
GDB
gdb ./main
# gdb will load your program and begin
(gdb) start
(gdb) layout
s
(step
) to go to the next linecontinue
: run to the end of the programprint <var>
: print the value of<var>
break sum.c:10
: ifcontinue
, stop at line 10reg
: show all register valuesquit