11 x86-64 Assembly

8 minute read

Published: July 25, 2025

Data format and registers

16 64-bit general purpose registers (super fast memory locations)

alt )

If only need to read bottom 32-bit (E. %r15), designate as a different name (%r15d）
Caller saved: Caller saves the register content before calling the function ^6260ff
Callee saved: Callee needs to restore to its previous value before return

C	Intel data type	Assembly-code suffix	Size (B)
`char`	Byte	`b`	1
`short`	word	`w`	2
`int`	Dword	`l`	4
`long`	Qword	`q`	8
`char *`	Qword	`q`	8
`float`	Single precision	`s`	4
`double`	double precision	`l`	8

movabsq $0x0011223344556677, %rax  # %rax = 0011223344556677
movb $-1, %al                      # %rax = 00112233445566FF
movw $-1, %ax,                     # %rax = 001122334455FFFF
movl $-1, %eax                     # %rax = 00000000FFFFFFFF, kills first l!
movq $-1, %rax                     # %rax = FFFFFFFFFFFFFFFF

First mov fills %rax with a quadword (64-bit) quantity, 0x0011223344556677.

Regular mov can only take a 32-bit immediate. movabs allows a 64-bit imm.
movl $-1 %eax has a side effect of killing the first half to 0!

Memory access

Expression: <instr> <source>, <target>

Starting address of int E[] is stored in %rdx
int i stored in %rcx
Trying to store in %eax for data or %rax for pointers
Dereference syntax: <off>(<base>, <i>, <size>)
lea: load effective address: same format as mov, but does not dereference
- Simply do pointer arithmetic

Expression	Type	Value	Assembly code
`E`	`int *`	$x_E$	`movq %rdx, %rax`
`E[0]`	`int`	$M[x_E]$	`movl (%rdx), %eax`
`E[i]`	`int`	$M[x_E + 4i]$	`movl (%rdx, %rcx, 4), %eax`
`&E[2]`	`int *`	$x_E+8$	`leaq 8(%rdx), %rax`
`E+i-1`	`int *`	$x_E + 4i - 4$	`leaq -4(%rdx, %rcx, 4), %rax`
`*(E+i-3)`	`int`	`s`	`movl -12(%rdx, %rcx, 4), %eax`
`&E[i]-E`	`long`	$i$	`movq %rcx, %rax`

E: source: %rdx, target: %rax
(%rdx) dereferences %rdx
movl (%rdx, %rcx, 4) means dereferences `%rdx + %rcx * 4

Stack operation

Grows down from high address Top of stack stored in %rsp (stack pointer)

push: grow stack down, push the %rax to the stack
pop: shrinks stack up, copy the value to %rdx (target register)

alt )

Stack frame

Sometimes the compiler will generate instructions that put stuff but not move the stack pointer

When calling functions, need arguments and returns Stack frame: amount of stack used by a function Suppose P calls Q, stack space:

Earlier frames
Frame for calling the function P (caller)
- Arguments (7-n) (arguments 1-6 are passed via the [[x86-reg.png registers]])
- Return address (pushed onto the stack after P calls, when Q returns, it will store the return value in %rax)
Frames for executing a function Q (callee)
- [[#^6260ff (callee) Saved registers]]
- Local variables
- Argument build area
When Q rets, it will pop the return address from P and jump to it

%rbp is the base pointer: start of a function’s frame

When a function is called, it will set its tack pointer to the previous base pointer

push   %rbp              # Save the old frame pointer on the stack
						 # changed rsp 8 bytes down
mov    %rsp,%rbp         # Set the new frame pointer             

# <function body>

mov    %rbp,%rsp         # Roll up the stack to %rbp
pop    %rbp              # Restore the old frame pointer
ret

Sample code

Compilation

gcc -g -Wall -O0 -

sum.c:

long sum(long a, long b) {
    return a + b;
}

long sum_array(long *p, int n) {
    long s = 0;
    for (int i = 0; i < n; i++) {
        s = sum(s, p[i]);
    }
    return s;
}

main.c:

long sum_array(long *p, int n);

int main() {
    long a[5] = {0, 1, 2, 3, 4};
    long sum = sum_array(a, 5);
    printf("sum=%ld\n", sum);
}

`sum.o`

Now disassemble sum.o, of the .text section, unoptimized

gcc -g -Wall -O0 -fno-omit-frame-pointer -fno-stack-protector -c -o sum.o sum.c
# force to use the frame pointer
objdump -d sum.o

-fno-omit-frame-pointer: always include stack pointer, even if not needed
-fo-no-stack-protector: avoid using stack protector (a guard value at the stack bottom, if overwritten, will quit the program) ```assembly 0: endbr64 # security instruction

4: push %rbp # Save the old frame pointer on the stack 5: mov %rsp,%rbp # Set the new frame pointer

```assembly
8:	mov    %rdi,-0x8(%rbp)    # Loads arguments a and b onto the stack
c:	mov    %rsi,-0x10(%rbp)   # without adjusting rsp

%rd1: first argument (a)
%rsi: second argument

%rsp is not adjusted at all, since sum is a leaf procedure. No need for it

10:	mov   -0x8(%rbp),%rdx    # Loads stack copies of a and b into %rdx and %rax
14:	mov   -0x10(%rbp),%rax   # Note that %rax holds function return value

Unoptimized: move arguments into memory %rbp

18:	add    %rdx,%rax         # b += a

Performs the addition

%rax: chosen so the result is left into %rax

1b:	pop    %rbp              # Restore the old frame pointer
1c:	ret                      # Return to caller

Prepare for return

Rolling up the stack omitted, since we never changed the stack

`sum_array.o`

1d:	endbr64 
21:	push   %rbp
22:	mov    %rsp,%rbp
25:	sub    $0x20,%rsp           # Allocate 32 bytes on the stack

Compiler analyzes the code and allocates 32 bytes on the stack

29:	mov    %rdi,-0x18(%rbp)     # Load arguments p and n onto the stack
2d:	mov    %esi,-0x1c(%rbp)
30:	movq   $0x0,-0x8(%rbp)      # s = 0
37:	00 
38:	movl   $0x0,-0xc(%rbp)      # i = 0

32 bytes: s, i, p, n, (int takes 8 but only uses the top 4)

Condition check: loads i into %eax, and compares i with n

3f:	jmp    6f <sum_array+0x52>  # Jump to 0x6f
...
6f:	mov    -0xc(%rbp),%eax      # %eax = i
72:	cmp    -0x1c(%rbp),%eax     # Compare i and n
75:	jl     41 <sum_array+0x24>  # Jump to 0x41 if i < n

jl: jump if less

Loop body:

41:	mov    -0xc(%rbp),%eax      # %eax = i
44:	cltq                        # %rax = (long) %eax
46:	lea    0x0(,%rax,8),%rdx    # Calculate offset (i * sizeof(long))
4d:	00 
4e:	mov    -0x18(%rbp),%rax     # %rax = p
52:	add    %rdx,%rax            # p += byte-offset (find &p[i])
55:	mov    (%rax),%rdx          # %rdx = *p
58:	mov    -0x8(%rbp),%rax      # %rax = s

Now set up the arguments to call sum()

5c:	mov    %rdx,%rsi            # %rsi = *p (second arg)
5f:	mov    %rax,%rdi            # %rdi = s (first arg)

62:	call   67 <sum_array+0x4a>  # Call add() (address not resolved yet)

67:	mov    %rax,-0x8(%rbp)      # Store return value into s

6b:	addl   $0x1,-0xc(%rbp)      # i++

6f:	mov    -0xc(%rbp),%eax      # %eax = i
72:	cmp    -0x1c(%rbp),%eax     # Compare i and n
75:	jl     41 <sum_array+0x24>  # Jump to 0x41 if i < n

cltq: sign extend 4 bytes to 8 bytes (convert long to quad) call address of sum() resolved at linking (HW 5). Currently just a placeholder

62: e8 00 00 00 00: e8 is the call instruction. The register that has the address of the next instruction is 67 + the offset 00 00 00 00, which will be filled in at the linking stage.

77:	mov    -0x8(%rbp),%rax      # %rax = s as return value
7b:	leave                       # mov %rbp,%rsp then pop %rbp
7c:	ret

`main.o`

Set up array a[]
Call sum_array()
Call printf() (also a8 00 00 00 00)
- Object file has a section designated to tell apart different functions for linking

Optimization

When compiled with -O1 (optimization level 1) instead of -O0, machine code optimized

sum.o now doesn’t set up a stack frame. Just lea (%rdi, %rsi, 1), %rax
- lea can be used for addition!
sum_array.o now doesn’t use i, just pointer p to track the position
sum_array does not call sum(). It in-lined sum(), reducing the calling overhead

Linked executable

Function calls filled in, addresses are little-endian (number of bytes to jump)

GDB

gdb ./main
# gdb will load your program and begin

(gdb) start
(gdb) layout 

s (step) to go to the next line
continue: run to the end of the program
print <var>: print the value of <var>
break sum.c:10: if continue, stop at line 10
reg: show all register values
quit

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Ming Gong