Displaying all argv in x64 assembly - Eli Bendersky's website

Recently I've been doing some x64 assembly hacking, and something I had to Google a bit and collect from a few places is how to go over all command-line arguments (colloquially known as argv from C) and do something with them.

I already discussed how arguments get passed into a program in the past (not the C main, mind you, but rather the real entry point of a program - _start), so what was left is just a small matter of implementation. Here it is, in GNU Assembly (gas) syntax for Linux. This is pure assembly code - it does not use the C standard library or runtime at all. It demonstrates a lot of interesting concepts such as reading command-line arguments, issuing Linux system calls and string processing.

#---------------- DATA ----------------#
    .data
    # We need buf_for_itoa to be large enough to contain a 64-bit integer.
    # endbuf_for_itoa will point to the end of buf_for_itoa and is useful
    # for passing to itoa.
    .set BUFLEN, 32
buf_for_itoa:
    .space BUFLEN, 0x0
    .set endbuf_for_itoa, buf_for_itoa + BUFLEN - 1
newline_str:
    .asciz "\n"
argc_str:
    .asciz "argc: "


#---------------- CODE ----------------#
    .globl _start
    .text
_start:
    # On entry to _start, argc is in (%rsp), argv[0] in 8(%rsp),
    # argv[1] in 16(%rsp) and so on.
    lea argc_str, %rdi
    call print_cstring

    mov (%rsp), %r12               # save argc in r12

    # Convert the argc value to a string and print it out
    mov %r12, %rdi
    lea endbuf_for_itoa, %rsi
    call itoa
    mov %rax, %rdi
    call print_cstring
    lea newline_str, %rdi
    call print_cstring

    # In a loop, pick argv[n] for 0 <= n < argc and print it out,
    # followed by a newline. r13 holds n.
    xor %r13, %r13

.L_argv_loop:
    mov 8(%rsp, %r13, 8), %rdi      # argv[n] is in (rsp + 8 + 8*n)
    call print_cstring
    lea newline_str, %rdi
    call print_cstring
    inc %r13
    cmp %r12, %r13
    jl .L_argv_loop

    # exit(0)
    mov $60, %rax
    mov $0, %rdi
    syscall

This code uses a couple of support functions. The first is print_cstring:

# Function print_cstring
#   Print a null-terminated string to stdout.
# Arguments:
#   rdi     address of string
# Returns: void
print_cstring:
    # Find the terminating null
    mov %rdi, %r10
.L_find_null:
    cmpb $0, (%r10)
    je .L_end_find_null
    inc %r10
    jmp .L_find_null
.L_end_find_null:
    # r10 points to the terminating null. so r10-rdi is the length
    sub %rdi, %r10

    # Now that we have the length, we can call sys_write
    # sys_write(unsigned fd, char* buf, size_t count)
    mov $1, %rax
    # Populate address of string into rsi first, because the later
    # assignment of fd clobbers rdi.
    mov %rdi, %rsi
    mov $1, %rdi
    mov %r10, %rdx
    syscall
    ret

More interestingly, here is itoa. It's a bit more general than what I actually use in the main program because it also supports negative numbers. It can convert any number that fits into a 64-bit register. Note the unusual API for receiving and returning the place where the actual string is written. Since it's very natural for an itoa implementation to emit the digits in reverse, I wanted to avoid actual string reversing by writing the digits into a buffer from the end towards the beginning.

# Function itoa
#   Convert an integer to a null-terminated string in memory.
#   Assumes that there is enough space allocated in the target
#   buffer for the representation of the integer. Since the number itself
#   is accepted in the register, its value is bounded.
# Arguments:
#   rdi:    the integer
#   rsi:    address of the *last* byte in the target buffer
# Returns:
#   rax:    address of the first byte in the target string that
#           contains valid information.
itoa:
    movb $0, (%rsi)        # Write the terminating null and advance.

    # If the input number is negative, we mark it by placing 1 into r9
    # and negate it. In the end we check if r9 is 1 and add a '-' in front.
    mov $0, %r9
    cmp $0, %rdi
    jge .L_input_positive
    neg %rdi
    mov $1, %r9
.L_input_positive:

    mov %rdi, %rax          # Place the number into rax for the division.
    mov $10, %r8            # The base is in r8

.L_next_digit:
    # Prepare rdx:rax for division by clearing rdx. rax remains from the
    # previous div. rax will be rax / 10, rdx will be the next digit to
    # write out.
    xor %rdx, %rdx
    div %r8
    # Write the digit to the buffer, in ascii
    dec %rsi
    add $0x30, %dl
    movb %dl, (%rsi)

    cmp $0, %rax            # We're done when the quotient is 0.
    jne .L_next_digit

    # If we marked in r9 that the input is negative, it's time to add that
    # '-' in front of the output.
    cmp $1, %r9
    jne .L_itoa_done
    dec %rsi
    movb $0x2d, (%rsi)

.L_itoa_done:
    mov %rsi, %rax          # rsi points to the first byte now; return it.
    ret

Some notes about the code:

GAS vs. Intel syntax: I used to believe the Intel syntax is better looking, but grew to tolerate GAS because it's the default used by tools on Linux. After a very short time you get used to it and don't really mind it any longer. Yes, even the weird indirect addressing syntax (mov 8(%rsp, %r13, 8), %rdi) grows on you. In other words, focus on the code, not syntax.
I could pick any representation for strings, but ended up going with the C-like null-terminated strings. If you look carefully at print_cstring you'll notice that a length-prefix representation could be better since the write system call doesn't care about the null and wants the length passed explicitly. However, since real life assembly code often does have to inter-operate with C, null-terminated strings make more sense.
Even though my own functions could use any calling convention, I'm sticking with the System V AMD64 ABI. It's natural because system calls use it as well w.r.t. argument and return value passing. AFAIU they can also clobber scratch registers so care must be taken to preserve information in registers around system calls.