In a previous section, we wrote text out to the user, but have had no way to get input back from the user. In this section, we will introduce a new system call which allows us to read a line of text from the console.
sys_read is the opposite of sys_write. While sys_write writes data from memory to the console, sys_read reads data from the console and saves that data into memory for later use by the program. Making a sys_read system call is very similar to using sys_write: all we have to do is set the registers to the appropriate values and tell the operating system when we're ready.
To make a sys_write call:
raxmust be set to 1, indicating sys_write
rdimust be set to 1, indicating stdout (console output).
rsimust be set to an address in memory where the string to be printed can be found.
rdxmust be set to the number of characters to write from memory to the console.
Compare the above to sys_read, which is pretty similar:
raxmust be set to 0, indicating sys_read
rdimust be set to 0, indicating stdin (console input).
rsimust be set to an address in memory where the input string can be saved.
rdxmust be set to the maximum number of characters to accept from the user.
The first program we'll make that uses sys_read will be very simple. It will accept input from the user and then print that same string right back out. Create a new file called "repeat.asm" and type the following program into it:
%define sys_exit 60 %define sys_read 0 %define sys_write 1 %define stdin 0 %define stdout 1 %define success 0 section .bss %define buffer_len 64 buffer: resb buffer_len section .text global _start _start: ; Read input from the user mov rax, sys_read mov rdi, stdin mov rsi, buffer mov rdx, buffer_len syscall ; Write whatever the user entered back out mov rdx, rax mov rax, sys_write mov rdi, stdout mov rsi, buffer syscall ; End the program mov rax, sys_exit mov rdi, success syscall
There are three high-level operations here:
- Read a line of input from the user and save that input into memory.
- Write the input string from memory back out to the console.
- Exit the program.
Let's go through the source file in detail:
%define sys_exit 60 %define sys_read 0 %define sys_write 1 %define stdin 0 %define stdout 1 %define success 0
These are the constants we'll be using to make system calls. sys_exit is 60, stdin is 0, etc. This is just like previous programs, but we've added some new definitions because of the new system call being made.
section .bss %define buffer_len 64 buffer: resb buffer_len
This is a new section type. Previously, we have worked with the text and data sections. This is a new type of section, called bss. Take a look at how these three sections compare:
- The text section is where code (instructions) go.
- The data section is for initialized data. This is memory for which we have an initial value when the program starts. In the "Hello, world!" section, we printed a string out to the user whose value we knew ahead of time.
- The bss section is for uninitialized data. This is memory which will be set dynamically by the program as it runs. Since the value of this memory will be set to whatever the user enters, we don't know what it will be ahead of time.
We could use the data section for this if we really wanted to, by giving buffer some garbage initial value that we expect to be overwritten, but it's wasteful to include that garbage data in the executable file. The bss section allows us to say we need a region of memory reserved, without actually taking up that number of bytes on disk. The operating system will reserve the requested number of bytes in memory each time the program runs.
So the purpose of this section is to make a region of memory which the user's input can be written to. Let's break it up into pieces and check out each line individually:
This defines the beginning of the bss section, where any uninitialized memory is declared.
%define buffer_len 64
This creates a constant called buffer_len, which will be the total number of bytes of memory reserved for storing user input. In this case, anywhere we use the text "buffer_len" in the code, it will be replaced with the number 64. This value can be basically whatever you want, but 64 is a reasonable number in this case.
Note: this is not actually part of the bss section.
%define is an example
of an assembler directive, which does not translate directly to machine code.
It's a convenience offered by the assembler which allows us to define the size
of the buffer once and then refer to it elsewhere, so if we ever want to change
the size of the buffer, we only have to change it in this one place.
buffer: resb buffer_len
This is where the magic happens. This line declares the area in memory where the user's input will be stored. This is in 3 sections:
- buffer is the name of the area in memory we're declaring. Anywhere in the code that we use the name "buffer" will be replaced with the memory address to the beginning of this region in memory.
- resb stands for "reserve bytes". This tells the assembler we're reserving some number of bytes of memory.
- buffer_len gives the number of bytes we want to reserve. In this case we're using the constant buffer_len, which is 64. We could alternatively just type the number 64 here.
All together, this reserves a 64-byte area in memory which we can refer to by the name "buffer". When the program runs, this memory will be reserved for the program and we'll be able to read and write to it.
Now we're getting into more familiar territory. This is where the code begins.
global _start _start:
This is the entry-point of the program, marking the first instruction that will be executed when the program is run.
; Read input from the user mov rax, sys_read mov rdi, stdin mov rsi, buffer mov rdx, buffer_len syscall
The first thing the program does is read input from the user by making a
sys_read system call. Like other system calls, we set up the registers with
the details of the operation we want carried out and then issue the
instruction, which notifies the operating system to do our bidding.
In this case, we're telling the operating system to read characters from the
console and store them in memory at the location given by buffer. The
operating system will let the user type until they hit the enter key, and then
up to 64 characters of text will be saved to memory. After the
instruction executes, the total number of bytes entered by the user will be
available to us in the
rax register. Whatever text the user entered will
be stored in memory, and we'll be able to access it through the label buffer.
; Write whatever the user entered back out mov rdx, rax mov rax, sys_write mov rdi, stdout mov rsi, buffer syscall
Now that the user's input is stored in memory and we can reference that region of memory with the name buffer, we just print whatever the user typed right back out to them.
This is very similar to previous sys_write calls, with one major difference.
Previously, we printed a static string "Hello, world!", meaning that we knew what
the string would be ahead of time, as well as how many characters it would be.
This time around, we don't actually know how many characters the user may have
entered. We know they couldn't have entered more than 64 characters, but other
than that, we have no idea. Luckily, sys_read returns the number of characters
the user entered in the register
rax. sys_write expects the number of
characters to write to be in the register
rdx. So we copy the value left by
rdx, where sys_write expects it.
Altogether, this system call tells the operating system to write the number of bytes that were previously read from memory, starting at the address buffer, out to the console.
; End the program mov rax, sys_exit mov rdi, success syscall
Finally, we make a third system call to exit the program successfully.
Make sure the program is typed correctly as listed above, save it as "repeat.asm", and run it using the "run" script from previous sections:
The program should appear to pause and do nothing, waiting for input from you. Type some text (like "Greetings!") and press enter. The program should repeat whatever you typed and exit. The total output should look something like this:
Greetings! Greetings! 0
Remember that 0 is the program status code, indicating that the program exited successfully.
The "repeat.asm" program doesn't tell the user what to do: it just hangs until the user presses enter. We can mix and match sys_write and sys_read calls to provide some instructions to the user and some formatting to the output.
For this next program, we're going to ask the user for their name and then greet them. This can be broken down into the following system calls:
- sys_write - print "Please enter your name: "
- sys_read - input the user's name
- sys_write - print "Hello, "
- sys_write - print the user's name
- sys_write - print "!"
This will produce final output that looks a bit like this (depending on what you enter):
Please enter your name: Brian Hello, Brian!
To get started, create a new file called "helloname.asm" and type the following program in:
%define sys_exit 60 %define sys_read 0 %define sys_write 1 %define stdin 0 %define stdout 1 %define success 0 %define newline 10 section .bss %define name_max_len 64 name: resb name_max_len name_len: resq 1 section .data prompt: db "Please enter your name: " prompt_len: equ $-prompt response_start: db "Hello, " response_start_len: equ $-response_start response_end: db "!", newline response_end_len: equ $-response_end section .text global _start _start: ; Write the prompt out to the console mov rax, sys_write mov rdi, stdout mov rsi, prompt mov rdx, prompt_len syscall ; Read the user's name from the console mov rax, sys_read mov rdi, stdin mov rsi, name mov rdx, name_max_len syscall ; Store the number of characters entered by the user mov [name_len], rax ; Write the start of the response mov rax, sys_write mov rdi, stdout mov rsi, response_start mov rdx, response_start_len syscall ; Write the user's name mov rax, sys_write mov rdi, stdout mov rsi, name mov rdx, [name_len] syscall ; Write the end of the response mov rax, sys_write mov rdi, stdout mov rsi, response_end mov rdx, response_end_len syscall ; End the program mov rax, sys_exit mov rdi, success syscall
This is a much longer program than the previous one, but it mostly just reuses the same concepts. There are only a couple of new things here. Let's step through it in detail:
%define sys_exit 60 %define sys_read 0 %define sys_write 1 %define stdin 0 %define stdout 1 %define success 0 %define newline 10
These are the same constants we defined before. The only difference is the inclusion of newline, with a value of 10. This is the newline character (produced when you press enter). We'll use this for formatting purposes.
section .bss %define name_max_len 64 name resb: name_max_len name_len: resq 1
Here we declare our uninitialized data. Like before, we reserve a 64 byte area in memory for user input. This time we call it name since this is where the user's name will be stored.
We also declare a new value called name_len. This is where we'll store the number of characters the user inputs (the length of name), so we can use it later. The declaration follows the same structure as the name declaration:
- name_len names the memory we're reserving so we can refer to it in the code.
- resq means to reserve a quad-word. This is 8 bytes, or 64 bits. On a 64-bit processor, the registers are 64 bits each. This makes 64 bits a natural size for an integer, since it requires no conversion to move it around between registers and memory.
- 1 means we only need one quad-word reserved. This is not a series of bytes like the string, it's only one piece of data: the number of characters typed by the user.
Altogether, the bss section defines two regions of memory:
- name, which is 64 bytes and will be used to store up to 64 characters entered by the user.
- name_len, which is 8 bytes and will be used to store a single integer indicating the total number of characters entered by the user.
section .data prompt: db "Please enter your name: " prompt_len: equ $-prompt response_start: db "Hello, " response_start_len: equ $-response_start response_end db: "!", newline response_end_len: equ $-response_end
Here is the data section, where we declare some initialized data. This is memory for which we have values ahead of time. We're declaring 3 static strings, plus a length count for each:
- prompt will be shown to the user first, telling them what to do.
- response_start will be printed before the user's name is repeated back to them.
- response_end will be printed after the user's name, giving punctuation and formatting to the response: an exclamation point and a newline character.
Each of these also has an accompanying _len value so we know how many characters each string contains.
section .text global _start _start:
Now we get to the code!
; Write the prompt out to the console mov rax, sys_write mov rdi, stdout mov rsi, prompt mov rdx, prompt_len syscall
The first thing we do is make a sys_write call to print out "Please enter your name: " when the program starts.
; Read the user's name from the console mov rax, sys_read mov rdi, stdin mov rsi, name mov rdx, name_max_len syscall
Next up, we read some input from the user. Whatever they type is stored in memory starting at the address indicated by name.
; Store the number of characters entered by the user mov [name_len], rax
After the sys_read call returns, the number of characters entered by the user
will be provided in the
rax register. We're going to need this later, but
unlike in the previous program, we won't be using it immediately. We're going
to print the static string "Hello, " first, which will involve overwriting both
rdx. By the time we get around to writing the user's name back out,
the information we need (the number of characters in the user's name) will be
In order to get around this, we need a place to temporarily save the number of characters in the user's name.
The instruction above copies the value from
rax into memory at the address
indicated by name_len. Notice the phrasing there. name_len is a memory
address: information about where we can store this data. This is unlike dealing
with registers, which are storage locations themselves. You can copy a value
directly to a register, but when dealing with a memory address you have to
clarify that you want to copy the value to memory at the given address.
This is where the square brackets come in. They're necessary because name_len
refers to an address in memory where data can be stored. The actual value of
name_len might be something like
0x6001b4, or wherever the operating system
chooses to put it. We want the value of
rax to be copied into memory at that
You may be wondering why the square brackets aren't always required. For example, when we read the user's input into memory, the instruction has no square brackets:
mov rsi, name
In the code above, name is a memory address just like name_len. The
difference is that the sys_read system call expects an address. It expects
rsi to contain an address in memory where it can write the input data. If
we put name in square brackets, that would copy the memory itself into
instead of the address. When the sys_read call tried to write to that
location in memory it would end up in the wrong place.
Let's take a short digression to explain this better. Here is a table showing some (made up) locations in memory:
The table above shows 9 bytes in memory, containing the string "Greetings". Each byte has its own unique address ranging from 0x6001b0 to 0x6001b8. The first byte has a label: string.
If we refer to string directly, we're talking about the memory address. For example:
mov rax, string
The above instruction would set
rax to the value 0x6001b0, which is the
address of the beginning of the string.
However, if we refer to string with square brackets, we're referring to the value stored in memory at the address 0x6001b0:
mov rax, [string]
This instruction would set
rax to the value of the first 8 characters in the
string: "Greeting". We can also refer to individual characters:
mov byte al, [string] mov byte bl, [string + 4]
These instructions would load the character "G" into the register
al and the
character "t" into the register 'bl'.
Data labels like name and name_len are just addresses which point to locations in memory which contain data. Adding square brackets indicates that you're interested in the data at that location in memory, not the address itself.
; Write the start of the response mov rax, sys_write mov rdi, stdout mov rsi, response_start mov rdx, response_start_len syscall
Now that the user has entered their name, we begin to respond. This system call prints out the string response_start, which is "Hello, ".
; Write the user's name mov rax, sys_write mov rdi, stdout mov rsi, name mov rdx, [name_len] syscall
Next, we print the name the user entered. Again, notice the square brackets:
name_len is an address in memory. It might be something like 0x6001b4 (or wherever the operating system decided to locate it). We don't want to print 0x6001b4 bytes to the console, since there aren't nearly that many available. Instead, we want to look up the value stored at the address 0x6001b4 and print that number of characters. This should be a more reasonable number like 5 or 8, depending on the length of the user's name. So we use the square brackets to indicate this.
The total output so far will look something like this (if your name happens to be Brian):
Now we finish up the output:
; Write the end of the response mov rax, sys_write mov rdi, stdout mov rsi, response_end mov rdx, response_end_len syscall
To finish off the sentence and apply some formatting, we write the string
response_end: "!\n" to the console. The exclamation point is added to the end
of the name and the newline character
\n is for formatting purposes.
; End the program mov rax, sys_exit mov rdi, success syscall
Finally, we end the program here. Type it all into a file called "helloname.asm" and run it with the "run" script:
Enter your name when it prompts you, and you should see something like the following:
Please enter your name: Brian Hello, Brian ! 0
Okay, not quite what we were going for. Why is the exclamation point on its own line? To troubleshoot the problem, try returning the number of characters entered by the user as the program status code to see how many characters the OS thinks we entered. Change the following:
mov rdi, success
mov rdi, [name_len]
This will report the number of characters we enter as the program status code so we can get some feedback. Make the change, save the file, and rerun it. You should see something more like this:
Please enter your name: Brian Hello, Brian ! 6
6?! I only typed 5 letters! The thing is, the operating system is including the enter key pressed after typing the name. So for the name "Brian", the actual string we get back is "Brian\n". That extra newline is garbage, it's not part of the data, it's just formatting. We can prevent the newline from being written by subtracting 1 from the value of name_len. Even though the string will still have a newline after it (we can't stop the operating system from including it), we can ignore it by only paying attention to the first 5 characters.
Change the following section:
; Store the number of characters entered by the user mov [name_len], rax
; Store the number of characters entered by the user dec rax mov [name_len], rax
rax contains the number of characters entered by the user. Before saving that
value to name_len for later use, we now decrement that value. This means to
subtract 1 from it. The instruction
dec rax subtracts 1 from whatever
value happens to be in
rax. If you entered 6 characters including the enter
key, this will change it to 5. If you entered 8, this will change it to 7.
By subtracting 1 from the number of characters we write out, we effectively ignore the last character in the string by printing only the part of the string we care about.
Make the change, save, and rerun. You should now get something like this:
Please enter your name: Brian Hello, Brian! 5
The formatting is no longer messed up. We're ignoring the last character in the string by printing one fewer than the number of characters the operating system returned. The trailing newline is not printed, so our exclamation point appears on the same line as the name.