Table of Contents
C is a high level language and it needs a compiler to convert it into an executable code so that the program can be run on our machine. Knowing how compilation works can be very helpful both when writing code and debugging.
This video explains the C Compilation process Theory:
The following Video shows the output of the C Compilation Process step by step:
I will be using a system with the following configuration:
OS – Ubuntu 16.04
I will be using the VIM editor for writing and viewing the files. This post won’t cover how to use VIM. That said, let’s begin.
Compilation of a C program is a multi-stage process. At an overview level, the process can be split into four separate stages: Preprocessing, compilation, assembly, and linking. Traditional C compilers orchestrate this process by invoking other programs to handle each stage.
For the purpose of this post, I will use a simple C program:
#include<stdio.h> #define MAX_NUM 25 int main(void){ printf("Hello world\n"); printf("The value of the MAX_NUM is = %d\n",MAX_NUM); return 0; }
Lets call this program comp_process.c
In this post, I’ll walk through each of the four stages of compiling the C program:
Preprocessing:
This is the first phase in the compilation process. This phase include:
- Removal of Comments – All the comments that we have in the file will be removed
- Expansion of Macros – All the macros will be expanded. In our program, the MAX_NUM macro will be replaced with the number 25
- Expansion of the included files – The contents of the ‘stdio.h’ file will be copied to output file
During the process of compilation, the preprocessed file is not saved to disk. To have all the intermediate files saved to the disk, we will have to use the -save-temps option while compiling. We will look into that later.
The preprocessed output is stored in the comp_process.i file. To preprocess a file, use the following command:
cpp comp_process.c > comp_process.i
In the image below, you can see the output of the command
We can also view the contents of the preprocessed file. Execute the following command:
vi comp_process.i
At the end of the file, we can see that the #include and the #define lines are missing. The #include line has been replaced by the contents of the stdio.h file and the #define macro has been expanded. In the image below, you can see that the MAX_NUM has been replaced by 25 and the comments are also removed.
We will now use the output of this step, i.e., the comp_process.i file for the next step in the compilation process.
Compilation:
Yes, that’s correct. Compilation is the second step in the process of compilation. In this stage, the pre-processed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human readable language. The output of this stage is a .S file. comp_process.i –> comp_process.s This step is accomplished by passing the -S option to gcc and the preprocessed file (.i) file.
gcc -S comp_process.i
You can see in the image that the .s file is generated after executing the mentioned command. Let us now see what it holds:
Assembly
In this stage, the assembler is used to translate the assembly instructions to machine code, or object code. The output consists of actual instructions to be run by the target processor. In order to invoke the assembler, use the following command:
as comp_process.s -o comp_process.o
The above command generates a file named comp_process.o containing the object code of the program. The contents of this file is in a binary format and can be inspected using hexdump
or od
by running either of the following commands:
- hexdump – It displays the contents of the file in hexadecimal format
hexdump comp_process.o
- od – It displays the contents in octal format.
od comp_process.c
Linking
The last step in the compilation process is the linking. The object code generated in the assembly stage contains the machine instructions that the processor understands. However, some pieces of the program are out of order or missing. In order to produce an executable program, the existing pieces have to be rearranged and the missing ones filled in. This process is called linking. Moreover, this stage links all the function calls with their definitions. Linker knows where all these functions are implemented. In the case of the our program, the linker will add the object code for the printf function. The result of this stage is the final executable program. When run without options, gcc
will name this file a.out
. To name the file something else, pass the -o
option to gcc
:
Invoking gcc without the -o option
gcc comp_process.c
Now, to execute the file run the following command
./a.out
Invoking gcc with the -o option
gcc -o compro comp_process.c
To execute the file, run the following command
./compro
You can also see the output generated by the file
Save all the intermediate files
Invoking the gcc command on the c file generates the output file. In case you are in need the intermediate files, you can pass the command -save-temps to the gcc while invoking it. This will save the intermediate files on the hard disk.
Do let me know what you think in the comments below
Vivek is a Senior Embedded Innovation Specialist. He has been working on Embedded Systems for the past 10 years. He loves to share his knowledge and train those who are interested. Nerdyelectronics.com was started out of this interest.