Archive for June, 2007

Understanding a C program’s compilation process

Tuesday, June 12th, 2007

I still remember my first program on Linux, it was a simple one which printed my name on the console.

I used the vi text editor to write and edit my program, something similar to the GNU Emacs, and the GNU C compiler to compile it. Most of you may be familiar with the process of compiling, and like many of you, I knew that, it is a process of converting the source file into an executable file format, that could run on a computer. But further exploring this process on a Linux system has revealed more issues and sub processes, which left me wanting for more.

I could see that it is a step by step process,where the source file was first getting converted into a preprocessed stage, where all preprocessor directives like #include, #define, #if etc. were getting resolved. Next the  preprocessed file gets converted into an assembly file, were the source gets converted into processor specific assembly instructions and later into relocatable binary (object file), where the assembly file gets converted into binary file and towards the end the executable file, which finally runs on your computer. In this chain of events, technically speaking, we would refer the process of converting the source file till the relocatable binary as compiling process and from the relocatable binary to the executable as the built process. We can sum it up by saying that GNU C compiler is a collection of tools like the cc (GNU C), GNU assembler and the linker, which does the job of compiling and built

 

Let us now pause and look at each stages of compilation and check the status of the main function

 This is a Source file: (demo.c)

 

#include <stdio.h>

#define VALUE 20

main ()

{

  printf ("Hello World!%d \n", VALUE);

}

The below given program snippet shows the Preprocessed stage. Here you can see the header file is replaced in the program and the VALUE is replaced with 20 in the main function.

To pause at this stage of compilation the following command should be issued:

#gcc -E demo.c -o demo.i

(here demo.c is the source file, -o for redirecting the output to demo.i)

The output could be viewed in a text editor like the vi.

 Note: This .i file is edited for our understanding.

# 1 "demo.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "demo.c"

# 1 "/usr/include/stdio.h" 1 3 4
# 28 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 314 "/usr/include/features.h" 3 4
# 1 "/usr/include/sys/cdefs.h" 1 3 4
# 315 "/usr/include/features.h" 2 3 4
# 337 "/usr/include/features.h" 3 4
# 1 "/usr/include/gnu/stubs.h" 1 3 4

typedef unsigned char __u_char;
typedef unsigned short int __u_short;
typedef unsigned int __u_int;
typedef unsigned long int __u_long;	  
# 837 "/usr/include/stdio.h" 3 4

# 3 "demo.c" 2
main ()
{
  printf ("Hello World!\n", 20);
}

Let’s look at the Assembly file. Here we can see the assembly instructions for our main function, just a single line of code has generated so many lines of assembly equivalent.

To pause at this stage of compilation the following command should be issued:

#gcc -S demo.i -o demo.s

The output could be viewed in a text editor like the vi.

  .file   "demo.c"
        .section        .rodata
.LC0:
        .string "Hello %d"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $20, %esp
        movl    $20, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        addl    $20, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.1.0 20060304 (Red Hat 4.1.0-3)"
        .section        .note.GNU-stack,"",@progbits

Now coming to the Relocatable file. Look what happens to the main. It contains some mysterious numbers and symbols, they are relocatable addresses. Right now let’s not worry about them.

To pause at this stage of compilation the following command should be issued:

#gcc -c demo.s -o demo.o

Here the output cannot be viewed in a text editor as this is a binary file. Now to view this type of file there is a tool called objdump in Linux.

Note: This .o file is edited for our understanding

#objdump -D demo.o

demo.o:     file format elf32-i386

Disassembly of section .text:

00000000 <main>:
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  0xfffffffc(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   51                      push   %ecx
   e:   83 ec 14                sub    $0x14,%esp
  11:   c7 44 24 04 14 00 00    movl   $0x14,0x4(%esp)
  18:   00
  19:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  20:   e8 fc ff ff ff          call   21 <main+0x21>
  25:   83 c4 14                add    $0x14,%esp
  28:   59                      pop    %ecx
  29:   5d                      pop    %ebp
  2a:   8d 61 fc                lea    0xfffffffc(%ecx),%esp
  2d:   c3                      ret

Now the final executable. It contains some additional program called the runtime code apart from the program that we have written. This runtime code is ultimately responsible to arrange the required   system resources for our program to execute on a given computer.

To pause at this stage of compilation the following command should be issued:

#gcc demo.o -o demo

Note: In this executable file the runtime code is not shown.

08048384 <main>:
 8048384:       8d 4c 24 04             lea    0x4(%esp),%ecx
 8048388:       83 e4 f0                and    $0xfffffff0,%esp
 804838b:       ff 71 fc                pushl  0xfffffffc(%ecx)
 804838e:       55                      push   %ebp
 804838f:       89 e5                   mov    %esp,%ebp
 8048391:       51                      push   %ecx
 8048392:       83 ec 14                sub    $0x14,%esp
 8048395:       c7 44 24 04 14 00 00    movl   $0x14,0x4(%esp)
 804839c:       00
 804839d:       c7 04 24 60 84 04 08    movl   $0x8048460,(%esp)
 80483a4:       e8 0f ff ff ff          call   80482b8 <printf@plt>
 80483a9:       83 c4 14                add    $0x14,%esp
 80483ac:       59                      pop    %ecx
 80483ad:       5d                      pop    %ebp
 80483ae:       8d 61 fc                lea    0xfffffffc(%ecx),%esp
 80483b1:       c3                      ret
 80483b2:       90                      nop
 80483b3:       90                      nop

And finally to run your executable use the following command:

# ./demo

Understanding this whole process helps us a lot, like debugging the program at different stages of compilation, as debugging at the source level may not be helpful at all times and optimizing our code to occupy lesser memory etc.

The GNU C compiler is free software; you can download it, if needed change it to create your own customized compiler. No reason why, the GNU C compiler is used on more number of platforms compared to others. You also have multiple numbers of C compilers like the GNU C, turbo C etc from different vendors compared to proprietary compilers that come from a single vendor. This to me symbolizes the true power and freedom that only comes with free software like the GNU system.