Archive for March, 2012

ARM Linux Booting Process

Friday, March 30th, 2012

We will look at  boot process of linux  kernel on AT91RM9200 system-on-chip, built around the ARM920T ARM Thumb processor. Kwickbyte builds an embedded board called kb9202 based on AT91RM9200. We will take this board as an example and see how Linux boots up on this board.

Before you start reading this you need to read AT91RM9200 data sheet (specification). You can download the data sheet with the following link.

www.keil.com/dd/docs/datashts/atmel/at91rm9200_ds.pdf

You also need to read ARM Architecture Reference Manual for better understanding the boot process.You can download it with the following link.

http://www.lysator.liu.se/~kjell-e/embedded/ARM-ARM.pdf

Components in Boot Process:

Linux boot sequence involves execution of  the following components.

  1.             Boot-loader initialization
  2.             Kernel initialization
  3.             User space initialization

Boot-loader:

A boot-loader is a small program which will load the kernel image into RAM and boots up the kernel image. This is also called bootstrap as it brings(pulls) up system by loading an operating system.  Boot-loader starts before any other software starts and initializes the processor and makes CPU ready to execute a program like an operating system. Most processors have a default address from which the first bytes of code are fetched upon power is applied or board is reset. Hardware designers use this information to store the boot-loader code at that address in ROM or flash. Since it should initialize the cpu and should run a program which is located at architecture specific address boot-loaders are highly processor specific and board specific.  Every embedded board comes with a bootstrap to download the kernel image or standalone application into the board and start executing the kernel image or application.  Boot-loader will be executed when power is applied to a processor board. Basically it will have some minimal features to load the image and boot it up.

It is also possible to control the system using a hardware debug interface such as J TAG. This interface may be used to write the boot loader program into boo-table non-volatile memory (e.g. flash) by instructing the processor core to perform the necessary actions to program non-volatile memory.  Generally done for first time to download the basic boot-loader and for some recovery process. J TAG is a standard and popular interface provided by many board vendors. Some micro-controllers provide special hardware interfaces which can’t be used to take arbitrary control of  a system or directly run code, but instead they allow the insertion of boot code into boot-able non-volatile memory (like flash memory) via simple protocols. Then at the manufacturing phase, such interfaces are used to inject boot code (and possibly other code) into non-volatile memory. After system reset, the micro-controller begins to execute code programmed into its non-volatile memory, just like usual processors are using ROM’s for booting. In many cases such interfaces are implemented by hardwired logic. In other cases such interfaces could be created by software running in integrated on-chip boot ROM from GPIO pins.

There are some other third party boot-loaders available which provide rich set of features and easy user interface. You can download these third party boot-loaders into board and can make them default boot-loaders for your board. Generally boot-loaders provided by board vendors are replaced with these third party boot-loader.  There are a quite few third party boot-loader available and some of them are open source (or free boot-loaders) and some are commercial. Some of them are Das U-Boot, Red boot, GRUB (for desktops),   LILO , Loadlin, , bootsect-loader, SYSLINUX,  EtherBoot, ELILO.

We will look at U-boot boot-loader . U-boot is the widely used boot-loader in embedded systems. I will explain code from the u-boot-2010.03  source. You can download U-boot from the following site.

http://www.denx.de/wiki/U-Boot

How U-boot is built:

————————-

 Based on the configuration of U-boot, all the assembly files (.S) and C files (.c) are compiled using  cross compiler which is built for a particular architecture and object files(.o) will be generated. All these object files are linked by linker and an executable file will be created. An object file or executable file is a collection of sections like .text, .data, .bss etc.  Object files and executable files have a file format like elf.  All the sections of the object files will be arranged in the executable file based on a script called linker script. This script tells where all the sections are to be loaded in the memory when it runs. Understanding this script is very important to know how boot-loader and kernel are composed and how different sections of boot-loader or kernel are loaded in the memory.

Generally,  when a program is run (executed) a loader reads executable file and loads different sections of the executable file in the specified memory location and starts executing the start function(entry point) specified in the linker script. But, if you want to run(load) a boot-loader there will not be any loader to load(basically to understand the file format) different sections of executable file into the memory.  Then you need to use a tool called objcopy which will take all sections from the executable file and create a binary file which doesn’t have any file format. This binary file can be loaded into the memory and executed or can be written in to the ROM at a particular address (specific to the architecture) which will be executed by CPU when power is applied to the board. You can find good tutorial on linker script in the following location.

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/gnu-linker/scripts.html

File: cpu/arm920t/u-boot.lds

 32 OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
 33 OUTPUT_ARCH(arm)
 34 ENTRY(_start)
 35 SECTIONS
 36 {
 37         . = 0x00000000;
 38 
 39         . = ALIGN(4);
 40         .text :
 41         {
 42                 cpu/arm920t/start.o     (.text)
 43                 *(.text)
 44         }
 45 
 46         . = ALIGN(4);
 47         .rodata : { *(SORT_BY_ALIGNMENT(SORT_BY_NAME(.rodata*))) }
 48 
 49         . = ALIGN(4);
 50         .data : { *(.data) }
 51 
 52         . = ALIGN(4);
 53         .got : { *(.got) }
 54 
 55         . = .;
 56         __u_boot_cmd_start = .;
 57         .u_boot_cmd : { *(.u_boot_cmd) }
 58         __u_boot_cmd_end = .;
 59 
 60         . = ALIGN(4);
 61         __bss_start = .;
 62         .bss (NOLOAD) : { *(.bss) . = ALIGN(4); }
 63         _end = .;
 64 }

OUTPUT_FORMAT in line #32 specify the file format of the executable file. Here the executable   file format is elf32 and endianness is little endian.  OUTPUT_ARCH in line # 33 specify the architecture on which this code runs. ENTRY in line #34 specifies the start function(entry point) of  u-boot program.  Here the entry point is _start. SECTIONS in line #35 defines how different sections are mapped in the executable file. Loader uses the addresses specified in this section to load different section of the program into the memory.  ‘.’ in the line #37 specifies the start address where the following sections should be loaded. In this case start address is 0x00000000. After this in line #39 the memory is aligned by 4 bytes and the .text section follows in the line #40.

 40    .text :
 41         {
 42                 cpu/arm920t/start.o     (.text)
 43                 *(.text)
 44         }

At the ‘.’ position (0x00000000) the code in the cpu/arm920t/start.o is mapped and follows the code that is there in .text sections of all other object (.o) files. cpu/arm920t/start.o contains the _start() function(in assembly language) which is entry point of this program.

Now the ‘.’ will be at 0x00000000 + sizeof (.text).  Again memory is aligned by 4 bytes and .rodata section follows in line #47.

. = ALIGN(4);

 47         .rodata : { *(SORT_BY_ALIGNMENT(SORT_BY_NAME(.rodata*))) }

.rodata sections from all objects files are mapped at this address. Follows the .data and  .git sections.

 49         . = ALIGN(4);
 50         .data : { *(.data) }
 51 
 52         . = ALIGN(4);
 53         .got : { *(.got) }

Each U-boot command is an object of type ‘cmd_tbl_t’ which contains command name, help string  and function pointer to be executed when this command is run. All these command objects are placed in the memory sequentilly. Each of this command object is built into an U-boot defined section called  .u_boot_cmd in the object file.  These all .u_boot_cmd  sections are placed in the memory after the above sections(.data and .git).

 

. = .;
 56         __u_boot_cmd_start = .;
 57         .u_boot_cmd : { *(.u_boot_cmd) }
 58         __u_boot_cmd_end = .;

__u_boot_cmd_start contains the start of the commands objects and __u_boot_cmd_end contains the end of the command objects. And next follows the .bss (uninitialized global variables) sections.

 60         . = ALIGN(4);
 61         __bss_start = .;
 62         .bss (NOLOAD) : { *(.bss) . = ALIGN(4); }
 63         _end = .;

__bss_start points to the .bss start address and _end contains the end of the all sections.

Using this linker script linker will generate an executable file called u-boot. Objcopy tool is used to generate a binary file from the u-boot executable file

u-boot.bin:   u-boot

$(OBJCOPY) ${OBJCFLAGS} -O binary $< $@

U-boot binary file will be  copied to the board RAM or written in the flash disk. At91rm9200 board comes with a boot programmer  A tiny program called atmel bootstrap that can be used to download the image into flash or RAM and start execute or to reset a corrupted board. U-boot will be copied to flash disk or internal RAM( if it is less size) and will be downloaded to RAM and will be executed when the board power is applied to the board. For this board(at91rm9200) the code is always downloaded from device address 0x0000_0000 to the address 0x0000_0000 of the SRAM after remap. That ‘s why we have given the start address of the .text section as 0x00000000.  If you want to load the code any where in the RAM and want to execute U-boot you need to build you code as position independent code(PIC).   Then the instructions addresses will be offset into to PC(cpu register) value. So the downloaded code must be position-independent or  linked at address 0x0000_0000. For our explanation purpose  assume that U-boot code is linked at 0x00000000 and the Boot program downloaded U-boot from the data flash(Check AT91RM9200 spec for downloading process))  and call entry point into the U-boot.

U-boot Execution:

———————-

As specified in the linker script U-boot starting function(entry point) will be _start():cpu/arm920t/start.S.

File: cpu/arm920t/start.S

_start is written in assembly language. First instuction executed by _start() is a call to start_code:cpu/arm920t/start.S.

.globl _start

	_start: b       start_code

start_code: Will perform the following tasks.

1) Set the cpu in supervisor mode. The current operating processor status is in the Current Program Status Register (CPSR). The CPSR holds:

• four ALU flags (Negative, Zero, Carry, and Overflow),

• two interrupt disable bits (one for each type of interrupt (FIQ and IRQ),

• one bit to indicate ARM or Thumb execution

• five bits to encode the current processor mode

Check ARM processor data sheet for more details on the CPSR register.

        /*
         * set the cpu to SVC32 mode
         */

        mrs     r0, cpsr
             // Load cpsr register into r0 
        bic     r0, r0, #0x1f      // Clear first 5 bits of cpsr(encode processor mode).
        orr     r0, r0, #0xd3      // supervisor mode (10011) + disable all interupts (110)
        msr     cpsr, r0             // write r0 to cpsr

Try to relocate the U-boot code to RAM if we are running from flash.

#ifndef CONFIG_SKIP_RELOCATE_UBOOT
relocate:                               /* relocate U-Boot to RAM           */
        adr     r0, _start              /* r0 <- current position of code   */
        ldr     r1, _TEXT_BASE          /* test if we run from flash or RAM */
        cmp     r0, r1                  /* don't reloc during debug         */
        beq     stack_setup

        ldr     r2, _armboot_start     /* _armboot_start defined 
                                                     * as _start  at line #436 in cpu/arm920t/start.S
                                                     */
        ldr     r3, _bss_start          /* _bss_start is defined in the linker script
        sub     r2, r3, r2              /* r2 <- size of armboot (find out size of all sections) */
        add     r2, r0, r2              /* r2 <- source end address        */

copy_loop:
        ldmia   r0!, {r3-r10}           /* copy from source address [r0]    */
        stmia   r1!, {r3-r10}           /* copy to   target address [r1]    */
        cmp     r0, r2                  /* until source end addreee [r2]    */
        ble     copy_loop
#endif  /* CONFIG_SKIP_RELOCATE_UBOOT */

Setup the stack now. U-boot code has been relocated to _TEXT_BASE(Defined in board/kb9202/configs.mk). Stack will be setup below this address.

stack_setup:

        ldr     r0, _TEXT_BASE          /* Load r0 with _TEXT_BASE */

        sub     r0, r0, #CONFIG_SYS_MALLOC_LEN  /* r0 = 
                                                                                       * r0 – CONFIG_SYS_MALLOC_LEN)         
                                                                                       * malloc area
                                                                                       */

        sub     r0, r0, #CONFIG_SYS_GBL_DATA_SIZE /* r0 = 
                                                                                           *r0 - CONFIG_SYS_GBL_DATA_SIZE        
                                                                                           * (Contains an object of board data)

Point stack pointer sp  to r0.

sub     sp, r0, #12             /* sp = r0 – 12 (leave 3 words(12 bytes) for abort-stack */

clear_bss:

        ldr     r0, _bss_start          /* load start of bss segment into r0.
                                                  * _bss_start defined in linker script */

        ldr     r1, _bss_end            /* load end of bss segment into r1.
                                                   * _bss_end defined in linker script       */

        mov     r2, #0x00000000         /* 0 will be written in the bss memory  */   

clbss_l:str     r2, [r0]                /* store 0's in 4 bytes at r0 ( r0 = r2) */

        add     r0, r0, #4                /* move 4 bytes ahead,  r0 = r0 + 4 */
        cmp     r0, r1                     /*  compare r0 and r1 */
        ble     clbss_l                   /* if ( r0 <= r1) then goto clbss_l; */

Call  start_armboot(), a C function.

ldr     pc, _start_armboot

        _start_armboot: .word start_armboot

start_armboot() function is defined in lib_arm/board.c which is common function for all arm based boards. The following are the tasks performed by start_armboot().

void start_armboot(void):

1)  Allocate memory for a global data structure gd_t. This is defined in include/asm-arm/global_data.h. When we setup the stack we left some space for gd_t data structure ( CONFIG_SYS_GBL_DATA_SIZE) below the _TEXT_BASE where U-boot has been relocated.

gd = (gd_t*)(_armboot_start - CONFIG_SYS_MALLOC_LEN – sizeof(gd_t));
 memset ((void*)gd, 0, sizeof (gd_t));

2)  Some information like architecture number (unique board id), boot params that have to be passed to kernel image and baud rate etc are stored in a data structure called bd_t whose pointer is stored in gd_t. Allocate memory for this bd_t after the gd_t.

gd->bd = (bd_t*)((char*)gd - sizeof(bd_t));

memset (gd->bd, 0, sizeof (bd_t));

3)  Call a set of functions which initialize all subsystems.

for (init_fnc_ptr = init_sequence; *init_fnc_ptr; ++init_fnc_ptr) {

              if ((*init_fnc_ptr)() != 0) {

                      hang ();

                }
            }

init_sequence is an array of function pointers defined in lib_arm/board.c. The above loop takes each function pointer and calls it.

The following are some of important functions called.

board_init():

int board_init (void)

	{

        		/* Enable Ctrlc */

      		  console_init_f ();

      		  /* memory and cpu-speed are setup before relocation */

        		/* so we do _nothing_ here */

        		gd->bd->bi_arch_number = MACH_TYPE_KB9200;

        		/* adress of boot parameters */

        		gd->bd->bi_boot_params = PHYS_SDRAM + 0x100;

        		return 0;

	}

This is board specific function and should definitely be defined by each board. This function should some board specific initialization if there are any. When you are porting u-boot to a new board you must define this function. For kb9202 board this function is defined in board/kb9202/kb9202.c. This function just sets its board number and tells where the boot params for Linux are stored.

timer_init():

          This is a cpu specific function and each cpu code must define it. It should basically initilize the timer services in the cpu.  timer_init() for  AT91RM9200 cpu defined in cpu/arm920t/at91rm9200/timer.c.

init_baudrate():

This architecture specific function defines the default baud rate for the serial port communication. For ARM this function is defined in  lib_arm/board.c.

static int init_baudrate (void)

{

        char tmp[64];   /* long enough for environment variables */

        int i = getenv_r ("baudrate", tmp, sizeof (tmp));

        gd->bd->bi_baudrate = gd->baudrate = (i > 0)

                        ? (int) simple_strtoul (tmp, NULL, 10)

                        : CONFIG_BAUDRATE;

        return (0);

}

If the ‘baudrate’ enviroment varable is set baud rate is taken from that, otherwise, taken from the CONFIG_BAUDRATE macro defined in board specific header file include/configs/kb9202.h.

serial_init():

  This is a common function called to setup the serial port. This function internally calls cpu or board specific serial_init() function

int serial_init (void)          
	{       
        		if (!(gd->flags & GD_FLG_RELOC) || !serial_current) {
                		struct serial_device *dev = default_serial_console ();

                		return dev->init ();
         		}

        		return serial_current->init ();
	}

As of now we did not relocate the u-boot code to ram and is running from flash.

GD_FLG_RELOC will be set when the u-boot is relocated to RAM. serial_current will point to an object of type struct serial_device of current serial device that we are using. As we have not yet initialized any serial device serial_current will be NULL. default_serial_console() is a function which points to __default_serial_console() defined in common/serial.c.

                 console_init_f,         /* stage 1 init of console */

     display_banner,         /* say that we are here */

	     dram_init,              /* configure available RAM banks */

            Based on configurations the following functions are also called.
	      #if defined(CONFIG_ARCH_CPU_INIT)

        		arch_cpu_init,          /* basic arch cpu dependent setup */

	      #endif

 		board_init,             /* basic board dependent setup */

	      #if defined(CONFIG_USE_IRQ)

        		interrupt_init,         /* set up exceptions */

	      #endif

 	      #if defined(CONFIG_HARD_I2C) || defined(CONFIG_SOFT_I2C)

                       init_func_i2c,

	      #endif

4) Initialize NAND if configured.

#if defined(CONFIG_CMD_NAND)

        puts ("NAND:  ");

        nand_init();            /* go init the NAND */

#endif

nand_init() function is defined in drivers/mtd/nand/nand.c

5) Initialize if dataflash is configured

#ifdef CONFIG_HAS_DATAFLASH
AT91F_DataflashInit();

        dataflash_print_info();

#endif

6) Get all  input and output devices list.

stdio_init ();  /* get the devices list */
stdio_init() is defined in common/stdio.c.

7) Call console_init_r () to setup the console info like where the input(stdin) should be taken and where the output(stdout) to be returned.

console_init_r ();      /* fully init console as a device */

8)  Enable the interrupts

enable_interrupts ();

enable_interrupt() for arm boards is defined in lib_arm/interrupts.c. Writing 0x80 into cpsr register will enable the interrupts. enable_interrupts() is the following assembly code.

__asm__ __volatile__("mrs %0, cpsr\n"

                             "bic %0, %0, #0x80\n"

                             "msr cpsr_c, %0"

                             : "=r" (temp)

                             :

                             : "memory");

9) Get the ‘loadaddr’ environment variable defined in u-boot through setenv. Kernel image is loaded at the load address.

        /* Initialize from environment */

if ((s = getenv ("loadaddr")) != NULL) {

                load_addr = simple_strtoul (s, NULL, 16);

}

10) All initialization has been done, now enter a main loop where we accept commands from the user and execute them.

 /* main_loop() can return to retry autoboot, if so just run it again. */

        for (;;) {

                main_loop ();

        }

main_loop() function is defined in common/main.c. This function is common for all boards

When the U-boot is booting up it will take a default command that is given in the U-boot enviroment variable bootcmd and executes that command. If this variable is not defined U-boot will provide U-boot prompt where user can enter the commands. U-boot can be made to go to its  boot prompt even if the bootcmd is defined by giving some bootdelay. U-boot waits until the bootdelay expires or user presses any key. If user does not press any key U-boot will execute the default command defined in bootcmd else U-boot goes to its prompt.

void main_loop (void)
{
 #if defined(CONFIG_BOOTDELAY) && (CONFIG_BOOTDELAY >= 0)

 s = getenv ("bootdelay");

 bootdelay = s ? (int)simple_strtol(s, NULL, 10) : CONFIG_BOOTDELAY;

       debug ("### main_loop entered: bootdelay=%d\n\n", bootdelay);

       s = getenv ("bootcmd");

       debug ("### main_loop: bootcmd=\"%s\"\n", s ? s : "<UNDEFINED>");

       if (bootdelay >= 0 && s && !abortboot (bootdelay)) {

       run_command (s, 0);
 }

If U-boot is interrupted by user by pressing a key U-boot enters into an infinite while loop and accepts user commands and executes them.

        for (;;) {

		/* Read the command */
		len = readline (CONFIG_SYS_PROMPT);
                  if (len > 0)
                        strcpy (lastcommand, console_buffer);

                   /* Execute the command */
		rc = run_command (lastcommand, flag);
        }

CONFIG_BOOTDELAY is the number of seconds the u-boot should wait for user interrupt before it takes its default action. CONFIG_BOOTDELAY  is defined in the  board specific file header file, in this case in include/configs/kb9202.h.

As we have explained each U-boot command is an object of type struct cmd_tbl_s. The command name and function to be executed will be stored in this structure. All commands structures are kept in memory at perticular memroy,  __u_boot_cmd_start and __u_boot_cmd_end contain start and end address of this memory section called command table.

run_command() function takes the command name and finds the data structure belonging to this command in the command table and calls the corresponding command function. Lets assume that bootcmd  is not configured and U-boot provided the prompt to enter commands. And also assume that the linux image is loaded into the ram in a perticular address using tftpboot command or using some ymodem command and bootm command is given at the U-boot prompt.

kb-9202# bootm 0x00280000

The function do_bootm() that is responsible for executing bootm command. bootm  loads kernel image.

kernel image (zImage) decompression

    arch/arm/boot/compressed/head.S: start (108)
        First code executed, jumped to by the bootloader, at label "start" (108)
        save contents of registers r1 and r2 in r7 and r8 to save off architecture ID and atags pointer passed in by bootloader (118)
        execute arch-specific code (inserted at 146)
            arch/arm/boot/compressed/head-xscale.S or other arch-specific code file
            added to build in arch/arm/boot/compressed/Makefile
            linked into head.S by linker section declaration:  .section “start”
            flush cache, turn off cache and MMU
        load registers with stored parameters (152)
            sp = stack pointer for decompression code (152)
            r4 = zreladdr = kernel entry point physical address
        check if running at link address, and fix up global offset table if not (196)
        zero decompression bss (205)
        call cache_on to turn on cache (218)
            defined at arch/arm/boot/compressed/head.S (320)
            call call_cache_fn to turn on cache as appropriate for processor variant
                defined at arch/arm/boot/compressed/head.S (505)
                walk through proc_types list (530) until find corresponding processor
                call cache-on function in list item corresponding to processor (511)
                    for ARMv5tej core, cache_on function is __armv4_mmu_cache_on (417)
                        call setup_mmu to set up initial page tables since MMU must be on for cache to be on (419)
                        turn on cache and MMU (426)
        check to make sure won't overwrite image during decompression; assume not for this trace (232)
        call decompress_kernel to decompress kernel to RAM (277)
        branch to call_kernel (278)
            call cache_clean_flush to flush cache contents to RAM (484)
            call cache_off to turn cache off as expected by kernel initialization routines (485)
            jump to start of kernel in RAM (489)
                jump to address in r4 = zreladdr from previous load
                    zreladdr = ZRELADDR = zreladdr-y
                    zreladdr-y specified in arch/arm/mach-vx115/Makefile.boot

ARM-specific kernel code

    arch/arm/kernel/head.S: stext (72)
        call __lookup_processor_type (76)
            defined in arch/arm/kernel/head-common.S (146)
            search list of supported processor types __proc_info_begin (176)
                kernel may be built to support more than one processor type
                list of proc_info_list structs 
                    defined in arch/arm/mm/proc-arm926.S (467) and other corresponding proc-*.S files
                    linked into list by section declaration:  .section ".proc.info.init"
            return pointer to proc_info_list struct corresponding to processor if found, or loop in error if not
        call __lookup_machine_type (79)
            defined in arch/arm/kernel/head-common.S (194)
            search list of supported machines (boards)
                kernel may be built to support more than one board
                list of machine_desc structs 
                    machine_desc struct for boards defined in board-specific file vx115_vep.c
                    linked into list by section declaration that's part of MACHINE_DESC macro
            return pointer to machine_desc struct corresponding to machine (board)
        call __create_page_tables to set up initial MMU tables (82)
        set lr to __enable_mmu, r13 to address of __switch_data (91, 93)
            lr and r13 used for jumps after the following calls
            __switch_data defined in arch/arm/kernel/head-common.S (15)
        call the __cpu_flush function pointer in the previously returned proc_info_list struct (94)
            offset is #PROCINFO_INITFUNC into struct
            this function is __arm926_setup for the ARM 926EJ-S, defined in arch/arm/mm/proc-arm926.S (392)
                initialize caches, writebuffer
                jump to lr, previously set to address of __enable_mmu
        __enable_mmu (147)
            set page table pointer (TTB) in MMU hardware so it knows where to start page-table walks (167)
            enable MMU so running with virtual addresses (185)
            jump to r13, previously set to address of __switch_data, whose first field is address of __mmap_switched
                __switch_data defined in arch/arm/kernel/head-common.S (15)

    arch/arm/kernel/head-common.S: __mmap_switched (35)
        copy data segment to RAM (39)
        zero BSS (45)
        branch to start_kernel (55)

Processor-independent kernel code

init/main.c: start_kernel (456)

Shared Libraries

Tuesday, March 27th, 2012

We’ve talked a bit about what object files and executables look like, so what do shared libraries look like? I’m going to focus on ELF shared libraries as used in SVR4 (and GNU/Linux, etc.), as they are the most flexible shared library implementation and the one I know best.

Windows shared libraries, known as DLLs, are less flexible in that you have to compile code differently depending on whether it will go into a shared library or not. You also have to express symbol visibility in the source code. This is not inherently bad, and indeed ELF has picked up some of these ideas over time, but the ELF format makes more decisions at link time and is thus more powerful.

When the program linker creates a shared library, it does not yet know which virtual address that shared library will run at. In fact, in different processes, the same shared library will run at different address, depending on the decisions made by the dynamic linker. This means that shared library code must be position independent. More precisely, it must be position independent after the dynamic linker has finished loading it. It is always possible for the dynamic linker to convert any piece of code to run at any virtual address, given sufficient relocation information. However, performing the reloc computations must be done every time the program starts, implying that it will start more slowly. Therefore, any shared library system seeks to generate position independent code which requires a minimal number of relocations to be applied at runtime, while still running at close to the runtime efficiency of position dependent code.

An additional complexity is that ELF shared libraries were designed to be roughly equivalent to ordinary archives. This means that by default the main executable may override symbols in the shared library, such that references in the shared library will call the definition in the executable, even if the shared library also defines that same symbol. For example, an executable may define its own version of malloc. The C library also defines malloc, and the C library contains code which calls malloc. If the executable defines malloc itself, it will override the function in the C library. When some other function in the C library calls malloc, it will call the definition in the executable, not the definition in the C library.

There are thus different requirements pulling in different directions for any specific ELF implementation. The right implementation choices will depend on the characteristics of the processor. That said, most, but not all, processors make fairly similar decisions. I will describe the common case here. An example of a processor which uses the common case is the i386; an example of a processor which make some different decisions is the PowerPC.

In the common case, code may be compiled in two different modes. By default, code is position dependent. Putting position dependent code into a shared library will cause the program linker to generate a lot of relocation information, and cause the dynamic linker to do a lot of processing at runtime. Code may also be compiled in position independent mode, typically with the -fpic option. Position independent code is slightly slower when it calls a non-static function or refers to a global or static variable. However, it requires much less relocation information, and thus the dynamic linker will start the program faster.

Position independent code will call non-static functions via the Procedure Linkage Table or PLT. This PLT does not exist in .o files. In a .o file, use of the PLT is indicated by a special relocation. When the program linker processes such a relocation, it will create an entry in the PLT. It will adjust the instruction such that it becomes a PC-relative call to the PLT entry. PC-relative calls are inherently position independent and thus do not require a relocation entry themselves. The program linker will create a relocation for the PLT entry which tells the dynamic linker which symbol is associated with that entry. This process reduces the number of dynamic relocations in the shared library from one per function call to one per function called.

Further, PLT entries are normally relocated lazily by the dynamic linker. On most ELF systems this laziness may be overridden by setting the LD_BIND_NOW environment variable when running the program. However, by default, the dynamic linker will not actually apply a relocation to the PLT until some code actually calls the function in question. This also speeds up startup time, in that many invocations of a program will not call every possible function. This is particularly true when considering the shared C library, which has many more function calls than any typical program will execute.

In order to make this work, the program linker initializes the PLT entries to load an index into some register or push it on the stack, and then to branch to common code. The common code calls back into the dynamic linker, which uses the index to find the appropriate PLT relocation, and uses that to find the function being called. The dynamic linker then initializes the PLT entry with the address of the function, and then jumps to the code of the function. The next time the function is called, the PLT entry will branch directly to the function.

Before giving an example, I will talk about the other major data structure in position independent code, the Global Offset Table or GOT. This is used for global and static variables. For every reference to a global variable from position independent code, the compiler will generate a load from the GOT to get the address of the variable, followed by a second load to get the actual value of the variable. The address of the GOT will normally be held in a register, permitting efficient access. Like the PLT, the GOT does not exist in a .o file, but is created by the program linker. The program linker will create the dynamic relocations which the dynamic linker will use to initialize the GOT at runtime. Unlike the PLT, the dynamic linker always fully initializes the GOT when the program starts.

For example, on the i386, the address of the GOT is held in the register %ebx. This register is initialized at the entry to each function in position independent code. The initialization sequence varies from one compiler to another, but typically looks something like this:

call __i686.get_pc_thunk.bx
add $offset,%ebx

The function __i686.get_pc_thunk.bx simply looks like this:

mov (%esp),%ebx
ret

This sequence of instructions uses a position independent sequence to get the address at which it is running. Then is uses an offset to get the address of the GOT. Note that this requires that the GOT always be a fixed offset from the code, regardless of where the shared library is loaded. That is, the dynamic linker must load the shared library as a fixed unit; it may not load different parts at varying addresses.

Global and static variables are now read or written by first loading the address via a fixed offset from %ebx. The program linker will create dynamic relocations for each entry in the GOT, telling the dynamic linker how to initialize the entry. These relocations are of type GLOB_DAT.
For function calls, the program linker will set up a PLT entry to look like this:

jmp *offset(%ebx)
pushl #index

jmp first_plt_entry

The program linker will allocate an entry in the GOT for each entry in the PLT. It will create a dynamic relocation for the GOT entry of type JMP_SLOT. It will initialize the GOT entry to the base address of the shared library plus the address of the second instruction in the code sequence above. When the dynamic linker does the initial lazy binding on a JMP_SLOT reloc, it will simply add the difference between the shared library load address and the shared library base address to the GOT entry. The effect is that the first jmp instruction will jump to the second instruction, which will push the index entry and branch to the first PLT entry. The first PLT entry is special, and looks like this:

pushl 4(%ebx)
jmp *8(%ebx)

This references the second and third entries in the GOT. The dynamic linker will initialize them to have appropriate values for a callback into the dynamic linker itself. The dynamic linker will use the index pushed by the first code sequence to find the JMP_SLOT relocation. When the dynamic linker determines the function to be called, it will store the address of the function into the GOT entry references by the first code sequence. Thus, the next time the function is called, the jmp instruction will branch directly to the right code.

That was a fast pass over a lot of details, but I hope that it conveys the main idea. It means that for position independent code on the i386, every call to a global function requires one extra instruction after the first time it is called. Every reference to a global or static variable requires one extra instruction. Almost every function uses four extra instructions when it starts to initialize %ebx (leaf functions which do not refer to any global variables do not need to initialize %ebx). This all has some negative impact on the program cache. This is the runtime performance penalty paid to let the dynamic linker start the program quickly.

On other processors, the details are naturally different. However, the general flavour is similar: position independent code in a shared library starts faster and runs slightly slower.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutils

Object File Formats

Saturday, March 10th, 2012

An assembler turns human readable assembly language into an object file. An object file is a binary data file written in a format designed as input to the linker. The linker generates an executable file. This executable file is a binary data file written in a format designed as input for the operating system or the loader (this is true even when linking dynamically, as normally the operating system loads the executable before invoking the dynamic linker to begin running the program). There is no logical requirement that the object file format resemble the executable file format. However, in practice they are normally very similar.

Most object file formats define sections. A section typically holds memory contents, or it may be used to hold other types of data. Sections generally have a name, a type, a size, an address, and an associated array of data.

Object file formats may be classed in two general types: record oriented and section oriented.

A record oriented object file format defines a series of records of varying size. Each record starts with some special code, and may be followed by data. Reading the object file requires reading it from the beginning and processing each record. Records are used to describe symbols and sections. Relocations may be associated with sections or may be specified by other records. IEEE-695 and Mach-O are record oriented object file formats used today.

In a section oriented object file format the file header describes a section table with a specified number of sections. Symbols may appear in a separate part of the object file described by the file header, or they may appear in a special section. Relocations may be attached to sections, or they may appear in separate sections. The object file may be read by reading the section table, and then reading specific sections directly. ELF, COFF, PE, and a.out are section oriented object file formats.

Earlier I said that executable file formats were normally the same as object file formats. That is true for ELF, but with a twist. In ELF, object files are composed of sections: all the data in the file is accessed via the section table. Executables and shared libraries normally contain a section table, which is used by programs like nm. But the operating system and the dynamic linker do not use the section table. Instead, they use the segment table, which provides an alternative view of the file.

All the contents of an ELF executable or shared library which are to be loaded into memory are contained within a segment (an object file does not have segments). A segment has a type, some flags, a file offset, a virtual address, a physical address, a file size, a memory size, and an alignment. The file offset points to a contiguous set of bytes which are the contents of the segment, the bytes to load into memory. When the operating system or the dynamic linker loads a file, it will do so by walking through the segments and loading them into memory (typically by using the mmap system call). All the information needed by the dynamic linker–the dynamic relocations, the dynamic symbol table, etc.–are accessed via information stored in special segments.

Although an ELF executable or shared library does not, strictly speaking, require any sections, they normally do have them. The contents of a loadable section will fall entirely within a single segment.

The program linker reads sections from the input object files. It sorts and concatenates them into sections in the output file. It maps all the loadable sections into segments in the output file. It lays out the section contents in the output file segments respecting alignment and access requirements, so that the segments may be mapped directly into memory. The sections are mapped to segments based on the access requirements: normally all the read-only sections are mapped to one segment and all the writable sections are mapped to another segment. The address of the latter segment will be set so that it starts on a separate page in memory, permitting mmap to set different permissions on the mapped pages.

The segment flags are a bitmask which define access requirements. The defined flags are PF_R, PF_W, and PF_X, which mean, respectively, that the contents must be made readable, writable, or executable.

The segment virtual address is the memory address at which the segment contents are loaded at runtime. The physical address is officially undefined, but is often used as the load address when using a system which does not use virtual memory. The file size is the size of the contents in the file. The memory size may be larger than the file size when the segment contains uninitialized data; the extra bytes will be filled with zeroes. The alignment of the segment is mainly informative, as the address is already specified.

The ELF segment types are as follows:

  • PT_NULL: A null entry in the segment table, which is ignored.
  • PT_LOAD: A loadable entry in the segment table. The operating system or dynamic linker load all segments of this type. All other segments with contents will have their contents contained completely within a PT_LOAD segment.
  • PT_DYNAMIC: The dynamic segment. This points to a series of dynamic tags which the dynamic linker uses to find the dynamic symbol table, dynamic relocations, and other information that it needs.
  • PT_INTERP: The interpreter segment. This appears in an executable. The operating system uses it to find the name of the dynamic linker to run for the executable. Normally all executables will have the same interpreter name, but on some operating systems different interpreters are used in different emulation modes.
  • PT_NOTE: A note segment. This contains system dependent note information which may be used by the operating system or the dynamic linker. On GNU/Linux systems shared libraries often have a ABI tag note which may be used to specify the minimum version of the kernel which is required for the shared library. The dynamic linker uses this when selecting among different shared libraries.
  • PT_SHLIB: This is not used as far as I know.
  • PT_PHDR: This indicates the address and size of the segment table. This is not too useful in practice as you have to have already found the segment table before you can find this segment.
  • PT_TLS: The TLS segment. This holds the initial values for TLS variables.
  • PT_GNU_EH_FRAME (0x6474e550): A GNU extension used to hold a sorted table of unwind information. This table is built by the GNU program linker. It is used by gcc’s support library to quickly find the appropriate handler for an exception, without requiring exception frames to be registered when the program start.
  • PT_GNU_STACK (0x6474e551): A GNU extension used to indicate whether the stack should be executable. This segment has no contents. The dynamic linker sets the permission of the stack in memory to the permissions of this segment.
  • PT_GNU_RELRO (0x6474e552): A GNU extension which tells the dynamic linker to set the given address and size to be read-only after applying dynamic relocations. This is used for const variables which require dynamic relocations.

ELF Sections

Now that we’ve done segments, lets take a quick look at the details of ELF sections. ELF sections are more complicated than segments, in that there are more types of sections. Every ELF object file, and most ELF executables and shared libraries, have a table of sections. The first entry in the table, section 0, is always a null section.

ELF sections have several fields.

  • Name.
  • Type. I discuss section types below.
  • Flags. I discuss section flags below.
  • Address. This is the address of the section. In an object file this is normally zero. In an executable or shared library it is the virtual address. Since executables are normally accessed via segments, this is essentially documentation.
  • File offset. This is the offset of the contents within the file.
  • Size. The size of the section.
  • Link. Depending on the section type, this may hold the index of another section in the section table.
  • Info. The meaning of this field depends on the section type.
  • Address alignment. This is the required alignment of the section. The program linker uses this when laying out the section in memory.
  • Entry size. For sections which hold an array of data, this is the size of one data element.

These are the types of ELF sections which the program linker may see.

  • SHT_NULL: A null section. Sections with this type may be ignored.
  • SHT_PROGBITS: A section holding bits of the program. This is an ordinary section with contents.
  • SHT_SYMTAB: The symbol table. This section actually holds the symbol table itself. The section contents are an array of ELF symbol structures.
  • SHT_STRTAB: A string table. This type of section holds null-terminated strings. Sections of this type are used for the names of the symbols and the names of the sections themselves.
  • SHT_RELA: A relocation table. The link field holds the index of the section to which these relocations apply. These relocations include addends.
  • SHT_HASH: A hash table used by the dynamic linker to speed symbol lookup.
  • SHT_DYNAMIC: The dynamic tags used by the dynamic linker. Normally thePT_DYNAMIC segment and the SHT_DYNAMIC section will point to the same contents.
  • SHT_NOTE: A note section. This is used in system dependent ways. A loadableSHT_NOTE section will become a PT_NOTE segment.
  • SHT_NOBITS: A section which takes up memory space but has no associated contents. This is used for zero-initialized data.
  • SHT_REL: A relocation table, like SHT_RELA but the relocations have no addends.
  • SHT_SHLIB: This is not used as far as I know.
  • SHT_DYNSYM: The dynamic symbol table. Normally the DT_SYMTAB dynamic tag will point to the same contents as this section (I haven’t discussed dynamic tags yet, though).
  • SHT_INIT_ARRAY: This section holds a table of function addresses which should each be called at program startup time, or, for a shared library, when the library is opened by dlopen.
  • SHT_FINI_ARRAY: Like SHT_INIT_ARRAY, but called at program exit time or dlclosetime.
  • SHT_PREINIT_ARRAY: Like SHT_INIT_ARRAY, but called before any shared libraries are initialized. Normally shared libraries initializers are run before the executable initializers. This section type may only be linked into an executable, not into a shared library.
  • SHT_GROUP: This is used to group related sections together, so that the program linker may discard them as a unit when appropriate. Sections of this type may only appear in object files. The contents of this type of section are a flag word followed by a series of section indices.
  • SHT_SYMTAB_SHNDX: ELF symbol table entries only provide a 16-bit field for the section index. For a file with more than 65536 sections, a section of this type is created. It holds one 32-bit word for each symbol. If a symbol’s section index is SHN_XINDEX, the real section index may be found by looking in theSHT_SYMTAB_SHNDX section.
  • SHT_GNU_LIBLIST (0x6ffffff7): A GNU extension used by the prelinker to hold a list of libraries found by the prelinker.
  • SHT_GNU_verdef (0x6ffffffd): A Sun and GNU extension used to hold version definitions (I’ll take about symbol versions at some point).
  • SHT_GNU_verneed (0x6ffffffe): A Sun and GNU extension used to hold versions required from other shared libraries.
  • SHT_GNU_versym (0x6fffffff): A Sun and GNU extension used to hold the versions for each symbol.

These are the types of section flags.

  • SHF_WRITE: Section contains writable data.
  • SHF_ALLOC: Section contains data which should be part of the loaded program image. For example, this would normally be set for a SHT_PROGBITS section and not set for a SHT_SYMTAB section.
  • SHF_EXECINSTR: Section contains executable instructions.
  • SHF_MERGE: Section contains constants which the program linker may merge together to save space. The compiler can use this type of section for read-only data whose address is unimportant.
  • SHF_STRINGS: In conjunction with SHF_MERGE, this means that the section holds null terminated string constants which may be merged.
  • SHF_INFO_LINK: This flag indicates that the info field in the section holds a section index.
  • SHF_LINK_ORDER: This flag tells the program linker that when it combines sections, this section must appear in the same relative order as the section in the link field. This can be used to ensure that address tables are built in the expected order.
  • SHF_OS_NONCONFORMING: If the program linker sees a section with this flag, and does not understand the type or all other flags, then it must issue an error.
  • SHF_GROUP: This section appears in a group (see SHT_GROUP, above).
  • SHF_TLS: This section holds TLS data.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutils

How to interpret call traces

Monday, March 5th, 2012

I have the following call trace and info from kernel:

INFO: task raw_device_benc:9684 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
raw_device_be D c1e5c800  5984  9684   9683
       ed9e9d5c 00000046 c1e5c800 c1e5c800 ed9e9d14 587dc805 00001e62 e884be00
       e884c054 c1e5c800 00000000 ed9e8000 f6333b80 00000000 00000000 001eda69
       00000000 e884be00 00000000 e884be00 c044319a 00000000 ed9e9d58 c013e912
Call Trace:
 [<c044319a>] ? _spin_unlock_irqrestore+0x36/0x3c
 [<c013e912>] ? trace_hardirqs_on+0xe9/0x111
 [<c044126d>] io_schedule+0x1e/0x28
 [<c018ec95>] __blockdev_direct_IO+0x9a9/0xaf2
 [<c014e2da>] ? generic_file_buffered_write+0x116/0x4e1
 [<c017efa6>] ? mnt_drop_write+0x4f/0xbc
 [<c018db1a>] blkdev_direct_IO+0x30/0x35
 [<c018da31>] ? blkdev_get_blocks+0x0/0xb9
 [<c014e067>] generic_file_direct_IO+0xda/0x125
 [<c014eecf>] generic_file_aio_read+0x9c/0x49f
 [<c013f7d1>] ? __lock_acquire+0xaea/0xb32
 [<c016c2c3>] do_sync_read+0xab/0xe9
 [<c0132f0d>] ? autoremove_wake_function+0x0/0x33
 [<c044192d>] ? mutex_unlock+0x8/0xa
 [<c018cb2d>] ? block_llseek+0xbe/0xcc
 [<c016c218>] ? do_sync_read+0x0/0xe9
 [<c016c9bc>] vfs_read+0x8a/0x106
 [<c016cdef>] sys_read+0x3b/0x60
 [<c0103809>] sysenter_past_esp+0x6a/0xb1
 =======================
no locks held by raw_device_benc/9684.

However, I am not quite sure what this section means:

raw_device_be D c1e5c800  5984  9684   9683
       ed9e9d5c 00000046 c1e5c800 c1e5c800 ed9e9d14 587dc805 00001e62 e884be00
       e884c054 c1e5c800 00000000 ed9e8000 f6333b80 00000000 00000000 001eda69
       00000000 e884be00 00000000 e884be00 c044319a 00000000 ed9e9d58 c013e912

Can anyone maybe explain a bit to me? Are those register values? More specifically I wonder if I could infer the arguments being passed to the function in the call trace from those values. Thanks.

Answer by Veda Solutions

Meaning of the following line is:

 raw_device_be D c1e5c800  5984  9684   9683

Hung process name is raw_device_be, it is in D state,process pc was at c1e5c800, free stack space is 5984, pid of the process is 9684 and parent pid of the process is 9683.

And the remaining few lines:

  ed9e9d5c 00000046 c1e5c800 c1e5c800 ed9e9d14 587dc805 00001e62 e884be00
   e884c054 c1e5c800 00000000 ed9e8000 f6333b80 00000000 00000000 001eda69
   00000000 e884be00 00000000 e884be00 c044319a 00000000 ed9e9d58 c013e912

is stack data. A 24 words(96 bytes) of stack data will be printed starting from the current stack pointer(sp) of the process. If you know how stack grows and shrinks and have the disassemble code of the vmlinux, you can find the arguments of a function. Of course you have a limited stack data given.

Note: This conversation is from the forum stack overflow

Linker Relocations

Monday, March 5th, 2012

Relocation is a computation to perform on the contents. Let’s take a closer look at the computation. In general relocation has a type, a symbol, an offset into the contents, and an addend. From the linker’s point of view, the contents are simply an uninterpreted series of bytes. Relocation changes those bytes as necessary to produce the correct final executable.

For example, consider the C code g = 0; where g is a global variable. On the i386, the compiler will turn this into an assembly language instruction, which will most likely be movl $0, g . Now, the g in the C code is a global variable, and we all more or less. The g in the assembly code is not that variable. It is a symbol which holds the address of that variable.

The assembler does not know the address of the global variable g, which is another way of saying that the assembler does not know the value of the symbol g. It is the linker that is going to pick that address. So the assembler has to tell the linker that it needs to use the address of g in this instruction. The way the assembler does this is to create relocation. We don’t use a separate relocation type for each instruction; instead, each processor will have a natural set of relocation types which are appropriate for the machine architecture. Each type of relocation expresses a
specific computation.

In the i386 case, the assembler will generate these bytes:

c7 05 00 00 00 00 00 00 00 00

The c7 05 are the instruction (movl constant to address). The first four 00 bytes are the 32-bit constant 0. The second four 00 bytes are the address. The assembler tells the linker to put the value of the symbol g into those four bytes by generating (in this case) a R_386_32 relocation. For this relocation the symbol will be g, the offset will be to the last four bytes of the instruction, the type will be R_386_32, and the addend will be 0 (in thecase of the i386 the addend is stored in the contents rather than in the relocation itself, but this is a detail). The
type R_386_32 expresses a specific computation, which is: put the 32-bit sum of the value of the symbol and the addend into the offset. Since for the i386 the addend is stored in the contents, this can also be expressed as: add the value of the symbol to the 32-bit field at the offset. When the linker performs this computation, the address in the instruction will be the address of the global variable g. Regardless of the details, the important point to note is that the relocation adjusts the contents by applying a specific computation selected by the type.

An example of a simple case which does use an addend would be

char a[10]; // A global array.

 

char* p = &a[1]; // In a function.

The assignment to p will wind up requiring a relocation for the symbol a. Here the addend will be 1, so that the resulting instruction references a + 1 rather than a + 0.

To point out how relocations are processor dependent, let’s consider g = 0; on a RISC processor: the PowerPC (in 32-bit mode). In this case, multiple assembly language instructions are required:

li 1,0 // Set register 1 to 0
lis 9,g@ha // Load high-adjusted part of g into register 9
stw 1,g@l(9) // Store register 1 to address in register 9 plus low adjusted
part g

The lis instruction loads a value into the upper 16 bits of register 9, setting the lower 16 bits to zero. The stw instruction adds a signed 16 bit value to register 9 to form an address, and then stores the value of register 1 at that address. The @ha part of the operand directs the assembler to generate a R_PPC_ADDR16_HA reloc. The @l produces a R_PPC_ADDR16_LO reloc. The goal of these relocs is to compute the value of the symbol g and use it as the store address.

That is enough information to determine the computations performed by these relocs. The R_PPC_ADDR16_HA reloc computes (SYMBOL >> 16) + ((SYMBOL & 0x8000) ? 1 : 0). The R_PPC_ADDR16_LO computes SYMBOL & 0xffff. The extra computation for R_PPC_ADDR16_HA is because the stw instruction adds the signed 16-bit value, which means that if the low 16 bits appears negative we have to adjust the high 16 bits accordingly. The offsets of the relocations are such that the 16-bit resulting values are stored into the appropriate parts of the machine instructions.

The examples I’ve shown are for relocations which appear in an object file, these types of relocations may also appear in a shared library, if they are copied there by the program linker. In ELF, there are also specific relocation types which never appear in object files but only appear in shared libraries or executables. These are the JMP_SLOT, GLOB_DAT, and RELATIVE relocations, Another type of relocation which only appears in an executable is a COPY relocation, which I will discuss later.

The specific examples of relocations I’ve discussed here are ELF specific, but the same sorts of relocations occur for any object file format. In the next part we will look at Object file formats.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutils

Note: This post has been slightly modified by Veda Solutions and posted here

What does a linker do?

Friday, March 2nd, 2012

It’s simple: a linker converts object files into executables and shared libraries. Let’s look at what that means. For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++ or Fortran (but typically not Java, as Java normally works differently, using a loader rather than a linker). A compiler translates this program code, which is human readable text, into into another form of human readable text known as assembly code. Assembly code is a readable form of the machine language which the computer can execute directly. An assembler is used to turn this assembly code into an object file. For completeness, I’ll note that some compilers include an assembler internally, and produce an object file directly. Either way, this is where things get interesting.

In the old days, many programs were complete in themselves. In those days there was generally no compiler–people wrote directly in assembly code–and the assembler actually generated an executable file which the machine could execute directly. As languages liked Fortran and Cobol started to appear, people began to think in terms of libraries of subroutines, which meant that there had to be some way to run the assembler at two different times, and combine the output into a single executable file. This required the assembler to generate a different type of output, which became known as an object file (I have no idea where this name came from). And a new program was required to combine different object files together into a single executable. This new program became known as the linker (the source of this name should be obvious).
Linkers still do the same job today. In the decades that followed, one new feature has been added: shared libraries.

Shared libraries were invented as an optimization for virtual memory systems running many processes simultaneously. People noticed that there is a set of basic functions which appear in almost every program. Before shared libraries, in a system which runs multiple processes simultaneously, that meant that almost every process had a copy of exactly the same code. This suggested that on a virtual memory system it would be possible to arrange that code so that a single copy could be shared by every process using it. The virtual memory system would be used to map the single copy into the address space of each process which needed it. This would require less physical memory to run multiple programs, and thus yield better performance.

I believe the first implementation of shared libraries was on SVR3, based on COFF. This implementation was simple, and basically assigned each shared library a fixed portion of the virtual address space. This did not require any significant changes to the linker. However, requiring each shared library to reserve an appropriate portion of the virtual address space was inconvenient.

SunOS4 introduced a more flexible version of shared libraries, which was later picked up by SVR4. This implementation postponed some of the operation of the linker to runtime. When the program started, it would automatically run a limited version of the linker which would link the program proper with the shared libraries. The version of the linker which runs when the program starts is known as the  dynamic linker. When it is necessary to distinguish them, I will refer to the version of the linker which creates the program as the program linker. This type of shared libraries was a significant change to the traditional program linker: it now had to build linking information which could be used efficiently at runtime by the dynamic linker.

Basic Linker Data Types
The linker operates on a small number of basic data types: symbols, relocations, and contents. These are defined in the input object files. Here is an overview of each of these.

A symbol is basically a name and a value. Many symbols represent static objects in the original source code–that is, objects which exist in a single place for the duration of the program. For example, in an object file generated from C code, there will be a symbol for each function and for each global and static variable. The value of such a symbol is simply an offset into the contents. This type of symbol is known as a defined symbol. It’s important not to confuse the value of the symbol representing the variable my_global_var with the value of my_global_varitself. The value of the symbol is roughly the address of the variable: the value you would get from the expression &my_global_var in C.

Symbols are also used to indicate a reference to a name defined in a different object file. Such a reference is known as an undefined symbol. There are other less commonly used types of symbols which I will describe later.

During the linking process, the linker will assign an address to each defined symbol, and will resolve each undefined symbol by finding a defined symbol with the same name. A relocation is a computation to perform on the contents.

Most relocations refer to a symbol and to an offset within the contents. Many relocations will also provide an additional operand, known as the addend. A simple, and commonly used, relocation is “set this location in the contents to the value of this symbol plus this addend.” The types of computations that relocations do are inherently dependent on the architecture of the processor for which the linker is generating code. For example, RISC processors which require two or more instructions to form a memory address will have separate relocations to be used with each of those instructions; for example, “set this location in the contents to the lower 16 bits of the value of this symbol.”

During the linking process, the linker will perform all of the relocation computations as directed. A relocation in an object file may refer to an undefined symbol. If the linker is unable to resolve that symbol, it will normally issue an error (but not always: for some symbol types or some relocation types an error may not be appropriate).

The contents are what memory should look like during the execution of the program. Contents have a size, an array of bytes, and a type. They contain the machine code generated by the compiler and assembler (known as text). They contain the values of initialized variables (data). They contain static unnamed data like string constants and switch tables (read-only data or rdata). They contain uninitialized variables, in which case the array of bytes is generally omitted and assumed to contain only zeroes (bss). The compiler and the assembler work hard to generate exactly the right contents, but the linker really doesn’t care about them except as raw data. The linker reads the contents from each file, concatenates them all together sorted by type, applies the relocations, and writes the result into the executable file.
Basic Linker Operation

At this point we already know enough to understand the basic steps used by every linker.

  • Read the input object files. Determine the length and type of the contents. Read the symbols.
  • Build a symbol table containing all the symbols, linking undefined symbols to their definitions.
  • Decide where all the contents should go in the output executable file, which means deciding where they should go in memory when the program runs.
  • Read the contents data and the relocations. Apply the relocations to the contents. Write the result to the output file.
  • Optionally write out the complete symbol table with the final values of the symbols.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutil

Reach Ian at: ian@airs.com

Note

This post has been slightly modified by Veda Solutions and posted here