Archive for the ‘Ian Lance Taylor’ Category

Shared Libraries

Tuesday, March 27th, 2012

We’ve talked a bit about what object files and executables look like, so what do shared libraries look like? I’m going to focus on ELF shared libraries as used in SVR4 (and GNU/Linux, etc.), as they are the most flexible shared library implementation and the one I know best.

Windows shared libraries, known as DLLs, are less flexible in that you have to compile code differently depending on whether it will go into a shared library or not. You also have to express symbol visibility in the source code. This is not inherently bad, and indeed ELF has picked up some of these ideas over time, but the ELF format makes more decisions at link time and is thus more powerful.

When the program linker creates a shared library, it does not yet know which virtual address that shared library will run at. In fact, in different processes, the same shared library will run at different address, depending on the decisions made by the dynamic linker. This means that shared library code must be position independent. More precisely, it must be position independent after the dynamic linker has finished loading it. It is always possible for the dynamic linker to convert any piece of code to run at any virtual address, given sufficient relocation information. However, performing the reloc computations must be done every time the program starts, implying that it will start more slowly. Therefore, any shared library system seeks to generate position independent code which requires a minimal number of relocations to be applied at runtime, while still running at close to the runtime efficiency of position dependent code.

An additional complexity is that ELF shared libraries were designed to be roughly equivalent to ordinary archives. This means that by default the main executable may override symbols in the shared library, such that references in the shared library will call the definition in the executable, even if the shared library also defines that same symbol. For example, an executable may define its own version of malloc. The C library also defines malloc, and the C library contains code which calls malloc. If the executable defines malloc itself, it will override the function in the C library. When some other function in the C library calls malloc, it will call the definition in the executable, not the definition in the C library.

There are thus different requirements pulling in different directions for any specific ELF implementation. The right implementation choices will depend on the characteristics of the processor. That said, most, but not all, processors make fairly similar decisions. I will describe the common case here. An example of a processor which uses the common case is the i386; an example of a processor which make some different decisions is the PowerPC.

In the common case, code may be compiled in two different modes. By default, code is position dependent. Putting position dependent code into a shared library will cause the program linker to generate a lot of relocation information, and cause the dynamic linker to do a lot of processing at runtime. Code may also be compiled in position independent mode, typically with the -fpic option. Position independent code is slightly slower when it calls a non-static function or refers to a global or static variable. However, it requires much less relocation information, and thus the dynamic linker will start the program faster.

Position independent code will call non-static functions via the Procedure Linkage Table or PLT. This PLT does not exist in .o files. In a .o file, use of the PLT is indicated by a special relocation. When the program linker processes such a relocation, it will create an entry in the PLT. It will adjust the instruction such that it becomes a PC-relative call to the PLT entry. PC-relative calls are inherently position independent and thus do not require a relocation entry themselves. The program linker will create a relocation for the PLT entry which tells the dynamic linker which symbol is associated with that entry. This process reduces the number of dynamic relocations in the shared library from one per function call to one per function called.

Further, PLT entries are normally relocated lazily by the dynamic linker. On most ELF systems this laziness may be overridden by setting the LD_BIND_NOW environment variable when running the program. However, by default, the dynamic linker will not actually apply a relocation to the PLT until some code actually calls the function in question. This also speeds up startup time, in that many invocations of a program will not call every possible function. This is particularly true when considering the shared C library, which has many more function calls than any typical program will execute.

In order to make this work, the program linker initializes the PLT entries to load an index into some register or push it on the stack, and then to branch to common code. The common code calls back into the dynamic linker, which uses the index to find the appropriate PLT relocation, and uses that to find the function being called. The dynamic linker then initializes the PLT entry with the address of the function, and then jumps to the code of the function. The next time the function is called, the PLT entry will branch directly to the function.

Before giving an example, I will talk about the other major data structure in position independent code, the Global Offset Table or GOT. This is used for global and static variables. For every reference to a global variable from position independent code, the compiler will generate a load from the GOT to get the address of the variable, followed by a second load to get the actual value of the variable. The address of the GOT will normally be held in a register, permitting efficient access. Like the PLT, the GOT does not exist in a .o file, but is created by the program linker. The program linker will create the dynamic relocations which the dynamic linker will use to initialize the GOT at runtime. Unlike the PLT, the dynamic linker always fully initializes the GOT when the program starts.

For example, on the i386, the address of the GOT is held in the register %ebx. This register is initialized at the entry to each function in position independent code. The initialization sequence varies from one compiler to another, but typically looks something like this:

call __i686.get_pc_thunk.bx
add $offset,%ebx

The function __i686.get_pc_thunk.bx simply looks like this:

mov (%esp),%ebx
ret

This sequence of instructions uses a position independent sequence to get the address at which it is running. Then is uses an offset to get the address of the GOT. Note that this requires that the GOT always be a fixed offset from the code, regardless of where the shared library is loaded. That is, the dynamic linker must load the shared library as a fixed unit; it may not load different parts at varying addresses.

Global and static variables are now read or written by first loading the address via a fixed offset from %ebx. The program linker will create dynamic relocations for each entry in the GOT, telling the dynamic linker how to initialize the entry. These relocations are of type GLOB_DAT.
For function calls, the program linker will set up a PLT entry to look like this:

jmp *offset(%ebx)
pushl #index

jmp first_plt_entry

The program linker will allocate an entry in the GOT for each entry in the PLT. It will create a dynamic relocation for the GOT entry of type JMP_SLOT. It will initialize the GOT entry to the base address of the shared library plus the address of the second instruction in the code sequence above. When the dynamic linker does the initial lazy binding on a JMP_SLOT reloc, it will simply add the difference between the shared library load address and the shared library base address to the GOT entry. The effect is that the first jmp instruction will jump to the second instruction, which will push the index entry and branch to the first PLT entry. The first PLT entry is special, and looks like this:

pushl 4(%ebx)
jmp *8(%ebx)

This references the second and third entries in the GOT. The dynamic linker will initialize them to have appropriate values for a callback into the dynamic linker itself. The dynamic linker will use the index pushed by the first code sequence to find the JMP_SLOT relocation. When the dynamic linker determines the function to be called, it will store the address of the function into the GOT entry references by the first code sequence. Thus, the next time the function is called, the jmp instruction will branch directly to the right code.

That was a fast pass over a lot of details, but I hope that it conveys the main idea. It means that for position independent code on the i386, every call to a global function requires one extra instruction after the first time it is called. Every reference to a global or static variable requires one extra instruction. Almost every function uses four extra instructions when it starts to initialize %ebx (leaf functions which do not refer to any global variables do not need to initialize %ebx). This all has some negative impact on the program cache. This is the runtime performance penalty paid to let the dynamic linker start the program quickly.

On other processors, the details are naturally different. However, the general flavour is similar: position independent code in a shared library starts faster and runs slightly slower.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutils

Object File Formats

Saturday, March 10th, 2012

An assembler turns human readable assembly language into an object file. An object file is a binary data file written in a format designed as input to the linker. The linker generates an executable file. This executable file is a binary data file written in a format designed as input for the operating system or the loader (this is true even when linking dynamically, as normally the operating system loads the executable before invoking the dynamic linker to begin running the program). There is no logical requirement that the object file format resemble the executable file format. However, in practice they are normally very similar.

Most object file formats define sections. A section typically holds memory contents, or it may be used to hold other types of data. Sections generally have a name, a type, a size, an address, and an associated array of data.

Object file formats may be classed in two general types: record oriented and section oriented.

A record oriented object file format defines a series of records of varying size. Each record starts with some special code, and may be followed by data. Reading the object file requires reading it from the beginning and processing each record. Records are used to describe symbols and sections. Relocations may be associated with sections or may be specified by other records. IEEE-695 and Mach-O are record oriented object file formats used today.

In a section oriented object file format the file header describes a section table with a specified number of sections. Symbols may appear in a separate part of the object file described by the file header, or they may appear in a special section. Relocations may be attached to sections, or they may appear in separate sections. The object file may be read by reading the section table, and then reading specific sections directly. ELF, COFF, PE, and a.out are section oriented object file formats.

Earlier I said that executable file formats were normally the same as object file formats. That is true for ELF, but with a twist. In ELF, object files are composed of sections: all the data in the file is accessed via the section table. Executables and shared libraries normally contain a section table, which is used by programs like nm. But the operating system and the dynamic linker do not use the section table. Instead, they use the segment table, which provides an alternative view of the file.

All the contents of an ELF executable or shared library which are to be loaded into memory are contained within a segment (an object file does not have segments). A segment has a type, some flags, a file offset, a virtual address, a physical address, a file size, a memory size, and an alignment. The file offset points to a contiguous set of bytes which are the contents of the segment, the bytes to load into memory. When the operating system or the dynamic linker loads a file, it will do so by walking through the segments and loading them into memory (typically by using the mmap system call). All the information needed by the dynamic linker–the dynamic relocations, the dynamic symbol table, etc.–are accessed via information stored in special segments.

Although an ELF executable or shared library does not, strictly speaking, require any sections, they normally do have them. The contents of a loadable section will fall entirely within a single segment.

The program linker reads sections from the input object files. It sorts and concatenates them into sections in the output file. It maps all the loadable sections into segments in the output file. It lays out the section contents in the output file segments respecting alignment and access requirements, so that the segments may be mapped directly into memory. The sections are mapped to segments based on the access requirements: normally all the read-only sections are mapped to one segment and all the writable sections are mapped to another segment. The address of the latter segment will be set so that it starts on a separate page in memory, permitting mmap to set different permissions on the mapped pages.

The segment flags are a bitmask which define access requirements. The defined flags are PF_R, PF_W, and PF_X, which mean, respectively, that the contents must be made readable, writable, or executable.

The segment virtual address is the memory address at which the segment contents are loaded at runtime. The physical address is officially undefined, but is often used as the load address when using a system which does not use virtual memory. The file size is the size of the contents in the file. The memory size may be larger than the file size when the segment contains uninitialized data; the extra bytes will be filled with zeroes. The alignment of the segment is mainly informative, as the address is already specified.

The ELF segment types are as follows:

  • PT_NULL: A null entry in the segment table, which is ignored.
  • PT_LOAD: A loadable entry in the segment table. The operating system or dynamic linker load all segments of this type. All other segments with contents will have their contents contained completely within a PT_LOAD segment.
  • PT_DYNAMIC: The dynamic segment. This points to a series of dynamic tags which the dynamic linker uses to find the dynamic symbol table, dynamic relocations, and other information that it needs.
  • PT_INTERP: The interpreter segment. This appears in an executable. The operating system uses it to find the name of the dynamic linker to run for the executable. Normally all executables will have the same interpreter name, but on some operating systems different interpreters are used in different emulation modes.
  • PT_NOTE: A note segment. This contains system dependent note information which may be used by the operating system or the dynamic linker. On GNU/Linux systems shared libraries often have a ABI tag note which may be used to specify the minimum version of the kernel which is required for the shared library. The dynamic linker uses this when selecting among different shared libraries.
  • PT_SHLIB: This is not used as far as I know.
  • PT_PHDR: This indicates the address and size of the segment table. This is not too useful in practice as you have to have already found the segment table before you can find this segment.
  • PT_TLS: The TLS segment. This holds the initial values for TLS variables.
  • PT_GNU_EH_FRAME (0x6474e550): A GNU extension used to hold a sorted table of unwind information. This table is built by the GNU program linker. It is used by gcc’s support library to quickly find the appropriate handler for an exception, without requiring exception frames to be registered when the program start.
  • PT_GNU_STACK (0x6474e551): A GNU extension used to indicate whether the stack should be executable. This segment has no contents. The dynamic linker sets the permission of the stack in memory to the permissions of this segment.
  • PT_GNU_RELRO (0x6474e552): A GNU extension which tells the dynamic linker to set the given address and size to be read-only after applying dynamic relocations. This is used for const variables which require dynamic relocations.

ELF Sections

Now that we’ve done segments, lets take a quick look at the details of ELF sections. ELF sections are more complicated than segments, in that there are more types of sections. Every ELF object file, and most ELF executables and shared libraries, have a table of sections. The first entry in the table, section 0, is always a null section.

ELF sections have several fields.

  • Name.
  • Type. I discuss section types below.
  • Flags. I discuss section flags below.
  • Address. This is the address of the section. In an object file this is normally zero. In an executable or shared library it is the virtual address. Since executables are normally accessed via segments, this is essentially documentation.
  • File offset. This is the offset of the contents within the file.
  • Size. The size of the section.
  • Link. Depending on the section type, this may hold the index of another section in the section table.
  • Info. The meaning of this field depends on the section type.
  • Address alignment. This is the required alignment of the section. The program linker uses this when laying out the section in memory.
  • Entry size. For sections which hold an array of data, this is the size of one data element.

These are the types of ELF sections which the program linker may see.

  • SHT_NULL: A null section. Sections with this type may be ignored.
  • SHT_PROGBITS: A section holding bits of the program. This is an ordinary section with contents.
  • SHT_SYMTAB: The symbol table. This section actually holds the symbol table itself. The section contents are an array of ELF symbol structures.
  • SHT_STRTAB: A string table. This type of section holds null-terminated strings. Sections of this type are used for the names of the symbols and the names of the sections themselves.
  • SHT_RELA: A relocation table. The link field holds the index of the section to which these relocations apply. These relocations include addends.
  • SHT_HASH: A hash table used by the dynamic linker to speed symbol lookup.
  • SHT_DYNAMIC: The dynamic tags used by the dynamic linker. Normally thePT_DYNAMIC segment and the SHT_DYNAMIC section will point to the same contents.
  • SHT_NOTE: A note section. This is used in system dependent ways. A loadableSHT_NOTE section will become a PT_NOTE segment.
  • SHT_NOBITS: A section which takes up memory space but has no associated contents. This is used for zero-initialized data.
  • SHT_REL: A relocation table, like SHT_RELA but the relocations have no addends.
  • SHT_SHLIB: This is not used as far as I know.
  • SHT_DYNSYM: The dynamic symbol table. Normally the DT_SYMTAB dynamic tag will point to the same contents as this section (I haven’t discussed dynamic tags yet, though).
  • SHT_INIT_ARRAY: This section holds a table of function addresses which should each be called at program startup time, or, for a shared library, when the library is opened by dlopen.
  • SHT_FINI_ARRAY: Like SHT_INIT_ARRAY, but called at program exit time or dlclosetime.
  • SHT_PREINIT_ARRAY: Like SHT_INIT_ARRAY, but called before any shared libraries are initialized. Normally shared libraries initializers are run before the executable initializers. This section type may only be linked into an executable, not into a shared library.
  • SHT_GROUP: This is used to group related sections together, so that the program linker may discard them as a unit when appropriate. Sections of this type may only appear in object files. The contents of this type of section are a flag word followed by a series of section indices.
  • SHT_SYMTAB_SHNDX: ELF symbol table entries only provide a 16-bit field for the section index. For a file with more than 65536 sections, a section of this type is created. It holds one 32-bit word for each symbol. If a symbol’s section index is SHN_XINDEX, the real section index may be found by looking in theSHT_SYMTAB_SHNDX section.
  • SHT_GNU_LIBLIST (0x6ffffff7): A GNU extension used by the prelinker to hold a list of libraries found by the prelinker.
  • SHT_GNU_verdef (0x6ffffffd): A Sun and GNU extension used to hold version definitions (I’ll take about symbol versions at some point).
  • SHT_GNU_verneed (0x6ffffffe): A Sun and GNU extension used to hold versions required from other shared libraries.
  • SHT_GNU_versym (0x6fffffff): A Sun and GNU extension used to hold the versions for each symbol.

These are the types of section flags.

  • SHF_WRITE: Section contains writable data.
  • SHF_ALLOC: Section contains data which should be part of the loaded program image. For example, this would normally be set for a SHT_PROGBITS section and not set for a SHT_SYMTAB section.
  • SHF_EXECINSTR: Section contains executable instructions.
  • SHF_MERGE: Section contains constants which the program linker may merge together to save space. The compiler can use this type of section for read-only data whose address is unimportant.
  • SHF_STRINGS: In conjunction with SHF_MERGE, this means that the section holds null terminated string constants which may be merged.
  • SHF_INFO_LINK: This flag indicates that the info field in the section holds a section index.
  • SHF_LINK_ORDER: This flag tells the program linker that when it combines sections, this section must appear in the same relative order as the section in the link field. This can be used to ensure that address tables are built in the expected order.
  • SHF_OS_NONCONFORMING: If the program linker sees a section with this flag, and does not understand the type or all other flags, then it must issue an error.
  • SHF_GROUP: This section appears in a group (see SHT_GROUP, above).
  • SHF_TLS: This section holds TLS data.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutils

Linker Relocations

Monday, March 5th, 2012

Relocation is a computation to perform on the contents. Let’s take a closer look at the computation. In general relocation has a type, a symbol, an offset into the contents, and an addend. From the linker’s point of view, the contents are simply an uninterpreted series of bytes. Relocation changes those bytes as necessary to produce the correct final executable.

For example, consider the C code g = 0; where g is a global variable. On the i386, the compiler will turn this into an assembly language instruction, which will most likely be movl $0, g . Now, the g in the C code is a global variable, and we all more or less. The g in the assembly code is not that variable. It is a symbol which holds the address of that variable.

The assembler does not know the address of the global variable g, which is another way of saying that the assembler does not know the value of the symbol g. It is the linker that is going to pick that address. So the assembler has to tell the linker that it needs to use the address of g in this instruction. The way the assembler does this is to create relocation. We don’t use a separate relocation type for each instruction; instead, each processor will have a natural set of relocation types which are appropriate for the machine architecture. Each type of relocation expresses a
specific computation.

In the i386 case, the assembler will generate these bytes:

c7 05 00 00 00 00 00 00 00 00

The c7 05 are the instruction (movl constant to address). The first four 00 bytes are the 32-bit constant 0. The second four 00 bytes are the address. The assembler tells the linker to put the value of the symbol g into those four bytes by generating (in this case) a R_386_32 relocation. For this relocation the symbol will be g, the offset will be to the last four bytes of the instruction, the type will be R_386_32, and the addend will be 0 (in thecase of the i386 the addend is stored in the contents rather than in the relocation itself, but this is a detail). The
type R_386_32 expresses a specific computation, which is: put the 32-bit sum of the value of the symbol and the addend into the offset. Since for the i386 the addend is stored in the contents, this can also be expressed as: add the value of the symbol to the 32-bit field at the offset. When the linker performs this computation, the address in the instruction will be the address of the global variable g. Regardless of the details, the important point to note is that the relocation adjusts the contents by applying a specific computation selected by the type.

An example of a simple case which does use an addend would be

char a[10]; // A global array.

 

char* p = &a[1]; // In a function.

The assignment to p will wind up requiring a relocation for the symbol a. Here the addend will be 1, so that the resulting instruction references a + 1 rather than a + 0.

To point out how relocations are processor dependent, let’s consider g = 0; on a RISC processor: the PowerPC (in 32-bit mode). In this case, multiple assembly language instructions are required:

li 1,0 // Set register 1 to 0
lis 9,g@ha // Load high-adjusted part of g into register 9
stw 1,g@l(9) // Store register 1 to address in register 9 plus low adjusted
part g

The lis instruction loads a value into the upper 16 bits of register 9, setting the lower 16 bits to zero. The stw instruction adds a signed 16 bit value to register 9 to form an address, and then stores the value of register 1 at that address. The @ha part of the operand directs the assembler to generate a R_PPC_ADDR16_HA reloc. The @l produces a R_PPC_ADDR16_LO reloc. The goal of these relocs is to compute the value of the symbol g and use it as the store address.

That is enough information to determine the computations performed by these relocs. The R_PPC_ADDR16_HA reloc computes (SYMBOL >> 16) + ((SYMBOL & 0x8000) ? 1 : 0). The R_PPC_ADDR16_LO computes SYMBOL & 0xffff. The extra computation for R_PPC_ADDR16_HA is because the stw instruction adds the signed 16-bit value, which means that if the low 16 bits appears negative we have to adjust the high 16 bits accordingly. The offsets of the relocations are such that the 16-bit resulting values are stored into the appropriate parts of the machine instructions.

The examples I’ve shown are for relocations which appear in an object file, these types of relocations may also appear in a shared library, if they are copied there by the program linker. In ELF, there are also specific relocation types which never appear in object files but only appear in shared libraries or executables. These are the JMP_SLOT, GLOB_DAT, and RELATIVE relocations, Another type of relocation which only appears in an executable is a COPY relocation, which I will discuss later.

The specific examples of relocations I’ve discussed here are ELF specific, but the same sorts of relocations occur for any object file format. In the next part we will look at Object file formats.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutils

Note: This post has been slightly modified by Veda Solutions and posted here

What does a linker do?

Friday, March 2nd, 2012

It’s simple: a linker converts object files into executables and shared libraries. Let’s look at what that means. For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++ or Fortran (but typically not Java, as Java normally works differently, using a loader rather than a linker). A compiler translates this program code, which is human readable text, into into another form of human readable text known as assembly code. Assembly code is a readable form of the machine language which the computer can execute directly. An assembler is used to turn this assembly code into an object file. For completeness, I’ll note that some compilers include an assembler internally, and produce an object file directly. Either way, this is where things get interesting.

In the old days, many programs were complete in themselves. In those days there was generally no compiler–people wrote directly in assembly code–and the assembler actually generated an executable file which the machine could execute directly. As languages liked Fortran and Cobol started to appear, people began to think in terms of libraries of subroutines, which meant that there had to be some way to run the assembler at two different times, and combine the output into a single executable file. This required the assembler to generate a different type of output, which became known as an object file (I have no idea where this name came from). And a new program was required to combine different object files together into a single executable. This new program became known as the linker (the source of this name should be obvious).
Linkers still do the same job today. In the decades that followed, one new feature has been added: shared libraries.

Shared libraries were invented as an optimization for virtual memory systems running many processes simultaneously. People noticed that there is a set of basic functions which appear in almost every program. Before shared libraries, in a system which runs multiple processes simultaneously, that meant that almost every process had a copy of exactly the same code. This suggested that on a virtual memory system it would be possible to arrange that code so that a single copy could be shared by every process using it. The virtual memory system would be used to map the single copy into the address space of each process which needed it. This would require less physical memory to run multiple programs, and thus yield better performance.

I believe the first implementation of shared libraries was on SVR3, based on COFF. This implementation was simple, and basically assigned each shared library a fixed portion of the virtual address space. This did not require any significant changes to the linker. However, requiring each shared library to reserve an appropriate portion of the virtual address space was inconvenient.

SunOS4 introduced a more flexible version of shared libraries, which was later picked up by SVR4. This implementation postponed some of the operation of the linker to runtime. When the program started, it would automatically run a limited version of the linker which would link the program proper with the shared libraries. The version of the linker which runs when the program starts is known as the  dynamic linker. When it is necessary to distinguish them, I will refer to the version of the linker which creates the program as the program linker. This type of shared libraries was a significant change to the traditional program linker: it now had to build linking information which could be used efficiently at runtime by the dynamic linker.

Basic Linker Data Types
The linker operates on a small number of basic data types: symbols, relocations, and contents. These are defined in the input object files. Here is an overview of each of these.

A symbol is basically a name and a value. Many symbols represent static objects in the original source code–that is, objects which exist in a single place for the duration of the program. For example, in an object file generated from C code, there will be a symbol for each function and for each global and static variable. The value of such a symbol is simply an offset into the contents. This type of symbol is known as a defined symbol. It’s important not to confuse the value of the symbol representing the variable my_global_var with the value of my_global_varitself. The value of the symbol is roughly the address of the variable: the value you would get from the expression &my_global_var in C.

Symbols are also used to indicate a reference to a name defined in a different object file. Such a reference is known as an undefined symbol. There are other less commonly used types of symbols which I will describe later.

During the linking process, the linker will assign an address to each defined symbol, and will resolve each undefined symbol by finding a defined symbol with the same name. A relocation is a computation to perform on the contents.

Most relocations refer to a symbol and to an offset within the contents. Many relocations will also provide an additional operand, known as the addend. A simple, and commonly used, relocation is “set this location in the contents to the value of this symbol plus this addend.” The types of computations that relocations do are inherently dependent on the architecture of the processor for which the linker is generating code. For example, RISC processors which require two or more instructions to form a memory address will have separate relocations to be used with each of those instructions; for example, “set this location in the contents to the lower 16 bits of the value of this symbol.”

During the linking process, the linker will perform all of the relocation computations as directed. A relocation in an object file may refer to an undefined symbol. If the linker is unable to resolve that symbol, it will normally issue an error (but not always: for some symbol types or some relocation types an error may not be appropriate).

The contents are what memory should look like during the execution of the program. Contents have a size, an array of bytes, and a type. They contain the machine code generated by the compiler and assembler (known as text). They contain the values of initialized variables (data). They contain static unnamed data like string constants and switch tables (read-only data or rdata). They contain uninitialized variables, in which case the array of bytes is generally omitted and assumed to contain only zeroes (bss). The compiler and the assembler work hard to generate exactly the right contents, but the linker really doesn’t care about them except as raw data. The linker reads the contents from each file, concatenates them all together sorted by type, applies the relocations, and writes the result into the executable file.
Basic Linker Operation

At this point we already know enough to understand the basic steps used by every linker.

  • Read the input object files. Determine the length and type of the contents. Read the symbols.
  • Build a symbol table containing all the symbols, linking undefined symbols to their definitions.
  • Decide where all the contents should go in the output executable file, which means deciding where they should go in memory when the program runs.
  • Read the contents data and the relocations. Apply the relocations to the contents. Write the result to the output file.
  • Optionally write out the complete symbol table with the final values of the symbols.

About the Author
Ian Lance Taylor is a quintessential name in the field of serious open source contributions. His works are reminiscent of a true open source hacker. Some of his major contributions, highlighting his career in his own words are (excerpts)I wrote my first linker for the AMOS operating system which ran on Alpha Micro systems. I wrote my second linker in 1993 and 1994, prototyped by Steve Chamberlain while we both worked at Cygnus Support (later Cygnus Solutions, later part of Red Hat). The linker I am now working, called gold, on will be my third. It is exclusively an ELF linker.

Other major contributions include:

  • Maintainer of GNU Binutils from 1996 to 1999
  • Middle-end Maintainer of GNU Compiler Collection
  • Active Contributions to:  autoconf, automake, CVS, GDB, Newlib, RTEMS, PostgreSQL, Coreutil

Reach Ian at: ian@airs.com

Note

This post has been slightly modified by Veda Solutions and posted here