uClinux - Shared Libraries
Although specifically intended for Motorola ColdFire microprocessors the methods used should carry over to other 68k based processors easily apart from a known issue relating to signal handling. This work will also carry over to other processors with little effort.
SnapGear Inc. engineers have previously added other important improvements to uClinux such as advanced memory management and eXecute In Place (XIP) technology. Shared library support was developed as an extension to the existing XIP code model
(-msep-data) to allow libraries to be shared amongst multiple applications. This greatly reduces the size of those executables in the file system. This in turn reduces the final image size. The overheads required to support these shared libraries are minimal and the modifications are relatively clean.
A shared library can be constructed from any combination of object and
library files with only a few restrictions. Each shared library is
capable of running initialisation and finalisation code -- via the CTORS
and DTORS mechanisms and additionally via a library specific main().
The only known restriction on shared libraries at this point is that the
environ global variable will not be initialised until the main program
has started its execution. Libraries are initialised in a specific well
defined order that cannot be overridden on a per application basis.
The penalty for this code reuse is the carrying of the shared library's
entire data segment with all applications (e.g. our libc/libgcc
combination library requires 16kb of data segment). Some of this data
space was previously required, although all of it would not be for any
single application. An additional optimisation over our previous tool
chain releases allows read only data which does not contain relocations
to be placed into the shared text segment. Previously such data lived
in the data segment. This change will greatly reduce the requirement
to carry the full data segment of the libraries.
The shared libraries themselves are flat files and are pretty much
indistinguishable from normal executables. No additional relocation and
run time linking information needs to be supplied. Most importantly,
no symbol table wastage appears on the embedded platform, this overhead
is carried by the host system.
The implementation requires a small modification to gcc to set up data
segment pointers correctly. The generated code is just as time and space
efficient as traditional PIC code is. Other changes are required in the
flat file loader and in the build process. Only the flat file loader
requires significant modifications and these changes should support
other platforms using the same method of shared library realisation.
The implementation produces a separate GOT and data segment for the
application and each shared library it uses. These are all allocated
separately and accessed using the %a5 register (the standard PIC
data segment pointer register on the 68000). The trick is to contain
pointers to the other data segments at known fixed offsets from the base
of each data segment. Thus a function can load the pointer to its data
segment by loading from the appropriate offset from the base of any of
the allocated data segments.
In this implementation, each library is allocated a specific library
identification number and this number is used to determine the offset
used to locate its private data segment. The main program is allocated
the identification of 0 just as if were a library. Thus every procedure
can determine the location of its data segment and calls into and back
out of library routines will function properly and without restriction.
To handle cross references from a program to its libraries or from a
library to another library, another innovation is required. Every library
is statically linked such that the top eight address bits correspond to
its identification number. When the main program is linked, it refers to
these addresses directly. The flat file loader in the kernel has the task
of unravelling these cross references. Because the library identification
is included within the address it is possible to perform the necessary
relocations. Additionally, due to the way we build programs, all
address references in a program are made either via the GOT or via an
initialised address in the data segment. These are both located in the
private data segment of each library and program and thus the relocations
can be performed without disturbing the code (hence shared text segments
are preserved). This method of handling relocations allows a program to
directly access data included inside a library (e.g. stdin), the kernel
loader transparently handles all of the inter-library cross references.
There are some limitations, none of which are severe. Because each
library and application requires a GOT to access address information a
limit of 8192 globals and distinct procedures exists for each library
and application. This is by no means a limitation on the size of data,
only the total number of distinct names. It is important to note that
each library can have 8192 such entries in addition to the 8192 for
the main application. In practice we've never needed even a quarter
of these entries and that was without shared libraries (i.e. using
- -msep-data builds).
There is another limitation on the number of shared libraries available.
There is a fairly strict limit of 255 shared libraries imposed by the
compiler modifications. This limit is system wide and will be troublesome
to overcome, although by no means impossible. The kernel flat file
loader imposes a smaller limit on the number of shared libraries.
This is currently set to 3 and can be increased simply by changing a #define in the flat file loader C file. The reason this is set so low
is to reduce the overhead caused by having a larger number. For n shared
libraries there is an overhead of 4n+4 bytes per data segment and with up
to n+1 data segments per application this quickly adds up. Since a shared
library can be made up of any combination of object files and libraries,
this limit isn't so bad. For example, our shared libc actually includes
both libc and libgcc (which has the compiler support routines). A C++
application suite could define a second shared library which contains
all the C++ libraries rolled together.
Finally, there is a code size limitation due to the stealing of address
bits from the top of the relocation entries. This limitation is 16
megabytes per library and application. If your application grows
beyond this size, it will need to be split up into two or more pieces.
This limitation really couldn't be considered significant in the context
of an embedded environment.
The shared library tools can be downloaded from mid-April 2002 onwards from uClinux elf tools.
Further information on SnapGear VPN Routers
Further information on SecureEdge Development Platforms
Further Technical Bulletins
|