Relocation (computing)
Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses.[1][2] Prior to the advent of multiprocess systems, and still in many embedded systems, the addresses for objects were absolute starting at a known location, often zero. Since multiprocessing systems dynamically link and switch between programs it became necessary to be able to relocate objects using position-independent code. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.
Relocation is typically done by the linker at link time, but it can also be done at load time by a relocating loader, or at run time by the running program itself. Some architectures avoid relocation entirely by deferring address assignment to run time; this is known as zero address arithmetic.
Segmentation
Object files are segmented into various memory segment types. Example segments include code segment(.text), initialized data segment(.data), uninitialized data segment(.bss), or others.
Relocation table
The relocation table is a list of pointers created by the translator (a compiler or assembler) and stored in the object or executable file. Each entry in the table, or "fixup", is a pointer to an absolute address in the object code that must be changed when the loader relocates the program so that it will refer to the correct location. Fixups are designed to support relocation of the program as a complete unit. In some cases, each fixup in the table is itself relative to a base address of zero, so the fixups themselves must be changed as the loader moves through the table.[2]
In some architectures a fixup that crosses certain boundaries (such as a segment boundary) or that is not aligned on a word boundary is illegal and flagged as an error by the linker.[3]
DOS and 16-bit Windows
Far pointers (32-bit pointers with segment:offset, used to address 20-bit 640 KB memory space available to DOS programs), which point to code or data within a DOS executable (EXE), do not have absolute segments, because the actual address of code/data depends on where the program is loaded in memory and this is not known until the program is loaded.
Instead, segments are relative values in the DOS EXE file. These segments need to be corrected, when the executable has been loaded into memory. The EXE loader uses a relocation table to find the segments which need to be adjusted.
32-bit Windows
With 32-bit Windows operating systems it is not mandatory to provide relocation tables for EXE files, since they are the first image loaded into the virtual address space and thus will be loaded at their preferred base address.
For both DLLs and for EXEs which opt into address space layout randomization (ASLR) - an exploit mitigation technique introduced with Windows Vista, relocation tables once again become mandatory because of the possibility that the binary may be dynamically moved before being executed, even though they are still the first thing loaded in the virtual address space.
64-bit Windows
When running native 64-bit binaries on Windows Vista and above, ASLR is mandatory, and thus relocation sections cannot be omitted by the compiler.
Unix-like systems
The Executable and Linkable Format (ELF) executable format and shared library format used by most Unix-like systems allows several types of relocation to be defined.[4]
Relocation procedure
The linker reads segment information and relocation tables in the object files and performs relocation by:
- merging all segments of common type into a single segment of that type
- assigning unique run time addresses to each section and each symbol, giving all code (functions) and data (global variables) unique run time addresses
- referring to the relocation table to modify symbols so that they point to the correct run time addresses.
Example
The following example uses Donald Knuth's MIX architecture and MIXAL assembly language. The principles are the same for any architecture, though the details will change.
- (A) Program SUBR is compiled to produce object file (B), shown as both machine code and assembler. The compiler may start the compiled code at an arbitrary location, often location 1 as shown. Location 13 contains the machine code for the jump instruction to statement ST in location 5.
- (C) If SUBR is later linked with other code it may be stored at a location other than 1. In this example the linker places it at location 120. The address in the jump instruction, which is now at location 133, must be relocated to point to the new location of the code for statement ST, now 125. [1 61 shown in the instruction is the MIX machine code representation of 125].
- (D) When the program is loaded into memory to run it may be loaded at some location other than the one assigned by the linker. This example shows SUBR now at location 300. The address in the jump instruction, now at 313, needs to be relocated again so that it points to the updated location of ST, 305. [4 49 is the MIX machine representation of 305].
See also
References
- "Types of Object Code". iRMX 86 Application Loader Reference Manual (PDF). Intel. pp. 1-2, 1-3. Archived (PDF) from the original on 2020-01-11. Retrieved 2020-01-11.
[…] Absolute code, and an absolute object module, is code that has been processed by LOC86 to run only at a specific location in memory. The Loader loads an absolute object module only into the specific location the module must occupy. Position-independent code (commonly referred to as PIC) differs from absolute code in that PIC can be loaded into any memory location. The advantage of PIC over absolute code is that PIC does not require you to reserve a specific block of memory. When the Loader loads PIC, it obtains iRMX 86 memory segments from the pool of the calling task's job and loads the PIC into the segments. A restriction concerning PIC is that, as in the PL/M-86 COMPACT model of segmentation […], it can have only one code segment and one data segment, rather than letting the base addresses of these segments, and therefore the segments themselves, vary dynamically. This means that PIC programs are necessarily less than 64K bytes in length. PIC code can be produced by means of the BIND control of LINK86. Load-time locatable code (commonly referred to as LTL code) is the third form of object code. LTL code is similar to PIC in that LTL code can be loaded anywhere in memory. However, when loading LTL code, the Loader changes the base portion of pointers so that the pointers are independent of the initial contents of the registers in the microprocessor. Because of this fixup (adjustment of base addresses), LTL code can be used by tasks having more than one code segment or more than one data segment. This means that LTL programs may be more than 64K bytes in length. FORTRAN 86 and Pascal 86 automatically produce LTL code, even for short programs. LTL code can be produced by means of the BIND control of LINK86. […]
- Levine, John R. (2000) [October 1999]. "Chapter 1: Linking and Loading & Chapter 3: Object Files". Linkers and Loaders. The Morgan Kaufmann Series in Software Engineering and Programming (1 ed.). San Francisco, USA: Morgan Kaufmann. p. 5. ISBN 1-55860-496-0. OCLC 42413382. Archived from the original on 2012-12-05. Retrieved 2020-01-12. Code: Errata:
- Borland (1999-09-01) [1998-07-02]. "Borland article #15961: Coping with 'Fixup Overflow' messages". community.borland.com. Technical Information Database - Product: Borland C++ 3.1. TI961C.txt #15961. Archived from the original on 2008-07-07. Retrieved 2007-01-15.
- "Executable and Linkable Format (ELF)" (PDF). skyfree.org. Tool Interface Standards (TIS) Portable Formats Specification, Version 1.1. Archived (PDF) from the original on 2019-12-24. Retrieved 2018-10-01.
Further reading
- Johnson, Glenn (1975-12-21) [1975-11-13], 11/34 Memory Management Basic Logic test, Digital Equipment Corporation (DEC), MAINDEC-11-DFKTA-A-D, retrieved 2017-08-19
- Kildall, Gary Arlen (February 1978). "A simple technique for static relocation of absolute machine code". Dr. Dobb's Journal of Computer Calisthenics & Orthodontia. People's Computer Company. 3 (2): 10–13 (66–69). ISBN 0-8104-5490-4. #22. Archived from the original on 2017-09-09. Retrieved 2017-08-19. (This "resize" method, named page boundary relocation, could be applied statically to a CP/M-80 disk image using MOVCPM in order to maximize the TPA for programs to run. It was also utilized dynamically by the CP/M debugger Dynamic Debugging Tool (DDT) to relocate itself into higher memory. The same approach was independently developed by Bruce Van Natta of IMS Associates to produce relocatable PL/M code. As paragraph boundary relocation, another variant of this method was later utilized by dynamically HMA self-relocating TSRs like KEYB, SHARE, and NLSFUNC under DR DOS 6.0 and higher. A much more sophisticated and byte-level granular method based on a somewhat similar approach was independently conceived and implemented by Matthias R. Paul and Axel C. Frinke for their dynamic dead-code elimination to dynamically minimize the runtime footprint of resident drivers and TSRs (like FreeKEYB).)
- Huitt, Robert; Eubanks, Gordon; Rolander, Thomas "Tom" Alan; Laws, David; Michel, Howard E.; Halla, Brian; Wharton, John Harrison; Berg, Brian; Su, Weilian; Kildall, Scott; Kampe, Bill (2014-04-25). Laws, David (ed.). "Legacy of Gary Kildall: The CP/M IEEE Milestone Dedication" (PDF) (video transscription). Pacific Grove, California, USA: Computer History Museum. CHM Reference number: X7170.2014. Archived (PDF) from the original on 2014-12-27. Retrieved 2020-01-19.
[…] Laws: […] "dynamic relocation" of the OS. Can you tell us what that is and why it was important? […] Eubanks: […] what Gary did […] was […] mind boggling. […] I remember the day at the school he came bouncing into the lab and he said, I have figured out how to relocate. He took advantage of the fact that the only byte was always going to be the high order byte. And so he created a bitmap. […] it didn't matter how much memory the computer had, the operating system could always be moved into the high memory. Therefore, you could commercialize this […] on machines of different amounts of memory. […] you couldn't be selling a 64K CP/M and a 47K CP/M. It'd just be ridiculous to have a hard compile in the addresses. So Gary figured this out one night, probably in the middle of the night thinking about some coding thing, and this really made CP/M possible to commercialize. I really think that without that relocation it would have been a very tough problem. To get people to buy it, it'd seem complicated to them, and if you added more memory you'd have to go get a different operating system. […] Intel […] had the bytes reversed, right, for the memory addresses. But they were always in the same place, so you could relocate it on a 256 byte boundary, to be precise. You could therefore always relocate it with just a bitmap of where those […] Laws: Certainly the most eloquent explanation I've ever had of dynamic relocation […]
(33 pages) - Lieber, Eckhard; von Massenbach, Thomas (1987). "CP/M 2 lernt dazu. Modulare Systemerweiterungen auch für das 'alte' CP/M". c't - magazin für computertechnik (part 1) (in German). Heise Verlag. 1987 (1): 124–135; Lieber, Eckhard; von Massenbach, Thomas (1987). "CP/M 2 lernt dazu. Modulare Systemerweiterungen auch für das 'alte' CP/M". c't - magazin für computertechnik (part 2) (in German). Heise Verlag. 1987 (2): 78–85; Huck, Alex (2016-10-09). "RSM für CP/M 2.2". Homecomputer DDR (in German). Archived from the original on 2016-11-25. Retrieved 2016-11-25.
- Guzis, Charles "Chuck" P. (2015-03-16). "Re: CP/M assembly language programming". Vintage Computer Forum. Genre: CP/M and MP/M. Archived from the original on 2020-02-01. Retrieved 2020-02-01.
[…] Ever wonder how MOVCPM works? Since the BDOS and CCP is in high memory, above the user application, addresses have to be changed every time the system memory size is changed. Now that requires relocating addresses in 8080 code, since relative addressing is not part of the hardware. Without implementing a full-blown relocating assembler and loader, how does one go about this? It's actually pretty clever and MP/M even uses this scheme to construct its page-relocatable files. You simply assemble the source program twice with the second assembly origin 100H (256 bytes) higher than the first. The two binary images are then compared, byte for byte, and a map constructed of where pairs of bytes differ in value by exactly 100H. The result is a list of locations where the relocation value needs to be adjusted if the location of a program in memory is to be moved. MP/M calls this sort of file PRL (page relocatable), but I don't know that CP/M 2.2 ever coined a name for it. […]
- Guzis, Charles "Chuck" P. (2015-07-29). "Re: How does MOVCPM.COM work?". Vintage Computer Forum. Genre: CP/M and MP/M. Archived from the original on 2020-02-01. Retrieved 2020-02-01.
[…] MOVCPM uses an early type of PRL format. Basically, CP/M is assembled twice; the second time is 100H bytes offset. The two binaries are compared and a bitmap constructed. A set bit implies that the high-order byte of an address is to be adjusted. Low order address bytes are not affected; hence, "Page relocatble file". Each byte in the bitmap corresponds to 8 bytes in the binary data. […] So everything to be moved in MOVCPM is part of the image and its relocation bitmap. […]
- Guzis, Charles "Chuck" P. (2016-11-08). "Re: Is it safe to use RST 28h in CP/M assembly programs?". Vintage Computer Forum. Genre: CP/M and MP/M. Archived from the original on 2020-02-01. Retrieved 2020-02-01.
[…] I've referenced PRL files and how they originally got their start with MOVCPM, but became an integral part of MP/M and CP/M 3.0. But PRL files use a bit map in which every bit corresponds to a memory location; one bits indicate that a page relocation offset should be added to the corresponding memory location. If you have very few absolute memory references (as opposed to relative ones) you may want to employ a pointer list (2 bytes per reference) rather than a bitmap. This is unlikely in 8080 code which doesn't have relative jumps, but may be a consideration for Z80 code. The trick to quickly find this out is to assemble your program twice; the second time offset by 100H, then compare the two binaries. The advantage of run-time relocation is that you don't have to incur a penalty for code that attempts to get around the relocation issue--no "tricks"; just write straight code. […]
- Roth, Richard L. (February 1978) [1977]. "Relocation Is Not Just Moving Programs". Dr. Dobb's Journal of Computer Calisthenics & Orthodontia. Ridgefield, CA, USA: People's Computer Company. 3 (2): 14–20 (70–76). ISBN 0-8104-5490-4. #22. Archived from the original on 2019-04-20. Retrieved 2019-04-19.
- Calingaert, Peter (1979) [1978-11-05]. "8.2.2 Relocating Loader". Written at University of North Carolina at Chapel Hill. In Horowitz, Ellis (ed.). Assemblers, Compilers, and Program Translation. Computer software engineering series (1st printing, 1st ed.). Potomac, Maryland, USA: Computer Science Press, Inc. pp. 237–241. ISBN 0-914894-23-4. ISSN 0888-2088. LCCN 78-21905. Retrieved 2020-03-20. (2+xiv+270+6 pages)
- The Microsoft OBJ File Format. Microsoft, Product Support Services. Application Note SS0288. Archived from the original on 2017-09-09. Retrieved 2017-08-21.
- Tanenbaum, Andrew Stuart; Bos, Herbert (2015). Modern Operating Systems (4 ed.). Pearson Education Inc. ISBN 978-0-13359162-0.
- Elliott, John C. (2012-06-05) [2000-01-02]. "PRL file format". seasip.info. Archived from the original on 2020-01-26. Retrieved 2020-01-26.
[…] A PRL file is a relocatable binary file, used by MP/M and CP/M Plus for various modules other than .COM files. The file format is also used for FID files on the Amstrad PCW. There are several file formats which use versions of PRL: SPR (System PRL), RSP (Resident System Process). LINK-80 can also produce OVL (overlay) files, which have a PRL header but are not relocatable. GSX drivers are in PRL format; so are Resident System Extensions (.RSX). […]
- Elliott, John C. (2012-06-05) [2000-01-02]. "Microsoft REL format". seasip.info. Archived from the original on 2020-01-26. Retrieved 2020-01-26.
[…] The REL format is generated by Microsoft's M80 and Digital Research's RMAC. […]
- feilipu (2018-09-05) [2018-09-02]. "Support for PRL, page relocatable executable for MP/M". z88dk. Archived from the original on 2020-02-01. Retrieved 2020-01-26.
[…] Out of the assembled Microsoft .REL files the linker has to generate a .PRL format executable for MP/M. The .PRL format is essentially a .COM file with some additional information to enable the program and its data to be relocated onto any page. What does a .PRL file look like? The first bytes are size of the program, followed by the program origin at 0x0100. Following the program, there is a bit-for-byte mask appended to allow the MP/M system to know which bytes in the program need to be changed when the program is relocated. How does the linker do that without disassembling the whole application? In advance the program is linked for two different origins 0x0100 and 0x0200, from the .REL objects. The linker trick is simply recognising which bytes in the two versions of the executable differ. These bytes are then recorded in the bit mask stored following the executable, and the final .PRL program is designed to run from 0x0100 plus its page offset. The same trick is done for the .RSP and .SPR executable files, except that both these formats forego the offset, and run from 0x0000 plus their page offset. […]
- Brothers, Hardin (April 1983). "Understanding Relocatable Code". 80 Micro. The Next Step. 1001001, Inc. (39): 38, 40, 42, 45. ISSN 0744-7868. Retrieved 2020-02-06.
- Brothers, Hardin (April 1985). "Relocatable Programs: Microcomputing's Hoboes". 80 Micro. The Next Step. CW Communications/Peterborough, Inc. (63): 98, 100, 102–103. ISSN 0744-7868. Retrieved 2020-02-06.
- Mitchell, Bridger (July–August 1988). Carlson, Art (ed.). "Z3PLUS & Relocation - Information on ZCPR3PLUS, and how to write self relocating Z80 code". The Computer Journal (TCJ) - Programming, User Support, Applications. Advanced CP/M. Columbia Falls, Montana, USA (33): 9–15. ISSN 0748-9331. ark:/13960/t36121780. Retrieved 2020-02-09.
- Sage, Jay (September–October 1988). Carlson, Art (ed.). "ZCPR3 Corner - More on relocatable code, PRL files, ZCPR34, and Type-4 programs". The Computer Journal (TCJ) - Programming, User Support, Applications. Advanced CP/M. Columbia Falls, Montana, USA (34): 20–25. ISSN 0748-9331. ark:/13960/t0ks7pc39. Retrieved 2020-02-09.
- Ganssle, Jack (February 1992). "Writing Relocatable Code - Some embedded code must run at more than one address". Embedded Systems Programming. The Ganssle Group - Perfecting the Art of Building Embedded Systems / TGG. Archived from the original on 2019-07-18. Retrieved 2020-02-20.