Relinking Oracle Binaries: Understanding the Process

This blog entry attempts to explain what I believe the Oracle relinking process is all about. Those DBAs who have been working with the Oracle database for a while and sysadmins who have been supporting Oracle database servers for a while have heard of this process. After some time in the field, it has become apparent that this process is not that well understood and Oracle has released precious little as far as in-depth documentation goes regarding it. A big reason for that is the fact that this process overlaps four competencies really: DBA, Sysadmin, operating system and software programmer. A good indication of this process being misunderstood is the fact that many believe Oracle ships the actual source code of the database. No, that is not the case. Considering that Oracle is proprietary software, you will not obtain the source code to their DBMS product. While we can’t go too deep into this process, I try to offer some very high-level reverse engineering here by explaining what relinking C programs in general is and why we do it. Perhaps we can draw some conclusions on why this Oracle process exists based on that general understanding. Please note I am not employed by Oracle. I do not possess any confidential internal knowledge of their database software. Nor do I have access to any of their source code. This is my conjecture having worked as a UNIX, Linux Sysadmin, and Oracle DBA.

The Oracle database software is unique in the way it installs onto UNIX and Linux servers—which are designated to be database servers. It does not use a package manager like LPP on AIX UNIX or RPM on most Linux systems. Because Oracle is not installed using the operating system’s native package manager, this means that the operating system does not track Oracle software installed on the system. Oracle software is tracked by Oracle itself via XML files it installs on the database server—which makes up what is known as the Oracle Inventory. Oracle uses its own installation utility called the OUI—Oracle Universal Installer. The OUI is a graphical interface written in JAVA SE. Oracle provides its own JRE to run it. It runs locally on the database server itself and you are presented with a series of GUI wizard screens to guide you through the installation process. The installation process in its basic terms is copying, LINKING, and installing the Oracle software from the installation media to a location on the database server—known as the Oracle home. The linking part is what we attempt to address in this blog. The Oracle home is the location in which the Oracle software resides once installed on the database server. This is like “C:\Program Files” on Windows servers or /opt for a lot of software on Linux systems.

For UNIX and Linux systems, the Oracle database software is available online via the Oracle Technology Network website. You can download it free-of-charge—with some usage restrictions—per the Oracle developer’s license. When you download the Oracle database software, you download two ZIP files. You download them from the website and transfer them to the database server. You then unzip these two files, navigate to the directory “database” to which everything was extracted, and run the shell script “runInstaller.” This script will start Java launching the OUI–Oracle Universal Installer. The OUI will further unpack the “database” directory’s contents. It will extract many JAR (Java Archive) files named something like filegroup###.jar. A lot of these files contain C object files. I explain what C object files are later in this blog.

Oracle’s installation procedure is interesting. It doesn’t just copy pre-packaged executable/binary files to the Oracle home from the installation media. It actually completes the C development process. What that means is that the Oracle installation process actually links the C object files so that they will become executables. In fact, many Oracle database features can only be enabled or disabled via the linking procedure–e.g. Partitioning, OLAP, Data Mining, Database Vault, etc. This is a very interesting approach and Oracle has been doing it this way for decades. There are many advantages to this. One is that it compacts the installation package on the install media. Another advantage is that it makes porting easier. It also guarantees best operating system compatibilities. Operating system vendors typically guarantee backward compatibility when they update their operating systems, but Oracle’s relinking will pretty-much guarantee that those system libraries—which are beyond the scope of this blog—Oracle is calling from the operating system will work the way in which Oracle expects.

Let’s talk about the general C compilation process. A big portion of the Oracle database software is written in the C programming language (i.e.: the main database kernel and most of its tools, such as Sqlplus). It apparently is developed on Linux systems at Oracle. Note: Some of the database is written in JAVA and some parts probably in C++. The Oracle database runs on many different platforms (e.g.: Linux, UNIX, etc). If you write a program in C, you will have to recompile it for every platform on which you wish to run your program. You won’t have to necessarily modify/rewrite the actual C source code to port it to another platform, but you will definitely have to recompile it. That is because when you write a program in C, you are writing source code. The computer does not understand source code. The computer understands machine code. Machine code is what actually executes on the computer’s processor. Think binary. To turn your C source code into something that your computer’s processor will understand, you need a special program called a compiler. A C compiler translates the human readable C source code into a binary file that is executable. This resulting binary file can be run on the computer processor. An exception to this is with the JAVA programming language. With Java programs—like many programs written in “interpreted” languages—you can run them on any platform without the need to recompile them. That is because JAVA programs—once compiled—run in a JVM (Java Virtual Machine). As long as the target platform has a virtual machine/interpreter installed, it can run your code without the need to recompile it. One of the popular C compilers to run on Linux systems is called GCC (GNU Compiler Collection). There are many different C compilers available, both open source and proprietary.

Let’s go over the high level process of compiling/building a C program. This is what GCC does:

High Level of the C Compiling Process:

C compiling process

These four steps typically are all implicit. They happen automatically when you compile with GCC and do not produce intermediate files.

Note: file extensions can be anything:

  • Preprocess—Processes the pre-processor stuff such as header files: #include, #define, etc. It expands variables, macros, etc. It results in a *.i file. It consumes a *.c file.
  • Compile—The preprocessed file is converted into assembly code. This is a higher-level language for talking to the processor, resulting in a *.s file. It consumes a *.i file.
  • Assemble—Assembly code is then converted into machine instructions/binary resulting in a *.o file. It consumes a *.s file.
  • Link—The machine code is linked together with other machine code objects, libraries, etc, into an executable file. It consumes a *.o file. This final step is what actually makes the file executable on your particular platform.

An analogy of this compilation process would be if you were in a foreign country and asked someone for directions on how to get somewhere. The person gets out a piece of paper and writes the directions down for you. He then folds up the piece of paper, hands it to you, and then walks away. You thank him in his language (one of the few phrases you know). You open the piece of paper with the directions on it and you realize that it is in your Good Samaritan’s native language—which you do not speak. You stop another stranger—assuming they speak that countries’ language—and you ask them to translate it for you. This might be what that process looks like. In this analogy, you are the computer processor. The note with the directions on it is the file with C source code in it. You—the computer processor—only understand English/binary. So you/computer needs to run it through a translator/compiler so you/computer can understand it.

  • Preprocess—The translator opens the folded up piece of paper with the instructions written on it. He scans the whole paper and expands the short hand he finds written on the note in his mind (i.e.) “Turn R” to “Turn Right”).
  • Compile & Assemble—Realizing that you will have to remember this, the translator gets out a blank piece of paper and starts to re-write it in English for you to read.
  • Link—The translator—realizing that the original author of the directions didn’t explain where to look for certain things—expands most of the steps to make it more clear (i.e. “If you see a church at the end of the road, you have gone too far” to “If you see the St. Mary’s Cathedral church on the right hand side at the end of the road on the corner of Cross Keys and Main St, then you have gone too far”).

You/computer are then given the directions/executable file translated into English/binary. You can now read it and execute the directions to get to where you have to go.

Simple Program Written in C:

#include <stdio.h>

int main() {

printf("Hello world!\n\n");
 return 0;
 }

Consider the simple program above. It just prints onto the console the string: “Hello world!” I won’t go into the syntax of the C source code. That is out of the scope of this blog.

By default, when you compile a C program, all four compilation stages are done automatically and implicitly. Let’s compile this C program on a Linux system:

$ ls -l program1.c
 -rw-r--r--. 1 root root 76 Oct 17 11:40 program1.c

If GCC is installed on your Linux system, go ahead and run it with the following options. The –o option tells GCC what you want to resulting executable file to be called. The argument “program1.c” just tells GCC to compile the C source code contained in that file:

$ gcc -o program1 program1.c
 $ ls -l program1
 -rw-r-xr-x. 1 root root 4690 Jan 16 10:04 program1

If you use the file command to look at what is called the magic number of the executable, you will see that this file is in the ELF format. The ELF format is Executable and Linkable Format. Each platform has their own executable format (e.g.: ELF on Linux, XCOFF on AIX, EXE on Windows, etc). File executable formats are out of the scope of this blog. At this point, just understand that an executable file generated by a C compiler is seen in a special way by the operating system.

$ file program1
 program1.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

To run this program, you must make it executable:

$ chmod u+x program1

$ ls -l program1
 -rwxr-xr-x. 1 root root 76 Oct 17 11:40 program1

Run the program:

$ ./program1
 Hello world!

In the above example, with one step we generated an executable from our source code. Reminder: You cannot now transfer this executable to another platform like UNIX or Windows and expect it to run. The C source code must be recompiled by a compiler built for the target platform and that compiler will produce an executable with a format that platform can execute.

Let’s split the GCC compilation process into stages so we can see how it works.

Preprocess Stage:

The –E option of GCC tells the compiler to stop after the source code file is pre-processed.

$ gcc -E program1.c -o program1.i
 $ file program1.i
 program1.i: ASCII C program text
 $ cat program1.i
 
 extern int getw (FILE *__stream);

extern int putw (int __w, FILE *__stream);

extern char *fgets (char *__restrict __s, int __n, FILE *__restrict __stream);

…

Compile Stage:

The –S option of GCC tells the compiler to stop after the source code file is compiled.

$ gcc -S program1.i -o program1.s
 $ cat program1.s
         .file   "program1.c"
 .section        .rodata
 .LC0:
 .string "Hello world!\n"
 .text
 .globl main
 .type   main, @function
 main:
 pushl   %ebp
 movl    %esp, %ebp
 andl    $-16, %esp
 subl    $16, %esp
 movl    $.LC0, (%esp)
 call    puts
 movl    $0, %eax

…

Assemble Stage:

The –c option of GCC tells the compiler to stop after the source code file is assembled.

$ gcc -c program1.s -o program1.o
 $ file program1.o
 program1.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

(Don’t attempt to cat the contents of the file created above. It is binary but not executable yet!)

Link Stage:

This stage completes the compilation process, resulting in an executable.

$ gcc program1.o -o program1
 $ ./program1
 Hello world!

There are two reasons why you would need to relink the Oracle software. Oracle utilities which modify the Oracle software—such as Opatch—will automatically relink the Oracle software.

  • If you upgrade or patch the operating system on which the Oracle software is installed, you will have to manually relink the Oracle software.
  • If you want to enable or disable certain Oracle database features, you will have to relink the Oracle software. Note: Relinking for this reason should be done under the direction of Oracle technical support.

(Please refer to Oracle documentation and/or Oracle technical support for further details into the linking process.)

We can deduce—based on what linking is and how a C compiler compiles a program—that when you download the Oracle database software from Oracle, what you are getting are C programs not fully built/compiled. Database developers at Oracle basically write their database software in C. They then preprocess it, compile it, and assemble it. They then they stop there, package the objects files, and ship them to us. They allow the OUI to actually complete the compilation process by linking the object files to make the executable files. The object files you get when you download the Oracle database software do not contain any source code. The C source code is long gone by the time you get it. This is as far as I can tell anyway.

In conclusion, I feel it is important to understand why a design or process decision was made by the vendor of the software one supports—and how it works as implemented—even if it is at a very high level. This type of understanding will allow you to either appreciate the vendor or question them. Either way, it should make you a better supporter of that software product and a better all-around thinker. Always think deeper. Don’t just do what the manual tells you. Ask and understand why.

Leave a Reply

Your email address will not be published. Required fields are marked *