Introduction to programming modular applications in C++

Posted on 2014-11-22

This article is from my old blog at kacperkolodziej.com.

Nowadays nearly every application can be extended with many different types of add-ons or plugins. Thanks to them we can write new functions to our favourite applications without rebuilding them each time we want to extend or modify it. I'm going to tell you how to write modular application in C++.

Shared libraries and extensions

Modern applications often offer developers comfortable API or sometimes even special languages in which they can write extensions to ready applications. It is a very good way to give users opportunity to customize programs. I'm going to show you how to write an application which will load extensions from shared libraries.

You have to know that writing modular application looks different on Unix-like systems and Windows. Unix offers dlfcn library which contains set of functions which can be used to load C functions from compiled library. On Windows we do it in another way. In this article we will get to know how to do it on Linux (solutions will surely work on BSD and probably on Mac OS).

Tools

To do all examples from this tutorial you will need Linux operating system (I recommend Debian) with g++ compiler and favourite code editor (for me it will be vim).

Two approaches to modular applications on Linux

To begin with, I am going to show you the simplest example of application which uses modules. Due to the fact that functions from dlfcn.h Linux's header allows to load only functions in C style, we can realize modules in two ways:

writing all code in C-like function
writing loader function in C style which creates object of a class which represents module

In this article I'm going to show you both solutions.

Solution with C style module function

To begin with, I'd like to tell you something about three functions which we will use in our application. They are called: dlopen, dlsym and dlclose. All of them come from dlfcn.h header file and are implemented in Linux system libraries. First of them is used to load shared library file. It requires two arguments: shared library file's name (const char*) and flag (int). There are several flags which we can use, but the most common is RTLD_LAZY. It causes loading resolving symbols to references when reference is used first time. We will use this flag in our first examples. You can learn about all flags in linux man pages (man dlopen). dlopen returns void pointer (void*) so-called handler.

The second one (dlsym) is used to get address of function which symbol had been loaded and passed to handler by dlopen function. dlsym returns void pointer (void*) to function which can be found in shared resource passed as its first argument (handler received from dlopen) and called with a name passed as the second argument (char *). If system cannot find such function it returns NULL pointer (zero address). In C++ we can cast received pointer to proper function's pointer type.

Last function, dlclose will be used to close shared library opened by dlopen.

Acquaintance of these functions allows us to write first modular application.

Basic example

First of all, we have to write an application which will be capable of loading modules and running their 'run' functions.

Names of modules will be passed as arguments from command line:

./application module1 module2

To do this we will use argc and argv arguments of main function. Code of our application will look like this:

#include <dlfcn.h>
#include <iostream>

int main(int argc, char ** argv)
{
  if (argc == 1)
  {
    std::cerr << "Usage: " << argv[0] << " modules...\n";
    return 1;
  }

  for (int i = 1; i < argc; ++i)
  {
    void* shared_library = dlopen(argv[i], RTLD_LAZY);
    void (*module)() = reinterpret_cast<void (*)()>(dlsym(shared_library, "module"));
    if (module)
    {
      module();
      dlclose(shared_library);
    } else
    {
      std::cerr << "Error while loading: " << argv[i] << "\n";
    }
  }
  return 0;
}

Code discussion

In loop which starts at line 12 we iterate through all command line's arguments. Each of them (except for first - indexed as 0 which is application's binary file name) is a name of module which should be loaded.

In for loop body we use dlopen function to load shared library and dlsym function to load function called module (in C style!). Next we call loaded function and finally close shared library using dlclose.

As you can see, we convert void pointer (void*) returned by dlsym to pointer to function without arguments returning void (void (*)()). It is necessary in C++ in order to be able to call function indicated by this pointer.

We have not used dlerror function which informs user about errors due to the fact that it is very simple application and its aim was to show how dynamic loading works. In the next part of this article we will write advanced tool so-called module loader which will use of dlerror.

Compilation

To compile above application we will run g++ compiler (nearly any version will be good enough for this purpose):

g++ app.cpp -o app -ldl

We use -ldl option to tell compiler that it should link our application with dl (dynamic linking) library.

Let's write a module!

Now it's time to write one or two simple modules. Each module will be in its own *.cpp file. Such file should have definition of at least one function. Name of this function should be module. As I said before this function must be in C style. It means that we have to add extern "C" before declaration.

I am going to create two modules. Both will show a line of text on standard output. This is code of first (module_start.cpp):

#include <iostream>

extern "C" void module()
{
  std::cout << "Start module function!\n";
}

and the second one (module_other.cpp):

#include <iostream>

extern "C" void module()
{
  std::cout << "Other module function!\n";
}

Now we can compile both modules using these commands:

g++ -fPIC -shared module_start.cpp -o module_start.so
g++ -fPIC -shared module_other.cpp -o module_other.so

You must remember about -shared and -fPIC option. First means no more than code will be compiled as a shared library. The second one is an abbreviation for Position Independent Code which means that positions in Assembler code will be relative instead of absolute. Thanks to that it will be able to be loaded dynamically in any position of application's memory.

If compilation didn't fail, we can run application with one or both modules:

./app ./module_start.so
./app ./module_other.so
./app ./module_start.so ./module_other.so

As you can see, we pass to application relative paths of shared libraries. If path is neither relative nor absolute, dlopen will look for shared object in these localizations:

paths from LD_LIBRARY_PATH
on the list placed in /etc/ld.so.cache
/lib
/usr/lib

Otherwise, dlopen uses given relative or absolute path.

Example with C++ classes

Next example of modular application will be very similar, but now each module will be a class which derives from modules' base class. They will also have one function called loader which will allocate memory for module class' object and construct it. It will return pointer to allocated memory.

In order to not write everything from the beginning we will:

write base class for modules
change a bit code of application
write new modules with loaders

To read about building the module manager with C++ class-based modules, see Part 1 of this article series.