Pages: Welcome | Projects

Back to Dynamic Libraries

2018/1/26
Tags: [ Hacking ] [ GNU/Linux ] [ tutorial ]

It is a long time since I learned how to write libraries. It goes back with libdacav, which I wrote back in the bachelor days.

I used to build my library with Automake/Autoconf/Libtool. Then, in the following years, I never got to apply the gained knowledge, and slowly I forgot most details. At work I use C and C++ every day, but everything is statically compiled for sake of simplicity, so I've no occasion to improve. It is a pity, since I'm definitely not fond of blobs, and since dynamic libraries enable a much smoother packaging, along with other things.

It is time to refresh my memory. And I'll take notes in form of a tutorial, since it might be helpful for me, and possibly for other people. But a badly written one, because I'm not following a specific script. It's more like the journal of an exploration. I give priority to learning, and not to consistent, readable journaling.

From little bits

This old document (in TLDP) is still there. Good.

I've put together some dummy code and a dumb makefile accordingly

dacav@lolcalhost:libtotem$ more totem.h totem.c Makefile
::::::::::::::
totem.h
::::::::::::::
#pragma once

typedef struct totem* totem_t;

totem_t totem_new(int);
int totem_get(totem_t);
void totem_del(totem_t);
::::::::::::::
totem.c
::::::::::::::
#include "totem.h"

#include <stdlib.h>

struct totem
{
    int value;
};

totem_t totem_new(int value)
{
    totem_t out = malloc(sizeof(struct totem));
    if (out == NULL)
        return NULL;
    out->value = value;
    return out;
}

int totem_get(totem_t totem)
{
    return totem->value;
}

void totem_del(totem_t totem)
{
    free(totem);
}
::::::::::::::
Makefile
::::::::::::::
soname := libtotem.so.1
minor := 0
patch := 0
realname := $(soname).$(minor).$(patch)

$(realname): totem.o
    gcc -shared -Wl,-soname,$(soname) -o $@ $^

totem.o: totem.c totem.h
    gcc -fPIC -g -c -Wall $<

.PHONY: clean
clean:
    rm $(realname) *.o

Some symbols hacking

Once compiled (with make) dynamic symbols are exported. They can be listed with objdump.

dacav@lolcalhost:libtotem$ objdump -T libtotem.so.1.0.0

libtotem.so.1.0.0:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 free
0000000000000000  w   D  *UND*  0000000000000000              _ITM_deregisterTMCloneTable
0000000000000000  w   D  *UND*  0000000000000000              __gmon_start__
0000000000000000      DF *UND*  0000000000000000  GLIBC_2.2.5 malloc
0000000000000000  w   D  *UND*  0000000000000000              _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*  0000000000000000  GLIBC_2.2.5 __cxa_finalize
00000000000006f0 g    DF .text  0000000000000010  Base        totem_get
0000000000201028 g    D  .got.plt   0000000000000000  Base        _edata
00000000000006ba g    DF .text  0000000000000036  Base        totem_new
0000000000201030 g    D  .bss   0000000000000000  Base        _end
0000000000201028 g    D  .bss   0000000000000000  Base        __bss_start
0000000000000580 g    DF .init  0000000000000000  Base        _init
0000000000000700 g    DF .text  000000000000001b  Base        totem_del
000000000000071c g    DF .fini  0000000000000000  Base        _fini

The manpage of objdump mentions the meaning of the flags.

While functions and (unfortunately) globals are quite common in the regular software developer routine, weak symbols are in my opinion quite unknown. Probably because they're supported by the ELF format, but not mentioned in the C/C++ standards (source: wikipedia). The idea is that they can be overridden at linking time by regular (strong) symbols, while no two regular symbols can live in the same linked executable.

This begs the question: what if there are two competing weak symbols? Who takes the cake?

Compile & Link into an executable

At code level I'll add a totem_print function

diff --git a/libtotem/totem.c b/libtotem/totem.c
index 40033f4..e82b83c 100644
--- a/libtotem/totem.c
+++ b/libtotem/totem.c
@@ -21,6 +21,11 @@ int totem_get(totem_t totem)
     return totem->value;
 }

+void totem_print(totem_t totem, FILE* stream)
+{
+    fprintf(stream, "totem %d\n", totem_get(totem));
+}
+
 void totem_del(totem_t totem)
 {
     free(totem);
diff --git a/libtotem/totem.h b/libtotem/totem.h
index 66eac5c..a0d6338 100644
--- a/libtotem/totem.h
+++ b/libtotem/totem.h
@@ -1,7 +1,10 @@
 #pragma once

+#include <stdio.h>
+
 typedef struct totem* totem_t;

 totem_t totem_new(int);
 int totem_get(totem_t);
+void totem_print(totem_t totem, FILE* stream);
 void totem_del(totem_t);

So I can get a simple main.c which uses it and prints some totem. And its makefile.

dacav@lolcalhost:libtutorial$ more main.c Makefile
::::::::::::::
main.c
::::::::::::::
#include <totem.h>
#include <stdlib.h>

int main()
{
    totem_t totem = totem_new(1024);
    totem_print(totem, stdout);
    totem_del(totem);

    return EXIT_SUCCESS;
}
::::::::::::::
Makefile
::::::::::::::
main: main.c
    gcc -I./libtotem -L./libtotem -ltotem $< -o $@

.PHONY : clean
clean:
    rm -f main *.o

So let's make, but we don't get much of a result:

dacav@lolcalhost:libtutorial$ make
gcc -I./libtotem -L./libtotem -ltotem main.c -o main
/usr/bin/ld: cannot find -ltotem
collect2: error: ld returned 1 exit status
make: *** [Makefile:2: main] Error 1

…as the library is there, but we are missing the ldconfig part of it. Now, the library is not installed, so ldconfig won't do a thing. But it boils down to manually create some links.

diff --git a/libtotem/Makefile b/libtotem/Makefile
index 5bcbf16..af62929 100644
--- a/libtotem/Makefile
+++ b/libtotem/Makefile
@@ -1,8 +1,15 @@
-soname := libtotem.so.1
+libname := libtotem.so
+soname := $(libname).1
 minor := 0
 patch := 0
 realname := $(soname).$(minor).$(patch)

+$(libname): $(soname)
+   ln -s $(soname) $(libname)
+
+$(soname): $(realname)
+   ln -s $(realname) $(soname)
+
 $(realname): totem.o
    gcc -shared -Wl,-soname,$(soname) -o $@ $^

 .PHONY: clean
 clean:
-   rm $(realname) *.o
+   rm -f $(libname) $(realname) $(soname) *.o

So this will create a symlink libtotem.so.1 → libtotem.so.1.0.0 (from SONAME to the actual file) as ldconfig would do. But there's also another link libtotem.so → libtotem.so.1 which ldconfig would not have created. We'll see why.

Now we can compile.

dacav@lolcalhost:libtutorial$ cd libtotem/
dacav@lolcalhost:libtotem$ make
gcc -fPIC -g -c -Wall totem.c
gcc -shared -Wl,-soname,libtotem.so.1 -o libtotem.so.1.0.0 totem.o
ln -s libtotem.so.1.0.0 libtotem.so.1
ln -s libtotem.so.1 libtotem.so
dacav@lolcalhost:libtotem$ cd ..
dacav@lolcalhost:libtutorial$ make
gcc -I./libtotem -L./libtotem -ltotem main.c -o main
dacav@lolcalhost:libtutorial$ ls -lR
.:
total 24
drwxr-xr-x. 2 dacav docker 4096 Jan 26 19:37 libtotem
-rwxr-xr-x. 1 dacav docker 8288 Jan 26 19:38 main
-rw-r--r--. 1 dacav docker  172 Jan 26 19:21 main.c
-rw-r--r--. 1 dacav docker  100 Jan 26 19:21 Makefile

./libtotem:
total 32
lrwxrwxrwx. 1 dacav docker    13 Jan 26 19:37 libtotem.so -> libtotem.so.1
lrwxrwxrwx. 1 dacav docker    17 Jan 26 19:37 libtotem.so.1 -> libtotem.so.1.0.0
-rwxr-xr-x. 1 dacav docker 11016 Jan 26 19:37 libtotem.so.1.0.0
-rw-r--r--. 1 dacav docker   396 Jan 26 19:24 Makefile
-rw-r--r--. 1 dacav docker   453 Jan 26 19:06 totem.c
-rw-r--r--. 1 dacav docker   186 Jan 26 19:06 totem.h
-rw-r--r--. 1 dacav docker  7248 Jan 26 19:37 totem.o

The execution of main won't work straight away, as the library is not installed in the system:

dacav@lolcalhost:libtutorial$ ./main
./main: error while loading shared libraries: libtotem.so.1: cannot open shared object file: No such file or directory

But we can of course use the LD_LIBRARY_PATH environment variable to add ./libtotem as a search directory for shared objects…

dacav@lolcalhost:libtutorial$ LD_LIBRARY_PATH=./libtotem ./main
totem 1024

Back to the symlinks. The libtotem.so → libtotem.so.1 link is mentioned in the TLDP document as linker name. The linker name is needed only when compiling, as it will match the argument of the -l option (-ltotemlibtotem.so). But once the binary is compiled, the link is no longer useful.

The executable will indeed specify a dependency to the soname, and not the linker name. This can be easily seen by running ldd on the binary:

dacav@lolcalhost:libtutorial$ ldd ./main
linux-vdso.so.1 (0x00007ffcc5daa000)
libtotem.so.1 => not found
libc.so.6 => /lib64/libc.so.6 (0x00007f383434d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3834730000)

And it is interesting that ldd does not find the shared object. Of course LD_LIBRARY_PATH can help!

dacav@lolcalhost:libtutorial$ LD_LIBRARY_PATH=./libtotem ldd ./main
linux-vdso.so.1 (0x00007fff52d53000)
libtotem.so.1 => ./libtotem/libtotem.so.1 (0x00007fbda2aba000)
libc.so.6 => /lib64/libc.so.6 (0x00007fbda26d7000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbda2cbc000)

Distribution in packaging

And now something about packaging, to get the standpoint of the library distribution. I'm not packaging as RPM this thing, but we can see it some action with a real library. Let's consider libcurl and libcurl-devel under a CentOS 7 environment.

Let's start with libcurl, which will provide only the dynamic library used at runtime:

[vmuser@localhost ~]$ rpm -qlv libcurl
lrwxrwxrwx    1 root    root                       16 Nov 27 16:43 /usr/lib64/libcurl.so.4 -> libcurl.so.4.3.0
-rwxr-xr-x    1 root    root                   435128 Nov 27 16:43 /usr/lib64/libcurl.so.4.3.0

The RPM will install both the file and the link. This is curious because the link would be generated by ldconfig. But I can guess the reason for explicitly providing the link: Ownership. If the SONAME link was generated by ldconfig the RPM database would not mark it as owned by the libcurl package (disclaimer: this is just a wild guess, but it sounds like a good reason!).

Then we have the scriptlets: the RPM will run ldconfig after installation and after removal.

[vmuser@localhost ~]$ rpm -q --scripts libcurl
postinstall program: /sbin/ldconfig
postuninstall program: /sbin/ldconfig

This is because, besides the symlinks, ldconfig also maintains the /etc/ld.so.cache file, which is meant to speed-up the dynamic loading operation when the program is started.

Next we can look at libcurl-devel, which contains development files for creating applications based on libcurl: headers, manpages, documentation… quite a few files

[vmuser@localhost ~]$ rpm -qlv libcurl-devel | wc -l
135

Among these, the linker name, which is indeed useful only for the compilation phase.

[vmuser@localhost ~]$ rpm -qlv libcurl-devel | grep /usr/lib64/libcurl
lrwxrwxrwx    1 root    root                       16 Nov 27 16:43 /usr/lib64/libcurl.so -> libcurl.so.4.3.0

And finally let's see how /usr/bin/curl does the linking:

[vmuser@localhost ~]$ ldd /usr/bin/curl | grep libcurl
    libcurl.so.4 => /lib64/libcurl.so.4 (0x00007f5ff0f57000)

It's worth noting that In CentOS 7 we can currently find libcurl-7.29.0-42.el7_4.1.x86_64. The library version is not the same as the SONAME. The SONAME is a matter of API compatibility.