Pages: Welcome | Projects

Where's my data going?

2015/6/9
Tags: [ GNU/Linux ]

A large data transfer keeps my program busy for a long time. This deserves some investigation. What's going on here? First thing, top. It gives me the list of running commands, and the most active one seems to be our man. The top program also gives us the pid.

Next step, let's see what happens with ls -l /proc/$pid/fd/, and we will see all the file descriptors opened by our program. There can be many. Which one is the most used?

Let's ask strace what the program is doing: strace -p $pid (better run it with the root user)...

$ strace -p $pid
...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
read(13, "$}Z\336\333...
write(2, "2015-06-09 ...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
read(13, "A\177\212\2...
write(2, "2015-06-09 ...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
read(13, "\26\23\344\...
write(2, "2015-06-09 ...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...

Yeah, we got a winner here: file descriptor nr. 13. So if we look at our ls -l /proc/$pid/fs, we will find out it is a socket:

lrwx------ 1 captain captain 64 Jun  9 09:40 13 -> socket:[2734601]

Who are we communicating with?

In order not to give away work-related sensible information, I will reproduce the scenario on my computer with netcat. From one shell I'll activat nc -l -p 2290 (i.e. listening on tcp port 2290, local pid 2333), while on a second shell I will contact the first one with nc 127.0.0.1 2290 (connecting to port 2290, local pid 2338).

Let's see what happens on /proc/:

$ ls -l /proc/2333/fd
total 0
lrwx------. 1 dacav dacav 64 Jun  9 11:26 0 -> /dev/pts/7
lrwx------. 1 dacav dacav 64 Jun  9 11:26 1 -> /dev/pts/7
lrwx------. 1 dacav dacav 64 Jun  9 11:26 2 -> /dev/pts/7
lrwx------. 1 dacav dacav 64 Jun  9 11:26 5 -> socket:[7881450]

$ ls -l /proc/2338/fd
total 0
lrwx------. 1 dacav dacav 64 Jun  9 11:23 0 -> /dev/pts/8
lrwx------. 1 dacav dacav 64 Jun  9 11:23 1 -> /dev/pts/8
lrwx------. 1 dacav dacav 64 Jun  9 11:23 2 -> /dev/pts/8
lrwx------. 1 dacav dacav 64 Jun  9 11:23 3 -> socket:[7887090]

While for stdin, stdout and stderr the file descriptors point on the filesystem (although those are virtual files, actually), on both sides the socket is flagged as such.

We know already a lot on those sockets, since we created them... But say we don't. Where do they point? See that integer value in socket:[$number]? The $number will match a row in the /proc/net/tcp file, by the inode column.

$ cat /proc/net/tcp
sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode                                                     
...
10: 0100007F:CC56 0100007F:08F2 01 00000000:00000000 00:00000000 00000000  1000        0 7887090 1 ffff88010042c140 20 0 0 10 -1       
...
16: 0100007F:08F2 0100007F:CC56 01 00000000:00000000 00:00000000 00000000  1000        0 7881450 1 ffff88010042ecc0 20 0 0 10 -1                   

The interesting part is in the local_address and rem_address fields. Even if they are in network byte order, you can see quite easily that they are both referring to 127.0.0.1, the server on port 2290 and the client on an ephemeral one:

perl -E 'say for (0x7f, 0x00, 0x00, 0x01, 0x08f2)'
127
0
0
1
2290

perl -E 'say for (0x7f, 0x00, 0x00, 0x01, 0xcc56)'
127
0
0
1
52310

So, here we got the winner. By applying the same technique on the server I was working on, I was able to understand what was the issue with the program.

Note: my example worked because we used TCP with IPv4. If we used IPv6 we would have a match into /proc/net/tcp6. But a good solution is to grep it on both files...

$ cat /proc/net/tcp{,6} | grep -p '(7881450|7887090)'

Of course with IPv6 we will need to interpret the address in a different way.

Happy hacking.