A large data transfer keeps my program busy for a long time. This deserves
some investigation. What's going on here? First thing, top
. It gives me
the list of running commands, and the most active one seems to be our man.
The top
program also gives us the pid
.
Next step, let's see what happens with ls -l /proc/$pid/fd/
, and we
will see all the file descriptors opened by our program. There can be
many. Which one is the most used?
Let's ask strace
what the program is doing: strace -p $pid
(better
run it with the root user)...
$ strace -p $pid
...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
read(13, "$}Z\336\333...
write(2, "2015-06-09 ...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
read(13, "A\177\212\2...
write(2, "2015-06-09 ...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
read(13, "\26\23\344\...
write(2, "2015-06-09 ...
write(13, "\27\3\2\00...
read(13, "\27\3\2\0`"...
Yeah, we got a winner here: file descriptor nr. 13. So if we look at
our ls -l /proc/$pid/fs
, we will find out it is a socket:
lrwx------ 1 captain captain 64 Jun 9 09:40 13 -> socket:[2734601]
Who are we communicating with?
In order not to give away work-related sensible information, I will
reproduce the scenario on my computer with netcat. From one shell I'll
activat nc -l -p 2290
(i.e. listening on tcp port 2290, local pid 2333),
while on a second shell I will contact the first one with nc 127.0.0.1
2290
(connecting to port 2290, local pid 2338).
Let's see what happens on /proc/
:
$ ls -l /proc/2333/fd
total 0
lrwx------. 1 dacav dacav 64 Jun 9 11:26 0 -> /dev/pts/7
lrwx------. 1 dacav dacav 64 Jun 9 11:26 1 -> /dev/pts/7
lrwx------. 1 dacav dacav 64 Jun 9 11:26 2 -> /dev/pts/7
lrwx------. 1 dacav dacav 64 Jun 9 11:26 5 -> socket:[7881450]
$ ls -l /proc/2338/fd
total 0
lrwx------. 1 dacav dacav 64 Jun 9 11:23 0 -> /dev/pts/8
lrwx------. 1 dacav dacav 64 Jun 9 11:23 1 -> /dev/pts/8
lrwx------. 1 dacav dacav 64 Jun 9 11:23 2 -> /dev/pts/8
lrwx------. 1 dacav dacav 64 Jun 9 11:23 3 -> socket:[7887090]
While for stdin
, stdout
and stderr
the file descriptors point on the
filesystem (although those are virtual files, actually), on both sides the
socket is flagged as such.
We know already a lot on those sockets, since we created them... But say
we don't. Where do they point? See that integer value in socket:[$number]
?
The $number
will match a row in the /proc/net/tcp
file, by the
inode
column.
$ cat /proc/net/tcp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
...
10: 0100007F:CC56 0100007F:08F2 01 00000000:00000000 00:00000000 00000000 1000 0 7887090 1 ffff88010042c140 20 0 0 10 -1
...
16: 0100007F:08F2 0100007F:CC56 01 00000000:00000000 00:00000000 00000000 1000 0 7881450 1 ffff88010042ecc0 20 0 0 10 -1
The interesting part is in the local_address
and rem_address
fields. Even if they are in network byte order, you can see quite easily
that they are both referring to 127.0.0.1, the server on port 2290 and the
client on an ephemeral one:
perl -E 'say for (0x7f, 0x00, 0x00, 0x01, 0x08f2)'
127
0
0
1
2290
perl -E 'say for (0x7f, 0x00, 0x00, 0x01, 0xcc56)'
127
0
0
1
52310
So, here we got the winner. By applying the same technique on the server I was working on, I was able to understand what was the issue with the program.
Note: my example worked because we used TCP with IPv4. If we used IPv6
we would have a match into /proc/net/tcp6
. But a good solution is to
grep
it on both files...
$ cat /proc/net/tcp{,6} | grep -p '(7881450|7887090)'
Of course with IPv6 we will need to interpret the address in a different way.
Happy hacking.