Troubleshooting an Apparent Process Hang



 

In this scenario, a user's goal is to create a file that takes data, automatically compresses it, and sends it to a new file. To perform this task, the user creates a named pipe and issues gzip, redirecting input from the pipe to a new file. The odd part is that the gzip process seems to hang, and the user cannot find the gzip process when searching ps -ef.
So you devise an action plan: Re-create the event and trace the process involved.

1.
Create a named pipe.
$ mknod /tmp/named_pipe p
$ ll /tmp/named_pipe
prw-r--r--    1 user2    user2         0 Oct  9 16:53
/tmp/named_pipe |
2.
Acquire the current process ID.
$ echo $$ # note that the current PID = the shell
5032
3.
From the same shell window, start the gzip process on the named pipe.
$ gzip < /tmp/named_pipe > /tmp/pipe.out.gz
4.
Find the process with a parent of 5032.
$ ps -emo pid,ppid,state,comm,time,pri,size,wchan | grep 5032
5236  5032 S bash             00:00:00  30 1040 pipe_wait

Notice that the command name is bash, and it is in the sleep state, sleeping on wait channel pipe_wait. Yet gzip was the command executed.
5.
In another shell window, start a trace on the parent before executing the gzip command.
$ strace -o /tmp/pipe.strace -f -F -r -T -v -p 5032
Process 5032 attached - interrupt to quit ........Parent shell \
process
Process 5236 attached ....................................The gzip \
process being forked
Process 5032 suspended

As mentioned earlier, fork() essentially creates a process structure by copying the parent. Until execve() executes the binary, the new executable is not loaded into memory, so ps -ef | grep gzip does not show the process. In this case, the gzip process waits for something to be sent to the pipe before executing gzip.


6.
A review of the trace explains why the ps -ef | grep gzip command does not show the process.
PID        Time               call()
...
5032       0.000079 fork()   = 5236 <0.000252>.........."GZIP was
executed at command line"
5032       0.000678 setpgid(5236, 5236) = 0 <0.000008>
5032       0.000130 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8)
= 0 <0.000007>
...
5032       0.000074 waitpid(-1, <unfinished ...>..........."man
waitpid: -1 means wait on child"
5236       0.000322 --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
5236       0.000078 getpid() = 5236 <0.000006>
5236       0.000050 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8)
= 0 <0.000007>
5236       0.000067 rt_sigaction(SIGTSTP, {SIG_DFL}, {SIG_IGN},
8) = 0 <0.000009>
5236       0.000060 rt_sigaction(SIGTTIN, {SIG_DFL}, {SIG_IGN},
8) = 0 <0.000007>
5236       0.000057 rt_sigaction(SIGTTOU, {SIG_DFL}, {SIG_IGN},
8) = 0 <0.000007>
5236       0.000055 setpgid(5236, 5236) = 0 <0.000008>
5236       0.000044 rt_sigprocmask(SIG_BLOCK, [CHLD TSTP TTIN
TTOU], [RTMIN], 8) = 0 <0.000007>
5236       0.000071 ioctl(255, TIOCSPGRP, [5236]) = 0 <0.000058>
5236       0.000102 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8)
= 0 <0.000007>
5236       0.000060 rt_sigaction(SIGINT, {SIG_DFL}, {0x8087030,
[], SA_RESTORER, 0x4005aca8}, 8) = 0 <0.000007>
5236       0.000075 rt_sigaction(SIGQUIT, {SIG_DFL}, {SIG_IGN},
8) = 0 <0.000007>
5236       0.000057 rt_sigaction(SIGTERM, {SIG_DFL}, {SIG_IGN},
8) = 0 <0.000007>
5236       0.000058 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x80776a0,
[], SA_RESTORER, 0x4005aca8}, 8) = 0 <0.000007>
5236       0.000262 open("/tmp/named_pipe", O_RDONLY|O_LARGEFILE)
= 3 <141.798572>
5236     141.798719 dup2(3, 0) = 0 <0.000008>
5236       0.000051 close(3) = 0 <0.000008>
5236       0.000167 open("/tmp/pipe.out.gz",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3 <0.000329>
5236       0.000394 dup2(3, 1) = 1 <0.000007>
5236       0.000042 close(3) = 0 <0.000008>
5236       0.000127 execve("/usr//bin/gzip", ["gzip"]

So 141.79 seconds after opening, the named pipe data was received, evecve() executed gzip, and the data was compressed and redirected to the file /tmp/pipe.out.gz. Only at this point would the gzip process show up in the ps listing. So what was initially thought to be a hung process is simply a sleeping process waiting on data.
7.
Now ps -ef | grep gzip works.
$ ps -ef | grep 5236
user2      5236  5032 0 17:01 pts/4    00:00:00 gzip