A user begins rewinding a tape but realizes that the wrong tape is in the drive. The user tries to kill the job but must wait for the process to finish.
Why?
The mt command has made an ioctl call to the SCSI tape driver (st) and must wait for the driver to release the process back to user space so that use signals will be handled.
# mt -f /dev/st0 rewind
# ps -emo state,pid,ppid,pri,size,stime,time,comm,wchan | grep mt
D 9225 8916 24 112 20:46 00:00:00 mt wait_for_completion
[root@atlorca2 root]# kill -9 9225
[root@atlorca2 root]# echo $? # This produces the return code for theprevious command. 0 = success0
[root@atlorca2 root]# ps -elf | grep 9225
0 D root 9225 8916 0 24 0 - 112 wait_f 20:46 pts/1
00:00:00 mt -f /dev/st0
The mt command has entered a wait channel, and after the code returns from the driver, the signal will be processed.
Let's check the pending signals:
cat /proc/9225/status
Name: mt
State: D (disk sleep)
Tgid: 9225
Pid: 9225
PPid: 8916
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 256
Groups: 0 1 2 3 4 6 10
VmSize: 2800 kB
VmLck: 0 kB
VmRSS: 640 kB
VmData: 96 kB
VmStk: 16 kB
VmExe: 32 kB
VmLib: 2560 kB
SigPnd: 0000000000000100 <-- SigPnd is a bit mask which indicates thevalue of the pending signal. Each byte accounts for 4 bits. In thiscase, the pending signal has a value of 9, so the first bit on the 3rdbyte is set. This algorithm is detailed in linux/fs/proc/array.c underthe render_sigset_t() function. The following table illustrates thisfunction.Signal : 1 2 3 4 . 5 6 7 8 . 9 10 11 12 . 13 14 15 16bit value : 1 2 4 8 . 1 2 4 8 . 1 2 4 8 . 1 2 4 8kill -3 yields bit mask 0000000000000004kill -9 yields bit mask 0000000000000100ShdPnd: 0000000000000100
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
Troubleshooting the hung process involves these steps:
Identify all the tasks (threads) for the program. | |
2. | Assess the hanging process. Is it easily reproducible? |
3. | Assess the other things going on. What else is the machine doing? Check load and other applications' response time. |