One of the things I’ve spent a lot of time with, has been EMC NetWorker (previously Legato NetWorker).
A vaguely common issue is for a process of some kind – backups, staging to tape, restores, etc – for no reason just stop making any new progress.
Once you’ve checked off the common reasons – like making sure you haven’t run out of disk space or usable tapes – it seems like the only option is to restart NetWorker as a whole, losing any in-progress actions (even ones that are to devices that haven’t stalled).
I suspect that random underlying I/O issues can occasionally upset it, and it doesn’t quite recover. But, whatever. How do you make it recover a single device, without restarting the whole thing?
First up, get the PID of the main
nsrd process. On Solaris,
ps -ef | grep nsrd; or on Linux
ps uaxw | grep nsrd.
Assuming the PID is
1234, you next need to run:
dbgcommand -p 1234 PrintDevInfo
It should pretty quickly spit out a whole stack of debugging info to
/nsr/logs/daemon.raw. It’s moderately complicated, but you should see that it’s a dump of its internal state of each device, including
d_device – the *nix device or directory, and
mm_number – the unique ID for the
nsrmmd process for that device.
So – find the device you’re interested in, and find the
mm_number for that device.
Get a list of your
nsrmmd processes, eg.
ps -ef | nsrmmd or
ps auxw | grep nsrmmd. If your
mm_number is 5, then there will be a process
nsrmmd -n 5
Kill the process, and it should re-spawn by itself on further access.