* I’m pretty sure at least one step I take in this post is unsupported by VMware, so please use at your own risk. If you don’t feel entirely comfortable with any part of this, and have a support contract – use it.
I was re-IPing a virtual machine (VM) today when it locked up and showed no signs of life (zero CPU usage, disk I/O, etc.). I went in through the host that it runs on using the vSphere Client and tried to hard reset it… which stuck at 95%. I restarted the management agents on the host, which successfully killed the reset command.
Now the VM showed as invalid, so I tried unregistering the VM, then selecting “Add to Inventory” by right-clicking on the <vmname>.vmx file in the VM’s folder on the datastore. While this made the host recognize the virtual machine correctly again, it didn’t help with booting the VM up – I was now getting an error while booting that told me the virtual machine swap file (<vmname>.swp) was locked.
I’d seen similar errors before, and have a good history with the tried-and-true hard kill method for resolving these:
ps aux | grep <vmname> –> which gave me a series of process IDs (pid) for the VM’s world
kill -9 <pid>
which resulted in:
cannot kill pid 123123: No such process
Uh-oh. At this point I began contemplating contacting all of the customers whose VMs (about 40 of them) are on this host for an emergency reboot of their systems so I can reboot the physical host. Then I found some threads that talked about the vm-support command and how it can hard crash the VM world for you. Huh. Cool. Here’s how I used that to make my day much better:
vm-support -x (this lists out all of the VM worlds running on your host)
When I found the numeric world ID (wid) for my VM, I then ran this command and allowed it to kill the VM:
vm-support -X <wid>
This process took a while (10-15 minutes), but it worked! Here’s a good KB article on the use of vm-support and other means of killing virtual machines that just won’t stop: