Tuesday, January 7, 2014

Red Hat Enterprise Linux Server in Amazon Web Services won't boot after yum update

Happy New Year, fellow people trying to make things work.

Hopefully everyone is mostly succeeding despite the fact the war on entropy cannot be won.

Interesting problem this morning, in a quest to live a more portable life I just moved all my services from a home network to Amazon Web Services. Great deal on the Free Tier there, perfect thing to evict all your local services and get them in the cloud.

Problem is, I did the normal thing you'd expect to work after bringing up the Amazon instances - a simple 'yum update'.

yum did it's thing, packages were updated, and things looked just fine. You can't be sure your changes are good until you reboot though, and here's where the trouble started.

After a yum update on the stock RHEL 6 AMI, your AWS instance will fail to boot. You go to grab the system log from the instance and you see this most unhappy nugget at the bottom:

VFS: Cannot open root device "LABEL=_/" or unknown-block(0,0)

That's not a good moment. You're not alone though. An example with links to more info:


So, root cause is that Amazon Web Services has to use custom kernels in order to deal with their storage infrastructure. Hey, that's totally understandable - I'm not sure how they'd implement it if not that way, and I'm not paying anything anyway so I am certainly not complaining.

But I'm still not booting, and that's not good.

So here's how you pull your instance back to the land of the living, courtesy of Alex Bell:


The thing you need to change (assuming you followed Alex's instructions to mount the EBS volume) is /mnt/boot/grub/grub.conf - specifically deleting the first kernel stanza and leaving only the entry for the Amazon-specific "ec2-starter" etc kernel there so your instance will boot.

Finally, to stop this from ever happening again what you need to do is fire up ye ol' 'vi /etc/yum.conf' and add an 'exclude=kernel*' as the last line in the block of header lines so the kernel is never updated again automatically.

Anyone else seen this? Any other "hey I'm an AWS n00b" problems I should look for next?

Hope this helps someone.