Tuesday, January 7, 2014

Red Hat Enterprise Linux Server in Amazon Web Services won't boot after yum update


Happy New Year, fellow people trying to make things work.

Hopefully everyone is mostly succeeding despite the fact the war on entropy cannot be won.

Interesting problem this morning, in a quest to live a more portable life I just moved all my services from a home network to Amazon Web Services. Great deal on the Free Tier there, perfect thing to evict all your local services and get them in the cloud.

Problem is, I did the normal thing you'd expect to work after bringing up the Amazon instances - a simple 'yum update'.

yum did it's thing, packages were updated, and things looked just fine. You can't be sure your changes are good until you reboot though, and here's where the trouble started.

After a yum update on the stock RHEL 6 AMI, your AWS instance will fail to boot. You go to grab the system log from the instance and you see this most unhappy nugget at the bottom:

VFS: Cannot open root device "LABEL=_/" or unknown-block(0,0)

That's not a good moment. You're not alone though. An example with links to more info:

https://forums.aws.amazon.com/thread.jspa?messageID=477611&threadID=132185

So, root cause is that Amazon Web Services has to use custom kernels in order to deal with their storage infrastructure. Hey, that's totally understandable - I'm not sure how they'd implement it if not that way, and I'm not paying anything anyway so I am certainly not complaining.

But I'm still not booting, and that's not good.

So here's how you pull your instance back to the land of the living, courtesy of Alex Bell:

http://amazonserver.blogspot.com/2013/01/recover-broken-amazon-ec2-instance.html

The thing you need to change (assuming you followed Alex's instructions to mount the EBS volume) is /mnt/boot/grub/grub.conf - specifically deleting the first kernel stanza and leaving only the entry for the Amazon-specific "ec2-starter" etc kernel there so your instance will boot.

Finally, to stop this from ever happening again what you need to do is fire up ye ol' 'vi /etc/yum.conf' and add an 'exclude=kernel*' as the last line in the block of header lines so the kernel is never updated again automatically.

Anyone else seen this? Any other "hey I'm an AWS n00b" problems I should look for next?

Hope this helps someone.

Cheers!

2 comments:

Unknown said...

Thanks for this great post man. The cause of the issue on mine isn't really similar with yours. Mine occurred when I resized my ebs. After mounting the resized volume to my instance, it spits the same error you described when the instance is started. Did a couple of editing until it finally start. Thanks again!

Unknown said...

God, you saved my day. Thanks a lot for your post