Monday, January 23, 2012

Amazon Web Services & The EC2 Support Team

It has been nearly a month of using the services provided by Amazon's Elastic Compute Cloud (EC2), and I have to say that overall at this early stage I'm more than happy. I've been working with Oracle tools and support services for several decades and it really isn't fair to directly compare a relative upstart to a fairly mature company. Even with some rough edges I have gotten what I needed from the EC2 Support Team so far.

At this point I'm not paying for support, and the public forum style of free support is adequate for this stage of my project. I have seen several occurrences of dissatisfied customers, but Amazon has made attempts to rectify complaints, and fairly often the problems are caused by the customer's own lack of knowledge.

One area where I'm still not happy with is the fact that two Oracle provided virtual machine images (AMI) have a very serious bug. ami-42778a2b Oracle Linux 5.6 (without a database), and Oracle Linux 5.4 (without a database).

They both would become unreachable after using them for awhile. In the case of the 5.4 version if you attached an EBS volume, then bounced the server it would come up but you could not log on.

I asked several times that the AMIs be corrected or removed, or, in the least a warning posted and the support staff notified of the problem so as little time and effort as possible was lost. I received no confirmation that my suggestion was taken. However, I did notice that the 4.5 version may no longer be available.

The problem in both cases involved the Ethernet port/card eth0. The fix below was supplied by Amazon Support. Regretfully, you must do this to a brand new instance, because the damaged instance is not reachable to fix.

Remove the complete HWADDR-line from /etc/sysconfig/network-scripts/ifcfg-eth0:

> cd /etc/sysconfig/network-scripts
> mv ifcfg-eth0 backup_ifcfg-eth0
> cat backup_ifcfg-eth0 | grep --invert-match HWADDR > ifcfg-eth0

Remove the complete class: NETWORK from /etc/sysconfig/hwconf with e.g. vi. I deleted the following lines:

-
class: NETWORK
bus: PCI
detached: 0
device: eth0
driver: pcnet32
desc: "Advanced Micro Devices AMD 79c970 PCnet32 LANCE"
network.hwaddr: 00:0C:29:FA:B6:CB
vendorId: 1022
deviceId: 2000
subVendorId: 1022
subDeviceId: 2000
pciType: 1
pcidom: 0
pcibus: 0
pcidev: 10
pcifn: 0

I have seen notes that you can shutdown a damaged instance, detach the root volume, attach it to a new instance, bring up that instance and repair the damageed root volume, then subsequently reattach it to what was the damaged instance.

Anyway... it was great that Amazon Support was willing to work on this problem. They were probably within their rights to refuse to touch the issue, because it was not their AMI. Oracle and Amazon (for now) are cloud partners and it would have been a very disheartening warning sign if Amazon did refuse to work on this issue.

At approximately $100 a month to have a development environment to experiment on is more than a bargain, and have some level of support on top of that is a winning situation.

No comments:

Post a Comment