Wednesday, August 26, 2009

Blade Servers - to boot from local disk or SAN?

Blade servers are meant to reduce data-center footprint in physical space, power and cooling by replacing rack-mount servers. I do not want to debate if they are efficiently / effectively doing that, but let us look at the various methods employed to boot these blades. Most of today's blade servers have internal local disk which can be used to install Operating Systems on. If you are moving from rack mount servers, the first question that will pop-up in your mind is if there is redundancy for this local disk. Some blade centers support dual local hard drives and some do not; even if there is redundancy the size of that hard drive is so small that is probably not going to fit your requirements. Some blades have SSDs for local disks to speed up boot process.

If your blade server does not have dual local hard drives, then it should be out of question whether to use them or not. It might also depend on what purposes you are using these blades for. If they are hosting a virtualized environment, maybe single drive does not matter as they can failover to other available servers in the virtual cluster. But still, it would put a lot of load on the virtual environment depending on how many VMs and their configuration need to be moved - just because the blade server is not capable of having mirrored boot disks.

In some cases, boot from SAN is employed even if there is mirrored boot disk. This might purely be for operations reasons as the data can be moved/migrated to new cluster if needed. Let us look at why this solution is good/bad.

From a storage perspective, in order to sustain boot IOPS from a whole bunch of servers (physical or virtual) the underlying storage need to support 'x' amount of IOPS and hence will determine the number of disks that need to be in the boot-pool. Even though it is a rare case for all of the servers to boot at a single time, all servers will be swapping in/out depending on the application on those servers. If boot is from SAN, obviously swap space is also coming from SAN and that need to be factored into the IO count while finalizing the storage design. Also, SAN storage is more expensive than to have local drives (even for local SSDs). Blade server vendors are now implementing local SSD drive solutions with RAID capabilities and that should be the preferred way to use them, especially for Virtualized environments.

Unless there is a business case for booting off the SAN, it should not be done for "just because he/she said so" reason. The already fine line between systems engineers and storage engineers is now more thinner with Blade server setups and most of the "slow boot" issues tend to blame the storage layer if not designed properly. Sufficient amount of cache on storage will certainly help in this process as well as the RAID configuration on where the boot and swap lives on the storage. Server design should be planned carefully so there is ample amount of RAM on it and there will be no swapping done.

No comments:

Post a Comment