The business case for NFS backing stores with VMware ESXi

I mentioned in this post (relating to my Solaris ZFS / iSCSI management script) to the storage-discuss OpenSolaris mailing list that we were mostly using NFS as opposed to iSCSI for our ESXi backing stores, and was asked by Christoph Jahn to provide some background on this.

Here’s my response as posted to the list:

Hi Christoph.

Several reasons, I’ll try and outline those that I can recall:

1. We found the performance of NFS to be far better than iSCSI, although we probably hadn’t spent sufficient time tweaking the iSCSI configuration. NB: we were exporting ZFS block devices as LUNs to virtual machines as RDMs (raw device mappings) as opposed to formatting them as VMFS, so the comparison isn’t perhaps completely fair.

2. Overall the administration of iSCSI was overcomplicated, and prone to human error. Training our people to administer and troubleshoot it was likely to be too costly, and the additional ongoing costs associated with the increased administration have to be considered.

3. Similar to point (2) above, we found that you had to be careful to keep track of IQNs and LUN Ids when mapping in VMware, it’s only possible to put names on targets, not LUNs, so when mapping multiple disks into a VM the process was potentially error-prone.

4. The provisioning of iSCSI storage between ESXi and COMSTAR *felt* a little bit unstable; we had numerous instances of lengthy HBA rescans, targets appearing and disappearing which although they were all explained within ‘expected behaviour’ were either a little counter-intuitive or too time-consuming.

5. My understanding of the nature of ESX NFS connections is that less IO blocking takes place, which explains why so many people get better throughput when running multiple VMs against the same SAN connection.

6. The limit of 48 NFS mappings in ESX wasn’t going to be a constraint for us for the foreseeable future; we rarely load more than 15 VMs per node. The flexibility of creating a separate ZFS filesystem for each VM and exporting it separately over NFS was therefore possible.

7. Everyone understands files; when you login to the Solaris system you can navigate to the ZFS filesystem and see .vmx and .vmdk files – these are nice and simple to manage, clone, backup, export etc etc.

8. Related to point (7), when you use NFS in this way the virtual machine configuration is stored alongside the rest of the virtual machine data. This means that a snapshot of the ZFS filesystem for that VM gives us a true complete backup at that point in time.

I think that just about covers it. Essentially, although iSCSI feels like a more clever way of solving our problem, and certainly it poses as a more ‘enterprise’ solution, it really was a case of overcomplicating the solution.

Several times in the last few years we’ve found that simple is better, even if it doesn’t satisfy one’s desire to implement the ‘technically perfect solution’. It’s more a question of balancing the economics (I’m running a business, after all) with the actual requirements. Oftentimes just because you *can* do something doesn’t mean you should, and I’ve often found that PERCEIVED requirements can out-grow ACTUAL requirements just because some technology exists to solve problems that you don’t have [yet].

In short, I like to keep it fit for purpose, even if it does feel like a more agricultural solution.

Finally, I should add that the one major shortcoming with our NFS solution is the lack of any equivalent to the iSCSI multipathing. If we had any machines that required true high availability or automated failover this would probably have negated all of the points above – iSCSI multipathing is a beautiful thing, and it creates some awesome possibilities for fault tolerance.

As it stands, we take care of link failure at the network level (as opposed to the iSCSI MPIO protocol level) and deal with ESX node or storage node failure by manually remapping NFS filesystems from elsewhere. This is actually preferable to automated recovery since sometimes we don’t want to take the ‘default’ action during a failure scenario. By having a carefully documented failure plan I believe we have more flexibility, and can deal with recovery on a per-client basis, rather than a system-wide basis.

Ultimately we are a business dealing with multiple clients hosted on shared hardware, so it’s important to keep our implementations client-centric, rather than system-centric.

Finally, I should add that although we don’t have automated failover of these systems, our solution does still permit us to stay well within our contracted SLAs, which serves the business need.

Regards,

Timothy Creswick

This entry was posted in Virtualisation and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

What is 10 + 5 ?
Please leave these two fields as-is:
IMPORTANT! To be able to proceed, you need to solve the following simple question (so we know that you are a human)