Stories from the VMworld Solutions Exchange: PernixData FVP: The what and installation in the lab.

This year my plan is to write up some blog posts about some of the solutions of the partners at the VMworld Solutions Exchange. Next to VMworld general sessions, technical sessions, hanging space and hands on Labs, an important part of VMworld is the partner eco system at the Solutions Exchange. I have visited the solution exchange floor several times this year (not only for the hall crawl) and I wanted to make a series about some of the companies I spoke with. Some I already know and some are new to me. The series is in absolute no order of importance, but about companies that all offer cool products/solutions and present their solutions with a whole lotta love to help the businesses and technologies getting happy. I have been a bit busy these last couple of weeks so probably a bit later then I first wanted, but here goes…

This time it is about getting familiar with PernixData’s FVP. I have seen enough on the communities and concepts on the big bad Intarweb, I had a chance to see it in real action from some demo’s and the shots in the technical presentations by PernixData (thanks for those sessions guys :-) ). Time to take it for a spin in the test lab. But first a little why and what before the how.

PernixData Logo

What is PernixData FVP?

To know about PernixData FVP you first have to start at the issue that this software solution is trying to solve; IO bottlenecks in the primary storage infrastructure. These IO bottlenecks add serious latency to the application workload the virtual infrastructure is serving out to the users.  Slow responses or unusable applications are a result of these latency issues. This leads to frustrated and suffering end users (which leads to anger and the dark side), but this also creates increased work for the IT department, such as increased help desk calls, troubleshooting/analyze performance issues and extra costs to compensate these troubleshooting (cost in personnel and hardware to patch up the issues). One of the options often used to try and solve the IO puzzle is to add flash, at first mostly to the storage infrastructure as a caching layer or a flash all storage arrays. Flash has microsecond response times and delivers more performance as magnetic spinning disks with their millisecond response times. The problems with adding to the storage infrastructure are high returning costs and not really solving the problem. Sure giving the storage processors faster IO response and more flash capacity will have an improvement of some sort vs traditional storage, but this needs constant investments when the limit is reached again and IO is still far from the workload. The IO still must travel  through the busses, to the host adapter over a network and through the storage
processor to reach the flash, and back the same way for a response of the operation or the requested data. This typically adds some response times where each component adds some handling and their own response time. Not the talk about adding workload to the storage processors with all the additional processing. Flash normally does it’s responses in microseconds. That seems to be a waste.
Okay no problem we add flash to the compute layer at the host. That is close to the application workloads. Yes good, performance needs to cuddle with the workloads. We decouple storage performance and capacity to performance in the host and storage capacity in the traditional storage infrastructure. Only just putting flash in the host does not solve it as a whole. The flash still needs to be presented to the workload and handle locality issues for VM mobility requiremens (fault tolerance, HA, DRS and such). As PernixData is not the only that tries to solve this issue, but some are presenting the local acceleration resources with a VSA (Virtual Storage Appliance) architecture. This in itself introduces yet another layer of complexity and additional IO handling as those appliances act as an intermediate between workload, hypervisor and flash. Furthermore as they are part of a virtual infrastructure they can have to battle with other workloads (that are also using the VSA resources for IO) when needing host resources. We need a storage virtualization layer to solve the mobility issue, optimize IO for flash and need to talk as directly as possible, or need a protection mechanism or smart storage software of some sort for IO appliances (there are some solutions out there that are also handling these). The first is where PernixData FVP comes in play.

PernixData Overview

Architecture

The architecture of FVP is simple. All the intuition, magic and smartness is in the software it self. It uses flash and/or RAM on the host, a host extension on those hosts and a management server component. It currently works only with the VMware hypervisor (a lot of the smart people from PernixData come from previous work at VMware). It can work with backend storage in block (FC, iSCSI) or file (NFS) storage as well as direct attached storage. As long as it is on the VMware HCL.
The host extension is installed as a VMware VIB. The management server requires a Windows server and a database. The management server is installed as a extension to vCenter and uses a service account (with rights to the VMware infrastructure and the database) or a local user (can be SSO only). With the latter it uses local system as the service account.
When adding a second FVP host this host is automatically added to the default fault domain. Default local acceleration is replicated to one peer in the same fault domain (with the Write-Back policy). This works from the box, but you probably will need to match the domains (add you own) and settings to the architecture you are using. The default fault domain cannot be renamed, removed, or given explicit associations.

Installation

After installing the flash or RAM to use for acceleration at the hosts. We can install the host extension with access to the ESXi shell (local or remote with SSH). I downloaded the FVP components and place the zip with the host extension on the local datastore as I’m not installing across a lot hosts. To install a VIB the host must be in maintenance.

# esxcli system maintenanceMode set –enable on
~ # esxcli software vib install -d /vmfs/volumes/datastore1/PernixData-host-exte
nsion-vSphere5.5.0_2.0.0.2-32837.zip
Installation Result
Message: Operation finished successfully.
Reboot Required: false
VIBs Installed: PernixData_bootbank_pernixcore-vSphere5.5.0_2.0.0.2-32837
VIBs Removed:
VIBs Skipped:
~ # esxcli system maintenanceMode set –enable off

Next up is the management server. The Windows installer itself is pretty straightforward. Have a service account, database and access to the database and vCenter inventory for that service account setup beforehand, and you are ready to roll.

Install - FVP Management Server IP-port Install - SQL Express Install - vCenter

When installing with a Web Client the FVP client add on is added when registering with the vCenter. Restart any active session by logging off and logging on and FVP will show up in the object navigator.

Web client Inventory

Next up create your FVP cluster as a transparent IO acceleration tier and associate this with a vCenter cluster of the to accelerate hosts (with the host extension and local resources). The hosts in this cluster will be added to the cluster. With cluster created and the hosts visible in the FVP cluster we add the local acceleration resources (the flash and/or RAM) to use. Next we add datastores and VM’s. Add this level we can set the policies to use. Depending on your environment certain policies will be effect. In the manage advance tab we can blacklist VM’s to be excluded from the acceleration (VADP or other reasons). In the advance tab we can also define which network to use for acceleration traffic. By default FVP automatically chooses a single vMotion network. In the manage advance tab we can also put in the license information or create a support bundle if we ever need one. The manage fault domain tab is not just a clever name, here we can see the default domain and add our own when needed.

Add Flash Devices Add FVP Cluster Fault Domain
The monitoring tab is where you we have the opportunity to look at what your environment is doing. An overview of what FVP is doing is shown in the summary tab. These are great places to get some more insight on what your workloads IO are doing. My test lab is getting the acceleration for the moment it is started.

Monitor Tab Performance results writethrough and writeback two login sessions

I can also show a first write policy versus on a Login VSI workload. (Keep in mind my testlab isn’t that much) LoginVSI-Policies

But that is more something for an other blog post about FVP.

Policies

When an application issues a write IO, the data is committed to the flash device however this data must always be written to the storage infrastructure. The timing of the write operation to the storage infrastructure is controlled by the write policies in FVP. There are two policies to choose from: the write-through and write-back policy.

Write-Through. When a virtual machine application workload issues an IO operation, FVP will determine if it can serve it from flash. When it’s a write IO the write goes straight to the storage infrastructure and the data is copied to the flash device. FVP acknowledges the completion of the operation to the application after it receives the acknowledge from the storage system. In effect write IO’s are not accelerated by the flash devices when selecting this write policy, but all subsequent reads on that particular data are served from flash. The write IO will benefit from the reads acceleration as these read operations/requests are not hitting the storage infrastructure and more resources are available there to serve the write IO.

Write-Back. This policy accelerates both read and write IO. When an virtual machine application workload issues a write IO operation, FVP forwards the command to the flash device. The flash devices acknowledges the write to the application first and then handles the write operation to the storage system in the background. With these delayed write’s we have a situation that for a small time window the data is on the host and not yet written to the backend storage infrastructure. When something happens on the vSphere host, this could end up with data loss. For this replica’s of the data are used. With replica’s the data is forward to the local and one or more remote acceleration devices. This results in the application having flash-level latencies while FVP deals with fault tolerance, and latency/throughput performance levels of the storage infrastructure. 

FVP will allow to set policies on datastores or on a per VM level. You can have an environment where a great number of virtual machines run in write-through mode, while others write-back.

Simple

You notice from the installation and usage of FVP that this product is to give simplicity to it’s operators. Just add some VIB and initial configuration, and your FVP solution is running and showing improvements within a few minutes (when you have some flash/RAM else it’s installing those first). No calculating, sizing and deploying virtual appliances where IO flows through, with FVP it’s extension it is talking to the right place. Yes you will have to have a Windows server for the Management component, but this is out of band of the IO flow.
If you have some experience in the VMware product line and understand the way the PernixData product is setup, the level of entry to using FVP is super low. You just have to familiarize yourself with the way FVP handles your workload performance/IO (the policies, settings and the places to set them), next to actually knowing some of the workloads that you have in the environment and what they are doing with IO. And there FVP can be of assistance as well, next to accelerating your IO workloads it will give you lot’s of insight on what storage IO is doing in your environment by presenting several metrics in the management interface of vSphere. And that’s another big simplicity, integrating seamlessly in one management layer.

 

Sources: pernixdata.com