EUC Layers: Dude, where’s my settings?

With this blog post I am continuing my EUC Layers series. As I didn’t know that I started one there is no real order to follow. Other that it seems to be somewhat from the user perspective, as that seems a big part in End User Computing. But I cannot guarantee that will be the right order at the end of things.

If you would like to read back the other parts you can find them here:

For this part I would like to ramble on and sing my song about an important part for the user experience, User Environment Management.

User Environment

Organisations will grant its users access to certain workspaces, an application, a desktop and or parts of data required or supporting the users role within the business processes. With that these users are granted access to one or more operating systems below that workspace or application. This organization would also like to apply some kind of corporate policy to ensure the user works with the appropriate level(s) of access for doing their job and keeping organizations data secure. Or in some cases to comply with rules and regulations and thus making the users job a bit difficult at the same time.

On the other side of the force, each user will have a preferred way of using the workspace and will tend to make all sorts of changes that enable these users to work efficiently as human possible. An example of these changes are look and feel options and e-mail signatures.

The combination of the organization policy and the user preferences is the User Environment Layer, also called persona also called user personality.

Whether a user is accessing a virtual desktop or a published application, the requirement for a consistent experience for users across all resources is one of the essential objectives and requirements for End User Computing solutions. If you don’t have a way of managing the UE, you will have disgruntled users and not much of a productive solution.

Dude

Managing the User Environment

Managing the User Environment is complicated as there are a lot of factors and variables in the End User environment. Further complexity is added by what will be needed to be managed from the organization perspective and what does your users expect.

Next to this yet an other layer is added to this complexity, the workspaces are often not just one dominating technology, but a combination of several pooled technologies. Physical desktops pools, Virtual desktops pools, 3D engineering pools, application pools and so on.

That means that a user does not always log on to the same virtual desktop each time, or log on to a published application on another device still wanting to have the same settings to the application and the application on the virtual desktop. A common factor is that the operating system layer is a Windows-based OS. Downside is, several versions and a lot of application options. We should make sure that user profiles are portable in one way or another from one session to the next one.

It is absolutely necessary that using different versions pooled workspaces that the method of deploying applications and settings to users is fast, robust and automated. From the user context and operational management.

Sync Personality

User Environment Managers

And cue the software solutions that will abstract the user data and the corporate policies from the delivered operating system and applications. And manage centrally.

The are a lot of solutions that provide a part of the puzzle with profile management and such. And some will provide a more complete UEM solution like:

  • RES ONE Workspace (previously known as RES Workspace Manager),
  • Ivanti Environment Manager (previously known as AppSense Environment Manager),
  • LiquidLabs Profile Unity,
  • VMware User Environment Manager (previously known as Immidio).

And probably some more…

Which one works best is up to your requirements and the fit with the rest of the used solution components. Use the one the fits the bill for your organisation now and in a future interaction. And look for some guidance and experience from the field via the community or the Intarweb.

User Profile Strategy

All the UEM solutions offer an abstraction for the Windows User Profile. The data and settings normally in the Windows User Profile are captured and saved to a central location. When the user session is started on the desktop, context changes, application starts or stops, or sessions are stopped, interaction between (parts of) the central location and the Windows Profile is done to maintain a consistent user experience across any desktop. Just in the time when they are needed, and not bulk loaded on startup.

The Windows Profile itself comes in following flavours:

  • Roaming. Settings and data is saved to a network location. Default the complete profile is copied at log in and log out to any computer the user starts the session. The bits that will be copied or not can be tweaked with policies.
  • Local. Settings and data is saved locally to the desktop. This remains on the desktop. When roaming settings and data are not copied and a new profile is created with a new session.
  • Mandatory. All user sessions use a prepared user profile. All user changes done to the profile are delete when user session are logged off.
  • Temporary. Something fubarred. This profile only comes in to play when an error condition prevents the user’s profile from loading. Temporary profiles are deleted at the end of each session, and changes made by the user to desktop settings and files are lost when the user logs off. Not using this with UEM.

The choice of Windows profile used with(in) the UEM solution often depends on to be architecture and the phase you are doing, starting point and where to go. For example starting with the bloating and error prone roaming profiles, UEM side-by-side for capturing the current settings and moving to clean mandatory profiles. Folder Redirection in the mix for centralized user data and presto.

Use mandatory as de facto wherever possible, it is a great fit for virtual desktops, published applications and host/terminal servers in combination with a UEM solution.

The User Profile strategy should also include something to mitigate against the Windows Profile versions. OS versions are incorporated with different profile versions. Without some UEM solution you cannot roam settings between a V2 and V3 profile. So when migrating or moving between different versions is not possible without tooling. The following table is created with the information from TechNet about User Profiles.

Windows OS User Profile Version
Windows XP and Windows Server 2003 First version without .
Windows Vista and Windows Server 2008 .V2
Windows 7 and Windows Server 2008 R2 .V2
Windows 8 and Windows Server 2012 .V3 (after the software update and registry key are applied)
.V2 (before the software update and registry key are applied)
Windows 8.1 and Windows Server 2012 R2 .V4 (after the software update and registry key are applied)
.V2 (before the software update and registry key are applied)
Windows 10 .V5
Windows 10, 1703 and 1607 .V6

Next to that UEM offers to move settings for the user context from Group Policies and login/logoff scripts, again lowering the amount of policies and scripts at login and logoff. And improving the user experience by lowering those waiting times to actually having what you need just in the time you need it.

And what your organization user environment strategy is, what do you want to manage and control, what to capture for users and applications, and what not.

VMware User Environment Manager

With VMware Horizon often VMware UEM will be used. And what do we need for VMware UEM?

In short VMware UEM is a Windows-based application, which consists of the following main components:

  • Active Directory Group Policy for configuration of the VMware User Environment Manager.
  • UEM configuration share on a file repository.
  • UEM User Profile Archives share on a file repository.
  • The UEM agent or FlexEngine in the Windows Guest OS where the settings are to be applied or captured.
  • For using UEM in offline conditions and synchronizing when a the device connects to the network again.
  • UEM Management Console for centralized management of settings, policies, profiles and config files.
  • The Self-Support or Helpdesk Tool. For resetting to a previous settings state or troubleshooting for level 1 support.
  • The Application Profiler for creating application profile templates., Just run your application with Appliction profiler and Application Profiler automatically analyzes where it stores its file and registry configuration. The analysis results in an optimized Flex config file, which can then be edited in the Application Profiler or used as is in the UEM environment.

UEM will work with the UEM shares and engine components available to the environment. With the latest release Active Directory isn’t a required dependency with the alternative NoAD mode. The last three are for management purposes.

All coming together in the following architecture diagram:

UEM Architecture

That’s it, no need for further redundant application managers and database requirements. In fact UEM will utilize components that organization already have in place. Pretty awesomesauce.

I am not going to cover installation and configuration of UEM, there are already a lot of resources available on the big bad web. Two excellent resources are http://www.carlstalhood.com/vmware-user-environment-manager/ or https://chrisdhalstead.net/2015/04/23/vmware-user-environment-manager-uem-part-1-overview-installation/. And of course VMware blogs and documentation center.

Important for the correct usage of UEM is to keep in mind that the solution works in the user context. Pre-Windows Session settings or computer settings will not be in UEM. And it will not solve application architecture misbehaviour. It can help with some duct tape, but it wont solve an application architecture changes from version 1 to version 4.

VMware UEM continually evolves with even tighter integration with EUC using VMware Horizon Smart Policies, Application Provisioning integrations, Application authorizations, new templates and so on.

Happy Managing the User Environment!

Sources: vmware.com, microsoft.com, res.com, ivanti.com, liquidwarelabs.com

EUC Toolbox: O Sweet data of mine… Mining data with Lakeside Software’s SysTrack

I have already covered the importance of insights for EUC environments in some of my blog post. TL;dr of those is: if not having some kind of insight your screwed. As I find this a very important part of EUC and EUC projects, and see that insights are often lacking when I enter the ring…. I would like to repeat myself: try to focus on the insights a bit more, pretty please.

Main message: to successfully move and design a EUC solution, assessment is key as designing, building and running isn’t possible without visibility.

Assessment Phase

The assessment phase is made up of gathering information from the business, such as objectives, strategy, business non-functional and functional requirements, security requirements, issues and so on. And mostly getting questions from the business as well. This gathering part is made up in getting the information in workshops with all kinds of business and user roles, having questionnaires and getting your hands on documentation regarding the strategy and objectives, and some current state architecture and operational procedures explanation and documentation. Getting and creating a documentation kit.

The other fun part is getting some insights from the current infrastructure. Getting data about the devices, images, application usage, logon details, profiles, faults etc. etc. etc.

Important getting this data is the correlation of user actions with the subjects. It is good to know that when a strategy is to move to cloud only workspaces, that there will be several thousands steps between how a user is currently using his tool set to support the business process and the business objective. An intermediate step of introducing an any device desktop and hosted application solution is likely to have a higher success rate. Or wanting to use user environment management, and the current roaming profiles as bloated to 300GB. But I will try to not to get ahead with theories, get some assessment data.

Data mining dwarfs

Mining data takes time

Okay out with this one. Mining, or gathering, data takes time and therefore a chunk of project time and budget. Unfortunately with a lot of organisations it is either unclear what this assessment will bring, there are costs involved and a permanent solution is not in place. Yes there are sometimes point-in-time software and application reports that can be made from centralized provisioning solution, but these often miss the correlation of that data to the systems and what the user is actually doing. And the fact that Shadow IT is around.

Secondly knowing what to look for in the mined data to answer the business questions

But we can help, and timing costs is more of a planning issue for example not being clear on the efforts.

The process: Day 1 is installation. After a week a health check is done to see if data is flowing in the system. Day 1+14 initial reports and modeling can be started. Day 30+ and a business cycle is mined. This means enough data point have been captured, desktops that not often are connected have had their connection and thus agents, and variation to have a good analytics. Month start and month closing procedures have been captured. What about half-year procedures? No there not in a 30 days assessment when this period doesn’t include that specific procedure. Check with the business if those are critical.

Assess the assessment

What will be your information need and are there any specific objectives from the business they would like to see? If you don’t know what you are looking for, the amount of data will be overwhelming and it will be hard to get some reports out. Secondly try to focus on what and how an assessment tool can do for you. Grouping objects in reports that don’t exist in the current infrastructure or the organization structure will need some additional technical skills or need to be place in the can’t solve the organization with one tool category.

Secondly check the architecture of the chosen tool and how it fits in the current infrastructure. You probably need to deploy a server, have a place where its data is stored and need some client components identified and deployed. Check whether the users are informed, if not do it. Are there desktops that not always connect to the network and how are these captured. Agents connect to the Master once per day to have their data mined.

Thirdly check if data needs to remain in the organization boundaries, or that it can be saved or exported to a secure container outside the organization. To analyze and report it will be beneficial for time lines if you could work with the data offsite, saves a lot of traveling time throughout the project.

Fourthly what kind of assessment is needed. Do we need a desktop assessment, server assessment, physical to virtual assessment or something else. What kind of options do we have in gathering data, do we need agents, something in the network flow etc. etc. This kinda defines our toolbox to use. Check if vendors and/or community is involved in the product this can prove to be very valuable for the right data and interpretation of data in reports. Fortunately for me the tool for this blog post SysTrack can be used for all kinds of assessments. But for this EUC toolbox I will focus on the desktop assessment part.

SysTrack via Cloud

VMware teamed up with Lakeside Software to provide a desktop assessment tool free for 90 days called the SysTrack Desktop Assessment. It will collect data for 60 days and keep that data in the cloud for an additional 30 days. After 90 days access to the data will be gone. The free part is that you pay with your data. VMware does the vCloud Air hosting and adding the reports, Lakeside adds the software to the mix and viola magic happens. The assessment can be found at: https://assessment.vmware.com/. Sign up with an account and your good to go. If you work together with a partner be sure to link your registration with that partner so they have access to you information. When registration is finished your bits will be prepared. The agent software will be linked to your assessment. Use your deployment method of choice to deploy agents to the client devices, physical or virtual as long as its Windows OS. Agents need to connect to the public cloud service to upload the data to the SysTrack system. Don’t like all your agents connecting to the cloud, you can use a proxy where your clients connect to and the proxy connects to the cloud service. Check the collection state after deploying and a week from deploying. After that data will so up in the different visualizers and overviews.

SDA - Dashboard

If you have greyed out options, be patience there is nothing wrong (well yet). These won’t become active until a few days of data have been collected to make sure representative information is in there before most of the Analyze, Investigate and report options are shown.

Have a business cycle in and you can use the reports for your design phase. The Horizon sizing tool is an XML export that you can use in the Digital Workspace Designer (formerly known as Horizon Sizing Estimator) find it at https://code.vmware.com/group/dp/dwdesigner. Use the XML as a custom workload.

SDA- User Visualizer

SysTrack On site

Okay, now for the on site part. You got a customer that doesn’t like its data on somebody else her computer, needs more time, needs customizations to reports, dashboards or further drill down options -> tick on site deployment. It needs more preparations and planning between you and the customer. If the cloud data isn’t a problem, let your customer start the SDA the have some information before having the onsite running, mostly it will take calls and operational procedures before a system is ready to install.

Architecture

Okay so what do we need? First get a license from Lakeside or your partner for the amount of desktop you want to manage. You will get the install bits or the consultant doing the install will bring them.

Next the SysTrack Master server. Virtual or physical. 2vCPU and 8GB (with Express use 12GB) to start with, grows when having more endpoints. Use the calculator (Requirements generator) available on the Lakeside portal. Windows server minimum 2008R2 SP. IIS Web Roles, .Net Framework (all), AppFabric and Silverlight (brr). If you did not setup the pre-requisites this will be installed by the installer (but it will take time). That is… not .Net Framework 3.5 as this is a feature on servers where you need some additional location of source files. Add this feature to the system prior installation. And while you are at it install the rest.
For a small environment or without non persistent desktops a SQL Express (2014) can be included in the deployment. Else use an external database server with SQL Server Reporting Service (SSRS) setup. With Express SRSS is setup by the way.

Lakeside Launch

You need a SQL user (or the local system) with DBO to the new created SysTrack database and a domain user with admin rights to reporting service and local admin on the Windows server. If you are not using a application provisioning mechanisme or desktop pool template, you can push or pull from the SysTrack Master. For this you need a AD user with local admin rights to the desktops (to install the packages) and File and Print Services, Remote management and Remote Registry. If SCCM or MSI installation in the template is used, you won’t require local admin rights, remote registry and such.

If there is a firewall between the clients (or agents or childs) and master server be sure to open the port you used in the installation, default you need 57632 TCP/UDP. And if there is something between the Master Server and Internet with the registration, you will need to activate by phone. Internet is only used with license activation though.

And get a thermos of coffee, it can take some time.

To visualise the SysTrack architecture we can use the diagram from the documentation (without the coffee that is).

SysTrack Architecture

Installation is done in four parts, first the SysTrack Master Server (with or without SQL), secondly the SysTrack Web Services, thirdly the SysTrack Administrative tools and when 1-3 are installed and SysTrack is configured, you can deploy the agents.

  • SysTrack Master Server is for the Master for the application intelligence storing the data from childs (or connecting to data repository), configuration, roles and so on.
  • SysTrack Web Services is for Front end visualizers and reporting (SSRS on SQL server).
  • SysTrack Administrative Tools for example the deployment tool for configuration.

You gotta catch them all.

SysTrack Install menu

And click on Start install.

The installers are straightforward. Typical choices are the deployment type, full or passive. Add the reporting service user that was prepared (you can do this later as well). Database type, pre-existing (new window will open for connection details) for an external database or the express version. Every component will need its restart. After restarting the Master Setup the Web Services installer will start. After this restart, the Administrative Tools don’t start automatically. Just open the Setup and tick the third option and start the install.

Open the deployment tool. Connect to the master server. Add your license details if this is a new installation. Create a new configuration (Configuration – Alarming and Configuration). Selecting Base Roles\Windows Desktop and VMP will work for a good start in desktop assessments. Set your newly created as default, or change manual in the tree when clients have been added to the tree. And push the play button when ready to start or receive clients. Else nothing will come in.

vTestlab Master Deployment Tool

Now deploy the agent via MSI. The installation files are on the Master Server in the installation location: SysTrack\InstallationPackages. You have the SysTrack agent (System Management Agent 32-bit) and the prerequisite C Redistributable’s VC2010.
With MSI deployments you add the master server and port to the installer options. If the Master allows clients to auto add themselves to the tree which is the default case with version 8.2, they will show up.

“Normally” the clients won’t notice the SysTrack agent deployed. There is not a restart required for the agent installation.
For strict environments you can have a pop-up in Internet Explorer about LSI Hook browser snap in. You can suppress this by adding the CLSID of LSI Hook to the add-on list with a value of 1. Or you can edit your configuration and change Web browser plugins to false. This in turn will mean that web data from all browser is not collected by SysTrack.

Configuration Web Browser Plugin.png

In any case be sure to test the behaviour in your environment before rolling out to a large group of client.

Conclusion

While the cloud is deployed within a snap and data is easily accessed within the provided tools and reports fit the why of the assessment, there is a big but.. Namely that a big chunk of organisations don’t like this kind of data to go into the cloud, even when the user names are anonymized. Pros for the on site version is that it will give you more customizations and reporting possibilities. Downside is that SysTrack onsite is Windows based and the architecture will require Windows licenses next to the Lakeside license. All the visualisers and tools can be clicked and drilled down from the interface, but it feels a little like several tools have been duck-taped together. You can customize whatever you want, dashboards, reports and grouping. You would need a pretty skill set including how to build SQL queries, SSRS reports and the SysTrack products themselves. And what about the requirement for Microsoft SilverLight, using a deprecated framework Tsck tsck. Come on this is 2017 calling….

But in the end it does not matter if SysTrack from Lakeside Software or for example Stratusphere FIT from Liquidwarelabs is used, that is your tool set. The most important part is to know what information is needed from what places and know thy ways to present these. Assess the assessment, plan some time and get mining for diamonds in your environment.

– Happy Mining!

Sources: vmware.com, lakesidesoftware.com

EUC Layers: Display protocols and graphics – the stars look very different today

In my previous EUC Layer post I discussed the importance of putting insights on screens, in this post I want to discuss the EUC Layer of putting something on the screen of the end user.

Display Protocols

In short, a display protocol transfers the mouse, keyboard and screen (ever wondered about vSphere MKS error if that popped up) input and output from a (virtual) desktop to the physical client endpoint device and vice versa. Display protocols usually will optimize this transfer with encoding, compressing, deduplicating and performing other magical operations to minimize the amount of data transferred between the client endpoint device and the desktop. Minimize data equals less chance of interference equals better user experience at the client device. Yes, the one the end user is using.

For this blog post I will stick to the display protocols VMware Horizon has under its hood. VMware Horizon supports four ways of using a display protocol: PCoIP via the Horizon Client, Blast Extreme/BEAT via the Horizon Client, RDP via Horizon Client or MS Terminal Client, and any HTML5 compatible browser for HTML Blast connections.

The performance and experience of all the display protocols are influenced by the client endpoint device – everything in between – desktop agent and the road back to the client. : for example virtual desktop Horizon Agent. USB Redirected Mass storage device to your application, good-bye performance. Network filtering and poof black screen. Bad WiFi coverage and good-bye session when moving from office cubicle to meeting room.

poof-its-gone

RDP

Who? What? Skip this one when you are serious about display protocols. The only reason it is around in this list, is for troubleshooting when every other method fails. And yes the Horizon Agent default uses RDP as an installation dependency.

Blast Extreme

Just Beat it PCoIP. Not the official statement of VMware. VMware ensures it’s customers that Blast Extreme is not a replacement but an additional display protocol. But yeah…..sure…

With Horizon 7.1 VMware introduced BEAT to the Blast Extreme protocol. BEAT stands for Blast Extreme Adaptive Transport— UDP-based adaptive transport as part of the Blast Extreme protocol. BEAT is designed to ensure user experience stays crisp across quality varying network conditions. You know them, those with low bandwidth, high latency and high packet loss, jitter and so on. Great news for mobile and remote workers. And for spaghetti incident local networks……..

Blast uses standardized encoding schemes such as default H.264 for graphical encoding, and Opus as audio codec. If it can’t do H.264 it will fallback to JPG/PNG, so always use H.264 and check the conditions you have that might cause a fallback. JPG/PNG is more a codec for static agraphics or at least not something larger than an animated gif. H.264 the other way around is more a video codec but also very good in encoding static images, will compress them better than JPG/PNG. Plus 90% of the client devices are already equipped with a capability to decode H.264. Blast Extreme is network friendlier by using TCP by default, easier for configuration and performance under congestion and drops. It is effecient in not using up all the client resources, so that for example mobile device batteries are not drained because of the device using a lot of power feeding these resources.
Default protocol Blast Extreme selected.

PCoIP

PC-over-IP or PCoIP is a display protocol developed by Teradici. PcoIP is available in hardware, like Zero Clients, and in software. VMware and Amazon are licensed to use the PCoIP protocol in VMware Horizon and AWS Amazon Workspaces. For VMware Horizon PCoIP is an option with the Horizon Client or PCoIP optimized Zero Clients.
PCoIP is mainly a UDP based protocol, it does use TCP but only in the initial phase (TCP/UDP4172). PcoIP is rendered, multi-codec and can dynamically adapt itself based on available bandwidth. In low bandwidth environments it utilizes a lossy compression technique  where a highly compressed image is quickly delivered followed by additional data to refine that image. This process is termed “build to perceptually lossless”. The default protocol behaviour is to use lossless compression when there is minimal network congestion expected. Or explicitly disable as might be required for use cases where image quality is more important than bandwidth for example in medical imaging.
Images rendered on the server are captured as pixels, compressed and encoded and then sent to the client where decryption and decompression happens. Depending on the display, different codecs are used to encode the pixels sent since techniques to compress video images can be different in effectiveness compared to those more effective for text.

 

HTML

Blast Extreme without the Horizon client dependency. Client is a HTML5 compatible browser. HTML access needs to be installed and enabled on the datacenter side.
HTML uses the Blast Extreme display protocol with the JPG/PNG codec. HTML is not feature par with the Horizon Client that’s why I am putting it up as a separate display protocol option. As not all features can be used it not a best fit in must production environments, but it will be very sufficient for enough to use for remote or external use cases.

Protocol Selection

Depending how the pool is configured in Horizon, the end user has either the option to change the display protocol from the Horizon Client or the protocol is set on the pool with the setting that a user cannot change it’s protocol. The latter is has to be selected when using GPU, but it depends a bit on the work force and use case if you would like to leave all the options available to the user.

horizon-client-protocol

Display Protocol Optimizations

Unlike what some might think, display protocol optimization will benefit user experience in all situations. Either from an end user point of view or from IT having some control over what can and will be sent over the network. Network optimizations in the form of QoS for example. PCoIP and Blast Extreme can also be optimized via policy. You can add the policy items to your template, use Smart Policies and User Environment Management (highly recommended) to apply on specific conditions or use GPO’s. IMHO use UEM, and then template or GPO are the order to work from.

uem-smart-policy-example

For both protocols you can configure the image quality level and frame rate used during periods of network congestion. This works well for static screen content that does not need to be updated or in situations where only a portion of the display needs to be refreshed.

With regard to the amount of bandwidth a session eats up, you can configure the maximum bandwidth, in kilobits per second. Try to correspond these settings to the type of network connection, such an interconnect or a Internet connection, that are available in your environment.For example a higher FPS is fluent motion, but more used network bandwidth. Lower is less fluent but a less network bandwidth cost. Keep in mind that the network bandwidth includes all the imaging, audio, virtual channel, USB, and PCoIP or Blast control traffic.

You can also configure a lower limit for the bandwidth that is always reserved for the session. With this option set an user does not have to wait for bandwidth to become available.

For more information, see the “PCoIP General Settings” and the “VMware Blast Policy Settings” sections in Setting Up Desktop and Application Pools in View on documentation center (https://pubs.vmware.com/horizon-7-view/index.jsp#com.vmware.horizon-view.desktops.doc/GUID-34EA8D54-2E41-4B71-8B7D-F7A03613FB5A.html).

If you are changing these values, do it one setting at a time. Check what the result of your change is and if it fits your end users need. Yes, again use real users. Make a note of the setting and result, and move on to the next. Some values have to be redone to find the sweet spot that works best. Most values will be applied when disconnecting and reconnecting to the session where you are changing the values.

Another optimization can be done by optimizing the virtual desktops so less is transferred or resources can be dedicated to encoding and not for example defragmenting non persistent desktops during work. VMware OS Optimization Tool (OSOT) Fling to the rescue, get it here.

Monitoring of the display protocols is essential. Use vROPS for Horizon to get insights of your display protocol performance. Blast Extreme and PCoIP are included in vROPS. The only downside is that these session details are only available when the session is active. There is no history or trending for session information.

Graphic Acceleration

There are other options to help the display protocols on the server-side by offloading some of the graphics rendering and coding to specialized components. Software acceleration uses a lot of vCPU resources and just don’t cut it in playing 1080p full screen video’s. Not even 720p full screen for that matter. Higher clock speed of processor will help graphical applications a lot, but a the cost that those processor types have lower core count. Lower core count and a low overcommitment and physical to virtual ratio will lower the amount of desktops on your desktop hosts. Specialized engineering, medical or map layering software requires graphic capabilities that are not offered by software acceleration. Or require hardware acceleration as a de facto. Here we need offloading to specialized hardware for VDI and/or Published applications and desktops. Nvidia for example.

gpu-oprah-meme

What will those applications be using? How many frame buffers? Will the engineers be using these application mostly or just for a few moments and are afterwards doing work in office to write their reports. For this Nvidia supports all kinds of GPU profiles. Need more screens and framebuffers, choose a profile for this use case. A board can support multiple profiles if it has multiple GPU cores. But per core there only one type of profile can be used, multiple times if you not out of memory (buffers) yet. How to find the right profile for your work force? Assessment and PoC testing. GPU monitoring can be a little hard as not all monitoring application have the metrics up there.

And don’t forget that some applications need to be set to use hardware acceleration to be used by GPU or applications that don’t support or run worse on hardware acceleration because their main resource request is CPU (Apex maybe).

Engineers only? What about Office Workers?

Windows 10, Office 2016, browsers, and streaming video are used all over the offices. These applications can benefit from graphics acceleration. The number of applications that support and use hardware graphics acceleration has doubled over the past years. That’s why you see that the hardware vendors also changed their focus. NVidias’ M10 is targeted at consolidation while its brother M60 is targetted to performance, however reaching higher consolidation ratio’s then the older K generation. But cost a little bit more.

vGPU and one of the 0B/1B profiles and a vGPU for everyone. The Q’s can be saved for engineering. Set the profiles on the VM’s and for usage on the desktop pools.

And what can possibly go wrong?

Fast Provisioning – vGPU for instant clones

Yeah. Smashing graphics and depJloying those desktops like crazy… me likes! The first iteration of instant clones did not support any GPU hardware acceleration. With the latest Horizon release instant clones can be used for GPU. Awesomesauce.

– Enjoy looking at the stars!

Sources: vmware.com, wikipedia.org, teradici.com, nvidia.com

Digital Workspace Transformation: information security

Yes…. it has been a while since I posted on this blog, but I’m still alive ;-)

For a 2016 starter (what?!? is it June already), I want to ramble on about information security in the digital workspace. With a growing number of digital workspace transformations going on, information security is more important than ever. With the growing variety of client endpoints and methodes of access in the personal and corporate environments, users are becoming increasingly independent from the physical company locations. Making it interesting how to centrally manage storage of data, passwords, access policies, application settings and network access (just examples, not the complete list). For any place, any device, any information and any application environments for your users (or do we want any user in there), it is not just a couple of clicks of this super-duper secure solution and were done.

encrypt-image300x225
(image source blogs.vmware.com)

Storing data on for example Virtual Desktop servers (hello VMware Horizon!) in the data center is (hopefully) a bit more secure than storing it locally on the user’s endpoint. At the same time, allowing users to access virtual desktops remotely puts your network at a higher risk then local only. But it’s not all virtual desktops. We have mobile users who will like to have the presentations or the applications directly on the tablet or handheld. I for instance, don’t want to have to open a whole virtual desktop for just one application. You ever tried a virtual desktop on a iPhone, it is technical possible yes, but works crappy. Erm forgot my Macbook HDMI USB-C converter for this presentation, well I send it to your gmail or dropbox for access with the native mobile apps at your conference room. And the information is gone out of the company sphere…..(a hypothetical situation of course..)

Data Leak

Great ideas all those ways to be in and out of company information. But but but….. these also pose some challenges to which a lot of companies have not started thinking about. Sounds a bit foolish as it is probably the biggest asset of a company, information. But unfortunately it’s a fact (or maybe it could be just the companies I visit). Sure these companies have IT departments or IT vendors who think a bit about security. And in effect mostly make their users life’s miserable with all sort of technical barriers installed in the infrastructure. In which the users, business and IT (!) users, will find all sorts of ways to pass these installed barriers. Why? First of all to increase their productivity while effectively decreasing security, and secondly they are not informed about the important why. And then those barriers can be just a nuisance.

Break down the wall

IT’s Business

I have covered this earlier in my post (https://pascalswereld.nl/2015/03/31/design-for-failure-but-what-about-the-failure-in-designs-in-the-big-bad-world). The business needs to have full knowledge of their required processes and information flows, that support or process in and out information for the services supporting the business strategy. And the persons that are part of the business and operate the services. And what to do with this information in what different ways, is it allowed for certain users to access the information outside of the data center and such. Compliancy to for example certain local privacy laws. Governance with policies and choices, and risk management do we do this part or not, how do we mitigate some risk if we take approach y, and what are the consequences if we do (or don’t).

Commitment from the business and people in the business is of utmost importance for information security. Start explaining, start educating and start listening.
If scratch is the starting point, start the writing first on a global level. What does the business mean by working from everywhere everyplace, what is this digital workspace and such.  What are the risks, how do we approach IAM, what do we have for data loss protection (DLP), is it allowed for IT to inspect SSL traffic (decrypt, inspect and encrypt) etc. etc.
Not to detailed at first it is not necessary, as it can take a long time to have a version 1.0. We can work on it. And to be fair information security and digital workspace for a fact, is continue evolving and moving. A continual improvement of these processes must be in place. Be sure to check with legal if there are no loops in what has been written in the first iteration.
Then map to logical components (think from the information, why is it there, where does it come from and where does it go, and think for the apps, the users) and then when you have defined the logical components. IT can then add the physical components (insert the providers, vendors, building blocks). Evaluate together, what works, what doesn’t, what’s needed and what is not. And rave and repeat…..

Furthermore, a target for a 100% safe environment all the time will just not cut it. Mission Impossible. Think about and define how to react to information leaks and minimize the surface of a compromise.

Design Considerations

With the above we should have a good starting point for the business requirement phase of a design and deploy of the digital workspace. And there will also be information from IT flowing back to the business for continual improvement.

Within the design of an EUC environment we have several software components were we can take actions to increase (or decrease, but I will leave that part out ;-)) security in the layers of the digital workspace environment. And yes, when software defined is not a option there is always hardware…
And from the previous phase we have some idea what choices can be made in technical ways to conform to the business strategy and policies.

If we think of the VMware portfolio and the technical software layers were we need to think about security, we can go from AirWatch/Workspace ONE, Access Point, Identity Manager, Security Server, Horizon, AppVolumes to User Environment Management. And And….Two-Factor, One Time Password (OTP), Microsoft Security Compliance Manager (SCM) for Windows based components, anti-virus and anti-malware, networking segmentation and access policies with SDDC NSX for Horizon. And what about Business Continuity and disaster recovery plans, and SRM, vDP.
Enterprise Management with vROPS and Log Insight integration to for example SIEM. vRealize for automating and orchestrating to mitigate work arounds or faults in manual steps. And so on and so on. We have all sorts of layers where to implement or help with implementing security and access policies. And how will all these interact? A lot to think about. (It could be that a new blog post series subject is born…)

But the justification should start at the business… Start explaining and start acting! This is probably 80% of the success rate of implementing information security. And the technical components can be made fit, but… after the strategy, policies, information architecture are somewhat clear….

And the people in the business are supporting the need for information security in the workspace. (Am I repeating myself a bit ;-)

Ideas, suggestions, conversation, opinions. Love to hear them.

Design for failure – but what about the failure in designs in the big bad world?

This post is a random thought post, not quite technical but in my opinion very important. The idea formed after some subjects and discussions at last week’s NL VMUG. This blog post’s main goal is to create a discussion, so why don’t you post a comment with your opinion … Here it goes…

Murphy, hardware failures and engineers tripping over cables in the data center, us tech gals and guys all know and probably experienced them. Disaster happens everyday. But what about a state of the art application that ticks all the boxes for functional and technical requirements, but users are not able to use it, because of their lack of knowledge in this field, or because they are clueless why the business has created this thingy (why and how this application or data is supposed to help the information flow of business processes)? Failure is a constant and needs to be handled accordantly, and from all angles.

Techies are used to look at the environment from the bottom up. We design complete infrastructures with failure in our minds and have the technology and knowledge to perfectly execute disaster avoidance or disaster recovery (forget the theoretical RTO/RPO of 0’s here). We can do this at a lower cost (CAPEX) than ever before, and there are more benefits (OPEX and minimized downtime for business processes) than before. But subsequently, we should ask ourselves this: What about failing applications or data which is generated but not reaching the required business processes (the people that are operating or using these processes)?
Designs need to tackle this problem, using design based on the complete business view and connecting strategy, technical possibility and users!

And how will we do this then?

Well, first of all, the business needs to have full knowledge of their required processes and information flows, that support or process in and out data for these services supporting the business strategy. Very important. And to be honest, only a few companies have figured out this part. Most experience difficulties. And they give up. Commitment from the business and people in the business is of utmost importance. Be a strategic partner (to the management). Start with asking why certain choices are made and explain the why a little more often than just the how, what and when!

Describe why and how information and data is collected organized and distributed (in a fail safe and secure method) and what information systems are used. Describe the applications (and their ROI, services, processes and busses), how the information is presented and flows back in the business (via the people or automated systems). How does your solution let the business grow and flourish? Keep clear of too much technical detail – present your story in a way the manager understands the added value, and knows which team members (future users) to delegate to project meetings.

Next up IT, or ICT here in the Netherlands, Information and Communication Technology. I really like the Communication part for this post, businesses must do that a little more often. Start looking at the business from different points of view, and make sure you understand the functional parts and what is required to operate. To prevent people working on their own without a common goal or reason, internal communication is essential. Know the in and outs, describe why and how the desired result is achieved. Connect the different business layers. For this a great part of business IT departments needs to refocus it’s 1984 vision to the now and future. IT is not about infrastructure alone, it is a working part within the business, a facilitator, a placeholder (for lack of other words in my current vocabulary). IT needs to be about aligning the business services with applications and data, the tools and services that support and provides the business. That is why IT is there in the first place, not the business that is (connected or not) there for IT. IT’s business. Start listening, start writing first on a global level (what does the business mean by working from everywhere everyplace), then map possibilities to logical components (think from the information, why is it there, where does it come from and where does it go, and think for the apps, the users) and then when you have defined the logical components, you can add the physical components (insert the providers, vendors, hardware building blocks).

Sounds familiar? There are frameworks out there to use. Use your Google-Fu: Enterprise Architecture. Is this for enterprise size organizations only? No, any size company must know the why and why and why. And do something about it. And a simplified version will work for SMB size companies. Below is an example of a simplified model and what layers of attention this architectural framework brings to your organization.

Design for Failure

And…in addition to this, start using the following as a basis to include in your designs:

The best way to avoid failure is to fail constantly

Not my own, but from Netflix. This cannot be closer than the truth. No test or disaster recovery plan testing in iterations of half year or year. Do it constantly and see if your environment and business is up to the task to not influence any applications that will go down. Sure, there will be influences that for example the services running at 100% warp speed, but your users still able to do things with the services is better than nothing at all. And knowing that your service operates with a failure is the important part here. Now you can do something about not reaching the full speed, for example scale out to allow a service failure but not at a degraded service speed. Or know which of your services can actually go down without influencing business services for a certain time-frame. This is valuable feedback that will need to go back to the business. Is going down sufficient for the business, or should we try and handle this part so it does not go down at all. Just don’t use it at the infrastructure level only, include the data, application and information layers as well.
Big words here: trust and commitment. Trust the environment in place and test if it succeeds to provide the services needed even when hell freezes over (or when some other unexpected thing should happen). Trust that your environment can handle failure. Trust that the people can do something with or about the failures.
Commitment of the organization not to abandon when reaching a brick wall over and over, but to keep going until you are all satisfied. And trust that your people can fail also. Let them be familiar with the procedures and let a broader range of people handle the procedure (not just the current users names mapped to the processes, but within defined and mapped roles to services, multiple people can operate and analyze the information). Just like technical testing, your people are not operating 24x7x365, they like to go on leave and sometimes they tend to get ill.

Back to Netflix. For their failure generating Netflix uses Chaos Monkey. With that name an other Monkey comes to mind, Monkey Lives: http://www.folklore.org/StoryView.py?project=Macintosh&story=Monkey_Lives.txt. Not sure where the idea came from, but such a service and name cannot be a coincidence only (if you believe coincidence exists in the first place). But that is not what this paragraph is about.
The Chaos Monkey’s job is to automatically and randomly kill instances and services within the Netflix Infrastructure architecture. When working with Chaos Monkey you will quickly learn that everything happens for a reason. And you will have to do something about it. Pretty Awesome. And the engineers even shared Chaos Monkey on Github:
https://github.com/Netflix/SimianArmy/wiki/Chaos-MonkeyIt must not stop at the battle plan of randomly killing services; fill up the environment with random events where services will get into some not okay state (unlike a dead service) and see how the environment reacts to this.

 

Exchange DAG Rebuilding steps and a little DAG architecture

At a customers site I was called in to do a Exchange health check and some troubleshooting. As I have not previously added Exchange content to this blog, I thought on doing a note experience and new blog post in once.

Situation
The environment is a two site data center where site A is active/primary and site B is passive/secondary. Therefor a two node Exchange is deployed on Hyper-V. The node in site A is CAS/HT/MBX en the node in site B is CAS/HT/MBX. The mailbox role is DAG’ged, where active is site A and database copies are on site B. There is no CAS array (Microsoft best practice is to set it even if you have just one, but this wasn’t the case here). This is not ideal as a fail-over in CAS doesn’t allow clients to auto connect to another CAS, Exchange uses the CAS Array (with load balancer) for this. The CAS fail-over is manual (as is the HT). But when documented well and small amount of downtime is acceptable for the organisation, this is no big issue.
Site A has a Hyper-V cluster where the exchange node A is a guest hosted on this cluster. Site B has a unclustered Hyper-V host where Exchange node B is a guest. Exchange node A is marked high available. This again is not ideal, yes maybe for the CAS/HT role it can be used (should then be separated from the mailbox role), but for the mailbox role this is application layer clustered already (the DAG) so preferably off. Anyhow these are some of the pointers I could discuss with the organisation. But there is a problem at hand that needs to be solved.

The Issue at hand (and some Exchange architecture)
The issue is that the secondary node is in a failed state and currently not seeding it’s database copies. Furthermore the host is complaining about the witness share. You can check the DAG health with PowerShell Get-MailboxDatabaseCopyStatus and Test-ReplicationHealth.

You can check the DAG settings and members with Get-DatabaseAvailabilityGroup -Identity DAGNAME -Status | fl. Here you can see the setup file witness server.

RunspaceId : 0ffe8535-f78a-4cc1-85fd-ae27934a98e0
Name : DAGNAME
Servers : {Node A, Node B}
WitnessServer : Servername
WitnessDirectory : Directory name on WitnessServer
AlternateWitnessServer : Second Servername
AlternateWitnessDirectory : Directory name on Second Witness Server

(AlternateWitness server is only used with Datacenter Activition (DAC) Mode DAGOnly, here it is off and therefore not used and not needed)

Okay witness share, some Exchange DAG architecture first. Exchange DAG is a Exchange database mirror service build on Fail over cluster service (Microsoft calls it hidden cluster service). You can mirror the databases in a active/passive solution (one node is active to other is only hosting replica’s), or in an active/active solution (both nodes have active and passive databases). In both solutions that is high availability and room for maintenance (in theory that is). The mirror service is done by replicating the databases as database copies between members of the dag. The DAG uses Fail over clustering services where the DAG members participate as cluster nodes. A cluster uses a quorum to tell the cluster which server(s) should be active at any given time (a majority of votes). In case of a failure in heartbeating networking there is a possibility of split brain, that both nodes are active and try to bring up the cluster resources as they are designed to do. Both nodes can serve active databases with the possibility of data mismatch, corruption or other failures. In this case a quorum is used to find out which node has more votes to be active. A shared disk is often used for the cluster quorum. An other option is to use a file share on a server outside the cluster, the so called file witness quorum or file witness share in Exchange.

image

The above model shows the CAS and DAG HA components. With Exchange architecture best practice the File Witness share is to be placed on the HT role, but in the case of mixed roles you should select a server outside the DAG and in this case outside the Exchange organisation. Any file server can be used, preferably a server in the same datacenter as the primary site serving users (important).

So back to the issue. File witness share (FWS) access. I checked if I could see the file share (\servernameDAGFQDN) from the server and checked permissions (Exchange Trusted Subsystem and the DAG$ computer object should full control). The Exchange trusted subsystem must be a Adminstrators local group member. The FWS is placed on a domain controller in this organization. Not ideal again (Exchange server now need domain level administrators group membership as domain controllers don’t have local groups), but working.

I checked the failover service and there the node is in a down state, including it’s networks. But in the Windows guest networks are up and traffic is flowing from and to the both nodes and the FWS. No firewall on or between the nodes, no natting. Okay……Some other items (well a lot) where checked as the where several actions done in the environment. Also checked Hyper-V settings and networking. Nothing blocking found (again some pointers for future actions).

Well, try to remove and add the failed state node to the DAG. This should have no impact on the organization and the state is already failed.

Removing a node from the DAG.

Steps to follow:
1. Depending on the state, suspend database seeding. When failed, suspend via Suspend-MailboxDatabaseCopy -Identity <mailbox database><nodename>. When status is failed and suspended this is not needed.
2. Remove Database copies of mailbox databases on the failed node. Use  Remove-MailboxDatabaseCopy -Identity <mailbox database><nodename>. Repeat when needed for the other copies.
3. Remove Server from DAG.  Remove-DatabaseAvailabilityGroupServer -Identity <DAGName> -MailboxServer <ServerName> -ConfigurationOnly
4. Evict from cluster.
As the cluster is now only one node, the quorum is moved to node majority automatically. The FWS object is removed from the config.

Rebuilding the DAG by adding the removed node back

Steps to follow:

1. Add server to DAG. This will add the node back to the cluster.  Add-DatabaseAvailabilityGroupServer -Identity <DAGName> -MailboxServer <ServerName>. Succes the node is healthy.
2. Add the database copies as preference 2 (the other node is still active). Add-MailboxDatabaseCopy –Identity <Mailbox Database> -MailboxServer <ServerName> -ActivationPreference 2.
3. In my case to time between fail state and returning to the DAG was a bit long. The database came up, but returned to failed state. We have to suspend and manually seed. Suspend-MailboxDatabaseCopy -Identity <mailbox database><nodename>.
4. Update-MailboxDatabaseCopy -Identity “<Mailbox Database><Mailbox Server>” -DeleteExistingFiles. Wait for the bytes are transferred across the line. When finished the suspended state is automaticaly lifted.
Repeat for the other databases.
5. You will now see a good state of the DAG and databases in Exchange Management console. Not yet. The file witness share is not yet back.
6. Add the Witness share from Exchange powershell. Set-DatabaseAvailabilityGroup -WitnessDirectory “<Server Directory” -WitnessServer “<Servername>” -id “<DAG Name>”. When the DAG members are minimal two the FWS is recreated. This is also visible in Failover Cluster.

Root Cause Analyse

Okay this didn’t went so smooth as described above. When trying to add the cluster node back to the cluster this fails with the FWS error again. In cluster node command output it is noticed that on Node A Node B is down. And on Node B Node A is down and Node B is Joining. Hey wait there is a split and the Joining indicates that Node B is trying to bring up it’s own cluster. Good that it is failing. When removing the node from the DAG Kaspersky Virus protection is loosing connection as this is configured to the DAG databases. At the same time Node A has the same errors and something new, RPC Server errors with Kaspersky. Ahhhh Node A networking services not correctly working is the culprit here. The organisations admins could not tell if networking updates and changes had a maintenance restart/reboot. So there probably something is still in memory. So inform the users, check the backup and reboot the active node. The node came up good and low and behold node B could be added to the fail over cluster. At this time I could rebuild the DAG. Health checks are okay, and documented.

– Hope this helps when you have similar actions to perform.

 

Dissecting vSphere – Data protection

An important part of a business continuity and disaster recovery plans are the ways to protect your organisation data. A way to do this is to have a back-up and recovery solution in place. This solution should be able to get your organization back in to production with the set RPO/RTO’s. The solution needs to be able to test your back-ups, preferable in a sandboxed testing environment. I have seen situations at organisations where backup software was reporting green lights on the backup operation, but when a crisis came up they couldn’t get the data out and thus failing recovery. Panicking people all over the place….

Back-up and recovery solution can be (a mix of) commercial products to protect the virtual environment like Veeam or from within guest with agents like Veritas or DPM or from features of the OS (return to previous version with snapshots). Other ways included solutions on the storage infrastructuur. But what if your budget constrained….

Well VMware has the vSphere Data Protection that is included from the Essentials Enterpise Plus kit. This is the standard edition. The vSphere Data Protection Advanced edition is available from the enterprise license.
So there are two flavours, what is standard giving and lacking from advanced?
First the what; like previous stated VDP is the backup and recovery solution from VMware. It is a appliance that is fully integrated with vCenter. It’s easy to be deployed. It performs full virtual machine and File-LevelRestore (FLR) without installing an agent in every virtual machine.It uses data deduplication for all backup jobs, reducing disk space consumption.

image

VDP standard is capped with a 2TB backup data store, where VDP advanced allows dynamic capacity growth. This allows a growth of capacity to 4TB, 6TB or 8TB backup stores. VDP advance also provides agents for specific applications. Agents for SQL Server and Exchange agents can be installed in the VM guest os. These agents provides selecting individual databases or stores for backup or restore actions, application quiescing and advanced options like truncating transaction logs.

image

At VMworld 2013 further capabilities of VDP 5.5 are introduced:

– Replication of backup data to EMC.
– Direct-to-Host Emergency Restore. (without the need for vCenter, so perfect for backing up your vCenter)
– Backup and restore of individual VDMK files.
– Specific schedules for multiple jobs.
– VDP storage management improvements. Selecting separate backup data stores.

Sizing and configuration

The appliance is configured with 4vCPU’s and 4GB RAM. For the available backup stores storage capacity 500GB, 1TB or 2TB they will consume respectivily 850GB, 1,3 TB and 3,1TB of actual storage. There is a 100 VM limit, so after that you would need another VDP appliance (maximum of 10 VDP appliances per vCenter).

After the appliance deployment the appliance need to be configured at the VDP web service. The first time it is in installation mode. Items such as IP, hostname, DNS (if you haven’t added these with the OVF deployment), time and vCenter need to be configured. After completion (and sucessful testing) the appliance needs to be rebooted. A heads up, the initial configuration reboot can take up to 30 minutes to complete so have your coffee machine nearby.

After this you can use the webclient connected to your VDP connected vCenter to create jobs. Let the created jobs run controlled for the first time; the first backup of a virtual machine takes time as all of the data for that virtual machine is being backed up. Subsequent backups of the same virtual machine take less time, here changed block tracking (CBT) and dedup is preformed.

Performance

Well this depends on the kind of storage you are going to use as the backup data store. If you going for low cost storage (let say most of the SMB would want that), your paying in performance (or lacking it most of the time).

Storage Offsite

Most organizations want their backup data stored offsite in some way. vDP does not offer replication (or with VDP5.5 to only EMC), so you want to have some offsite replication or synchronization in place (and a how are you able to restore from this data if your VDP is lost also). vSphere Replication only protects VM’s and not your backup data store. Most SMB’s don’t have a lot of storage able replication devices in place, and when they do, there using it for production and not use that as a backup datastore. Keep this in mind when researching this product for your environment.

– Enjoy data protecting!

Learned Lessons – Nexus 1000V and the manual VEM installation

At an implementation project we implemented 24 ESXi hosts and used the Nexus 1000V to have a consistent network configuration, feature set and provisioning throughout the physical and the virtual infrastructure. When I tried to add the last host to the Nexus it failed on me with the InstallerApp (Cisco provided java installer that adds the VEM and adds the ESXi host to the configured DVS and groups). Another option is to use Update Manager (the Nexus is a appliance that can be updated by update manager), but that one threw an error code at me. I will describe the symptoms a little bit later, first some quick Nexus product architecture so you will have a bit understanding how the components work, where they are and how they interact.

Nexus 1000V Switch Product Architecture

Cisco Nexus 1000V Series Switches have two major components: the Virtual Ethernet Module (VEM), which runs inside the hypervisor (or with other words on the ESXi host), and the external Virtual Supervisor Module (VSM), which manages the VEMs (see figure below). The VSM can be either a virtual appliance or an appliance in the hardware device (for example in the physical Nexus switch).
The Nexus 1000V replaces the VMware virtual switches and adds a Nexus Distributed Switch (Enterprise Plus). Uplink and portgroup definitions are bound to the Cisco ethernet profile configurations. The VEM and VSM use a control and data link to exchange configuration items.

Configuration is performed through the VSM and is automatically propagated to the VEMs. Virtualization admins can pick up these configuration to select the portgroups at VM provisioning.

image

Symptoms to a failing installation 

The problem occurred as follows:

– Tried to install the VEM with the InstallerApp. The installer app finds the host and when the deployment is done, it stops when adding the hosts to the existing Nexus DVS. This happens somewhere from moving existing vSwitches to the DVS. Error presented is: got (vim.fault.PlatformConfigFault) exception.

– Checked the status of the host in update manager and this showed a green compliant Cisco Nexus Appliance. This probably delayed me a bit, because it really wasn’t.

– Tried to manually add a host to the Nexus DVS in the vSphere Webclient. This gave an error in the task. Further investigation let me to the line in the vmkernel.log: invalid Net_Create: class cisco_nexus_1000v not supported. Say what?

– With the Cisco support site and a network engineer I tried some VEM cli on the host. But hey wait vem isn’t there. An esxcli software vib list | grep cisco doesn’t show anything either (duh). While on an VEM installed ESXi host this shows the installed VEM software version. So Update Manager is screwing with me.

Manual Installation

That leaves me with trying to manually install the VEM. With the Cisco Nexus 1000 installation guides the working sollution is as follows:

  • The preferred Update Managers does not work. It fails with an error 99.
  • copy the vib file containing the VEM software from the VSM homepage using the following url: http://Your_VSM_IP_Address/. Check an ESXi host that is installed for the running version (esxcli software vib list | grep cisco). Download this file (save as).
  • Upload this file to a location where the host can access this. On the host or a datastore accessible from the host. I did the latter as the host did not have direct storage. Used WinSCP to transfer the files to a datastore directory ManualCiscoNexus.
  • On the hosts I added this vib by issuing the following command:

esxcli software vib install -v /vmfs/volumes/<datastore>/ManualCiscoNexus/Cisco_bootbank_cisco-vem-v160-esx_4.2.1.2.2.1.0-3.1.1.vib

  • vem status -v now gives output. Look for VEM Agent is running in the output of the vem status command.
  • vemcmd show port vlans only shows the standard switches. Communication with the VSM is not yet there.
  • I added the host manually to the Nexus DVS and success. When migrating the standard vmkernel management port to the DVS groups the hosts is also visible on the VSM. Communication is flowing and the host is part of the Nexus 1000v.

I hope this post will help when you experience the same problem, and also learns you a little about the Nexus 1000V Product Architecture.