CIO Turns to Data Analytics to Hunt Down VDI Ghost
- 30 May, 2014 05:53
A few years ago, Seattle Children's Hospital embraced virtual desktop infrastructure (VDI) in a big way. Not long after, an elusive "ghost in the machine" began causing major headaches for the IT organization, says CIO Wes Wright.
"We started experiencing poor performance between 8 and 10 in the morning," Wright says, noting that it was never at exactly the same time and didn't necessarily happen every day. "I set up teams several different times to try to figure out what was going on, but we couldn't find it."
The 107-year-old institution serves as the pediatric referral center for Washington, Alaska, Montana and Idaho. Its 40-member strong IT organization supports more than 100 applications for more than 8,500 users across 25 different physical locations, including the nine research centers that make up the Seattle Children's Hospital Research Institute.
Many of the institution's workers, particularly nurses and clinicians, are mobile; they move from station to station throughout the day. Before adopting VDI, that meant logging onto a device at every new location.
"Before VDI, it took about two-and-a-half minutes to log into a machine that was up and spinning," Wright says.
VDI Helps Hospital Cut Login Times for Nurses, Clinicians
For mostly stationary workers, two-and-a-half minutes might not be too terrible. But for the hospital's mobile workers, it was turning into a considerable expense. A single nurse might log in more than 40 different times during a 12-hour shift, Wright says. That's more than 1.5 hours per shift spent logging in. Multiply by several thousand nurses and it's not hard to understand why the hospital needed to make a change.
The answer Wright settled on was Citrix XenDesktop. The IT organization started with about 250 users at one of its remote locations, but rapidly rolled it out to the main campus due to user demand.
"The loudest demand was the emergency department," Wright says. "I was a little hesitant for our main ramp into the main campus to be the emergency department, where workflow and timing is critical, but we did it."
[Related: Top Tools for Solving App Performance Woes]
The results were impressive: whenever mobile workers arrived at a new station, they would log into their XenDesktop instance from a device at the station. Login times decreased from roughly 2.5 minutes to 12 seconds. In short order, the IT organization was delivering nearly 3,000 Windows 7 desktops through the Citrix environment.
Then the ghost in the machine started appearing. On seemingly random mornings, logins would suddenly go from a few seconds to as long as 15 minutes.
Pinpointing End-User Performance Issues Wasn't Working
"We've got an environment with just about every technology you could think of," says Tim Holt, director of Enterprise Applications. "And consequently, it's very, very difficult to troubleshoot performance from an end-user perspective."
Wright and his team were hitting their heads against the wall trying to discover the reason for the performance hit.
"We always found ourselves trying to prove that a problem wasn't coming from a particular technology silo," Holt says. "We'd start with the network team, who would burn a whole bunch of time proving the network was operating as expected, and then you'd move on to the next level of the stack. 'Well, it's not here, must be somewhere else.'"
Agents and network sniffer technologies were not an option Wright says (though he notes they wouldn't have worked either).
"I really wouldn't want to put an agent on a virtual desktop," he says. "Any application slows down performance (I don't even run antivirus -- these are nonpersistent images). It slows things down and it gives me jagged performance. When you start putting applications in your virtual desktop, then you don't know the performance characteristics of every virtual desktop. The agent on desktop A may be doing something that the agent on desktop B is doing differently. Then I lose my standardization."
Wire Data Analytics Provides Cross-Tier Visibility
Then one of Wright's senior engineers had a suggestion: bring in ExtraHop Networks, a Seattle firm that specializes in real-time wire data analytics. The ExtraHop Operational Intelligence platform analyzes all L2 to L7 communications, including full bidirectional transactional payloads.
ExtraHop is able to perform wire data analytics at line rate -- up to a sustained 20Gbps. When it receives the wire data traffic, it recreates the TCP state machines for every endpoint and reconstructs sessions, flows and transactions. If the traffic is encrypted, it performs bulk decryption at line rate so that it can reassemble the full streams.
From there it analyzes the payload and content from L2 to L7, extracting application-level metrics and infrastructure, network and transaction metrics for all tiers. It discovers and classifies devices based on ongoing heuristic analysis of MAC addresses, IP addresses, naming protocols, transaction types and other elements. The metrics are then written to a purpose-built streaming datastore that powers trend-based alerts.
Wright made the call and asked ExtraHop to do a proof of concept for the hospital: He wanted ExtraHop to find the ghost in the machine that his team had spent months hunting. Almost immediately, ExtraHop proved its worth, Wright says. Every morning that a particular doctor logged in -- sometimes first thing in the morning, sometimes after performing tasks that didn't require a computer -- it would cause severe contention at the storage tier.
It seemed the doctor had moved about 2GB of personal photos from his personal profile to his Citrix profile.
"The hit was the system spinning up the pictures when he logged on," Wright says. "It backed things up for a good 10 to 20 minutes."
Suddenly, the IT organization had cross-tier visibility to put troubleshooting issues in context. They restricted use of the My Pictures folder and made other optimizations that earned them the goodwill of their users.
"I've never seen anything comparable to ExtraHop," says Senior Systems Infrastructure Team Engineer Bruce Fulton. "It's our way to see how a transaction flows from start to finish through these various applications. We simply couldn't get that end-to-end perspective with any of our previous technologies."
While Wright admits that the ExtraHop platform is pricey -- it was no mean feat to wedge it into his budget cycle -- he says he wouldn't think of working without it.
"I do a lot of speaking on our virtual desktop story," he says. "Every time I talk about it, I tell people that if you're going to deploy virtual desktop, you've got to deploy it with ExtraHop or something like it, but I haven't found anything else like it out there. It saves you the pain of these ghosts in the machine."
Wire Data Analytics Helps Developers, Too
"Think outside the box on this one," he adds. "It's not just a monitoring tool for your technology folks. Get your application folks -- your developers and SMEs -- get them involved in the training. They'll appreciate seeing the performance of their applications from end user to database and they'll help with the monitoring. They want those applications to run better, faster, stronger than anybody else does."
Holt adds that taking that message to heart has helped Seattle Children's Hospital's IT staff really understand how its complex applications work.
"Before, I would ask people, can you map out what's really happening here -- for example, with logging in to a Cerner application -- and almost no one could map that end-to-end," Holt says. "Now with ExtraHop, we have at least 15 staff who can map that out in a heartbeat, and that number is growing."
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn.