Storage vexations of server virtualization
- 12 October, 2011 03:40
Server virtualization offers a host of efficiencies, but storage administrators say it may open a can of worms on the storage side. Resulting headaches can include huge I/O bottlenecks for primary and backup storage, as well as complicated disaster and recovery efforts, among other things.
With multicore CPUs being utilized to create multiple virtual machines on servers -- and since the typical large-enterprise server farm is 70% to 80% virtualized -- there's a lot more application I/O moving back and forth between application servers and primary storage, and between primary storage and backup storage.
What's more, between 2000 and 2010, the number of servers worldwide multiplied by a factor of six, while the amount of storage increased by a factor of 69, thanks to server virtualization, according to researchers at IBM.
In July, Computerworld polled dozens of storage administrators to find out how server virtualization has complicated their work lives. Our findings yielded this list of five top headaches. But fear not: IT analysts and virtualization veterans offer their advice on how to deal with each challenge.
1. Storage Performance Slowdowns and I/O Bottlenecks
IT administrators are painfully aware that storage performance is growing at a much slower rate than computing power. So when it comes to virtualization, it's no surprise that I/O bottlenecks and slow storage performance are the No. 1 problem for one-third of the administrators who responded to the Computerworld poll.
"Virtualization lets you do a whole lot of workloads on one physical piece of hardware, but there's lots of different I/O [operations] mixed into the I/O stream, so it makes disks work harder and caching less effective," says Jeff Boles, senior analyst at Taneja Group in Phoenix. "Virtualization lets us easily do more than our compute power is capable of."
How to deal: The solution to the I/O bottleneck depends on where the problem lies: in the network or in the storage domain. Most often, it's in the storage environment, because improvements in storage capability have lagged behind that of all other infrastructure. "You have a very slow, creeping, linear progression of storage capability. Rotating disks can only go so fast. Part of the problem is visibility. Administrators can't see what's going on inside the storage environment, so they don't know how to fix it. Fortunately, we're getting some tools that can help you figure out that problem and address it [more easily]," Boles says.
Fibre Channel customers, for instance, might use Virtual Instruments' performance monitoring tool for storage area networks (SAN) to optimize performance and availability. Other storage vendors delivering visibility tools include NetApp, which recently acquired Akorri and its predictive tool for the virtual infrastructure, and EqualLogic, which has a graphical user interface that lets customers monitor storage system performance.
Boston-based ad agency Arnold Worldwide virtualized most of its servers five years ago. Chris Elam, senior systems engineer, remembers when he first started doing backups and noticed that throughput to the backups was dropping and that backup times were growing. But visibility tools on the firm's Dell Compellent SAN alerted Elam to the problem. He added more drives to increase I/O operations per second, and Compellent now spreads the data among the drives.
As an extra precaution, Arnold Worldwide's IT staff set most replications to take place during off-hours, except for those involving its production file servers, which it replicates during the day because data changes constantly. "That's an I/O hit we are willing to take," Elam says, adding that customer service is most important. "It's one thing if backups take longer; it's another thing if users start to complain [about slow systems]."
Performance is another important consideration in the I/O equation. "It's really important that administrators start to think about the I/O density and performance they need given the amount of infrastructure they have," Boles says. "Workload density has massively increased in the data center. Now you have 30 workloads in a single rack [running virtual servers]."
I/O density can be increased through the use of solid-state drives and similar technologies, more effective caching or auto-tiering. Also, I/O will only increase as the enterprise adds more servers within a single storage system. Scale-out technologies can help scale performance as well as capacity .
"Small and medium-size business customers can look at [tools from] Scale Computing, for example. The midrange customer could look at EqualLogic, and the enterprise could look at NetApp and 3Par," Boles says.
2. More Complicated Data Backup and Disaster Recovery
More than a quarter (27%) of the respondents in the Computerworld poll said that server virtualization has complicated backup and disaster recovery.
One of the biggest mistakes here is trying to protect a virtual infrastructure with traditional backup methods, according to Boles. With traditional backup, "the degradation and backup performance is more than a linear degradation as you scale the number of virtual machines on a piece of hardware. You're effectively creating a blender for backup contention as you're trying to protect these virtual servers overnight. You try to do 10 backups simultaneously on this one physical server, and you've got a lot of combat going on inside that server for memory, CPU, network and storage," he says.
Complicating matters are workload mobility tools, such as VMware's Storage vMotion, that let users relocate virtual machine disk files between and across shared storage locations. "Now you have to keep a backup going in relation to these virtual servers that are going to be moving around, and possibly run into other bottlenecks. That can be a serious headache," says Boles.
The virtual desktop I/O dilemma
The virtual desktop I/O workload is tremendously punishing on a hard disk array. For starters, despite the traditional I/O workload of an individual workstation being sequential in nature, many IT departments are running thousands of virtual desktops on a single storage platform, which creates the I/O "blender effect."
"They're all doing sequential I/O in different regions of the disk, which turns those easy-to-service sequential I/O patterns into a nasty, random I/O pattern as far as the array is concerned," explains James Candelaria, CTO at WhipTail Technologies, a maker of solid-state storage arrays.
That's a big problem for traditional storage arrays because many don't have enough cache to keep up with the influx of data, and cache misses occur, slowing down the system.
How to deal: First, perform an I/O profile analysis to make sure you know what the I/O demand is going to be. "A general rule of thumb is that to support a typical user on [a virtual desktop infrastructure] in a steady state environment, you need ... 20 to 40 I/O per second per user," Candelaria says. "If you don't account for that I/O demand, your user experience is going to suffer drastically."
Also, make sure you have a storage fabric and a transport fabric that's going to scale. "I see a lot of customers attempt to do virtual desktop projects without a high-speed storage fabric, and they're continually maxing out 1-gig storage links running on SCSI," he says. "You need to look at a higher-speed transport like 10 Gigabit iSCSI or Fibre Channel."
If you're going to deliver virtual desktops to remote users, make sure you have enough bandwidth to ensure a favorable user experience.
Finally, make sure you have a substantial amount of write I/O, Candelaria says. If designed correctly, the desktop workload is predominately write I/O as opposed to read I/O. Traditional vendors such as Citrix and VMware have come up with ways to mitigate read traffic from hitting the array, and they don't have to reach back to the array for data, reducing the amount of redundant traffic that it sees.
How to deal: A handful of vendors are building backup and recovery tools for the virtual environment that runs within their virtual infrastructure. That way, the vendors can capture and manage data right on top of the physical server and optimize it before it ever leaves the virtual server.
Acronis, for example, recently announced a product that can back up virtual machines in a matter of minutes and recover the data in about the same amount of time, while keeping data organized as virtual servers move around. Many vendors have harnessed some virtualized infrastructure capabilities, such as storage snapshot tools and replication, to make backup simpler and faster than was possible in the past.
When the Bank of Fayetteville in Arkansas first started virtualizing its servers, Les Barnes, senior vice president and IT manager, treated backups the same way he would with traditional servers: He used a tape library. But after a few months, he knew there had to be a better way. What's more, backups were traditionally performed overnight, but as more customers demanded 24/7 access to the online banking system, Barnes needed another solution.
He completely eliminated traditional backups and replaced them with SAN replication and SAN snapshots as a way to make multiple copies of SANs off-site.
"The beauty of using SAN replication is that it completely offloads any I/O from the server," Barnes says. "It's now SAN-cluster-to-SAN-cluster communications. It's all back-channel stuff. There is no impact on the end user or the virtual machine. And if I have to recover, I can do it in a few minutes rather than a few hours or a couple of days."
Elam looks at Arnold Worldwide's SAN as a way to bring backups forward. "It's almost impossible now to just write everything to tape [over the weekend]," he says, adding that the ad agency holds 60 terabytes of data on its SAN. "But because we're replicating a lot of stuff off-site, that kind of serves as a backup. We also have snapshots that we keep live. We also do deduplication to get backups in a timely window."
But Elam warns that those snapshots can be quite large: "The biggest thing we didn't realize when we rolled this out was the amount of space that snapshots or replays take up. We didn't even think about how much it takes. You need to plan for that in the amount of data storage you have."
3. Difficulty Managing Shared Storage
Some 23% of the administrators we polled reported that server virtualization creates new headaches in managing shared storage.
Enterprises typically have a lot of different workloads being stored on storage systems, and for administrators, there aren't always clear connections among the storage volumes, the workloads that each volume supports, the demands against each volume, and who is consuming capacity.
"Essentially, the virtual infrastructure has created another layer of abstraction on top of the storage infrastructure without really freeing you from the complexity of the physical layer," Boles explains. "Now you have this virtual storage layer that you're managing, made up of [VMware's] VMFS, all the different virtual server files and data, and you're provisioning those resources inside the virtual infrastructure -- maybe even executing operations like Snapshot." On top of all that, "you still have to take care of the physical infrastructure and look at I/O demand. Having those two layers makes it harder to connect the dots between them," he says.
How to deal: Consider thin provisioning, a storage virtualization capability that helps curb low storage utilization by allocating data to free space. Physical storage is allocated on demand from a shared pool, but only when needed. By using thin provisioning along with server virtualization, users can optimize both server and storage utilization rates. Virtualization appliances and arrays from vendors such as 3Par, Compellent, DataCore Software and NetAppl include thin provisioning functionality.
4. The Need to Adapt the Storage Infrastructure to Serve Both Physical and Virtual Environments
In a finding that's similar to the backup and recovery dilemma, 20% of the polled administrators said that they find it hard to adapt their storage infrastructure to handle a mix of traditional and virtual processes.
How to deal: When jumping into virtualization solutions that will mingle with physical environments, "make sure you're doing it with the best storage vendor you can find for ease of use, simplicity and virtual infrastructure integration," Boles says.
Some of the big vendors' offerings are integrated with the virtual infrastructure, reducing the complexity of these systems "so you don't have to do a lot crazy stuff, like disk group configuration," he adds. "You want one-click setup of storage and [access to] fine-grain granularity provision storage so you can carve up resources, understanding who's using what, and manage it over time."
Some large-scale IT departments are even making a complete switch to technologies like an NFS-NAS setup, which is ready to go into production underneath a virtual infrastructure. "You can store a whole bunch of virtual machines on one storage mount point and not have a lot of complexity around that," says Boles. "There aren't nearly as many headaches as trying to coordinate some of those physical storage resources with a very virtual server infrastructure."
5. Trouble Choosing the Right Kind of Networked Storage for Virtualized Servers
Some 18% of the storage professionals surveyed said that they can't decide on the right kind of networked storage for virtualized servers. "The right kind of networked storage makes a difference -- because you can scale, and get better performance and more simplicity in your processes [if you choose correctly]," Boles says. But the right solution depends largely on the organization's objectives.
At Purdue University's Krannert School of Management, for instance, the IT department's top priority wasn't 24/7 availability for its virtualized environment, but rather fast recovery time when -- not if -- the system went down, says IT manager Jeff Ellow.
Virtualizing storage-intensive servers without big performance losses requires a level of storage performance that SANs weren't able to achieve. The obvious choice for Purdue seemed to be 10 Gigabit iSCSI, but cost was a deterrent. Purdue ultimately went with LSI 6Gbps SAS switching technologies, which offered the benefits of a failover SAN and the performance of an end-to-end native SAS 6Gbps data path --and which the school could afford.
"Even if our SAN goes down, we have enough local storage where we could limp along in another mode. Restoring more quickly is more important than staying up," Ellow says.
How to deal: Before choosing any vendor, be sure you understand the management capabilities, Barnes says. Server and storage virtualization can be simple: "You don't have to be a rocket scientist or have a degree in SAN management to take care of these things," he says.
At the end of the day, Elam says the benefits of virtualization are worth the trouble of grappling with these five challenges. "The pros far outweigh the cons just in its complete ease of use, stability, high availability, being able to replicate and do maintenance during the day, move stuff around as you need to and take hardware offline," he says "There are lots of things you don't have to come in on a weekend to do anymore."