At Ingram Micro, executive president and CIO Mario Leone doesn't think about how much he will spend on disaster recovery.
That's because the global electronics distributor weaves its disaster recovery requirements into its broader business objectives and its service-level agreements (SLA) with its 15,000 users. Since 2010, the IT shop has been cutting costs and meeting its service and disaster recovery commitments by using a "hybrid" cloud made up of its own virtualized hardware at colocation facilities in Chicago, Frankfurt and Singapore.
And rather than paying for dedicated recovery hardware that sits around waiting for a disaster, it uses virtualization to shift workloads from a failed server to one running a less critical workload. "We're always using that architecture for something," says Leone.
More and more IT shops are using technologies such as virtualization and replication to make disaster recovery just another service, sometimes using the same servers, network and storage that run order entry, email, application development or other services. This merges what historically were disaster recovery and business continuity efforts, protecting the business against not only rare disasters, but also human error or equipment failures.
Some store only data (and perhaps templates for virtual machines) off-site, creating (and paying for) the physical hardware to run them only when needed. "We can recover at our remote site much, much faster by just being able to fire up the system images of the VMs," says Justin Bell, systems administrator at Strand Associates, an engineering firm in Madison, Wis. Even if the server infrastructure at that site is less robust than the one at the primary site, "we could run in limited capacity, on much less hardware, until we got things back up at our primary site."
Other organizations have done away with dedicated disaster recovery systems. They shift production work to test or development servers during outages and defer work that's less critical.
More Demands, More Risk
These changes are driven by ongoing pressure to cut costs while maintaining continual uptime, and by the flexibility provided by server, storage and network virtualization. Meanwhile, a recent spate of natural disasters, along with stricter regulatory requirements, has made disaster recovery the No. 1 subject of client inquiries at research firm Gartner, says analyst John Morency.
However, Forrester Research reports that enterprise disaster recovery/business continuity budgets are stuck at 6% of total IT capital and operating budgets and that concerns such as "consolidation, business intelligence and virtualization" are given higher priority when it comes to spending.
Meanwhile, the list of critical services that need protection keeps growing, with communication tools such as voice over IP and email gaining "critical" status alongside traditional business applications like order entry and ERP. Finally, it's necessary to ensure uptime not only after major disasters, but also in the event of localized failures, and many companies need the ability to quickly recover just one file rather than an entire system.
Recovery in the Cloud
3 things to think about before choosing a cloud-based data recovery service
Start with applications that already perform well in virtual or private cloud environments but don't support your most critical systems. This gives you time to try different approaches and vendors.
Be realistic about SLAs, and know that most cloud providers won't take responsibility for your losses if you can't recover after a failure.
Understand the interdependencies among the applications and services you host in the cloud and those you host in a traditional data center so you properly test recovery.
Source: Forrester Research
By separating virtual servers, networks and storage capacity from physical hardware, virtualization gives users many more choices in disaster recovery strategies. "When you recover a virtual machine , it doesn't matter where we put it," says Kurtis Berger, IT manager at Provider Advantage NW, a healthcare software and services company in BeaverAton, Ore. "At each of our data centers, all of our VM servers are pretty much the same. [Almost] any old box will handle the prescribed load, and it'll be good enough to recover some VMs onto."
Disaster recovery is also being transformed by fast, easy-to-use replication software that copies data between primary and recovery sites in near real time. One such offering, Double-Take software from Vision Solutions, allows users to sync data among servers and establish failover protection in about 20 minutes, says Joseph Pedano, senior vice president for data engineering at Evolve IP, a provider of cloud-based IT services in Wayne, Pa.
Martin Mazor, Ingram Micro's director of global information assurance, wouldn't discuss which products he uses, but he says replication allows his company to recover systems much more quickly than the full day it would take to ship tape offsite. Ingram Micro has also invested in tools that provide a single performance dashboard for all of its worldwide operations, and it has offered employees training in areas such as operational management and the handling of incidents and problems.
Evolve IP uses VMware virtualization technology, and Pedano says backup and recovery tools now feature improved VMware integration, making it easier to replicate and restore not just servers, but also their associated databases and security systems.
To successfully restore a business service such as email or order entry, IT must recover the application server as well as associated components (such as an Active Directory server that contains user information or a database that holds inventory records), and it must do so in the proper order. Taking these dependencies into account is a major area of focus for vendors.
Symantec, for example, recently announced that enhancements to its backup products combine more granular backup and recovery of VMs with the ability to account for dependencies among VMs. The enhancements, found in products for businesses of all sizes, also make it easier to use multiple public or private cloud backup services , and to convert a physical server at a production site to a virtual server at a recovery site, says Dan Lamorena, director of product marketing for Symantec's storage and availability management group.
Continuity Software's RecoverGuard software is designed to automatically check all critical infrastructure components, such as the file system and virtualization components, and identify vulnerabilities that could cause downtime and data loss. It looks for vulnerabilities using a database of "signatures" similar to the ones antivirus tools use to identify malware. The database is updated by the vendor's researchers and its users, says CEO Gil Hecht.
Snapshots in Time
Faster recovery is vendors' goal
Improving the performance of replication systems and related technologies, such as snapshot tools, and tailoring them to shorten virtualized disaster recovery times is a key focus for vendors. Here are some examples.
Actifio's Protection and Availability Storage (PAS) appliance allows users to execute a one-time transfer of data to a remote site, and then send only changes to the data, with the changes themselves deduplicated, says Actifio CEO Ash Ashutosh. This not only reduces bandwidth requirements; it can eliminate the need for backup software, he says.
The distributed object file system within PAS contains information about each block of stored data that makes it easier to find and reuse the data for purposes other than disaster recovery, such as test and development, regulatory compliance or legal searches, he says.
FalconStor's CDP aims to speed recovery by ensuring the most recent snapshot is always the most complete. This eliminates the need to factor in the incremental changes since the initial backup before recovering the data. And it can save hours when recovering tens of terabytes of data, says Fadi Albatal, vice president of marketing and product management.
Asigra's Cloud Backup eliminates the need for dedicated physical recovery hardware by automatically backing up VMs to virtualized environments and scaling up the VMs in the recovery environment so they can meet production needs. By automatically creating new servers and provisioning storage, it can reduce restore times from hours to minutes, says Eran Farajun, an executive vice president at Asigra.
Egenera's PAN (Processing Area Network) Manager software virtualizes connections between physical or virtual hosts to a customer's network or storage resources, thereby speeding restoration by making it easier to create not just VMs, but also the network and storage connections needed to make them work, says John Humphreys, vice president of marketing.
PAN can also automatically detect failures in production servers and move them to the recovery environment, with the new server looking "just like it did before, with the same MAC address and same resources," says Scott Geng, senior vice president of engineering.
Dell's Compellent Live Volume enables a physical server or VMs to share a virtual storage volume among Dell's Compellent Storage Center SANs in a semi-synchronous configuration that enables always-available failover volumes or LUNs, and makes it possible to move data closer to users for performance reasons, says Brett Roscoe, general manager and executive director of data management at Dell.
Jason Buffington, an analyst at Enterprise Strategy Group, says applications like Microsoft Exchange, Microsoft SQL Server and some network-attached storage platforms offer capabilities such as replication and failover at little or no extra cost.
Robert L. Scheier
Other products with those capabilities include VMware vCenter Site Recovery Manager, which also supports custom scripting and automation to ensure that VMs are brought up and reconnected in the proper order across multiple sites, says Gaetan Castelein, VMware's director of product marketing.
Making It Pay
Often, the only way to get funding for disaster recovery systems is to demonstrate that they deliver more than just "insurance," or that they can even pay for themselves. For example, Strand uses FalconStor Software's Continuous Data Protector appliance to replicate about 50TB of data and 25 virtual servers between its remote offices and headquarters. This is not only easier and less expensive than using a colocation facility, but the higher bandwidth required for the replication also makes it easier for employees to videoconference and share complex engineering documents.
That bandwidth also allows Strand to "take snapshots every hour on the hour, so we can facilitate a file restore in about three to five minutes," says Bell. Given the expense the company would incur if an engineer had to repeat several hours of work, the ability to take snapshots helps justify the cost of disaster recovery even without a disaster, he says.
Thorntons Inc., a Louisville, Ky.-based convenience store operator, recoups much, if not all, of the cost of disaster recovery by using DataCore Software's SANsymphony storage virtualization software on XIOtech SANs it purchased to support its newest servers, while moving its older Dell Compellent SANs and older servers to nearby space it already leased as a disaster recovery site. Senior network engineer Kevin Schmidt says that gives the company disaster recovery for its full application environment, not just its data, and it has improved performance and cut the time required to produce a profit and loss statement from 10 or 12 hours to less than five hours.
Another benefit is that virtualization allows the company to use the Dell Compellent storage, for which it paid $350,000 in 2007, as a recovery platform for its newer XIOtech storage.
Cloud Disaster Recovery? Not So Fast
Some providers say cloud-based disaster recovery will bring the benefit of true disaster recovery, rather than just backup, to small and midsize businesses that until now couldn't afford it.
Pat O'Day, co-founder and CTO of Bluelock, a provider of public cloud virtual data centers, says customers are increasingly satisfied with cloud security. Many security experts say even public cloud environments in which multiple customers share hardware can be made secure with the proper processes.
But a fall 2011 Forrester Research survey showed that only 11% of large enterprises and 9% of small to midsize businesses had adopted recovery as a service, with 35% of large enterprises and 41% of SMBs saying they were interested in it but had no plans.
New approaches to data recovery
Some IT shops are expanding disaster recovery to include not only servers, but also user devices. They're using portions of backup sites to store images of virtual desktops, laptops or even tablets so users can have access to their data and applications while they await replacement devices, says Eran Farajun, executive vice president at Asigra, which is also giving customers the ability to back up and restore data from consumer devices such as smartphones and tablets.
Jason Buffington, an analyst at Enterprise Strategy Group, says many companies now require branch offices to adopt the same protection standards as headquarters. He says products designed to help with such efforts include Riverbed Technology's Steelhead EX+ Granite appliances, which optimize the performance of wide area networks to speed backup and replication from branch offices to central data centers.
And many organizations are reducing or ending their use of tape for disaster recovery, although some still use it for long-term archiving.
"For us, tape is dead," says Kurtis Berger, IT manager at Provider Advantage NW. "It was the second tape drive that failed that finally pushed us toward a hard-drive-only solution. Hard drives are faster, and so cheap. We just couldn't find any reason to entertain the idea of tape anymore."
"Tape has been a love-hate relationship -- mostly hate," says Jason Axne, systems administrator at conveyer belt manufacturer Wire Belt Company of America. He cites tape's unreliability, the lengthy recovery periods for even single files or email inboxes, and the time required to manage backups. Using Actifio PAS and disk-based storage, he says, "I don't spend any time during the day managing our backups ... because it just works."
Robert L. Scheier
Berger says cloud providers only promise "not to go into your servers" when he questions them about security. "To me, that's not enough," he says, adding that the disaster recovery prices he's hearing -- $500 per month per server -- are "more than I can justify." He instead uses Acronis Backup & Recovery to back up approximately 60 VMs at two data centers. The facilities are only a half-hour apart, so this setup would not meet some definitions of a disaster recovery system, but he says it covers most of his needs because the applications aren't mission-critical.
Hecht downplays resistance to cloud-based disaster recovery, saying the smallest companies typically host their entire infrastructures in the cloud, and thus get some level of disaster recovery simply by keeping applications and data off-site.
Smaller companies that do choose the cloud typically don't do it for the savings, he says, but because "it's just so much simpler to have a system you set up and forget."
While midsize organizations have some incentive to consider disaster recovery in the cloud, few of them use the cloud for mission-critical systems that require true disaster recovery -- and what they get in the cloud is closer to dedicated hosting (with the customer's data and systems running on separate hardware) rather than a multitenant, elastic, pay-as-you-go public cloud, Hecht says.
Most large organizations are big enough to provide disaster recovery themselves, he says, and even if they weren't, "there's no good solution" for protecting sensitive applications in the cloud.
Cloud disaster recovery is also not suited for applications that rely on older platforms that most cloud providers don't offer, or large databases that don't perform well in the cloud, says Morency. Users also need to watch for the hidden costs of software licenses some cloud vendors charge for software sitting unused on remote VMs or disaster recovery systems, he says.
Both Gartner and Forrester also warn that most cloud disaster recovery providers will refund only a portion of a customer's fee if disaster recovery falls short -- nowhere near enough to make up for the potential revenue loss that such an event could cause.
The cost of the bandwidth required to quickly recover an organization's VMs and data from the cloud is often an unwelcome surprise, says Alan Arnold, executive vice president and CTO at Vision Solution Management, which provides high-availability and disaster recovery software and services. Some customers and providers opt to physically ship portable hard drives via overnight courier, says Arnold, recalling that one user joked that "FedEx is still the largest-bandwidth network out there."
With IT so central to the business and budgets so tight, it's essential to get input from top business managers to assess which applications deserve the highest levels of protection. Ingram Micro, for example, conducted a business impact analysis that put various applications in different tiers, with voice, email, ERP and ordering among the top priorities. The company thought of it "just like an insurance policy," says Mazor. "It helped us think of how much insurance we're going to buy."
Scheier is a veteran technology writer. You can contact him at firstname.lastname@example.org .
Read more about storage in Computerworld's Storage Topic Center.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.