CIO

Figuring out the data center fabric maze

  • Jim Duffy (Network World)
  • 23 March, 2012 03:32

Despite vendor pledges to support existing or developing industry standards, users are expected to deploy single-vendor data center and cloud switching fabrics from their primary suppliers.

Standards such as Transparent Interconnect of Lots of Links (TRILL), Shortest Path Bridging and Multi-Chassis Link Aggregation will be embraced by those vendors looking to dent Cisco's dominance in data center switching. The standards support will soothe customers looking to avoid vendor lock-in, but it's unlikely IT shops will mix and match multivendor switches within and between data centers.

"In the data center, customers do look for that path with the least bumps in the road just because of the critical need of the data center," says Zeus Kerravala, principal at ZK Research.

IT'S WAR! Fabric wars: Cisco vs. Brocade vs. Juniper

Cisco, meanwhile, will continue to expand and enhance its FabricPath approach to data center fabric switching -- which the company says supports TRILL but is not founded on it. Juniper will continue advocating a tagging mechanism in the Broadcom silicon inside its QFabric line to support multiple active paths and one-hop reachability in data centers and cloud environments.

Brocade says the current implementation of its VCS fabric technology does not include all of the features that are found in the TRILL standard. The company says its VDX data center switches can operate in VCS mode or in "classic" mode, which is more adherent to IEEE 802.x standards. The data plane in a VCS implementation uses TRILL; but the control plane does not, it uses Fabric Shortest Path First (FSPF), an ANSI standard used by all Fibre Channel SAN fabrics as the link-state routing protocol.

Alternative link-state routing protocols can be supported in VCS when they are standardized and available, Brocade says. But the link state protocol in TRILL, Intermediate System-to-Intermediate System (IS-IS), has been documented since July 2011 and even tested for interoperability at the University of New Hampshire almost a year before that, says Donald Eastlake, chairman of the TRILL working group in the IETF.

"RFC 6326 is TRILL support of IS-IS," Eastlake says. "That has been out for a bit now. I think this year there will be a number of switches from a number of vendors that will support the TRILL control plane."

Eastlake wouldn't speculate on why vendors have been delaying full compliance of TRILL data and control planes in their fabric switches.

"There's nothing inherently evil about using the TRILL data plane instead of the control plane," he says. "It won't interoperate with a standard TRILL control plane but people manufacture things that don't interoperate all the time. They make the business decision that that's what they want to do. Other people make things that try very hard to be very interoperable because that's the business decision that they make."

HP says it supports TRILL and its own Intelligent Resilient Framework (IRF) multichassis bonding technology for flattening the data center network. HP just announced its new 5900 line of top-of-rack switches that support both.

Arista Networks, Extreme Networks and Alcatel-Lucent all currently advocate MC-LAG as the fabric architecture for their switches. MC-LAG is an IEEE standard that is intended to replace the Ethernet Spanning Tree Protocol to improve resiliency and uptime, and reduce latency, by creating active/active network paths for load balancing and redundancy.

Since the fabrics are based on Ethernet, they should be as easy to deploy as Ethernet, according to Extreme.

"Ethernet is open," says Doug Wills, senior director of marketing at Extreme. "Why not Ethernet fabrics?"

Avaya supports the IEEE's Shortest Path Bridging (SPB) specification, an alternative to TRILL, for its VENA fabric architectures. Shortest Path Bridging is founded on the IEEE 802.1ah Provider Backbone Bridging MAC-in-MAC standard, which in the telecom world is proposed as a Layer 2 alternative to MPLS for Metro Ethernet deployments.

Some vendors, though, hedge their bets by supporting both SPB and TRILL. Huawei is in this camp.

Juniper chose to implement a tagging mechanism in the Broadcom silicon in its QFabric line because it functions like a fabric inside the switch itself; if a switch itself functions as a fabric, a fabric configured with like switches can form a flatter topology, Juniper asserts.

"You don't actually run a protocol," says Andy Ingram, worldwide managing director of data center sales for Juniper. "We use the hardware to transport the bits between the ingress and the egress point in the switch fabric."

Ingram says this has several advantages: It can be managed as a single logical device -- there is no need to manage each node of the spine and leaf topology; it lowers latency because there are fewer ASICs in the data path; and it should be less expensive because of the fewer components.

"The cards inside of a fabric switch interconnect, we don't ask them to be intelligent," Ingram says. "We're at 3kW (per port) versus something that might be 13kW. Because we're not asking it to be a switch, we can make it more efficient, we can make it faster, we can make it less expensive."

A packet is tagged and hashed on which of the routes it will take through the interconnect, Ingram says. This capability is built into the Broadcom Titan and Trident ASICs used in the QFabric nodes and interconnects.

Juniper says it has 100 customers for QFabric, more than half of which are in production mode -- for the stand-alone QFX3500 node/top-of-rack switch and/or the QFabric Interconnect, according to Ingram. He also says at least one-third are brand-new Juniper customers who've literally bought into the QFabric architecture.

There's been speculation that QFabric is having a hard time living up to its 6,144 10G port/one-hop billing in customer trials, but Ingram says Juniper's been working closely with customers through a prolonged testing period for what's essentially a new data center networking architecture.

"We have one that will go production in the next 60 days that was a beta account," he says. "It's a big deal, it's a big part of their business, it's a service provider in the cloud space, and they've been testing it extensively for some time. They're waiting for the next release to come out which would have some of the capabilities that are important for them to go to production. We have quite a few that will go into production in the next three to four months."

There's also speculation that many are waiting for a release of QFabric based on custom Juniper ASICs.

The New York Stock Exchange is a showcase account for Juniper and one that's been testing QFabric since its introduction. But UBS analyst Nikos Theodosopoulos reported in a recent bulletin that QFabric "champion" Andy Bach, senior vice president of technology for the NYSE, is leaving the exchange, "adding uncertainty to QFabric deployment there."

"QFabric remains difficult to gauge, but we view 2012 as a year of trials," Theodosopoulos writes in his bulletin.

NYSE declined an interview request with Bach or another NYSE networking official for this story. Ingram says QFabric is "exactly where I would expect it to be."

"We're not seeing reliability, performance or quality issues with this product in our customer base," says Denise Shiffman, vice president of Platform Systems Division marketing at Juniper.

With NYSE specifically, Ingram says it just upgraded its Juniper EX2500 top-of-rack switches with the QFX3500. NYSE had "a couple hundred" EX2500s deployed.

And Juniper has a proof-of-concept scheduled this month with NYSE at Juniper's Sunnyvale labs "to see if they want to take the next step" with the fabric interconnect, Ingram says.

Throwing a curve into all of this is software-defined networking and its most visible component, OpenFlow. OpenFlow and SDNs are seen as a way to make configuring network hardware more programmable, which could make data center fabrics more agile to handle large data sets, or big data.

Proponents of OpenFlow and SDNs say the technologies could help hybrid cloud computing services enable bandwidth-on-demand for data center interconnection; allow scientists to conduct research with collaborators worldwide by enabling the global exchange of massive data sets collected from research projects around the world; and balance loads across the fabric's multiple active links within and between data centers.

And perhaps most threatening to vendors, OpenFlow and SDNs could relegate proprietary fabrics to niches within data centers and clouds, rather than holistic, end-to-end architectures.

Then again, maybe not.

"In theory, it should make fabrics more flexible," ZK Research's Kerravala says. "And it should also allow application-to-network interfaces better. But have you ever heard an OpenFlow vendor explain to you what they can do that you couldn't do with traditional networking?"

Regardless, it's now time to show some proof points after all of the noise and hype surrounding fabric architectures over the past three years.

"2012 has to be the year of the customer deployment," Kerravala says. "I want to see some actual customer examples and, frankly, some quantifiable benefits. So this will be the year we go from vision to some deployments, and along with that have to come some proof points."

Read more about data center in Network World's Data Center section.