CIO

PCI Express-based fabrics: A low-cost alternative to InfiniBand

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

By building on the natural strengths of PCI Express (PCIe) -- it's everywhere, it's fast, it's low power, it's affordable -- and by adding some straightforward, standards-compliant extensions that address multi-host communication and I/O sharing capabilities, a universal interconnect now exists that substantially improves on the status quo in high-performance cloud and data center installations.

One application for these installations now receiving considerable attention is that of replacing small InfiniBand clusters with a PCIe-based alternative. This implementation approach for high-speed data center applications was addressed at the Super Computing 2012 Conference (SC12) in Salt Lake City, where the high-performance computing (HPC) community began to really sit up and take notice.

[ EPIC INTERCONNECT CLASH!InfiniBand vs. Gigabit Ethernet]

The belief is that in cloud and data-center environments, PCIe-based fabrics can replace small InfiniBand clusters, offering Quad Data Rate (QDR)-like performance when communicating between CPUs, enabling straightforward sharing of I/O devices, and doing so at a much lower cost and power envelope. InfiniBand doesn't do this anywhere near as easily or cost-effectively. Figure 1 illustrates the simplicity of a PCIe-based fabric compared to InfiniBand.

InfiniBand predated PCIe and was originally envisioned as a unified fabric to replace most other data center interconnects. In the end, however, it did not achieve that goal, but did develop a niche as a high-speed clustering interconnect that replaced some proprietary solutions.

InfiniBand, like PCIe, has evolved considerably since its introduction. The initial speed supported was the Single Data Rate (SDR), 2Gbps, the same data rate as PCIe Gen 1. The original PCIe specification borrowed heavily from InfiniBand at the signaling level. It has been enhanced through Double Data Rate (DDR) at 4Gbps, QDR at 8Gbps, and is now shipping at Fourteen Data Rate (FDR) at 13.64Gbps. Higher speeds are envisioned moving forward.

The QDR data rate is closest to PCIe Gen 3. With a similar bandwidth and latency, a fabric based on PCIe should provide similar performance to that of an InfiniBand solution at the same data rate. This is especially true if the PCIe fabric includes enhancements to the basic PCIe capability to enable Remote DMA (RDMA), which offers very low-latency host-to-host transfers by copying the information directly between the host application memories.

And PCIe can also allow sharing of I/O devices using standard multifunction networking and telecommunications hardware and software, something InfiniBand can't easily do.

The native sharing of I/O devices, and the ability to enable high-speed communication between the CPUs in a system, is not part of the current specification for PCIe. However, that specification does provide a mechanism for vendors to add their own extensions, while still remaining compatible with the overall specification. Using these vendor-defined extensions allows the enhanced implementation to be used with existing test and analysis equipment, but with a more robust feature set.

So, a PCIe-based fabric can achieve InfiniBand-like performance, but does it add anything to data centers and cloud computing environments? PCIe delivers a range of advantages that accrue from its comparative strengths.

First is its ability to scale linearly for different bandwidth requirements -- from x1 connections on PC motherboards, to x2 connections to high-speed storage, to x4 and x8 connections on backplanes, and up to x16 for graphics applications.

Another key advantage of PCIe is its simple, low-overhead protocol, and the fact that PCIe builds on the legacy architecture of PCI, so that it has been quick and easy to migrate to the newer, faster connections. While InfiniBand has achieved very low latency with a relatively complex protocol through special-purpose hardware and software drivers that have been tuned over many years, PCIe starts out with low latency and simplicity based on its construction, giving it an advantage in adaptability.

But the most powerful advantage for PCIe is that it's already a near-universal interconnection technology.

Based on its technical elegance and simplicity, and its historical widespread adoption, almost every device in a cloud or data center system has at least one -- and often many -- PCIe connections. This includes CPUs, storage devices, I/O devices and special-purpose bridges. In fact, even InfiniBand host bus adaptors connect to systems through a PCIe port.

And it is this fact that makes PCIe so much less expensive and lower power to use in a fabric. With InfiniBand, a Host Channel Adapter (HCA) is required to translate from PCIe -- where it starts at the host -- to the InfiniBand protocol, then translated back on the other end. And while that makes sense for connections traveling over relatively long distances, it is significant overkill for a rack, or a small cluster of racks, in a data center.

With PCIe, all those costly (in the hundreds of dollars) HCAs are replaced with a simple re-timer device to ensure a clean signal, which could cost as low as a few dollars in high volume. For a 36-slot rack, or a mini-cluster that can span up to hundreds of nodes, the math easily shows the cost differential.

The power advantage of PCIe over InfiniBand is similar, and also flows directly from the ability to use a simple re-timer rather than an HCA. A single re-timer device will dissipate less than 1W, while an InfiniBand HCA is in the 5-7W range. And, as with the cost aspect, this power difference adds up to a substantial advantage for PCIe in a data-center rack application.

But while power and cost can be significant motivators, one of the most powerful advantages of PCIe for data centers is its ability to act as a truly converged rack-level fabric. With regard to power and cost, it's assumed that an InfiniBand backplane would simply be replaced with a PCIe version. But it is typical that an InfiniBand backplane also has an Ethernet switch on it (to enable communication between multiple racks), so there are multiple fabrics on the backplane. PCIe can act as the only fabric, since it can effectively handle both the communications and storage traffic.

Some of the newer implementations of InfiniBand can combine with Ethernet to form a single top-of-rack switch, and while this provides a degree of convergence, Ethernet network interface cards (NICs) or InfiniBand HCAs are still needed on each node, and separate interconnect protocols are still running from the nodes. This is an improvement over the traditional method, but doesn't address the node-level power and cost advantages of PCIe, which is where the majority of the savings come from.

One trend that makes PCIe an especially powerful fabric is that of enterprise-level solid-state drives (SSDs) converging on PCIe as their primary connection (see Figure 2). This allows a PCIe-based fabric to connect up directly to SSD arrays, reducing latency -- which is crucial in SSD applications -- and increasing performance, while reducing the part count and power. No other mainstream, general-purpose interconnect has this capability.

In summary, PCIe has grown from its original use as a high-speed, board-level graphics interconnect to a popular general-purpose solution. This has enabled it to penetrate every market segment: enterprise, servers, storage, embedded and consumer. And with some simple extensions it is a highly attractive solution for high-speed clustering in data center fabrics. It can offer performance comparable to a QDR InfiniBand solution, with a much lower cost and power envelope. InfiniBand technology has its place in those applications that require high data transfer rates without regard to power or cost, and those willing to pay a premium for this requirement will continue to use the technology. But for cloud and data center applications that need and value the three P's -- performance, power and price -- PCIe is the superior option.

Larry Chisvin is vice president of strategic initiatives and Krishna Mallampati is senior director of marketing atPLX Technology Inc.

Read more about lan and wan in Network World's LAN & WAN section.