As a former designer of big iron, scale-up computing remains dear to my heart. Some specialized applications still perform best on large, shared memory, tightly-coupled SMP servers. Even for workloads that need only fractions of a large server, the opportunity to share "headroom" allows multiple applications to reside on the same large SMP, typically via virtualization partitioning. These systems can run at a much higher utilization percentage by pooling the resources that are "reserved for peaks" than if each workload were deployed on separate, individual servers. That flexible pool of hardware resources is often just as valuable as the ability to host a large, monolithic application.
Unfortunately, the sophisticated scale-up designs created by innovative engineers must amortize their development expense across a relatively small volume of servers, compared to the ubiquitous industry-standard x86 platforms shipping as near-commodities. While I remain convinced that large, scale-up, shared memory servers are crucial for many computing environments, I acknowledge that x86 implementations offer compelling price/performance.
Is there a way to combine the best of both attributes? That is, can large shared memory servers be created from commodity-priced, industry-standard building blocks? Can some "glue," be it hardware or software, allow a collection of small, inexpensive servers to appear to perform as if a large, shared memory system? This week, two firms, 3Leaf Systems and ScaleMP, have introduced alternatives they each believe will address significant portions of the market for economical scale-up systems by coupling standard x86 servers.
Both firms unveiled their solutions just before SC09, November's annual USA venue showcasing innovative high performance computing solutions. Why SC09 and not a trade show targeting the market for broader commercial computing? Attendees of the supercomputing conference usually have very demanding compute requirements, often needing very large servers, but they are also typically budget-constrained. The promise of a large, shared memory, server at commodity price points understandably excites these users.
Of course, a virtual "appears like a very large shared memory SMP" server will not perform identically to an actual large hardware-optimized SMP server. Therein lies the difference between 3Leaf and ScaleMP solutions. The 3Leaf platform employs a 3Leaf-designed ASIC hardware chip that melds together the memory space of multiple x86 motherboards (currently AMD Opteron, but planning Xeon support with the next iteration of Intel QuickPath) to deliver low-latency shared memory reminiscent of SMPs. ScaleMP, on the other hand, forgoes the expense of unique hardware, and accomplishes coherent shared memory via software. Admittedly, 3Leaf's hardware memory coherence is faster than ScaleMP's software mechanism, but ScaleMP claims its pre-fetch and local caching algorithms deliver the desired low latency appearance without the expense of specialized hardware.
High Performance Computing (HPC) users are acutely aware that the road promising affordable high performance is littered with firms who dreamed of delivering champagne performance at beer budgets. NUMA, non uniform memory access, is at the heart of most "affordable" scale-up designs. Flat, uniform memory access, designs prove to be costly as systems grow larger and larger. The key is to have NUMA's apparent non-uniformity seem as small as possible. KSR in the early 90s, Convex, SGI's Distributed Shared Memory, and other Scalable Coherent Interface (SCI) users (such as Sequent and Data General) all promised the performance of sophisticated, optimized SMPs at commodity server prices. Fairly recently, Virtual Iron attempted a hypervisor that could join multiple smaller servers into a virtual larger SMP, but it eventually abandoned that effort to focus only on applying virtualization for partitioning (the company has since been absorbed into Oracle.)
Clearly, a virtualized NUMA SMP will not adequately substitute for a real SMP for all environments. The key question is which workloads will be satisfied by ScaleMP's software virtualization, which will benefit from 3Leaf's hardware memory coherence, and which applications demand the more uniform memory access of true large SMP implementations. All current operating systems have evolved to understand how to efficiently keep processing and associated memory physically close to each other on NUMA platforms. However, some workloads have less sensitivity to NUMA than others. Targeting an announcement at the supercomputing conference recognizes that many HPC applications depend less on the shared memory updating that plagued early NUMA attempts to address general purpose commercial computing. The inherent parallelization of HPC applications designed to exploit large infrastructures makes them better prepared to run well on 3Leaf and ScaleMP platforms. Optimized large SMP hardware designs will continue to offer the best performance for applications that exploit multiple compute engines and large shared memory. However, other application environments may benefit from hardware and/or software mechanisms that lash together lower cost commodity servers to create the virtual appearance of a large system, yet with very attractive pricing.
We believe there will be growing customer demand for virtualized SMP solutions like 3Leaf and ScaleMP, which offer lower cost, even with the tradeoff of potentially lower performance. 3Leaf Systems' hardware ASIC is impressive in its use of InfiniBand switching as a substitute for custom SMP backplanes. The low latencies of today's InfiniBand implementations are likely to provide an adequate SMP-appearance for many applications. ScaleMP's software-only solution may not offer the low latencies of 3Leaf, but it is likely to attract a number of customers who desire large memory accessibility at a very economical cost, albeit with somewhat slower latencies. We predict that 3Leaf and ScaleMP will see first adoption among HPC users, but each will eventually carve out a niche among commercial application users. The optimized design of monolithic large SMPs will still be needed by many customers, but virtualized SMPs, aggregated from low cost commodity servers, will offer an attractive price/performance alternative for a variety of workloads.

Recent Comments