New Green Server Competitor Emerging?

There has been a decent amount of chatter on all the media channels over some of Facebook’s efforts to move forward with innovative technology. The other day I wrote about its new “green” European data center based in Sweden. In addition, at the recent Open Compute Project Summit, Facebook announced its intention to contribute to greater standardization at the system level for data center server and hardware equipment. For some, minimizing heat and energy consumption is as high a priority as performance.

A potential competitor to Facebook is emerging in HP, who is launching a new effort Project Moonshot. HP intends to utilize this program to develop:

…a new server development platform, “customer discovery lab” and partner ecosystem brought together with the purpose of reducing the complexity and energy consumption of environments that have thousands of servers along with all the network, storage, power, cooling and management technologies needed to support them.

But Facebook as a player in the world of enterprise IT is a newbie. Data centers are not their primary focus. So while HP may butt heads with them, their real game appears to be Intel.

Continue reading

Graph 500, Green500, HPCC, and SPEC: Alternative benchmarks for high-performance computing

Image of Supermicro SuperRackSupercomputers have become a vital part of almost any innovative project undertaken by collaborative teams in the developed world. Server clusters can be found anywhere from the offices of small businesses to compartments in U.S. Navy submarines.

So which are the fastest supercomputers on earth? The usual measurement for high-performance computer (HPC) clusters is the TOP500 ranking, which is based on the High Performance LINPACK (HPL) benchmark. LINPACK stands for “linear equations software package”, and the benchmark measures how fast a supercomputer can solve a system of linear equations. The results are reported in units of billions of floating point operations per second (GFLOPS).

The high-performance LINPACK metric has long been the established standard for measuring computing performance, with intense competition worldwide for the lead spot in the TOP500. But some scientists criticize the TOP500 ranking for creating an incomplete picture of how to measure performance. The risk, as Mark Anderson describes in an article in IEEE Spectrum magazine, is motivating computer hardware manufacturers to develop less-effective technologies.

Continue reading

GPU workstation sale (and other news)

Image of Supermicro workstationWow, this is the first update in a while on the ICC blog. We have been working on several web-based projects that have been keeping us busy, and I would like to highlight some of them (and other news) in this post.

Website and product news

First of all, as you may have noticed, our HPC by ICC section of the site launched earlier this month which describes ICC solutions for high-performance computing clusters. There is an outline of the Platform Computing HPC Suite, an industry-leading cluster management software package, as well as a diagram which explains common cluster components. Our goal is to open up high-performance computing to industries that have been slow to adopt it, even though HPC may save them a lot of money in the long term and help them stay competitive. If you think you could benefit from an upgrade to your IT infrastructure, feel free to contact us for a free consultation.

At SC10, Supermicro unveiled their GPU SuperBlade server modules (SBI-7126TG) that will offer perhaps the highest density CPU-GPU computing power available on the market. A 42U rack, comprised of six 7U blade enclosures – each with ten GPU SuperBlade modules – can carry 120 CPUs and 120 GPUs. For comparison, a rack with 42 standard dual-processor 1U servers would have 84 CPUs and no GPUs. We will have these high-density server products available on our site soon after they are released. Read more about them in the Supermicro press release or on this podcast interview with Tau Leng, GM of Supermicro, by insideHPC.

Continue reading

China builds the world’s fastest supercomputer

Photo of Tianhe-1A supercomputer courtesy of NVIDIA.comAfter almost a year-long run, the Jaguar supercomputer at Oak Ridge National Laboratory in Tennessee has relinquished its title as the world’s fastest computer. This honor now belongs to the Tianhe-1A supercomputer located in the National Supercomputing Center in Tianjin, China.

Tianhe-1A is expected to officially become the leader of the TOP500.org list of the world’s fastest supercomputers sometime in mid-November. It clocked an impressive 2.507 petaflops on the LINPACK scale, which is about the sum of the performance of supercomputers #6 to #10 on the Top 500 list, according to insideHPC. Jaguar, now the second most powerful supercomputer in the world, had a peak performance of about 1.75 petaflops.

Although Tianhe-1A may re-ignite the anxiety in the West that usually accompanies news of great achievements from East Asia, this is not the first time that America or Europe had lost the #1 place on the Top 500. In 2002, Japan captured the top spot with their Earth Simulator (ES) supercomputer, which remained the world’s fastest until September of 2004 when IBM’s Blue Gene/L cluster at Argonne National Laboratory surpassed it. The quasi-geopolitical competition for computing power is far from over, but China’s ascendancy is actually one of the less interesting things about Tianhe-1A.

Tianhe-1A can potentially usher in a new era in “personal supercomputing”. It is the first leader of the Top 500 to make extensive use of GPUs (Graphics Processing Units). In fact, it is comprised of 7,168 NVIDIA Tesla M2050 GPUs and 14,336 Intel CPUs. In comparison, Jaguar has 37,376 AMD CPUs and no GPUs.

Continue reading

HPC and the life sciences

Connected network cablesThis week, a team from our company visited a large laboratory located in the Chicago area. IT representatives there told us how a major focus for them has been migrating their computing resources from a model of individual workgroups using separate clusters to a shared private cloud that all research teams in the facility can access for running their jobs. This shift to private clouds for getting the most out of dedicated clusters is a hot topic of conversation in the HPC world.

HPC in the Cloud recently published an article responding to a case study written by Platform Computing about the implementation of a private cloud at the Harvard Medical School. Both are worth a read if you are interested in the challenges encountered by small- and medium-sized life sciences organizations when they try to adopt HPC clusters.

HPC holds much promise for organizations such as the Harvard Medical School. With middleware such as Platform Computing (we are biased, I must admit, since this is what HPC clusters by ICC deploy as well) it is getting easier to operate an HPC cluster with hosts running different operating systems and applications. It used to be that this multiplicity of software on the same cluster would cause extensive compatibility and usability problems, but not so much anymore. End-users in the life sciences (such as medical researchers) are benefiting from computing applications that are productive and easy to use.

So Harvard Medical School, as the HPC in the Cloud article describes, has migrated from an inefficient computing model of unshared individual computers scattered across various laboratories to a centralized private cloud that can be accessed by any of those users and managed as one unit. Simplifying maintenance while maximizing accessibility to HPC resources by medical school staff is most likely going to save money and increase the pace of innovation in the long run.

While this is a hopeful case study that sheds light on how other organizations can pool their computing resources to great effect, challenges remain for spreading this model to other small- and medium-size laboratories and businesses. For one, private medical companies are heavily regulated by the government and their IT infrastructure has to incorporate many time-consuming applications to store detailed records.

HPC is becoming more affordable and easier to use, but software has to continue evolving to accommodate the particular context of each industry. Only then will the life sciences (not to mention other markets) have a truly turn-key HPC solution that can benefit labs and private companies of every size.

Microsoft HPC Server 2008 R2 – cool new features

Yesterday, at an HPC conference for the financial industry, Microsoft announced an update (R2) for Windows HPC Server 2008. Aside from offering new features that will take advantage of innovations in cloud computing, Microsoft claims that this update will make HPC Server 2008 less expensive to operate than Linux.

The reasoning, according to Microsoft, is that Linux requires much expensive expertise to use while the Windows interface is familiar to just about anybody in computing. Moreover, as Computer Reseller News reports, a Microsoft-funded study found that Windows HPC Server is 32-51% less expensive in the long-term than a Linux HPC solution.

There are several other features about which the new update of Microsoft HPC Server 2008 can boast. First, number crunching large data sets in Excel (a popular method with the financial computation industry) has become a lot more efficient.

This is partly due to a second major innovation: the ability to outsource computing to more powerful clusters from one’s Windows workstation. A series of calculations that would have taken two hours to complete before, writes HPC Wire, can now take less than two minutes.

Finally, the new update also takes advantage of cloud computing from the other direction: not one powerful HPC cluster assisting individual workstations, but rather many PCs volunteering their time to help compute a few data-intensive calculations.

This is the same idea as SETI@home or Einstein@home: any PC user can allow their computer to be used as part of a cloud to perform others’ calculations when it is not needed by the user herself.

With these impressive additions to Windows HPC Server 2008, Microsoft seeks to chip away at Linux’s lead in the HPC market.

Five nines reliability standard

HPCUptime. It’s all about uptime; ask any sysadmin.

Components fail and networks go down. Power goes out. Users download viruses on to systems. Apophis could go to eleven on the Torino impact hazard scale, smack right into Euro Disney, and the call from marketing would be to inquire when the servers are going to be back up.

The Black Swan has a nasty habit of smacking us around, so we come up with contingencies and try to engineer redundancy. Redundancy has costs but the question remains, how critical is the underlying system? Can you afford, in the grand scheme of things, for that component to fail?

RAID, quad-port LAN, Twin blades, dual UPS, remote backups. Every layer adds complexity and cost but if reliability is the goal for mission critical applications, then those complexities have value.

Achieving five nines reliability, for most, is impractical and cost prohibitive. Vendor claims of five nines do not distinguish between availability and reliability. Availability means the total  amount of time the product was up. Reliability means the number of times a product went down. One power outage means your system is reliable but unavailable.

The maximum component downtime, to be considered five nines reliable, cannot exceed 5 minutes 35  seconds a year. Five nines availability requires the problem be resolved in less than five seconds. This is where you try to justify the expense of backups in your budget meetings.

What does downtime cost? Depends on what alternatives are available to your customers in the event of single component failure.

How much is too much? That’s what the sysadmin is for.

PUE can be a misleading energy-efficiency standard

Photo of planet earthThe Power Usage Effectiveness (PUE) standard is one of the leading benchmarks for measuring energy efficiency in a data center. But there are some situations where an energy-efficient change to a data center will actually register as more wasteful by the PUE. For this reason, it’s important to know the gaps in the PUE metric.

Winston Saunders has described just such a case on the Intel Server Room Blog in an article titled, “Turning the Tide: CIO’s dilemma with PUE”.

The dilemma, according to Saunders, is this: “If improving the efficiency of your data center is an important goal, should you incentive [sic] the organization to improve PUE?”

Taken as a goal in and of itself, the PUE may lead data center operators astray. Take this example discussed by Saunders. A data center is running old servers with processors from 2006 (e.g. Intel Xeon 5160). These processors consume more power and perform less calculations than processors that came out in 2010 (e.g. Intel Xeon 5670).

Here’s the problem: if that data center chose to upgrade their old and inefficient servers to newer systems running more efficient and higher-performing processors, the PUE metric of that data center would actually increase (meaning, that data center would be less efficient according to the PUE).

The reason why this would happen is because PUE compares how much energy is consumed by IT equipment in a data center compared to all other energy expenditures (for lighting, cooling the servers, and other infrastructure). So, the more of the total energy is used to power the servers and not the other systems, the better the PUE score and – in theory – the more efficient the data center.

Back to our example of the data center that upgraded its old servers. This data center’s new servers drain less power. This means that the IT portion of total energy consumption in the data center decreases, which means the PUE will register this hardware upgrade as less energy efficient. Of course, the truth of the matter is the exact opposite.

Saunders illustrates the importance of considering PUE in context very well in his article. Anyone responsible for controlling data center costs should consider this (and other) gaps in PUE when making truly energy-efficient decisions.

Extending Moore’s Law

ProcessorAccording to Science News, researchers at Rice University have created the first two-terminal, pure silicon memory chips, easily adaptable to nanoelectric manufacturing.

Researchers discovered that silicon oxide could replace carbon in the process. When an electric charge is sent through silicon oxide-an insulator-between semiconducting sheets of polycrystalline silicon, it forms a conductive pathway as small as 5 nanometers (billionths of a meter) wide. This process creates a two-terminal resistive switch, far smaller than current circuits in computer architectures.

By continuously breaking and connecting these nano strips, one creates robust and reliable memory bits.

“The beauty of it is its simplicity,” said Professor James Tour. That, he said, will be key to the technology’s scalability. Silicon oxide switches or memory locations require only two terminals, not three (as in flash memory), because the physical process doesn’t require the device to hold a charge.

The implications for chip manufacturers and the continuation of Moore’s Law holds the promise for this technology.

Cooling Servers Under Oil

Photo courtesy of Matt Howard (http://www.flickr.com/people/35734278@N05) under the CC Attribution-Share Alike 2.0 Generic licenseMicheal Feldman of HPCwire has written an interesting article about a startup company called Green Revolution Cooling (GRC) that has a competitive and innovative product in the server cooling market.

GRC has developed a server rack that lies horizontally on the floor and is filled with an oil-based cooling fluid. Any server that is built according to the standard form factor can slide into the rack and be cooled by the oil bath.

Servers stored in a liquid solution? For this stroke of innovation (actually, the idea is not entirely new, as Feldman notes, but this is its most recent reiteration) GRC has been selected as one of the “Disruptive Technologies of the Year” for SC ’09 and SC ’10.

So how can servers live underwater (actually, under oil, since oil does not conduct electricity) safely? GRC can take any standard server (including blades and GPUs), remove its fans, and seal it with a special coating to make it safe for oil immersion. According to GRC, 250,000+ server hours of testing their racks has not revealed any malfunctions due to the cooling system.

So what are the advantages of this new kind of cooling system? As Feldman states: “The solution is advertised to reduce the cooling energy by 90 percent and cut overall power consumption in the datacenter by up to 45 percent. The pitch is that a single 10kW server rack at 8 cents per kWh will save over $5,000 per year on energy costs alone.”

Although at least one large supercomputing location (the Texas Advanced Computing Center) has begun using these server racks with happy results, GRC is having a difficult time partnering with server manufacturers to cover their servers under warranty if they are used in a GRC rack.

There are various liquid-cooling solutions on the market, but GRC’s is one of the most creative and cost-effective. It is indeed disruptive technology with lots of cost-savings potential. If GRC can overcome the stigma in the market against dunking servers into liquid, its technology can perhaps become a key player in the cooling industry.