Intel MIC and its Comprehensive Networking Strategy

Last week we talked about the upcoming release of Intel’s Xeon E5 processor family. This week, we have some even more important announcements regarding Intel MIC and the strategic direction that Intel is headed regarding high performance computing.

Image of the Aubrey Isle Die used in the Intel MIC "Knights Ferry"

Die shot of 'Aubrey Isle,' the silicon chip included in the Intel MIC 'Knights Ferry' development platform

Continue reading

How a SAS switch can improve storage management

LSI SAS 6Gb/s switch and accessoriesLast week, LSI announced their release of “the industry’s first 6Gb/s SAS switch”. The switch offers unique opportunities for cluster managers to improve the architecture of their storage systems.

The value of the SAS switch is its function of transforming a cluster from a NAS (network-attached storage) structure into a DAS (direct-attached storage) structure. With DAS, storage data does not have to be transferred from the SAS protocol to the network protocol (Ethernet or InfiniBand) and back to SAS. The bottleneck of the middle step is eliminated – the LSI switch allows all I/O of data to happen through just the SAS protocol. This is especially useful for clusters which have or plan to upgrade to 6Gb/s RAID controllers – their throughput will be increased when connected to a 6Gb/s switch rather than to a network.

Another advantage of switching to a DAS configuration for a cluster is it migrates the RAID controllers from the storage nodes to the compute nodes. In a NAS cluster, each storage node typically has its own RAID controller which communicates with the compute nodes through a network. In a DAS cluster with a SAS switch, the storage nodes are JBODs (“Just A Bunch Of Drives” – essentially hard drive warehouses without other computing components within their chassis) that are all accessed by RAID controllers located directly inside the compute nodes.

This configuration separates the RAID controllers from the storage drives and centralizes each of them for simpler management and improved performance. Now, as many RAID controllers as the cluster administrator decides can access any quantity of drives on separate JBOD-based storage. The process that allows this kind of interaction is known as SAS zoning and is illustrated in the diagram below:

Diagram showing the DAS configuration of a cluster with a SAS switch

Diagram showing the DAS configuration of a cluster with a SAS switch, RAID controllers located on the compute nodes, and SAS zoning of the JBOD storage nodes.

For more detailed information about the various uses of the LSI 6Gb/s SAS switch, read their white paper about this product. As storage technology continues to evolve, new solutions such as a DAS cluster configuration with a 6Gb/s SAS switch are helping overcome the various I/O bottlenecks that hamper computing performance.

Einstein@Home volunteers discover rare star

Image of planet earthWhen I was a freshman in college, I helped the professor of my introductory astronomy class to conduct some of his research. The job wasn’t hard: I had to look at digital maps of the sky and try to find a particular type of rare star. Open map segment, click on pixels around a light source (star), evaluate if pixels have sufficient contrast, repeat. I never found the kind of star my professor was searching for.

Looking back on this experience, my job could have easily been done by a computer program. It would probably have been magnitudes more efficient that I was at analyzing the thousands of pixels on the map, and my professor wouldn’t have had to pay it $8.50/hr. Of course, as a college freshman, I was grateful for the research experience and the cash.

These days, universities are becoming more sophisticated in the way they let amateurs help them with research. A project called Einstein@Home recently had a breakthrough when a rotating pulsar was discovered by volunteer scientists, the first such accomplishment of its kind.

As the press release by the National Science Foundation describes, Einsten@Home is a collaborative project that allows lay people to donate the computing power of their PCs and laptops to help search the sky for celestial objects that have not been discovered yet. Over a quarter of a million volunteers from almost every country on earth participate in this venture, and now it has payed off.

Some of the volunteers’ computers recently unearthed a rare star that had not been documented before. This type of star is very important to researchers studying Einstein’s general theory of relativity, one of the most complicated paradigms in science. For such a star to be formed, there are many preconditions that must occur. As the press release noted above explains:

When two massive stars are born close together from the same cloud of gas, they can form a binary system and orbit each other from birth. If those two stars are at least a few times as massive as our sun, their lives will both end in supernova explosions. The more massive star explodes first, leaving behind a neutron star. If the explosion does not kick the second star away, the binary system survives. The neutron star can now be visible as a radio pulsar, and it slowly loses energy and spins down. Later, the second star can swell up, allowing the neutron star to suck up its matter. The matter falling onto the neutron star spins it up and reduces its magnetic field. This is called “recycling” because it returns the neutron star to a quickly-spinning state. Finally, the second star also explodes in a supernova, producing another neutron star. If this second explosion also fails to disrupt the binary, a double neutron star binary is formed. Otherwise, the spun-up neutron star is left with no companion and becomes a “disrupted recycled pulsar“, spinning between a few and 50 times per second.

Quite a find! With this recent success, collaborative computing projects such as Einstein@Home, which require very little involvement on the part of the lay user, will become more and more popular. There are many such opportunities available, and the page to download BOINC, the program that allows your computer to facilitate scientific research, even has a special option for using a GPU if your computer has one. With NVIDIA releasing their new GeForce GTS 450 GPU today for just $129, beefy gaming computers can now be easily used to scan the heavens when they’re not being honed to shoot alien mutants.

I know the first thing I’m going to do when I load up my personal laptop is install Einstein@Home. If I could not find the stars I was looking for when I was in astronomy class, maybe my computer can.

Software Driven Networking: Enabling New Internet Speed Protocols

Internet map Photo courtesy of Matt Britt (http://en.wikipedia.org/wiki/User:Matt_Britt) and is under CC license.

A team from the Korea Advanced Institute of Science and Technology created a router, built from parts found in most high-end desktop computers, that transmits data at nearly 40 GBps.

The technique used by the scientists could lead to the development of cheap commodity chips, replacing the custom made hardware in high performance routers. The software could lead to the development of new techniques and protocols to replace the decades old infrastructure, on which the Internet currently runs.

Routers use custom hardware to route traffic between networks. Software routers use software to perform the same function. Most commercial software products can only achieve speeds of 3GB, far below the 10Gbps of common hardware. The Korean researchers developed a program called Packetshader which uses GPUs to process data packets at nearly 40GBps.

Routers manipulate data packets in myriad ways and this “parallel” processing is where the GPU really shines. Able to handle multiple data packets at once, such as encryption and authentication, it allows the CPU to perform serial operations on the data, such as packet processing to detect network breach attempts.

Gianluca Iannaccone, an engineer at Intel Labs Berkeley familiar with the software, says it could reduce the number of physical machines needed to comprise a Terabit-per-second software router, to one-third of what his research has previously indicated would be required.

“One Terabit is the entry point for enterprise-grade routers–the routers in the core of the Internet,” says Iannaccone. If enough 40Gb software routers are connected, you create a 1TB router. These clusters could one day form routers made up entirely of software.

“We can expect killer apps out of this,” added KyoungSoo Park, who was also involved. “You can build an interesting packet- or network-management system on top of a PC-based software router that can’t be implemented with a hardware router. Ultimately, you can experiment with new protocols that are not used in today’s Internet.”

Used in combination with technology like Openflow, we could develop scalable, energy-efficient data networks to replace  our current infrastructure.

Platform Computing – innovator in cloud computing software

When Platform Computing began as a company in 1992, computers were used by large organizations much like they had been for several previous decades. If a complex computing job needed to be run, it would be run on one machine, often only during certain times in the day. There was no widespread use of clusters or computing clouds. Computational research that now takes several days or weeks to complete used to take many months or years.

The first commercial project that Platform Computing undertook, according to an interview on HPCintheCloud.com with Platform CEO Songnian Zhou, was to help the engine manufacturer Pratt & Whitney design the engines for the new Boeing 777 airliner. Zhou describes how supercomputers were used back then:

At that time, they were using one Cray supercomputer rather than IBM mainframes to do it — and every night they would run one job. One job! Per night! Using that one Cray they had to explore all the parameters — how big or small, how many blades, and so on — all the design alternatives; that takes dozens and dozens of runs. They had to run half a year, which is of course a big problem for their product cycle to serve the airlines and their customers.

Platform Computing sought to change the way organizations such as Pratt & Whitney used supercomputers. Instead of running one job on one computer at a time, Platform pioneered the use of software to break up complex jobs to be performed on many computers connected together in one computer cluster. The airline and automotive industries, according to Zhou, were the early adopters of this technology and used it to speed up and simplify their design simulations.

Today, cloud computing and clustering have become industry standards, and there are now many companies that offer software and other services to facilitate them. Platform Computing still specializes in private clouds (a network of computers in the same facility most likely owned by the same organization) and community clouds (a network of computers owned and shared exclusively by a few organizations) which means that they cater towards large corporations and organizations that can afford purchasing large clusters.

Public clouds, the type most people think of when they talk about cloud computing, is facilitated mostly by other companies, although Platform has made an effort to reach out to smaller businesses and organizations in 2010.

If you are interested in Platform Computing software for clustering or private clouds, feel free to contact ICC and we can talk to you about the different options they have available to take full advantage of your computing hardware resources.

AMD Fusion processors – from GPU to APU

GPUs (graphics processing units) are a favorite topic on this blog. It is an innovative and powerful computing idea with an almost awkward origin: the graphics card, which has in the past been used to perform calculations necessary for visual rendering, is now used in GPU applications to help the processor perform millions of general computations. In effect, the GPU becomes a specialized processor in the computer.

GPU products have been soaring in popularity recently, especially for scientific uses. But the CPU-GPU arrangement still retains the old CPU-graphics card relationship. That is, the way a CPU and GPU are connected is still the same way that a CPU used to interact with a graphics card, through the PCI-E slot on the motherboard. GPUs are fast, but their speed is limited by this type of connectivity, a remnant from the days when GPUs were just graphics cards. In effect, PCI-E communication between the GPU and motherboard is a bottleneck on performance.

AMD is tackling this problem head-on with their upcoming Fusion line of processors. Instead of connecting a GPU to to the motherboard like an add-on card, AMD proposes to make the GPU part of the same silicon chip as the CPU, eliminating the need for PCI-E communication. They have dubbed this combination of CPU and GPU technology “Accelerated Processing Unit” (APU).

Currently, the AMD Fusion processor is going to be released for the consumer market in desktop and laptop computers. But, while AMD is working on bringing this technology to server boards, many issues (mostly with coding) need to be resolved before this can happen, as John Fruehe of AMD explains in his blog article, “Fusion for Servers”.

Nevertheless, this innovation carries some promise for the future evolution of GPU technology. It has the potential to eliminate the PCI-E bottleneck and make that remnant of GPU’s original function as a graphics card a thing of the past.

But this won’t be easy for AMD. Although it has the advantage of being the only processor manufacturer to also produce graphics cards (AMD bought ATI in 2006) NVIDIA is still the leader in GPU technology. Many commercial battles will still be fought for the future of GPUs between these and other manufacturers, among them the competition between the coding languages of CUDA and OpenCL.

Despite these hurdles, AMD’s plans for using Fusion processors in servers is an nascent idea with a lot of potential to improve the GPU market and make computers – and supercomputers – even faster.

Microsoft to integrate technical applications, HPC and the cloud

HPCwire reports that Microsoft has gathered a new team called the Technical Computing Group with the ambition of “modeling the world”, that is, creating a software/hardware package that makes it easier than ever for scientists, researchers, doctors, designers, analysts, and engineers to run their mathematical models.

Microsoft already has a set of software tools that help scientists in their tasks, but they are disparate and, more importantly, not connected to the computing cloud.

The idea behind the “Modeling the World” initiative is that, while Microsoft will be working on new, high-level, intuitive software for scientists, it will also be powering that software with supercomputing technology. From the article:

Although little of this capability is in place today, the long-term goal is to be able to run a Windows-based HPC app on either a local cluster running the HPC Server, in the Azure cloud, on a workstation grid, or on some combination of the three. The idea is to make the underlying platform transparent to the applications, so that applications can be migrated as needed.

[. . .] In the future, they will integrate support for GPU computing — there’s already a beta plug-in for NVIDIA’s parallel Nsight — and extend the programming model to support a distributed runtime environment for clusters and clouds.

By combining the latest hardware innovations, such as GPU and cloud computing, with new software applications, Microsoft seeks to boldly launch a new trend in science and computing.

Network storage – past, present and future

In my last article, back in 2001, I had envisioned a network storage accessibility from anywhere at anytime. Today we are starting to see network storage pool accessibility using techniques such as Cloud Storage. EMC’s Atoms onLine provides customers instant access to online storage, while “NetApp, VMware and Cisco have been collectively working on a joint Cloud solution …”, wrote NetApp’s Val Bercovici. Already companies such as Mozy have started to offer network storage services for a fee to laptop users. Individuals can use network storage as an extension or replacement of their directly attached drive so that if their DA HDD fails, they could always rely on their data being available over their network-attached storage drive.

Now we are indeed approaching true network storage centralization through virtualization and are taking a technological leap to centralized virtual storage using converging technologies, moving towards more efficient and unified green technology where data becomes available all the time. But do we have mature and evolved storage building block components as well as SW layers?

The historical trend of storage components has helped to make current momentum on development of virtual network storage technologies such as Cloud Storage possible. Basic underlying building blocks are storage interfaces such as SCSI, Fibre Channel (FC) and ATA. They were accepted by the market and have been maturing in the past two decades or so with FC coming last. Other technologies, such as ESDI, targeted at specific markets, failed. FC customers benefit from storage scalability, large SANs and point-to-point interfaces while SCSI is popular for medium-level pools of data but uses a parallel interface and ATA is often used for lower-end applications. As the ATA, SCSI and FC markets grew, they gained maturity. But pressure built to expand the capability and features of these interfaces has gradually caused them to overlap each other.

In early 2000s, ATA was pushed to mimic FC’s point-to-point feature and SATA was born. SCSI was mature but could not do point-to-point, so SAS was approved and filled that gap. SCSI also was pressured to piggy-back on top of networking protocols such TCP/IP through tunneling, so that it could reach storage volumes that are farther away, leading to the birth of network storage SANs thru block I/O (i.e. iSCSI). Other network storage encapsulations included FCIP and iFCP.

Meanwhile smarter storage software layers such as new distributed filesystems, cloud storage and others were being developed to take advantage of these new interfaces. The variety of cluster filesystems together with a multitude of techniques to map many drives into simple centralized volumes have been gaining popularity. For example the iSCSI stack gained a nitch when customers found out that they could still use their SCSI-based storage server pools over the network. As a result, IDC predicts a 75.8% increase in iSCSI revenue between 2005 and 2010. IDC also predicts that iSCSI revenue will top $5 billion, or 20% of the external disk storage market, in 2010, up from $305 million, or 3% of the external disk market, in 2005. So the market has nodded to encapsulation as the future direction. Continue reading

Great introduction to Infiniband

The InfiniBand Trade Association published a very useful white paper that is an introduction to InfiniBand technology. InfiniBand is a network technology that greatly boosts computing performance by allowing applications in different parts of a standard network to communicate without the standard network’s usual communication channels. By using only the server resources those applications need and little else, InfiniBand can significantly increase computational speed and performance. The white paper is definitely worth a read for anybody interested in high-performance computing (HPC).

Although InfiniBand has established itself as a standard for computing performance, you may also want to check out a different perspective about the competing technologies that are seeking to overtake InfiniBand.

As Nebojsa Novakovic writes in an article for The Inquirer, Intel, which created InfiniBand, may be putting it on the back-burner to promote 10GB Ethernet  technology instead. Novakovic writes,

IB has very decent application support in high performance computing these days, however its protocol stack is fattened by its envisioned need to act as a common fabric for everything from storage access to networking and clustering, which naturally increases CPU load and latency.

So, if you really want a common single interconnect architecture for your datacentre or supercomputer, 10GE might make more sense, since all applications you might ever think of run on it anyway.

Intel is under pressure from AMD, its competitor. AMD has developed its own networking standard called High Node Count Hypertransport, which uses the hardware of Infiniband but not its exclusive protocols. Novakovic thinks that InfiniBand’s future may be in doubt, but until the new technologies that seek to challenge it mature, InfiniBand is still a sure bet for HPC.

Correction: Intel, as stated above, was not the sole creator of InfiniBand. Intel was one of several companies working on “Next Generation I/O”, a project which later merged with a competing venture to form the InfiniBand Trade Association (see the O’Reilly Introduction to InfiniBand Architecture). Thanks to “B” for the clarification.