More chip cores can mean slower supercomputing, simulation shows

January 14th, 2009 THE MULTICORE DILEMMA

Enlarge

THE MULTICORE DILEMMA: more cores on a single chip don't necessarily mean faster clock speeds, a Sandia simulation has determined. (Photo by Randy Montoya)

(PhysOrg.com) -- The worldwide attempt to increase the speed of supercomputers merely by increasing the number of processor cores on individual chips unexpectedly worsens performance for many complex applications, Sandia simulations have found.

A Sandia team simulated key algorithms for deriving knowledge from large data sets. The simulations show a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added.

The problem is the lack of memory bandwidth as well as contention between processors over the memory bus available to each processor. (The memory bus is the set of wires used to carry memory addresses and data to and from the system RAM.)
More chip cores can mean slower supercomputing, simulation shows
Enlarge

The graph depicts simulations of four potential multicore computers: the “conventional,” which adds more standard cores to a single processor socket; an MTA, which looks like the processor used the exotic Cray XMT supercomputer; and a PIM, which is based on Sandia's X-caliber processor design and includes memory tightly integrated with the processor. The fourth line simulates a conventional processor that represents a theoretical ideal. (Graphic by Richard Murphy et al, Sandia National Laboratories)


A supermarket analogy

To use a supermarket analogy, if two clerks at the same checkout counter are processing your food instead of one, the checkout process should go faster. Or, you could be served by four clerks.

Or eight clerks. Or sixteen. And so on.

The problem is, if each clerk doesn't have access to the groceries, he or she doesn't necessarily help the process. Worse, the clerks may get in each other's way.

Similarly, it seems a no-brainer that if one core is fast, two would be faster, four still faster, and so on.

But the lack of immediate access to individualized memory caches — the "food" of each processor — slows the process down instead of speeding it up once the number of cores exceeds eight, according to a simulation of high-performance computers by Sandia's Richard Murphy, Arun Rodrigues and former student Megan Vance.

"To some extent, it is pointing out the obvious — many of our applications have been memory-bandwidth-limited even on a single core," says Rodrigues. "However, it is not an issue to which industry has a known solution, and the problem is often ignored."

"The difficulty is contention among modules," says James Peery, director of Sandia's Computations, Computers, Information and Mathematics Center. "The cores are all asking for memory through the same pipe. It's like having one, two, four, or eight people all talking to you at the same time, saying, 'I want this information.' Then they have to wait until the answer to their request comes back. This causes delays."

"The original AMD processors in Red Storm were chosen because they had better memory performance than other processors, including other Opteron processors, " says Ron Brightwell. "One of the main reasons that AMD processors are popular in high-performance computing is that they have an integrated memory controller that, until very recently, Intel processors didn't have."

Multicore technologies are considered a possible savior of Moore's Law, the prediction that the number of transistors that can be placed inexpensively on an integrated circuit will double approximately every two years.

"Multicore gives chip manufacturers something to do with the extra transistors successfully predicted by Moore's Law," Rodrigues says. "The bottleneck now is getting the data off the chip to or from memory or the network."

A more natural goal of researchers would be to increase the clock speed of single cores, since the vast majority of applications are designed for single-core performance on word processors, music, and video applications. But power consumption, increased heat, and basic laws of physics involving parasitic currents meant that designers were reaching their limit in improving chip speed for common silicon processes.

"The [chip design] community didn't go with multicores because they were without flaw," says Mike Heroux. "The community couldn't see a better approach. It was desperate. Presently we are seeing memory system designs that provide a dramatic improvement over what was available 12 months ago, but the fundamental problem still exists."

In the early days of supercomputing, Seymour Cray produced a superchip that processed information faster than any other chip. Then a movement — led in part by Sandia — proved that ordinary chips, programmed to work different parts of a problem at the same time, could solve complex problems faster than the most powerful superchip. Sandia's Paragon supercomputer, in fact, was the world's first parallel processing supercomputer.

Today, Sandia has a large investment in message-passing programs. Its Institute for Advanced Architectures, operated jointly with Oak Ridge National Laboratory (ORNL) and intended to prepare the way for exaflop computing, may help solve the multichip dilemma.

ORNL's Jaguar supercomputer, currently the world's fastest for scientific computing, is a Cray XT model based on technology developed by Sandia and Cray for Sandia's Red Storm supercomputer. Red Storm's original and unique design is the most copied of all supercomputer architectures.

Provided by Sandia National Laboratories


print this article email this article download pdf blog this article bookmark this article     Digg this Stumble it share on Facebook share on Reddit add to delicious save to Yahoo! bookmarks
4.3/5 after 6 votes

Rank Filter

Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

  • dirk_bruere - Jan 15, 2009
    • Rank: 5 / 5 (1)
    I was doing research on multicore computers 30 years ago and this was a wellknown fact in the field. Seems that researchers today spend their time reinventing the wheel.
  • Marquo - Jan 16, 2009
    • Rank: not rated yet
    http://www.cc.gat...rch.html
    I see that there is a move away from threads and static hardware. Multicore means we have to throw away the past and embrace the new paradigm of memory non-concurrent, but hardware concurrent functionality of the multicore. Bus architecture becomes irrelevant when processes reach the 32-64 CPU/PSU/GPU envisioned by Intel/IBM.
    Eventually Machine Learning will become hardwired when Topological Quantum Computing becomes commonplace. Microsoft has invested millions to discover the abelian 4/5 pair of kelvin silicon.
    Horizons often hide brilliant days or mask storm clouds. I vote for the sunny days and progress towards complex thoughts of problems. Luck to that braided pair/tied to a multicore classic core architecture.
    3D silicon shall one day be replaced with pure photonic computation, maybe. We see all this in the lab and a drive to push boundaries of hilbert space.
    ZIP60 is a wonderful example of something that could be exploited by classic/Quantum integration.
  • dirk_bruere - Jan 16, 2009
    • Rank: not rated yet
    The problem is CPUs requiring access to shared resources eg memory. The way it's done now only works well for SIMD streams

January 14th, 2009 all stories
Technology / Computer Sciences

Comments: 3
Rank: 4.3/5 after 6 votes

  • Stumble this up

  • Digg this

  • Share it:
  • share on Facebook
  • share on MySpace
  • share on Slashdot
  • rss-newsfeed
  • share on Google
  • share on Reddit
  • add to delicious
  • save to Yahoo! bookmarks
  • share on Windows Live
  • Add to Mixx!
Rating: 4.3/5 after 6 votes

  • Related Stories

  • Intel Previews Xeon 'Nehalem-EX' Processor
    created May 27, 2009 | popularity not rated yet | comments 0
  • AMD Planning 16-Core Server Chip For 2011 Release
    created Apr 27, 2009 | popularity not rated yet | comments 0
  • Chips with everything
    created Feb 24, 2009 | popularity not rated yet | comments 0
  • Team sets records in simulating seismic wave propagation across the Earth
    created Nov 25, 2008 | popularity not rated yet | comments 0
  • LLNL, industry leaders to develop advanced technology cluster testbed
    created Nov 18, 2008 | popularity not rated yet | comments 0


  • Physicists Demonstrate Quantum Memory with Matter Qubits
    Physicists Demonstrate Quantum Memory with Matter Qubits
    Physics / General Physics
    created Jul 03, 2009 | popularity 4.4 / 5 (17) | comments 1
  • 'Holey' Nanosheets for Wastewater Dye Removal
    Nanotechnology / Nanomaterials
    created Jul 01, 2009 | popularity 5 / 5 (5) | comments 1
  • Jellyfish Robot Swims Like its Biological Counterpart
    Jellyfish Robot Swims Like its Biological Counterpart
    Electronics / Robotics
    created Jun 26, 2009 | popularity 4.4 / 5 (8) | comments 1
  • Could Maxwell's Demon Exist in Nanoscale Systems?
    Could Maxwell's Demon Exist in Nanoscale Systems?
    Physics / General Physics
    created Jun 24, 2009 | popularity 4.4 / 5 (18) | comments 29
  • Living Safely with Robots, Beyond Asimov's Laws
    Living Safely with Robots, Beyond Asimov's Laws
    Electronics / Robotics
    created Jun 22, 2009 | popularity 4.6 / 5 (54) | comments 40
  • Other News

    Industry wants to ban Minn. woman from downloading

    Technology / Internet

    created 1minute ago | popularity not rated yet | comments 0

    (AP) -- Just weeks after a federal jury ruled that a Minnesota woman must pay $1.92 million for illegally sharing copyright-protected music, the recording industry wants to make sure she doesn't do it again.


    Translate this: 'cognition-strength interfaces'

    Translate this: 'cognition-strength interfaces'

    Technology / Engineering

    created 5 hours ago | popularity 5 / 5 (1) | comments 0

    (PhysOrg.com) -- A highly ambitious European project used basic cognitive function, eye-tracking and keystroke logging as the starting point for the study of human-computer interaction for translation. It ...


    US Justice Dept probing telecom companies: WSJ

    Technology / Telecom

    created 1hour ago | popularity not rated yet | comments 0

    The US Justice Department is conducting an initial review to determine whether large US telecom companies have abused their market power, The Wall Street Journal reported on Monday.


    National Semiconductor Introduces Industry's Lowest-Noise Frequency Synthesizer

    National Semiconductor Introduces Industry's Lowest-Noise Frequency Synthesizer

    Technology / Semiconductors

    created 1hour ago | popularity not rated yet | comments 0

    National Semiconductor today announced the industry’s lowest-noise, fully integrated frequency synthesizer. The PowerWise LMX2541 provides less than 2 milli-radians (mrad) root-mean-square (rms) noise at 2.1 ...


    Pages of the Codex Sinaiticus are pictured on a laptop in Westminster Cathedral, central London

    World's oldest surviving Bible published online

    Technology / Internet

    created 2 hours ago | popularity 5 / 5 (2) | comments 0

    About 800 pages of the world's oldest surviving Bible have been pieced together and published on the Internet for the first time, experts in Britain said Monday.