Mastering multicore

April 26, 2010 by Larry Hardesty
Mastering multicore

Enlarge

Graphic: Christine Daniloff

(PhysOrg.com) -- MIT researchers have developed software that makes computer simulations of physical systems run much more efficiently on so-called multicore chips. In experiments involving chips with 24 separate cores -- or processors -- simulations of fluid flows were at least 50 percent more efficient with the new software than they were with conventional software. And that figure should only increase with the number of cores.

Complex computer models — such as atom-by-atom simulations of physical materials, or high-resolution models of weather systems — typically run on multiple computers working in parallel. A software management system splits the model into separate computational tasks and distributes them among the computers. In the last five years or so, as multicore chips have become more common, researchers have simply transferred the old management systems over to them. But John Williams, professor of information engineering in the Department of Civil and Environmental Engineering (CEE), CEE postdoc David Holmes, and Peter Tilke, a visiting scientist in the Department of Earth, Atmospheric and Planetary Sciences, have developed a new management system that exploits the idiosyncrasies of multicore chips to improve performance.

To get a sense of what it might mean to split a model into separate tasks, consider a two-dimensional simulation of a weather system over some geographical area — like the animated weather maps on the nightly news. The simulation considers factors like temperature, humidity and wind speed, as measured at different weather stations, and tries to calculate how they will have changed a few minutes later. Then it takes the updated factors and performs the same set of calculations again, gradually projecting its model out across hours and days.

Changes to the factors in a given area depend on the factors measured nearby, but not on the factors measured far away. So the computational problem can, in fact, be split up according to geographic proximity, with the weather in different areas being assigned to different computers — or cores. The same holds true for simulations of many other physical phenomena.

This video is not supported by your browser at this time.

Video: A computer model simulates the falling of a drop of water by calculating the forces that individual molecules exert on each other. The simulation can be broken into chunks, each representing a cluster of neighboring molecules, that are processed in parallel by different processing units, or “cores.”

Smaller is better

When such simulations run on a cluster of computers, the cluster’s management system tries to minimize the communication between computers, which is much slower than communication within a given computer. To do this, it splits the model into the largest chunks it can — in the case of the weather simulation, the largest geographical regions — so that it has to send them to the individual computers only once. That, however, requires it to guess in advance how long each chunk will take to execute. If it guesses wrong, the entire cluster has to wait for the slowest machine to finish its computation before moving on to the next part of the simulation.

In a multicore chip, however, communication between cores, and between cores and memory, is much more efficient. So the MIT researchers’ system can break a simulation into much smaller chunks, which it loads into a queue. When a core finishes a calculation, it simply receives the next chunk in the queue. That also saves the system from having to estimate how long each chunk will take to execute. If one chunk takes an unexpectedly long time, it doesn’t matter: The other cores can keep working their way through the queue.

Perhaps more important, smaller chunks means that the system is better able to handle the problem of boundaries. To return to the example of the weather simulation, factors measured along the edges of a chunk will affect factors in the adjacent chunks. In a cluster of computers, that means that computers working on adjacent chunks still have to use their low-bandwidth connections to communicate with each other about what’s happening at the boundaries.

Multicore chips, however, have a memory bank called a cache, which is relatively small but can be accessed very efficiently. The MIT researchers’ management system can split a simulation into chunks that are so small that not only do they themselves fit in the cache, but so does information about the adjacent chunks. So a core working on one chunk can rapidly update factors along the boundaries of adjacent chunks.

E pluribus unum

In theory, a single machine with 24 separate cores should be able to perform a simulation 24 times as rapidly as a machine with only one core. In the February issue of Physics Communications, the MIT researchers report that, in their experiments, a 24-core machine using the existing management system was 14 times as fast as a single-core machine; but with their new , the same machine was about 22 times as fast. And, Williams says, the new system’s performance advantage compounds with the number of cores, “like compound interest over time.”

Geoffrey Fox, professor of informatics at Indiana University, says that the MIT researchers’ system is “clever and elegant,” but he has doubts about its broad usefulness. The problems of greatest interest to many scientists and engineers, he says, are so large that they will still require clusters of computers, where the MIT researchers’ system offers scant advantages. “State-of-the-art problems will not run on single machines,” Fox says.

But Holmes points out that the model that he and his colleagues used in their experiments was a simulation of fluid flow through an oilfield, which is of immediate interest to the oilfield services company Schlumberger, which helped fund the research and employs Tilke when he’s not on loan to MIT. “We’re running problems with 50, 60 million particles,” Holmes says, “which is on the order of 20, 30 gigabytes.” Holmes also points out that 24-core computers “will not remain the state of the art for long.” Manufacturers have already announced lines of 128-core computers, and that could just be the tip of the iceberg.

Williams adds that, even for problems that still require clusters of computers, the new system would allow the individual machines within the clusters to operate more efficiently. “Cross-machine communication is one or two orders of magnitude slower than on-machine communication,” Williams says, “so it makes sense to keep cross-machine communication to a minimum, which is what our solution allows.”

More information: Project website: http://geonumerics.mit.edu/

Provided by Massachusetts Institute of Technology (news : web)

4.6 /5 (14 votes)  

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

El_Nose
Apr 26, 2010

Rank: 1 / 5 (1)
And as every programmer knows that the speed up of adding cores or processors is limited by bandwith as the article stated and that you will never achieve 1:1 speed up per core except on the most trivail of problems -- the ones easiest to split apart. While I am inclinded to believe MIT researchers are better than most I am curious to see the algorithm they split and still got a 22x speed up over 24 cores that is not a trivial test problem. But given its MIT is will except it and politely review the paper for myself.
CSharpner
Apr 26, 2010

Rank: 1 / 5 (1)
The article is right when it says many OS/s just dump their stuff onto multi-core chips without really doing much (if anything) to optimize performance for multi-core, but putting tasks in a queue to be more efficient is nothing revolutionary either. Dividing tasks to work within the confines of individual cores and caches is not all that revolutionary either. They're both good ideas of course. These are tasks that should be in the heart of either the OS and/or the language compilers with enough runtime variables that the OS can optimize them for the hardware they're currently running on (so a particular program isn't just optimized for one specific hardware spec, but can adapt to different machines with different numbers of processors and cache sizes).

In short, nothing really new here.

But, THIS is a significant step forward:
http://www.physor...136.html
Expiorer
Apr 27, 2010

Rank: not rated yet
50% increase with 24 cores.
And that figure should only increase with the number of cores.
LOL
Actually 50% increase is very close to SHlT.
My teacher of algorithms said that 95% of software can be made atleast 50% faster on same system (showed many examples).
sender
Apr 27, 2010

Rank: not rated yet
True hardware concurrency relies on reprogrammable process constructors for direction and parallel bus junctions for hardware synchronicity rather than simple core distribution, it seems that linear functional programming is locked into functional calls rather than process redirection which loses cycles for calls over simply formatting the construct and passively exciting the desired changes in the overlaying active environment.
El_Nose
Apr 28, 2010

Rank: not rated yet
hey powerup just went through and gave 1's to people without disputing anything stated??

@ Explorer

yes if you are writing very inefficient code.. par tof programming is know the art of using your basic structures. But a 50% increase is positively outstanding if all you did was add cores ... wait till you take a computer architecture course and when you do the math you will understand
Rank 4.6 /5 (14 votes)
Relevant PhysicsForums posts

More news stories

Anonymous knocks CIA website offline (Update)

The website of the Central Intelligence Agency was inaccessible on Friday after the hacker group Anonymous claimed to have knocked it offline.

Technology / Internet

created 11 hours ago | popularity 5 / 5 (10) | comments 16

New error-correcting codes guarantee the fastest possible rate of data transmission

Error-correcting codes are one of the triumphs of the digital age. They’re a way of encoding information so that it can be transmitted across a communication channel — such as an optical fiber o ...

Technology / Computer Sciences

created 19 hours ago | popularity 4.9 / 5 (8) | comments 6 | with audio podcast

Google users warned of threat to smartphone wallets

Users of Google smartphone wallets were being warned on Friday that there is a way to crack pass codes intended to thwart thieves from going on illicit shopping sprees.

Technology / Internet

created 9 hours ago | popularity 5 / 5 (2) | comments 0

New power source discovered

(PhysOrg.com) -- Researchers at the Massachusetts Institute of Technology (MIT) and RMIT University have made a breakthrough in energy storage and power generation.

Technology / Energy & Green Tech

created 18 hours ago | popularity 4.7 / 5 (31) | comments 8 | with audio podcast

Small modular reactor design could be a 'SUPERSTAR'

(PhysOrg.com) -- Though most of today's nuclear reactors are cooled by water, we've long known that there are alternatives; in fact, the world's first nuclear-powered electricity in 1951 came from a reactor ...

Technology / Energy & Green Tech

created 19 hours ago | popularity 4.4 / 5 (13) | comments 25 | with audio podcast


Humans may have helped the decline of African rainforests 3000 years ago

(PhysOrg.com) -- Large areas of rainforests in Central Africa mysteriously disappeared over three thousand years ago, to be replaced by savannas. The prevailing theory has been that the cause was a change ...

The power of estrogen -- male snakes attract other males

A new study has shown that boosting the estrogen levels of male garter snakes causes them to secrete the same pheromones that females use to attract suitors, and turned the males into just about the sexiest ...

Advanced power-grid model finds low-cost, low-carbon future in West

(PhysOrg.com) -- The least expensive way for the Western U.S. to reduce greenhouse gas emissions enough to help prevent the worst consequences of global warming is to replace coal with renewable and other ...

Could Venus be shifting gear?

(PhysOrg.com) -- ESA’s Venus Express spacecraft has discovered that our cloud-covered neighbour spins a little slower than previously measured. Peering through the dense atmosphere in the infrared, the ...

Complex wiring of the nervous system may rely on a just a handful of genes and proteins

Researchers at the Salk Institute have discovered a startling feature of early brain development that helps to explain how complex neuron wiring patterns are programmed using just a handful of critical genes. ...

Japan scientist makes 'Avatar' robot

A Japanese-developed robot that mimics the movements of its human controller is bringing the Hollywood blockbuster "Avatar" one step closer to reality.