Parallel course: Researchers help ease transition to parallel programming
October 23, 2009 by Larry Hardesty
An image of a Quad-Core AMD Opteron processor. Courtesy of AMD
(PhysOrg.com) -- In 1995, a good computer chip had a clock speed of about 100 megahertz. Seven years later, in 2002, a good computer chip had a clock speed of about three gigahertz -- a 30-fold increase. And now, seven years later, a good computer chip has a clock speed of... still about three gigahertz.
Four or five years ago, chip makers realized that they couldn't make chips go much faster, so they adopted a new strategy for increasing computers' power: putting multiple "cores," or processing units, on each chip. In theory, a chip with two cores working in parallel can accomplish twice as much as a chip with one core. But software developers tend to see their programs as lists of sequential instructions, and they've had trouble breaking up those instructions in ways that take advantage of parallel processing. Professor of Computer Science and Engineering Saman Amarasinghe and his colleagues at MIT's Computer Science and Artificial Intelligence Lab are giving them a hand.
In the past, computer science researchers had hoped that sequential programs could be converted into parallel programs automatically. "I spent a good part of my life trying to do that," says Amarasinghe. But Amarasinghe has now come to the conclusion that "if you want to get parallel performance, you have to start writing parallel code." At a high level, he believes, programmers will need to specify which tasks performed by their programs can run concurrently.
"Just writing anything parallel doesn't mean that it's going to run fast," Amarasinghe says. "A lot of parallel programs will actually run slower, because you parallelize the wrong place." And determining when to parallelize, and which cores to assign which tasks, is something that computers can do automatically, Amarasinghe believes.
Amarasinghe divides his group's work on multicore computing into two categories: tools to ease programmers' transition to parallel programming and tools to improve programs' performance once that transition is complete.
Predictable parallelism
The first set of tools addresses one of the central difficulties of parallel programming: if several tasks are assigned to separate cores, there's no way to be sure which will be completed first. Most of the time, that's not a problem. But occasionally, different cores have to access the same resource — updating the same stored value, for instance, or sending data to the monitor. Depending on which core reaches the resource first, the program's behavior could be quite different. Even given the exact same inputs, a multicore program may not execute in quite the same way twice in a row.
That's a nightmare for developers trying to debug their programs. To find a bug, developers run their programs over and over, with slight variations, to localize problems to ever-smaller blocks of code. But those variations don't convey any useful information unless the rest of the program behaves in the exact same way.
Amarasinghe and his grad students Marek Olszewski and Jason Ansel designed a system to make multicore programs more predictable. When a core tries to access a shared resource, it's assigned a priority based not on the time of its request but on the number of instructions it's executed. At any point during a parallel program's execution, any two cores will have executed about the same number of instructions. But if one core has had to, say, access a little-used value in a distant part of the computer's memory, it might have fallen slightly behind. In those cases, the researchers' system gives it a chance to catch up. Once it's reached the same instruction count, if it hasn't issued a higher-priority request for the shared resource, the first core's request is granted.
That waiting around means, of course, that the program as a whole will run more slowly. But in experiments, the researchers determined that, on average, a software implementation of their system increased program execution time by only about 16 percent. For developers, that's a small price to pay for reliable debugging. Moreover, if the ability to count instructions were built into the cores themselves, the system would be much more efficient. Indeed, Intel — which has demonstrated eight-core chips at trade shows — is talking with Amarasinghe about funding further research on the system.
"A lot of other people have this alternative approach," says Krste Asanovic, an associate professor at the Parallel Computing Lab at the University of California, Berkeley. "You kind of just run it [the parallel program] however it runs and then try and record what it did so then you can go back and replay it." But with the MIT system, he says, "You don't have to worry about recording how it executed because when you execute it, it will always run the same way. So it's strictly more powerful than replay."
Maximizing multicore
Amarasinghe's lab has two projects that fit into his second category — tools that optimize the performance of parallel programs. The first deals with streaming applications, such as video broadcast over the Web. Before your computer can display an Internet video, it needs to perform a slew of decoding steps — splitting the data stream into parallel signals, performing different types of decompression and color correction, recombining the signals, performing motion compensation, equalizing the signal, and so on. Traditionally, Amarasinghe says, video software will take a chunk of incoming data, pass it through all those decoding steps, and then grab the next chunk.
That approach, says Amarasinghe, squanders the inherent parallelism of signal-processing circuits. As one chunk of data exits each decoding step, the next chunk of data could be entering it. StreamIt, a language developed by Amarasinghe's group, allows programmers to specify the order of the steps but automatically determines when to pass how much data to each step.
"If you have some kind of specification document, they're all written in block diagrams" that illustrate a signal-processing path as a series of sequential steps, Amarasinghe says. "When they write to [the programming language] C, you lose those blocks." StreamIt, by contrast, "is very close to the original thinking," Amarasinghe says.
"Other people had talked about stream programming, but StreamIt was really putting everything together in a language and compiler tool chain so people could actually write streaming programs," says Asanovic. "So I think the language design and compiler tool chain that followed from that was pretty influential."
The group's other parallel-computing project helps programs adapt on the fly to changing conditions. Often, Amarasinghe explains, a programmer has several ways of tackling a particular task. There are, for example, many common algorithms for sorting data, with names like quicksort, insertion sort, radix sort, and the like. But the algorithms perform differently under different circumstances. Quicksort is often the best choice, but not "when the data is very small," Amarasinghe says. With huge data sets, however, radix sort may work even better than quicksort.
That variability is compounded when the algorithms are being assigned to different cores. So Amarasinghe's group has designed another language that asks the developer to specify four or five different methods for performing a given computational task. When the program is running, the computer automatically identifies the most efficient method.
It might seem that that approach would lead to bloated programs. But Amarasinghe points out that when a typical program executes, it devotes much more memory to storing data than it does to storing instructions. "Something like a sort is, like, five lines of code that work on arrays with billions of elements. So making five lines of code 25 is not a big issue," he says.
It does require some added work on the programmer's part. But "there's no free lunch: you won't get nice multicore performance by doing nothing," Amarasinghe says. "Believe me, we've been trying for 50 years."
Provided by Massachusetts Institute of Technology (news : web)
-
'Not so fast, supercomputers,' say software programmers
May 22, 2007 |
not rated yet |
0
-
Stanford, tech giants team up to enable software for parallel computers
May 01, 2008 |
not rated yet |
0
-
MIT, IBM team up on first PlayStation 3 course
May 08, 2007 |
not rated yet |
0
-
Next-generation, high-performance processor unveiled
Apr 24, 2007 |
not rated yet |
0
-
New approach eliminates software deadlocks using discrete control theory
Dec 02, 2008 |
not rated yet |
0
-
Engineers build first sub-10-nm carbon nanotube transistor
Feb 01, 2012 |
4.9 / 5 (31) |
30
-
Something old, something new: Evolution and the structural divergence of duplicate genes
Jan 31, 2012 |
4.6 / 5 (7) |
1
-
The hidden nanoworld of ice crystals: Revealing the dynamic behavior of quasi-liquid layers
Jan 30, 2012 |
5 / 5 (3) |
1
-
Stock market network reveals investor clustering
Jan 27, 2012 |
3.9 / 5 (23) |
8
-
Of microchemistry and molecules: Electronic microfluidic device synthesizes biocompatible probes
Jan 26, 2012 |
5 / 5 (1) |
0
-
Synergistic relations between computer science and technology.
Feb 06, 2012
-
how do iphone gloves work?
Feb 05, 2012
-
iPhone battery over time
Jan 30, 2012
-
Best alternate Tablet to an iPad for writing math or physics equations?
Jan 26, 2012
-
Sending SMS to a website
Jan 20, 2012
-
Need help with my technical fest!
Jan 19, 2012
- More from Physics Forums - Computing & Technology
More news stories
Google users warned of threat to smartphone wallets
Users of Google smartphone wallets were being warned on Friday that there is a way to crack pass codes intended to thwart thieves from going on illicit shopping sprees.
3 hours ago |
5 / 5 (1) |
0
Anonymous knocks CIA website offline (Update)
The website of the Central Intelligence Agency was inaccessible on Friday after the hacker group Anonymous claimed to have knocked it offline.
5 hours ago |
5 / 5 (6) |
10
New error-correcting codes guarantee the fastest possible rate of data transmission
Error-correcting codes are one of the triumphs of the digital age. Theyre a way of encoding information so that it can be transmitted across a communication channel such as an optical fiber o ...
Technology / Computer Sciences
13 hours ago |
5 / 5 (5) |
6
|
New power source discovered
(PhysOrg.com) -- Researchers at the Massachusetts Institute of Technology (MIT) and RMIT University have made a breakthrough in energy storage and power generation.
Technology / Energy & Green Tech
12 hours ago |
4.8 / 5 (19) |
7
|
Small modular reactor design could be a 'SUPERSTAR'
(PhysOrg.com) -- Though most of today's nuclear reactors are cooled by water, we've long known that there are alternatives; in fact, the world's first nuclear-powered electricity in 1951 came from a reactor ...
Technology / Energy & Green Tech
13 hours ago |
4.3 / 5 (11) |
20
|
Complex wiring of the nervous system may rely on a just a handful of genes and proteins
Researchers at the Salk Institute have discovered a startling feature of early brain development that helps to explain how complex neuron wiring patterns are programmed using just a handful of critical genes. ...
The power of estrogen -- male snakes attract other males
A new study has shown that boosting the estrogen levels of male garter snakes causes them to secrete the same pheromones that females use to attract suitors, and turned the males into just about the sexiest ...
Putting the squeeze on planets outside our solar system
(PhysOrg.com) -- Using high-powered lasers, scientists at Lawrence Livermore National Laboratory and collaborators discovered that molten magnesium silicate undergoes a phase change in the liquid state, abruptly ...
Humans may have helped the decline of African rainforests 3000 years ago
(PhysOrg.com) -- Large areas of rainforests in Central Africa mysteriously disappeared over three thousand years ago, to be replaced by savannas. The prevailing theory has been that the cause was a change ...
NASA sees wide-eyed cyclone Jasmine
Cyclone Jasmine's eye has opened wider on NASA satellite imagery, as it moves through the Southern Pacific Ocean.
NASA sees Giovanna reach cyclone strength, threaten Madagascar
Tropical Storm 12S built up steam and became a cyclone on February 10, 2012 as NASA's Terra satellite passed overhead. Residents of east-central Madagascar should prepare for this cyclone to make landfall ...
Oct 23, 2009
Rank: not rated yet
The article should make clarify the difference between hyperthreading and a true parallel processor.
Oct 23, 2009
Rank: not rated yet
it's assigned a priority based not on the time of its request but on the number of instructions it's executed.
Unfortunately it's not that simple because the parallel code sections may be accessing level 1, 2, or 3 cache memory or external dram so that the number of instructions executed has little to do with the time to execute them.
Oct 23, 2009
Rank: not rated yet
The second major hurdle is pushing the front side bus to keep up with CPU instructions reducing buffer and increasing hardware access time.
Oct 23, 2009
Rank: 5 / 5 (1)
Oct 23, 2009
Rank: not rated yet
Oct 23, 2009
Rank: 5 / 5 (1)
When I work on video games, I like the AI path-finding and the AI decision making to run at the same time since path-finding can be tedious when traversing a 3d environment. Physics engines is also something to have on a separate core. Render engine to run on its own core.
Oct 23, 2009
Rank: not rated yet
Oct 24, 2009
Rank: not rated yet
Exactly. Path finding absolutely MUST run on a separate core, or at the very least in a separate thread. That is, if you want your FPS to stay above 30 and not just stop completely while your algorithm iterates through the available options. Even with something like A* you still can be iterating through a ton of data, often recursively. You would definitely prefer that to run in parallel.
Oct 24, 2009
Rank: not rated yet
Oct 24, 2009
Rank: not rated yet
The (c) date ? 1987...