Futuristic 48-Core Intel Chip Could Reshape How Computers are Built (w/ Video)
December 3, 2009
Single Chip Cloud Computer has 48 Intel cores and runs at as low as 25 watts
(PhysOrg.com) -- Researchers from Intel Labs demonstrated an experimental, 48-core Intel processor, or "single-chip cloud computer," that rethinks many of the approaches used in today's designs for laptops, PCs and servers.
Researchers from Intel Labs demonstrated an experimental, 48-core Intel processor, or "single-chip cloud computer," that rethinks many of the approaches used in today's designs for laptops, PCs and servers. This futuristic chip boasts about 10 to 20 times the processing engines inside today's most popular Intel Core-branded processors.
The long-term research goal is to add incredible scaling features to future computers that spur entirely new software applications and human-machine interfaces. The company plans to engage industry and academia next year by sharing 100 or more of these experimental chips for hands-on research in developing new software applications and programming models.
You need Flash installed to watch this ideo
While Intel will integrate key features in a new line of Core-branded chips early next year and introduce six- and eight-core processors later in 2010, this prototype contains 48 fully programmable Intel processing cores, the most ever on a single silicon chip. It also includes a high-speed on-chip network for sharing information along with newly invented power management techniques that allow all 48 cores to operate extremely energy efficiently at as little as 25 watts, or at 125 watts when running at maximum performance (about as much as today's Intel processors and just two standard household light bulbs).
Intel plans to gain a better understanding of how to schedule and coordinate the many cores of this experimental chip for its future mainstream chips. For example, future laptops with processing capability of this magnitude could have "vision" in the same way a human can see objects and motion as it happens and with high accuracy.
Imagine, for example, someday interacting with a computer for a virtual dance lesson or on-line shopping that uses a future laptop's 3-D camera and display to show you a "mirror" of yourself wearing the clothes you are interested in. Twirl and turn and watch how the fabric drapes and how the color complements your skin tone.
This kind of interaction could eliminate the need of keyboards, remote controls or joysticks for gaming. Some researchers believe computers may even be able to read brain waves, so simply thinking about a command, such as dictating words, would happen without speaking.
Intel Labs has nicknamed this test chip a "single-chip cloud computer" because it resembles the organization of datacenters used to create a "cloud" of computing resources over the Internet, a notion of delivering such services as online banking, social networking and online stores to millions of users.
Cloud datacenters are comprised of tens to thousands of computers connected by a physically cabled network, distributing large tasks and massive datasets in parallel. Intel's new experimental research chip uses a similar approach, yet all the computers and networks are integrated on a single piece of Intel 45nm, high-k metal-gate silicon about the size of a postage stamp, dramatically reducing the amount of physical computers needed to create a cloud datacenter.
"With a chip like this, you could imagine a cloud datacenter of the future which will be an order of magnitude more energy efficient than what exists today, saving significant resources on space and power costs," said Justin Rattner, head of Intel Labs and Intel's Chief Technology Officer. "Over time, I expect these advanced concepts to find their way into mainstream devices, just as advanced automotive technology such as electronic engine control, air bags and anti-lock braking eventually found their way into all cars."
Cores Allow Software to Intelligently Direct Data for Efficiency
The concept chip features a high-speed network between cores to efficiently share information and data. This technique gives significant improvement in communication performance and energy efficiency over today's datacenter model, since data packets only have to move millimeters on chip instead of tens of meters to another computer system.
Application software can use this network to quickly pass information directly between cooperating cores in a matter of a few microseconds, reducing the need to access data in slower
off-chip system memory. Applications can also dynamically manage exactly which cores are to be used for a given task at a given time, matching the performance and energy needs to the demands of each.
Related tasks can be executed on nearby cores, even passing results directly from one to the next as in an assembly line to maximize overall performance. In addition, this software control is extended with the ability to manage voltage and clock speed. Cores can be turned on and off or change their performance levels, continuously adapting to use the minimum energy needed at a given moment.
Overcoming Software Challenges
Programming processors with multiple cores is a well-known challenge for the industry as computer and software makers move toward many-cores on a single silicon chip. The prototype allows popular and efficient parallel programming approaches used in cloud datacenter software to be applied on the chip. Researchers from Intel, HP and Yahoo's Open Cirrus collaboration have already begun porting cloud applications to this 48 IA core chip using Hadoop, a Java software framework supporting data-intensive, distributed applications as demonstrated by Rattner today.
Intel plans to build 100 or more experimental chips for use by dozens of industrial and academic research collaborators around the world with the goal of developing new software applications and programming models for future many-core processors.
"Microsoft is partnering with Intel to explore new hardware and software architectures supporting next-generation client plus cloud applications," said Dan Reed, Microsoft's corporate vice president of Extreme Computing. "Our early research with the single chip cloud computer prototype has already identified many opportunities in intelligent resource management, system software design, programming models and tools, and future application scenarios."
This milestone represents the latest achievement from Intel's Tera-scale Computing Research Program, aimed at breaking barriers to scaling future chips to 10s-100s of cores. It was co-created by Intel Labs at its Bangalore (India), Braunschweig (Germany) and Hillsboro, Ore. (U.S.) research centers. Details on the chip's architecture and circuits are scheduled to be published in a paper at the International Solid State Circuits Conference in February.





A mix of implicit/explicit threading through meta-instructions compiled in to the application, that would be such a fantastic approach and alleviate a lot of burden from programmers.
And when I talk about threading I mean both across all cores and on any single core, the CPU should decide when to use extra or share cores unless given explicit instructions. That way software will always run utilizing far more of the CPU's potential.
I'm amazed its left to the programmers to make use of the technology the way it is now. Software developers have to make the decision that such additional programming work is value added. This results in pricer software if you want to take advantage of multithreading :(
If it doesn't take full advantage of the cores the way tailored programming might then there should be a way for the programmer to choose to override.
Can't be right. Since when does even off-chip memory access take a few microseconds...
You can access registers and different levels of cache a magnitude faster than RAM (main memory) and thousands of times faster than non-volatile (dear god) memory such as hard drives. When you're passing information between machines, you're running it over layers and layers of devices. From the registers in the CPU through main memory via some sort of a bus to a device that's sitting on the bus and is probably sharing said bus with other similar devices, which then has to send information over some kind of networked connection and then the process is repeated in reverse on the receiving end.
Instead what Intel is doing is passing information between registers of multiple cores, the same thing that makes RISC machines so great, you aren't making all these accesses to main memory and beyond. This article doesn't really explain just how hard it would be to implement something like this.
Whatever your particular OS religion, it's worth checking out the open-source (open-ish) Grand Central Dispatch.
There's some info here (follow the link in the article for an exhaustive review): http://arstechnic...inux.ars
I realize that, which is why I dedicated a significant part of my statements to describing implicit *and explicit multi-threading approach. For example, it could be possible for multiple applications to be working on the 'current' core, in which case the CPU will decide to offload the work to another core without the programmer actually have to command the CPU to do such a thing.
I'm a programmer with regular experience in this, it's obvious to me that there are ways the systems can be architected to automate threading based on various conditions not directly tied to any application itself, but the size of the job and current consumption of a given core.
It's more about the distribution of work than it is segmenting of any given algorithm. Not 'automated parallelism', automated distribution* of work.
Explicit threading for parallelism.
Implicit threading for redistribution of work across multiple cores for general processing.
IIRC, the Occam language written to work with the Transputer enabled automatic task distribution into parallel processes, pipelines etc. One benefit, IIRC, was that different cores ran asynchronously, so you could add more / faster hardware as easy as USB...
Between one and one billion cores, what is the optimum and do we have to design software around it? How long would that take?
I have a real doubt about how this could work. If I used this setup to type these two simple sentences it would come out as...
"I have a real doubt as to about no wait I have a real doubt as to how this could work man my grammar sucks where is my beer ok I don't think this could work what no i didn't take the damn garbage out what this stupid software ARAGGSDGHGH stop"
"
That's already being done by every single modern OS (beginning with the first flavors of UNIX.) The word you're looking for, is "scheduling". Of course, some OSes are better at scheduling and fair resource allocations, than others. My experience with Windows, for example, leads me to conclude that Microsoft can't be bothered to implement a decent scheduler. Very frequently, the entire computer hangs up (even on dual-core systems!) when a single task somehow manages to "hog" the CPU. One such routine culprit is (of all things) Outlook....
This isn't about memory registers it's about more processing power with fixed clock speeds. More cores means more calculative ability meaning less time even talking to memory. This is about streamlining the process through over pwoering the application. It will be interesting to see if this is the correct methodology going forward.
I'm wondering if they mean 25 - 125W PER CORE?! Not so power-efficient if this is the case! If these are full-blown cores than they I would expect them to be at least 25W/core (or 1000 W per package!)
http://en.wikiped...itecture