From Terabytes to Petabytes: Computer Scientists Develop New Hybrid Database System

August 26, 2009

(PhysOrg.com) -- As the amounts of data being stored by databases around the world enters the realm of the petabyte (the amount of data stored in a mile-high stack of CD-ROM disks), efficient data management is becoming more and more important. Now computer scientists at Yale University have developed a new database system by combining the best features of multiple approaches to create an open source hybrid system called HadoopDB.

Traditional approaches to managing data at this scale typically fall into one of two categories. The first includes parallel database management systems (DBMS), which are good at working with structured data that contain, for instance, tables with trillions of rows of data. The second includes the kind of approach taken by MapReduce, the software framework used by Google to search data contained on the Web, which gives the user more control over how the data is retrieved.

“In essence, HadoopDB is a hybrid of MapReduce and parallel DBMS technologies,” said Daniel Abadi, assistant professor of computer science at Yale and one of the system designers. “It’s designed to take the best features of both worlds. We get the performance of parallel database systems with the scalability and ease of use of MapReduce.”

HadoopDB was announced on Abadi’s blog last month. Yale graduate students and co-creators Azza Abouzeid and Kamil Bajda-Pawlikowski will present more in-depth details of the new system at the VLDB conference in Lyon, France on August 27. They will also present results of a detailed performance analysis they conducted with Abadi, Avi Silberschatz, chair of computer science at Yale, and Alexander Rasin of Brown University. The team will demonstrate the system performance on a range of representative queries at the conference, both on structured and unstructured data, and will outline HadoopDB’s characteristics along the run-time performance, loading time, fault tolerance and scalability dimensions.

With the huge amounts of data being collected and used in today’s databases - from consumer information used by retail chains to improve buying experiences and reduce customer churn to financial information being collected by banks to reduce risk and avoid another catastrophic financial collapse- being able to store and analyze such vast amounts of data will only continue to grow in importance, Abadi said.

HadoopDB reduces the time it takes to perform some typical tasks from days to hours, making more complicated analysis possible - the kind that could be used to find patterns in the stock market, earthquakes, consumer behavior and even outbreaks, Abadi said. “People have all this data, but they’re not using it in the most efficient or useful way.”

Provided by Yale University (news : web)


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 4.4 /5 (11 votes)


August 26, 2009 all stories

Comments: 0

4.4 /5 (11 votes)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories




  • hide
  • Relevant PhysicsForums posts

  • kindle e-reader and scientific papers
    created 19 hours ago
  • Help with a camera choice
    created Nov 18, 2009
  • casio calculator that's similar to TI-89
    created Nov 08, 2009
  • Advice on what cell phone to get
    created Nov 08, 2009
  • More from Physics Forums - Computing & Technology

Other News

US online ad revenue down 5.4 pct in third quarter

Technology / Internet

created 2 minutes ago | popularity not rated yet | comments 0

(AP) -- Online advertising revenue in the U.S. fell 5.4 percent in the third quarter from a year ago, as the sputtering economy kept its tight grip on even the fastest growing segment of industry, according to a report released ...


Wikileaks

Wikileaks releases pager intercepts from 9/11

Technology / Internet

created 3 minutes ago | popularity not rated yet | comments 0

Whistleblower website Wikileaks began publishing on Wednesday what it said were hundreds of thousands of pager messages from the day of the September 11, 2001 attacks on New York and Washington.


Design chosen for British 1,000 mph car

Design chosen for British 1,000 mph car (w/ Video)

Technology / Engineering

created 6 hours ago | popularity 5 / 5 (2) | comments 1

(PhysOrg.com) -- A British team hoping to be the first to get a car to 1,000 mph (1,610 km/h) has made its final design selection. The six-tonne car, known as the Bloodhound, will be powered by a Eurofighter ...


EU assembly adopts Internet, phone user rights

Technology / Telecom

created 2 hours ago | popularity not rated yet | comments 0

(AP) -- The European Parliament has endorsed new telecom rules that would give phone and Internet users more rights and allow them to appeal to national courts if they are cut off for illegal file-sharing.


Taking the drudgery out of software development

Taking the drudgery out of software development

Technology / Software

created 20 hours ago | popularity 3.6 / 5 (10) | comments 7

(PhysOrg.com) -- Software developers will no longer have to reinvent the wheel when writing new programs and applications thanks to a clever new set of tools and a central repository of 'building blocks'.