A genomic CluE for cloud computing

April 23, 2009

DNA sequencing is the next frontier in biological research. As new sequencing technology becomes more efficient and affordable, it is increasingly available to small laboratories. Thus, sequencing data is being generated at a faster rate than ever before.

However, the computing capacity needed to analyze such vast amounts of data still has some catching up to do. Large networks of interconnected computers, called computer clusters, are required to analyze these data. Expensive to establish and maintain, these computer clusters are generally available only to labs that can afford them.

Enter Mihai Pop, an assistant professor in the department of and in the Center for Bioinformatics and Computational Biology at the University of Maryland. He and colleague Steven Salzberg, director of the center and Horvitz Professor of computer science, recently received a grant from the National Science Foundation Cluster Exploratory Program (CluE) to fund research aimed at discovering how remote cluster computers, computer networks available over the internet, might be used to process DNA sequence data.

"There is a new initiative by NSF to figure out what you can do with cluster computers on the internet - like the ones through Amazon, , and IBM," Pop said. "Our NSF grant will be used to find out if remote clusters of computers are a better option for DNA sequence analysis than local clusters of computers."

Pop's goal is to develop the software required to analyze sequence data in parallel (on many computers simultaneously). This massively parallel computing allows faster alignment and genome assembly.

While parallel computing is already being used on locally maintained computer clusters, Pop will be working on programs that will allow researchers to perform their DNA sequence over the web by accessing remote computer clusters maintained by large companies on a pay-per-use basis. This paradigm is known as .

So now, rather than buying and maintaining their own computer systems, researchers may simply be able to rent computer time at a fraction of the cost. But there are a few obstacles to overcome before Cloud Computing becomes a reality for genetic analysts.

"The first question is how to best split up the process of DNA sequence analysis to fit these computer clusters," Pop said. "The second is whether or not the benefits of cloud computing outweigh the costs of data transfer and storage."

The massive amounts of data generated by just one genome may take a significant amount of time to transfer over the internet. This, in addition to the data storage needed before analysis, might add costs that outweigh the benefits of using a remote computer cluster.

"Even if the analysis doesn't take long, the transfer may take forever and cost too much to make whole thing worthwhile," said Pop.

A Different Kind of Puzzle

DNA is made up of nucleotide bases that are abbreviated by the letters A, C, G, and T. Lined up in a double helix structure, they make up a code that is translated into the proteins that run our body processes. New technology can read this code and compare the genetic makeup of species and organisms.

However, the sequencing process cannot handle a whole genome at once. The DNA strands have to be chopped into small pieces, sequenced, and then those sequences have to be put back together again. Putting the pieces back together is what requires so much computing power.

There are two ways to put the pieces back together. If a reference genome is available from the same species, scientists can use the reference as a guide for piecing together the new sequence. However, if a reference is unavailable, the scientist faces the more difficult task of determining all possible combinations of the loosely fitting pieces and finding the best one.

Pop likens this process to completing a jigsaw puzzle. "If you have a reference genome, it's like having the box with the picture on the front to guide your assembly," he said. "With no reference, it's like having no picture and no idea what the finished product will look like; with lots of sky and ocean pieces that fit very loosely together."

Such a process requires a lot of computing power because of the number of possibilities and level of uncertainty. Computer clusters can do all the comparisons of sequence combinations and decide on the best one. But computer power and expense of systems are a limiting factor.

Pop's team will spend the next two years determining whether it is feasible and beneficial to do this analysis through cluster computers available on the internet. He will write software programs that, if successful, will be made available for researchers to use at no cost, and his results will be made available through journal articles and conference presentations.

Teaching and mentoring of both grads and undergrads will also be a large component of the grant, which Pop hopes will help entice talented computer science students to go into the biotechnology industry where their skills are needed.

Source: University of Maryland (news : web)


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 5 /5 (1 vote)


April 23, 2009 all stories

Comments: 0

5 /5 (1 vote)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories




  • hide
  • Relevant PhysicsForums posts

  • casio calculator that's similar to TI-89
    created 7 hours ago
  • Mathematica Question: Finding local maximums
    created 10 hours ago
  • Advice on what cell phone to get
    created 11 hours ago
  • Read multiple binary files to ascii
    created Nov 07, 2009
  • Engineering Translation software
    created Nov 06, 2009
  • Changing the language options on your phone.
    created Nov 03, 2009
  • More from Physics Forums - Computing & Technology

Other News

A system of space solar power system (SSPS)

Japan eyes solar station in space as new energy source

Technology / Energy

created 22 hours ago | popularity 4.7 / 5 (14) | comments 20

It may sound like a sci-fi vision, but Japan's space agency is dead serious: by 2030 it wants to collect solar power in space and zap it down to Earth, using laser beams or microwaves.


Framed for child porn -- by a PC virus

Framed for child porn -- by a PC virus

Technology / Internet

created 14 hours ago | popularity 5 / 5 (5) | comments 2

(AP) -- Of all the sinister things that Internet viruses do, this might be the worst: They can make you an unsuspecting collector of child pornography.


Software cos. eye key patent case in Supreme Court (AP)

Software cos. eye key patent case in Supreme Court

Technology / Business

created 23 hours ago | popularity 5 / 5 (4) | comments 2

(AP) -- With the technology industry looking on, the Supreme Court on Monday will explore what types of inventions should be eligible for a patent in a pivotal case that could undermine such legal protections ...


Campaigners are stepping up efforts to curb online tracking

Advertisers face resistance to on-line tracking

Technology / Internet

created 22 hours ago | popularity 5 / 5 (4) | comments 0

Campaigners are stepping up efforts to curb online tracking of Internet use by firms that deliver adverts tailored to the specific interests of consumers, as polls reveal widespread unease with the practice.


Sony offers 'Cloudy' early to people with its TVs

Technology / Business

created 15 hours ago | popularity not rated yet | comments 0

(AP) -- In a bid to sell living room electronics and spur buzz for "Cloudy with A Chance of Meatballs," Sony Corp. is offering the movie for free to U.S. buyers of its Internet-connected TVs and Blu-ray players starting ...