San Diego Supercomputer Center director offers tips on data preservation in the information age

December 11, 2008

The world has gone digital in just about everything we do. Almost every iota of information we access these days is stored in some kind of digital form and accessed electronically -- text, charts, images, video, music, you name it. The key questions are: Will your data be there when you need it? And who's going to preserve it?

In the December 2008 edition of Communications of the ACM, the monthly magazine of the Association for Computing Machinery, Dr. Fran Berman, director of the San Diego Supercomputer Center (SDSC) at the University of California, San Diego, provides a guide for surviving what has become known as the "data deluge."

Managing this deluge and preserving what's important is what Berman refers to as one of the "grand challenges" of the Information Age. The amount of digital data is immense: A 2008 report by the International Data Corporation (IDC), a global provider of information technology intelligence based in Framingham, Mass., predicts that by 2011, our "digital universe" will be 10 times the size it was in 2006 - and almost half of this universe will not have a permanent home as the amount of digital information outstrips storage space.

"As a society, we have only begun to address this challenge at a scale concomitant with the deluge of data available to us and its importance in the modern world," writes Berman, a longtime pioneer in cyberinfrastructure – an open but organized aggregate of information technologies including computers, data archives, networks, software, digital instruments, and other scientific endeavors that support 21st century life and work.

Berman is a strong advocate of cyberinfrastructure that supports the management and preservation of digital data in the Information Age – data cyberinfrastructure: "Just like the physical infrastructures all around us -- roads, bridges, water and electricity – we need a data cyberinfrastructure that is stable, predictable, and cost-effective."

In her article, Berman explores key trends and issues associated with preserving digital data, and what's required to keep it manageable, accessible, available, and secure. However, she warns that there is no "one-size-fits-all" solution for data stewardship and preservation.

"The 'free rider' solution of 'Let someone else do it'-- whether that someone else is the government, a library, a museum, an archive, Google, Microsoft, the data creator, or the data user -- is unrealistic and pushes responsibility to a single company, institution, or sector. What is needed are cross-sector economic partnerships," says Berman. She adds that the solution is to "take a comprehensive and coordinated approach to data cyberinfrastructure and treat the problem holistically, creating strategies that make sense from a technical, policy, regulatory, economic, security, and community perspective."
Berman's ACM article closes with a set of "Top 10" guidelines for data stewardship:

1. Make a plan. Create an explicit strategy for stewardship and preservation for your data, from its inception to the end of its lifetime; explicitly consider what that lifetime may be.

2. Be aware of data costs and include them in your overall IT budget. Ensure that all costs are factored in, including hardware, software, expert support, and time. Determine whether it is more cost-effective to regenerate some of your information rather than preserve it over a long period.

3. Associate metadata with your data. Metadata is needed to be able to find and use your data immediately and for years to come. Identify relevant standards for data/metadata content and format, following them to ensure the data can be used by others.

4. Make multiple copies of valuable data. Store some of them off-site and in different systems.

5. Plan for the transition of digital data to new storage media ahead of time. Include budgetary planning for new storage and software technologies, file format migrations, and time. Migration must be an ongoing process. Migrate data to new technologies before your storage media becomes obsolete.

6. Plan for transitions in data stewardship. If the data will eventually be turned over to a formal repository, institution, or other custodial environment, ensure it meets the requirements of the new environment and that the new steward indeed agrees to take it on.

7. Determine the level of "trust" required when choosing how to archive data. Are the resources of the U.S. National Archives and Records Administration necessary or will Google do?

8. Tailor plans for preservation and access to the expected use. Gene-sequence data used daily by hundreds of thousands of researchers worldwide may need a different preservation and access infrastructure from, for example, digital photos viewed occasionally by family members.

9. Pay attention to security. Be aware of what you must do to maintain the integrity of your data.

10. Know the regulations. Know whether copyright, the Health Insurance Portability and Accountability Act of 1996, the Sarbanes-Oxley Act of 2002, the U.S. National Institutes of Health publishing expectations, or other policies and/or regulations are relevant to your data, ensuring your approach to stewardship and publication is compliant.

Source: University of California - San Diego


print this article email this article download pdf blog this article bookmark this article     Stumble it Digg this share on Facebook retweet share on Reddit add to delicious
Rate this story - 4.2 /5 (10 votes)


December 11, 2008 all stories

Comments: 0

4.2 /5 (10 votes)
  • Stumble this up

  • Digg this

  • share this

  • hide
  • Related Stories

  • IBM makes Big Blue cloud
    created 10 hours ago | popularity not rated yet | comments 0
  • Digital cloud may rise over London (w/ Video)
    created Nov 13, 2009 | popularity not rated yet | comments 0
  • Two Earth-sized bodies with oxygen rich atmospheres found -- but they're stars not planets
    created Nov 12, 2009 | popularity not rated yet | comments 0
  • Advertisers face resistance to on-line tracking
    created Nov 08, 2009 | popularity not rated yet | comments 0
  • HP Enables Better, Faster Decision Making with Breakthrough Sensing Technology
    created Nov 05, 2009 | popularity not rated yet | comments 0



  • hide
  • Relevant PhysicsForums posts

  • casio calculator that's similar to TI-89
    created Nov 08, 2009
  • Advice on what cell phone to get
    created Nov 08, 2009
  • Changing the language options on your phone.
    created Nov 03, 2009
  • HP strange RPN operation???
    created Nov 02, 2009
  • Databases in physics
    created Oct 31, 2009
  • TI-89 Titanium Problem
    created Oct 29, 2009
  • More from Physics Forums - Computing & Technology

Other News

Google SPDY

Google's SPDY will speed up downloads

Technology / Internet

created 13 hours ago | popularity 4.3 / 5 (11) | comments 4

(PhysOrg.com) -- As part of its effort to speed up the Web, Google is experimenting with SPDY, a new application layer protocol, that it hopes will speed up the conversation between browsers and Web servers ...


Gartner said in a statement that semiconductor revenue is on pace to fall 11.4 percent this year

Semiconductor revenue to fall 11 percent: Gartner

Technology / Business

created 2 hours ago | popularity not rated yet | comments 0

Worldwide semiconductor revenue is expected to decline by more than 11 percent in 2009 over last year, less than previously forecast, market research firm Gartner said Monday.


More than 1,000 people have already signed up on the Internet to receive the "niiu"

Europe's first 'personalised paper' rolls off the presses

Technology / Other

created 6 hours ago | popularity not rated yet | comments 0

Billed as Europe's first "personalised paper", "niiu", a newspaper tailored to readers' individual wishes and delivered to their door before 08:00 am, made its first appearance in Berlin on Monday.


Comcast's NBC talks cap its decades-long rise (AP)

Comcast's NBC talks cap its decades-long rise

Technology / Telecom

created 3 hours ago | popularity not rated yet | comments 0

(AP) -- Ralph Roberts knew he was onto something big when people ran after his cable TV trucks in Tupelo, Miss., asking for a visit to their homes.


A sign marks the entrance to IBM Corporate Headquarters

IBM makes Big Blue cloud

Technology / Software

created 10 hours ago | popularity 3.3 / 5 (6) | comments 9

IBM on Monday announced it has created the world's largest business computing "cloud" capable of holding an amount of digital data on a par with 250 billion iTunes songs.