Researchers create search engine to hunt molecules online
July 26, 2007ChemxSeer, the first publicly available search engine designed specifically for chemical formulae, can sort out when "He" refers to helium and not a person more than nine times out of 10, according to the Penn State College of Information Sciences and Technology (IST) researchers who created the tool.
With the new engine, scientists searching for research on CH4 or methane no longer have to wade through search results about Channel 4 or Chapter 4 as ChemxSeer will only return documents with references to the chemical formula.
The new algorithm also can identify related chemicals with different formula representations and chemicals with related substructures or similarities, said C. Lee Giles, professor of information sciences and technology and co-director of the IST Cyber Infrastructure Lab where the research originated.
"Results from our search engine are much more relevant than results returned by popular search engines," Giles said. "It is one of several cyber tools under development in our lab which will enable better access to and sharing of information and data among scientists and scholars."
The tool is described in a paper, "Extraction and Search of Chemical Formulae in Text Documents on the Web," presented at the recent 16th International World Wide Web Conference in Alberta, Canada. In addition to Giles, the authors are Bingjun Sun and Qingzhao Tan, graduate students in computer science and engineering, and Prasenjit Mitra, assistant professor of information sciences and technology and co-director of Penn State's Cyber Infrastructure Lab.
Electronically hunting for chemical formulae poses some unique challenges for popular search engines, which typically focus on key words. For one, scientists often search for parts of chemical formulae, with the part appearing in the beginning, at the end or in between.
Similarly, some chemical molecules can have more than one formula representation. As a result, if a person is searching for CH4 using a popular search engine and the article identifies the molecule as H4C, the article won't be included in the search results.
In addition, molecules can be confused with nonchemical abbreviations. While people would recognize "OH" as Ohio in a particular context, a machine with a chemical dictionary could confuse it with the chemical notation for a hydroxide. A similar slip up can occur with "I" (iodine) or "In" (indium).
In designing the engine, the researchers built on their expertise in information-extraction algorithms created for CiteSeer, a search engine for academic and science documents.
Besides extracting formulae, ChemxSeer also allows for various query models appropriate for any scientist looking for a molecule. Not only does it query for exact matches, but it also queries for formulae with additional terms or elements as well as for formulae with similar structures. The engine also can search for the range of occurrence of an element in various formulae, the researchers said.
To create ChemxSeer, the researchers basically "taught" machines how to recognize chemical formulae by providing training samples of occurrences of both chemical formulae and non-chemical formulae.
"Teaching the computer to classify what is a formula and what is not was complex because language is inherently context sensitive and judging the meaning of a term using its context is hard for a machine," Mitra said.
Future research will focus on improving the reliability of identification, linking to existing molecular databases, data archiving and increasing the relevance of search results.
The engine is part of an open-source cyber infrastructure project focusing on chemical document search for environmental chemistry and funded by the National Science Foundation. The grant awarded to the Penn State Department of Chemistry aims to enable automatic data analysis.
"This tool replaces time-intensive manual searching, allowing our research team to focus more on solving problems with as much relevant information as possible," said Karl Mueller, professor of chemistry and PI of the cyber infrastructure grant.
Source: Penn State
-
Film coatings made from whey
Jan 03, 2012 |
4.8 / 5 (4) |
0
-
New powerful painkiller has abuse experts worried
Dec 26, 2011 |
3.7 / 5 (3) |
7
-
Need a new material? New tool can help
Dec 20, 2011 |
5 / 5 (7) |
2
-
Fingerprinting uranium: X-rays identify mobile, stationary forms of atomic pollutant
Dec 19, 2011 |
5 / 5 (2) |
0
-
Seeing the world of nanotechnology from a single-molecule perspective
Aug 22, 2011 |
5 / 5 (1) |
0
-
Fast photon control brings quantum photonic technologies closer
1 minute ago |
not rated yet |
0
-
Engineers build first sub-10-nm carbon nanotube transistor
Feb 01, 2012 |
4.9 / 5 (33) |
30
-
Something old, something new: Evolution and the structural divergence of duplicate genes
Jan 31, 2012 |
4.6 / 5 (7) |
1
-
The hidden nanoworld of ice crystals: Revealing the dynamic behavior of quasi-liquid layers
Jan 30, 2012 |
5 / 5 (5) |
1
-
Stock market network reveals investor clustering
Jan 27, 2012 |
3.9 / 5 (23) |
8
More news stories
Ordered planar polymers created for the first time
(PhysOrg.com) -- Scientists under the direction of ETH Zurich have created a minor sensation in synthetic chemistry. They succeeded for the first time in producing regularly ordered planar polymers that form ...
3 hours ago |
5 / 5 (5) |
1
|
Manipulating genes with hidden TALENs
(PhysOrg.com) -- A better understanding of gene function in model plant and animal systems could be used to develop useful traits in livestock and crop plants, and might someday lead to developments in stem ...
2 hours ago |
5 / 5 (2) |
0
|
Scientists discover molecular secrets of 2,000-year-old Chinese herbal remedy
For roughly two thousand years, Chinese herbalists have treated Malaria using a root extract, commonly known as Chang Shan, from a type of hydrangea that grows in Tibet and Nepal. More recent studies suggest that halofuginone, ...
20 hours ago |
4.5 / 5 (19) |
20
|
New method to examine batteries -- MRI from the inside
There is an ever-increasing need for advanced batteries for portable electronics, such as phones, cameras, and music players, but also to power electric vehicles and to facilitate the distribution and storage of energy derived ...
Chemistry / Analytical Chemistry
20 hours ago |
5 / 5 (7) |
0
|
Hydrogen from acidic water: Researchers develop potential low cost alternative to platinum for splitting water
A technique for creating a new molecule that structurally and chemically replicates the active part of the widely used industrial catalyst molybdenite has been developed by researchers with the Lawrence Berkeley ...
Feb 09, 2012 |
4.8 / 5 (16) |
21
|
Fast photon control brings quantum photonic technologies closer
(PhysOrg.com) -- Using photons instead of electrons to transmit information could lead to faster and more secure ways to communicate, among other advantages. Now a team of physicists has taken another step toward realizing ...
Planck mission steps closer to the cosmic blueprint
(PhysOrg.com) -- ESA's Planck mission has revealed that our Galaxy contains previously undiscovered islands of cold gas and a mysterious haze of microwaves. These results give scientists new treasure to mine ...
New ability to regrow blood vessels holds promise for treatment of heart disease
(Medical Xpress) -- University of Texas at Austin researchers have demonstrated a new and more effective method for regrowing blood vessels in the heart and limbs a research advancement that could have ...
Nanostructured electrodes for rechargeable sodium-Ion batteries
Highly efficient 3V cathodes for rechargeable sodium-ion batteries have been developed by users from Argonne National Laboratory's Materials Science, Chemical Sciences & Engineering, and X-ray Sciences Divisions, ...
A lost world? How zooarchaeology can inform biodiversity conservation
A new study of tropical forests will provide a 50,000-year perspective on how animal biodiversity has changed, explored through an archaeological investigation of animal bones.
Myths and shame keep many from seeking bankruptcy protection
(PhysOrg.com) -- Two interesting facts that may counter modern ideas about bankruptcy: The overwhelming majority of U.S. filings belong to individuals rather than corporations or entities, and most of these ...