[Currents header graphic]

February 15, 1999

UCSC computer scientists find success predicting protein structures

By Tim Stephens

Although scientific research is often highly competitive, the competition between research groups rarely takes the form of an outright contest. In recent years, however, UCSC's computational biology group has taken part in an unusual series of experiments in which dozens of research teams analyze the same data and submit their results for independent assessment to see whose techniques work best.

The experiments, called Critical Assessment of Techniques for Protein Structure Prediction (CASP), challenge researchers to predict the three-dimensional structure of various proteins based on each protein's unique sequence of amino acid building blocks. The latest CASP results indicate that UCSC's computational biologists are among the best in the world.

"This is one of the few instances where researchers make blind predictions, without an opportunity to check their results before releasing them," said Kevin Karplus, associate professor of computer engineering, who heads a team of UCSC researchers studying protein structure.

Protein molecules are long, linear chains of amino acids that fold into complex three-dimensional shapes. They carry out an enormous variety of functions in all forms of life. Because of the central role of proteins in biology, predicting protein structure is critically important to biologists, biomedical researchers, and the biotechnology and pharmaceutical industries.

For decades, biologists have been working to understand how the linear sequence of amino acids in a protein molecule ultimately determines the three-dimensional shape of the active protein. The field has advanced rapidly in recent years through the application of sophisticated computer techniques for sequence analysis. These techniques take advantage of the wealth of data on protein sequences and structures now available in public databases.

"Without this data, the best sequence analysis in the world wouldn't be able to do much," Karplus noted.

In the CASP experiments, participants apply their techniques to a set of protein sequences provided by the organizers. The structures of these protein "targets" are determined by other researchers using the laborious techniques of x-ray crystallography and nuclear magnetic resonance spectroscopy. Those results are kept secret until the end of the experiment.

The first CASP experiment was held in 1994. The third and most recent version, CASP3, culminated in December with a conference in Asilomar, California. Karplus said his group's success in CASP3 varied depending on the difficulty of the protein target. In most cases, the UCSC researchers ranked among the top two or three groups in the accuracy of their predictions.

Karplus and his coworkers, who include associate professor of computer engineering Richard Hughey and graduate student Christian Barrett, use a variety of computer-based tools to analyze protein sequences. They have made their structure prediction method available on the Internet to anyone who wants to use it. Their approach primarily involves finding sequences related to the target protein by searching a large database of protein sequences whose structure is already known.

"On the easy targets, where a related structure was easily found with standard sequence search techniques, we did quite well," Karplus said.

On these easy targets, Karplus's group consistently placed among the top two or three out of 30 to 55 groups. (The number of groups making predictions varied depending on the target.) In predicting a particular aspect of protein structure known as secondary structure, the UCSC team came in second out of 32 groups for predictions over the full range of target difficulties.

On targets of "medium" difficulty, where the correct structure was hard to find in the database, they placed sixth out of about 43 groups. This was impressive because the UCSC group relied entirely on their ability to find related sequences, while some other groups incorporated additional techniques into their prediction methods.

"We have pushed the sequence techniques about as far as they'll go, so we'll probably have to start including other approaches to remain competitive in the next round of CASP," Karplus said.

The UCSC group and most other participants had little success on the most difficult targets, where no protein similar to the target existed in the database. Karplus said much of his group's current research is focused on developing new techniques that do not depend solely on finding related protein sequences in the database.

To the Currents home page

To UCSC's home page