PET Projects
![]() |
| A 3-D structure of the protein falcipain-2. The structural elements are depicted by colored shapes: helical regions are red spirals; the sections of protein having a beta sheet are yellow strips; the turns in the structure are either green-colored or violet-colored regions on the cyan tube; and the cyan-colored tube traces the connectivity of those atoms that comprise the backbone of the protein and have no particular structural motif. |
Zottola helps develop an Automated Biological Determination Tool
By Jake Weyant, Special to Link
A paper airplane and a page in a book have drastically different functions, but the only difference in their form is the way one is folded. The same is true of proteins.
In a recent study, Dr. Mark Zottola of High Performance Technologies, Inc. (HPTi) and Dr. Xiao Lian Gao of the University of Houston have collaborated on a project to automate the determination of biological structures based on nuclear magnetic resonance (NMR) experiments. Their work for the Programming Environment and Training (PET) program of the High Performance Computing Modernization Program Office (HPCMPO) successfully developed a tool to identify amino acid resonances in protein spectra.
Structure determination allows scientists to look at a protein and know what function its shape implies. There are 20 naturally occurring amino acids in nature. These amino acids combine to form sequences called proteins, which each have a unique three-dimensional structure based on its amino acid sequence. Specific amino acid sequences form specific structural elements such as helices, beta sheets, turns and random coils. It is the combination of these structural elements, called motifs, that creates the unique three-dimensional structures of proteins. The structure shows the unique orientations of the atoms in each amino acid and their spatial relationship to atoms in other amino acids.
Depending on the type of atoms and their inter-atomic distances, proteins can have a variety of biological functions. Atoms that are far apart in sequence may be very close once the folding of the protein is taken into consideration. The structure to function relationship is a cornerstone concept in the field of biochemistry.
NMR has been used extensively to study the three-dimensional structure of proteins and ascertain their biological function. NMR spectroscopy compares the changes in a protein’s NMR spectrum before it performs its biological function with the spectra obtained during and after the function.
The NMR experiment produces a spectrum that has a series of peaks (often called resonances), which can be used to determine the structure of the protein. Using a protein’s chemical shift (CS) pattern, one can identify which resonances belong to which amino acids. This is the assignment phase of the structure-determination problem. These structures then become a protein’s fingerprint, allowing researchers to know what structural changes occur to a protein as it performs its biological function.
The assignment phase is the most critical step in the structure-determination problem. An incorrect assignment will lead to an incorrect determination of the protein structure. The NMR spectrum of proteins is extremely crowded because resonances can be numerically very close to one another. Since proteins contain hundreds of amino acids, with each amino acid containing from 5 to 16 resonances, the spectral crowding encountered requires extremely precise assignment strategies. Currently, this work is done by an NMR spectroscopist, requiring a time-consuming 6 to 18 months for successful completion.
Automation of this procedure would reduce the time required to solve an NMR spectrum from months to a few weeks; however, this would require a table containing extremely accurate values for amino acid resonances. First principles methods of computing CS values for the resonances within an amino acid exist, but the calculation of CS values for isolated amino acids has shown a remarkable lack of accuracy for use in automated-assignment strategies.
Zottola and Gao’s team used a unique approach to circumvent the limitations of computational accuracy for CS. Realizing that the local environment of the amino acid has an influence on the value of the CS seen in the NMR spectrum, Zottola and Gao used a technique called the method of triplets. For the first time, a study of NMR chemical shifts considered the effects of an element’s neighboring amino acids.
Dr. Zottola explained, “If you have someone who likes to go to bed at 9 p.m., but has a neighbor who plays loud music until 3 a.m., that person will have different sleep patterns than if he had a neighbor who went to bed at the same time.”
The method of triplets resulted in a study of more than 2,000 protein structures, which researchers used to gather data for every possible triplet combination of amino acids in all known structural motifs. Using these structures, CS calculations were performed and tabulated in a relational database that allowed search, retrieval and analysis of the thousands of data points. To validate the method of triplets, the calculated CS data was correlated with the NMR spectra in the BioMagResBank. The method of triplets produced a very high level of correlation, greater than 95%, between the calculated and experimental resonances found in NMR spectra of proteins.
The resulting database will allow researchers to compare findings and rapidly determine protein structures. “We are going from prediction to identification,” said Dr. Zottola. “This tool effectively cuts the time for structure determination in half. This technology has implications for counterterrorism efforts that might need to determine a toxic biological agent and develop a treatment in very little time. The pharmaceutical applications are also astounding. Being able to rapidly determine protein structures would mean a significant increase in therapeutic agents reaching the market.”
While the Chemical Shift Correlated Database will neither require hundreds of gigabytes of storage space nor hundreds of processors in order to run, Dr. Zottola believes the database will serve as a feeder application for HPC resources.
“This database will be used to automate the assignment phase of protein NMR structure determination,” said Dr. Zottola. “With the ability to rapidly assign peaks from the spectrum, this will provide the necessary input for running the CPU-intensive task of structure determination using the data gleaned from the NMR experiment and data gleaned via the aegis of this database. By reducing the time to analyze NMR spectra from months to days means that NMR structure determination will become a regularly utilized tool. This utilization will result in a significant increase in usage of HPC resources.”
Already, Zottola has his eyes on the future of this technology. “This tool is a bridge between the NMR spectroscopist and the bio-informatician. It expands the scope of structural proteomics by providing a tool to help answer the ‘What ifs.’ It feels like we have climbed a rock wall and are getting our first glimpses of what lies in an undiscovered canyon. This is an exciting time to be a scientist,” added Zottola.
