Protein threading.html

 
ca de en es fr it nl no pl pt ru ro fi sv tr vo


 

Contents

Description

Protein Threading, also known as fold recognition, is a method for the computational prediction of protein structure from amino acid sequence. Homology modelling is also for that purpose.

Classification of protein structure

The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the structural and evolutionary relationships of known structure. Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy, but the principal levels are family, superfamily and fold described below.

The different major levels in the hierarchy are:

Family (clear evolutionarily relationship) 
Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater. However, in some cases similar functions and structures provide definitive evidence of common descent in the absense of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%.
Superfamily (probable common evolutionary origin) 
Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexakinase together form a superfamily.
Fold (major structural similarity) 
Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies.

Protein threading

Protein threading or fold recognition is for those targets which have the same fold as proteins of known structures but do not have homologous proteins with known structure. Protein threading predicts protein structures by using statistical knowledge of the relationship between the structure and the sequence.

The prediction is made by "threading" (i.e., placing, aligning) each amino acid contained in the target sequence to a position in the template structure, and evaluating how well the target fits the template. After the best-fit template is selected, the structural model of the sequence is built based on the alignment with the chosen template. The protein threading method is based on two basic observations. One is that the number of different folds in nature is fairly small (approximately 1000), and the other is that according to the statistics of PDB, 90% of the new structures submitted to PDB in the past three years have similar structural folds to the ones in PDB.

Steps involved in protein threading

A general paradigm of protein threading consists of the following four steps:

The construction of a structure template database
Select protein structures from the protein structure databases as structural templates. This generally involves selecting protein structures from databases such as PDB, FSSP, SCOP, or CATH, after removing protein structures with high sequence similarities.
The design of the scoring function
Design a good scoring function to measure the fitness between target sequences and templates based on the knowledge of the known relationships between the structures and the sequences. A good scoring function should contain mutation potential, environment fitness potential, pairwise potential, secondary structure compatibilities, and gap penalties. The quality of the energy function is closely related to the prediction accuracy, especially the alignment accuracy.
Threading alignment
Align the target sequence with each of the structure templates by optimizing the designed scoring function. This step is one of the major tasks of all threading-based structure prediction programs that take into account the pairwise contact potential; otherwise, a dynamic programming algorithm can fulfill it. This thesis is mainly dedicated to solving the optimal alignment problem derived from a scoring function considering pairwise contacts.
Threading prediction
Select the threading alignment that is statistically most probable as the threading prediction. Then construct a structure model for the target by placing the backbone atoms of the target sequence at their aligned backbone positions of the selected structural template.

Difference between protein threading and homology modelling

Homology modelling and protein threading are both template-based methods and there is no rigorous boundary between homology modelling and protein threading in terms of prediction techniques. But the protein structures they target at are different. Homology modelling is for those targets that have homologous proteins with known structure. As mentioned, protein threading is for those targets with only fold-level homology found . In other words, homology modelling is for easy targets and protein threading is for hard targets.

Homology modelling treats the template in an alignment as a sequence and only sequence homology is used for prediction. Protein threading treats the template in an alignment as a structure and both sequence and structure information extracted from the alignment are used for prediction. When there is no significant homology found, protein threading can make a prediction based on the structure information. That also explains why protein threading may be more effective than homology modelling in many cases.

In practice, when the sequence identity in a sequence sequence alignment is low (i.e. <25%), homology modelling may not produce a significant prediction. In this case, if there is distant homology found for the target, protein threading can generate a good prediction.

More about threading

Fold recognition methods can be broadly divided into two types: 1. methods that derive a 1-D profile for each structure in the fold library and align the target sequence to these profiles; 2. methods that consider the full 3-D structure of the protein template. A simple example of a profile representation would be to take each amino acid in the structure and simply label it according to whether it is buried in the core of the protein or exposed on the surface. More elaborate profiles might take into account the local secondary structure (e.g. whether the amino acid is part of an alpha helix) or even evolutionary information (how conserved the amino acid is). In the 3-D representation, the structure is modelled as a set of inter-atomic distances i.e. the distances are calculated between some or all of the atom pairs in the structure. This is a much richer and far more flexible description of the structure, but is much harder to use in calculating an alignment. The profile-based fold recognition approach was first described by Bowie, Lüthy and Eisenberg in 1991. The term threading was first coined by Jones, Taylor and Thornton in 1992, and originally referred specifically to the use of a full 3-D structure atomic representation of the protein template in fold recognition. Today, the terms threading and fold recognition are frequently (though somewhat incorrectly) used interchangeably.

Fold recognition methods are widely used and effective because it is believed that there are a strictly limited number of different protein folds in nature, mostly as a result of evolution but also due to constraints imposed by the basic physics and chemistry of polypeptide chains. There is, therefore, a good chance (currently 70-80%) that a protein which has a similar fold to the target protein has already been studied by X-ray crystallography or NMR spectroscopy and can be found in the PDB (Protein Data Bank). Currently there are just over 1100 different protein folds known (see CATH database statistics for latest view), but new folds are still being discovered every year thanks in part to the ongoing structural genomics projects.

Many different algorithms have been proposed for finding the correct threading of a sequence onto a structure, though many make use of dynamic programming in some form. For full 3-D threading, the problem of identifying the best alignment is very difficult (it is an NP-hard problem) and researchers have made use of many combinatorial optimization methods such as simulated annealing or branch and bound searching to arrive at heuristic solutions.

It is interesting to compare threading methods to methods which attempt to align two protein structures (Protein structural alignment), and indeed many of the same algorithms have been applied to both problems.

Protein threading software

See also

References

  • Bowie JU, Lüthy R, Eisenberg D (1991). "A method to identify protein sequences that fold into a known three-dimensional structure". Science 253: 164–170. doi:10.1126/science.1853201. PMID 1853201. 
  • Jones DT, Taylor WR, Thornton JM (1992). "A new approach to protein fold recognition". Nature 358: 86–89. doi:10.1038/358086a0. 
  • Lathrop RH (1994). "The protein threading problem with sequence amino acid interaction preferences is NP-complete". Protein Eng 7: 1059–1068. doi:10.1093/protein/7.9.1059. PMID 7831276. 
  • Jones DT, Hadley C (2000). "Threading methods for protein structure prediction". in Higgins D, Taylor WR. Bioinformatics: Sequence, structure and databanks. Heidelberg: Springer-Verlag. pp. pp. 1–13. 
  • Xu J, Li M, Kim D, Xu Y (2003). "RAPTOR: Optimal Protein Threading by Linear Programming, the inaugural issue". J Bioinform Comput Biol 1 (1): 95–117. doi:10.1142/S0219720003000186. PMID 15290783. 
  • Xu J, Li M, Lin G, Kim D, Xu Y (2003). "Protein threading by linear programming". Pac Symp Biocomput: 264–275. PMID 12603034. 
SmutneElenaBogumiłaAgataCzesławaBrygidaGrzybySmutneDorotaAdaSmutneEleonoraEleonoragospodarstwa rolneCecylia All Right Reserved © 2007, Designed by Stylish Blog.