To run LOCK 2 in Single or Multiple mode, or to run a FoldMiner Structural Similarity Search search, you will need to specify a query structure. The query structure can be specified as a four character PDB accession code or as a 6 character SCOP id. Alternatively, you may upload a PDB file. To obtain PDB accession codes, go to the PDB WWW Browser. The FoldMiner server cannot read proprietary file formats (such as Microsoft Word files) and therefore only accepts text files.
To select a particular chain in the query structure, you can either entre the chain identifier in the "Specifiy Chain (if present)" text area or append it to the PDB accession code as follows: 4hbb-A. This example selects only the "A" chain of 4hhb. This option is not available when the query is specified as a SCOP id.
However, in Multiple superposition mode, the targets cannot
be uploaded from files. Instead, PDB or SCOP accession codes must be
specified. To select a particular chain you must append the chain
identifier to the PDB code as follows: 4hhb-A. The list of targets
must be entered in the provided text area and separated by carraige
returns.
Various subsets of SCOP have been extracted from ASTRAL. Two types of target databases are available. You may align your query to a database of targets in which the sequence identity between any two structures is less than a threshold value, or to a database of targets which represent all classifications within a given level of the SCOP hierarchy. These two options are mutually exclusive.
You may enter either PDB accession codes or SCOP domain identifiers:
If "sequential ordering" is selected, the residue alignment will contain no non-sequential pairs of aligned residues. Selecting "non-sequentail ordering" allows target residues that are not sequential in sequence space to be aligned to sequential query residues. This is particularly helpful in cases such as circular permutations, and increases the number of aligned residues in many cases. If "non-sequential ordering" is selected, the residue alignment will still be restricted to sequential numbering within secondary structure elements. The alignment score does not depend on this parameter.
LOCK 2 aligns secondary structure elements and then refines the alignment at the residue level by iteratively matching nearest neighbors. By default, residues that are further than 3A apart are never matched and are not reported as aligned. This threshold may be changed by entering a new value in Angstroms. The number of aligned residues and the RMSD will likely increase as the distance threshold is increased. We have found that 3A is appropriate for most applications. Press the "Reset to Default" button to restore the default value of 3A.
LOCK 2 uses a geometric hashing algorithm to test initial superpositions of the query and target structures. These superpositions are scored by a dynamic programming algorithm and the best is selected for refinement. The number of superpositions attempted depends on both the number of secondary structure elements and on the topology of the two proteins. Alignments of large proteins or structures with internal repeats may require on the order of several minutes or more of CPU time. In these cases, however, approximations can often be made in order to select fewer initial superpositions for evaluation. If you wish to make this approximation, select "Fewer Initial Superpositions." In extreme cases, we may override this option and test fewer initial superpositions. You will be notified if this occurs.
When performing a structural similarity search, the expectation determines the statistical significance cutoff (p value) at which the search is run according to the following formula:
An expectation of 10, for example, implies that an average of 10 false positives will be included in the search results. For a database of 1000 proteins, this corresponds to a statistical significance threshold of 0.01.
FoldMiner uses information from statistically significant LOCK 2 alignments to target structures in order to identify the core fold (or motif) in the query structure that is the basis for the structural similarity observed in high scoring targets. This information can be used to find additional targets that align well to the structurally conserved region of the query, but which may not have received high scores for other reasons. The algorithm used to detect additional targets and the query structure's core fold is described in more detail in a forthcoming publication. The extent to which this conservation information is used to detect additional homologous structures and to identify the core fold is controlled by the parameter 'x,' which may be adjusted. A value of zero will turn off this portion of the search procedure. This parameter affects both the search results and the definition of the motif shared by the query and high scoring targets.
FoldMiner is designed to favor alignments that are global with respect to both the query and target, but if no global alignments are found, you may wish to select the local alignment scope option and reanalyze the alignment results. Statistical signifiance values are preliminary at this point, so we suggest that you focus your attention on the top ranking results.
You may view a LOCK 2 alignment result in one of three ways:
The alignment score lies on the interval [0,1], where high scores represent high quality alignments. Raw alignment scores (not shown) are normalized to the maximum of the query vs. query and target vs. target scores in order to prefer alignments that are global with respect to both the query and target structures.
Each alignment is assigned a statistical significance score. This score, or p-value, gives the probability of achieving an alignment score at least as high as the score for this alignment by chance. In the context of a structural similarity search, the highest acceptable p-value is determined by the expectation value and the number of structures in the target database.
In order to calculate statistical significance scores, background distributions of alignment scores were obtained by aligning each structure in a subset of SCOP domains to every other domain not in its own SCOP class. This last restriction is imposed in order to create distributions comprised almost entirely of true negative scores; there are very few strong structural similarities that cross SCOP class boundaries. We use a subset of SCOP in which no two domains share greater than 25% sequence identity.
The data from all structures in each SCOP fold are combined into one background distribution. We provide separate background distributions for each SCOP fold because certain characteristics of proteins, such as secondary structure composition and compactness, influence the probability of obtaining a false positive result. Some SCOP folds contain very few queries. In these cases, we use background distributions created from an entire SCOP class or all of SCOP. The cumulative distribution function for each background distribution is then used to provide a statistical significance score for an alignment.
You will be notified if we cannot identify your query structure's SCOP fold or if its fold does not have its own background score distribution. You will be able to reanalyze your results (without repeating any alignments) using the background distribution of your choice.
If your browser supports Chime and Javascript, you may execute an arbitrary sequence of Chime commands to aid your visualization of an alignment. Separate commands with semicolons or carriage returns, and press the "execute" button to run the script. Use the word "query" to refer to the query residues (initially blue) and the word "target" to refer to the target residues (initially red).
Example script:FoldMiner uses information contained in statistically significant alignments of the query structure to target structures to determine the structural motif that is the basis for the similarity between the query and high scoring targets. In some cases, this motif is global and may therefore be considered to be the query's core fold.
This motif is described probabilistically by the structural conservation of each query secondary structure element. If your browser supports Chime, you will see a Chime display in which the query's secondary structure elements are colored according to their conservation values. Bright and dark colors correspond to strongly and weakly conserved secondary structure elements, respectively.
The query structure's secondary structure elements are colored according to their conservations, where bright colors represent strongly conserved secondary structure elements. The colors are consistent across all queries; i.e. a secondary structure element with a given conservation always has the same color. In some cases, conservation values for a particular structure may not utilize the entire color range. Click the button to the left of the "renormalize color scale to use entire spectrum" link to use the entire color spectrum for this particular query. Press the button again to return to the original color scheme.
Press the button to the left of the "show conservation values" link to display each secondary structure element's conservation value. Press the button again to turn the labels off. Conservation values lie on the interval [0,1], where a value of 1 indicates complete conservation.
The SCOP fold statistics table at the bottom of the search results window shows the number of times each fold listed is represented among the statistically significant results. Sorting the results table by fold will group together domains of the same fold in the same order as in the fold statistics table. To highlight all members of a given fold, check the box next to that fold. Uncheck it to clear the highlighting. More than one fold may be highlighted at the same time.
In the current implementation of FoldMiner, only one motif is identified. To attempt to identify an additional motif in the query structure, select secondary structure elements to be excluded from consideration (most likely secondary structure elements with high conservation values). Press the 'blink' button to cause the secondary structure element in the Chime display to blink for purposes of identification. Because no additional alignments are performed, your results should appear in this window and in the window in which the search results table is displayed within several seconds.
You may navigate among your results using the history displayed in the FoldMiner control panel window. If you choose to repeat and iteration with a different value of x, both sets of results will be available to you.
To view your previous results, select the appropriate radio button. The motif display, iterative search options frame, and the results table (displayed in a separate window) will be updated. The value 'x' corresponds to the parameter that determines the conservation profile's contribution to recalculated secondary structure element score.
Clicking a radio button will display the corresponding search results.