Other informations

Motivation of AH-DB

Conformational transitions are commonly observed in various protein interactions which carry out important biological functions. For example, adenylate kinase (AdK), which catalyzes the phosphoryl transfer from ATP to AMP, undergoes a large conformational variation from an open state (the apo structure) to a closed state (the holo structure).

Other examples such as proteins responsible for gene regulation, signal transduction and ion flow also have notable conformational transitions. There have been many studies analyzing a collection of apo and holo structure pairs to investigate the conformational transitions and/or critical residues in protein-protein, -nucleic acid and -ligand interactions.

However, most of these studies compiled the apo-holo pairs from scratch. The collection process was usually complicated, varied from article to article and produced a small-scale dataset (less than 1000 pairs). This study presents AH-DB, Apo and Holo structure DataBase, which provides an easy and unified way to prepare apo-holo pairs for proteins of interests. It contains 572,921 apo-holo pairs of 3,377 proteins from 681 organisms. AH-DB categorized molecules into proteins, nucleic acids, ligands and ions and identified/mapped molecules among complex structures from different Protein Data Bank (PDB) entries to pair apo and holo structures.

What is AH-DB?

AH-DB (Apo and Holo structures DataBase) collects the apo and holo structure pairs of proteins. Proteins are frequently associated with other molecules to perform their functions. Experimental structures determined in the bound state are named holo structures; while structures determined in the unbound state are named apo structures.

In AH-DB, we extended the definition of apo and holo structures. The apo strcuture of a protein could be in a apo complex containing other molecules. The corresponding holo strcuture must be in a holo complex containing all molecules in the apo complex and some added molecules.

AH-DB is the largest database focusing on pairing apo and holo strctures and provide a sophisticated view of the paired structures. It collects 562424 apo-holo pairs, covering 3369 proteins from 677 organisms. See more detailed statistics.

Terminology in AH-DB

AH-DB pairs protein structures before and after binding. Some conventions, e.g. the structure before binding is called "apo" structure, are widely used in this area. Here we clarify the terminology used in AH-DB.

  • apo structure: The structure before binding, a.k.a unbound structure.
  • holo structure: The structure after binding, a.k.a bound structure.
  • apo complex: The apo structures of all molecules in a paired complexes.
  • holo complex: The holo structures of all molecules in a paired complexes.
  • target protein: Users can specify a protein, which is named "target protein" just for convenience, in the apo-holo complex pair to analyze.
  • core molecules: Common molecules in both apo and holo complexes except the target protein.
Pseudo ligands in AH-DB

The default list of ligands that are not substrates or natural ligands, namely these compounds are not considered as added molecules by default, in AH-DB.

Ligand identifier in PDBLigand description
LAlanthanum ion
LUlutetium ion
MSEselenomethionine
OSosmium ion
PTplatinum ion
RErhenium ion
SMsamarium ion
SRstrontium ion
WO4tungstate ion
XExenon ion
YBytterbium ion
Superimposition algorithms in AH-DB

In AH-DB, the set of residue pairs in an apo-holo structure pair are regarded as a set of paired vectors in the 3D vector space, where a residue is represented by its alpha carbon. Two superimposition algorithms were adopted to generate the superimposition of the apo and holo structures of a target protein according to the residue mapping obtained in the previous steps.

  1. Least-Square (LS)
    The first algorithm (Zhang, 1994) is a conventional least-square method that minimizes root-mean-square deviation (RMSD). The structural superposition problem has classically been solved with the standard statistical optimization method of least-squares (Flower, 1999). The LS objective is to find the rotations and translations that minimize the squared distances among corresponding atoms in the observed structures. A fundamental justifying assumption of LS (as given in the GaussˇVMarkov theorem) requires that the errors have equal variance (Seber and Wild, 1989). When this assumption does not hold, a condition known in statistics as heteroscedasticity, LS can provide misleading and inaccurate results.
  2. Maximum Likelihood (ML)
    The second algorithm, THESEUS (Theobald and Wuttke, 2006), is a maximum likelihood method that down-weight variable structural regions for a better superimposition. ML is widely considered to be fundamental in statistical modeling and parameter estimation (Pawitan, 2001). ML superpositioning requires solving for four types of unknowns: a global covariance matrix describing the variance and correlations for each atom in the structures, a mean structure, and, for each structure in the analysis, a rotation matrix and a translation vector. The ML method accounts for uneven variances and correlations in the structures by weighting by the inverse of the atomic covariance matrix. The unknowns are interdependent and cannot be solved analytically. For simultaneous estimation, use an iterative numerical algorithm for maximizing the joint likelihood.
Home page

The following figure shows the home page of AH-DB, which provides a search facility for users to browse AH-DB.

home page

The search facility includes four major parts:

  1. Organism: the organism to search.
  2. Target protein: search the apo and holo structure pairs of this protein.
  3. Added molecules: the molecules that appear only in the holo structure.
  4. Miscellaneous: all other search constraints, such as the requirement that the added molecules must contain a protein that is identical to the target protein, namely the target protein and the added protein form a homo-dimer, are listed here.
Search result page

The following figure shows the search result page of AH-DB, which lists all pairs that satisfy the search settings.

search result page

This page includes two areas:

  1. Search information: shows the search settings specified in the previous page.
  2. Search results: a table listing all the apo and holo structure pairs that satisfy the search settings. The information of each pair includes the PDB IDs of the apo and holo complexes, protein name, organism, composition of the added molecules, conformational transitions and structural quality.
Pair page

The following figure shows the pair page of AH-DB, which shows details of an apo and holo structure pair.

pair page

This page consists of four areas:

  1. Sequence view: shows the alignment of the primary and secondary structures of the target protein in the apo and holo structures.
  2. Structure view: uses a Jmol plugin to render the superimposed structure of the apo and holo complexes.
  3. Display: provides controls that adjust the presence and absence of each molecular element in the structure view. Moreover, users can highlight residues with a specific secondary structure, disorder/order state, or specific conformational transitions upon binding.
  4. Download: provides various data, such as the superimposed structure, to download.
Keyword search

AH-DB provides a sophisticated keyword search for users to specify target proteins and added molecules of interest.

  • Protein identifiers
    PDB, UniProt
  • Ligand identifiers
    The ligand identifier provided by PDB, such as AP5 in 1E4V.
  • Ion identifiers
    The ligand identifier provided by PDB, such as _CU in 1AZV.
  • Compound names / other keywords
    Any compound names such as "adenylate kinase" and "amidin" can be used. AH-DB will search the terms not recognized as any identifiers in compound descriptions.
    • The given keywords must match individual words. For example, "amidin" matches " amidin ", "-amidin," or "(amidin)" but not "carboxamidine". Here " -,()" are commonly used word boundaries. See the Wildcard character (*) for searching keywords matching partial words.
    • Consecutive keywords are considered together. For example "adenylate kinase" matches "adenylate kinase" but not "adenylate/guanylate kinase". See the AND operator, OR operator and NOT operator for more advanced usages.
  • Wildcard character (*)
    The star can match any character(s). For example, "amidin*" will give you results whose descriptions contain words heading with "amidin". The search "*amidin*" will give you results whose descriptions contain the string "amidin", even that it is not an individual word.
  • AND operator
    Use AND to tell AH-DB that two keywords should be considered separately. For example, "adenylate AND kinase" matches both "adenylate kinase" and "adenylate/guanylate kinase". Using comma (,) or semicolon (;) to invoke the AND operator implicitly. Another indicator of the AND operator is the ampersand symbol (&). For example, "adenylate, kinase", "adenylate; kinase" and "adenylate & kinase" are identical to "adenylate AND kinase".
  • OR operator
    OR is similar to the AND operator except that only one of the separated keywords is required in the search results. For example, "adenylate OR kinase" matches both "adenylate" and "kinase". The plus symbol (+) is an indicator of the OR operator. For example, "adenylate + kinase" is identical to "adenylate OR kinase". Note that when mixing with AND and NOT operators, OR has the lowest precedence as many packages, such as MySQL, do.
  • NOT operator
    Use NOT to ensure that the keywords after the NOT operator are not present in the search results. For example, "adenylate NOT kinase" does not match "adenylate kinase". The minus symbol (-) is an indicator of the OR operator. For example, "adenylate - kinase" (do not omit the surrounding space of "-") is identical to "adenylate NOT kinase".

To wrap it up, "ID OR DESC1 DESC2 AND DESC3* - DESC4" searches molecules satisfying either i) with ID or ii) with "DESC1 DESC2" and DESC3 but without DESC4. Note the precedence of OR and the concatenation of DESC1 and DESC2.

Troubleshooting

For better experience, AH-DB utilized many modern web technologies such as AJAX. We provide some checkpoints to help people who have no idea about these web technologies.

  1. Browser
    The default settings of most modern browsers (Chorme, Firefox, Internet Explorer and Opera) should be okay for AH-DB.
    • enable Javascript (further information for Firefox and IE)
  2. Tested Environments
    The environments that have been tested for compatibility are as follows:
    1. Windows 7 - Chrome 4.1.249.1064 - Java 1.6.0_20
    2. Windows 7 - Internet Explorer 8.0 - Java 1.6.0_20
    3. Windows 7 - Firefox 3.6.3 - Java 1.6.0_20
    4. Windows 7 - Opera 10.53 - Java 1.6.0_20
    5. Windows XP - Chrome 4.1.249.1064 - Java 1.6.0_20
    6. Windows XP - Internet Explorer 8.0 - Java 1.6.0_20
    7. Windows XP - Firefox 3.6.3 - Java 1.6.0_20
    8. Windows XP - Opera 10.53 - Java 1.6.0_20
    9. Ubuntu 10.04 - Chrome 4.1.249.1064 - Java 1.6.0_20
    10. Ubuntu 10.04 - Internet Explorer 8.0 - Java 1.6.0_20
    11. Ubuntu 10.04 - Firefox 3.6.3 - Java 1.6.0_20
    12. Ubuntu 10.04 - Opera 10.53 - Java 1.6.0_20

It would be very nice if you can tell us your environment in which AH-DB runs well. On the other hand, please feel free to contact us when you cannot use AH-DB on a specific environment. We will test AH-DB on your environment as soon as possible.

Mirrors AHDB@NCKU.EE AHDB@NTU.CSBB AHDB@NTU.CSIE