Introduction
============

**Purpose**

The aims of this work are the discovery and systematic annotation of protein molecular conformational space as a function of their intermolecular interactions. There are many possible applications for this software, for example, the comparison of molecular simulation trajectories with experimental structures. It was written primarily with what we call Ensemble-Based Drug Discovery (EBDD) in mind. An example of its use for this purpose is the discovery of potential relationships between pharmacophoric interactions and ligand binding site conformational flexibility.

**Overall Methodology**

To carry out an analysis of all the protein structures available for a particular protein, an initial pdb code must be specified. Once this has been done, all structures with sequences within a given percentage identity are downloaded from the RCSB website and a fasta sequence alignment file created from their ATOM records. This alignment file is given a standard name and is the reference frame used for subsequent protein comparisons. Properties are then calculated on a per residue basis and only the properties of equivalent residues (in the same alignment position) are compared. Properties describing the conformational state of a residue and its interactions with other molecules are included. In this way, the effect of a proteins environment can be related to its conformational state. To facilitate the analysis of large and/or large numbers of proteins, intraprotein distances or the like aren't calculated. Three dimensional aspects become obvious when similarites and differences in 1D properties are mapped onto the coordinates of example structures. Correlations between the property values of different residue can also be calculated. 

**Description of protein conformational state**

The backbone conformational of each protein structure is modelled as a spline, or trace, fitted through the c-alpha atoms. The conformational state of each residue is recorded as two numbers, the `curvature and torsion <http://en.wikipedia.org/wiki/Frenet%E2%80%93Serret_formulas>`_ of that spline at its c-alpha atom. Side-chain positions are recorded as a single x,y,z coordinate of a terminal atom after the mainchain atoms have been fitted to a reference frame. 

**Characterisation of intermolecular interactions**

The concept of the interaction fingerprint is used here. Each residue is a assigned a fingerprint depending of the type and number on interactions it forms with other molecules. The fingerprint consists of counts of each interaction type in a defined order. Interactions with small molecule ligands are extracted from the `Credo database <http://www-cryst.bioc.cam.ac.uk/databases/credo>`_ and those with proteins from the `Piccolo database <http://www-cryst.bioc.cam.ac.uk/piccolo>`_. 

**Visualisation of results**

Results of the analysis can be viewed in 3 main ways, 2 of which are via 3rd party open source software applications. For sequence alignment based visualisation, together with interactive 3D structure and dendogram viewers, `Jalview <http://www.jalview.org/>`_ is used. Specifically, Jalview feature files and Newick format tree diagram files are produced by Polyphony. For more powerful 3D visualisation of results, `PyMOL <http://www.pymol.org/>`_ is used. For this python functions have been written to allow interactive visualisation. The 3rd way is via 2D plots produced using `PyLab <http://www.scipy.org/PyLab>`_/`matplotlib <http://matplotlib.sourceforge.net/>`_. It is also possible to view standalone dendrograms using `E.T.E. <http://ete.cgenomics.org/>`_.