polyphony_create_fasta_file.pyΒΆ

Polyphony. Python code for the analysis of protein structure ensembles.

Copyright (C) 2013 William R. Pitt

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Run by typing “python create_fasta_file.py -f my_cluster.html” The input html file should be output from the RCSB sequence similarity webtool. e.g. “http://www.rcsb.org/pdb/explore/sequenceCluster.do?structureId=1HUW&entity=1&cluster=2856&seqid=70” The pdb files will be downloaded if necessary and a .fasta alignment file will be created. Please view this alignment using your favourite multiple alignment software package. Correct alignment if necessary.

Please be aware that if the program tries to download a lot of files then the ftp server complains. If this happens you will need to rerun the script until all your files have been downloaded. The location of the ftp server is specified in polyphony.cfg.

Usage: polyphony_create_fasta_file.py [options]

Options:
-h, --help show this help message and exit
-u, --update use this option if you want to re-extract all sequence data from pdb files
-a, --use_all_models
 If pdbs contain multiple models, treate them as separate chains. Otherwise use first model only (default)
Specifications for a RCSB sequence cluster:
-p PDBCODE, --pdbc=PDBCODE
 pdb code of target protein. Please note for NMR ensembles, only the first model will be used. If you would like to analyse all models, copy the pdb file to the current directory, use the -m option and then manually add the resulting sequences to your fasta file.
-c CHAINCODE, --chain=CHAINCODE
 single letter chain code for target. default = “”
-s SIMILARITY, --similarity=SIMILARITY
 Percentage sequence similarity cutoff, must be one of [100|95|90|70|50|40|30], default = 95
Own sequence alignment specifications:
-f ALIGNMENT_FILENAME, --fasta-file=ALIGNMENT_FILENAME
 name of sequence alignment file to use
Directory of pdb files:
-d DIRECTORY, --directory=DIRECTORY
 Specific a directory containing the pdb files you want use. Can be the current directory (.). Note Biopython halts reading pdb files if the element column on the far right has lower case letters e.g. Cl. You’ll need to manually edit this to continue. Not my fault, sorry.
-e, --use_existing_names
 Use existing names, after replacing underscores with hyphens. Otherwise fake pdb codes will be created for each file.
Selection from multimodel pdb:

-m MULTIMODEL_PDB_FILENAME, –multimodel_pdb=MULTIMODEL_PDB_FILENAME -r MRANGE, –range=MRANGE

Range (3 values, start, end and step size. Use spaces between values and start at 0)