On this page
Introduction
uv init prody --name prody-projectcd prodyuv add prodyFile Parsers
Let’s start with parsing a protein structure and then keep working on that in this part.
File parser function names are prefixed with parse.
When using parsePDB(), usually an identifier will be sufficient. If corresponding file is found in the current working directory, it will be used, otherwise it will be downloaded from PDB servers.
Let’s parse structure 5uoj of p38 MAP kinase (MAPK):
from prody import *
p38 = parsePDB('5uoj') # returns an AtomGroup objectprint(p38) # typing in a variable name will give some informationWe see that this structure contains 3138 atoms.
@> 3138 atoms and 1 coordinate set(s) were parsed in 0.01s.AtomGroup 5uojNow, similar to listing parser function names, we can use tab completion to inspect the p38 object:
Let’s use some of them to get information on the structure:
print(f"Num Atoms {p38.numAtoms()}")print(f"Num Coordsets {p38.numCoordsets()}") # returns number of modelsprint(f"Num Residues {p38.numResidues()}") # water molecules also count as residuesPDB Files
This example demonstrates how to use the flexible PDB fetcher, fetchPDB(). Valid inputs are the PDB identifier, for example 2k39, or a list of PDB identifiers, for example ["2k39", "1mkp", "1etc"]. Compressed PDB files (pdb.gz) will be saved in the current working directory or in a specific folder.
Fetch PDB files
A single file
The fetchPDB function will return a filename if the download is successful:
import osfrom prody.proteins import localpdb
file = localpdb.fetchPDB("5uoj", folder="data")assert file == 'data\\5uoj.pdb.gz'Even if the file is compressed, you can view it directly with Protein - PyMOL
Multiple files
This function also accepts a list of PDB identifiers:
files = localpdb.fetchPDB(["5uoj", "1r39", "@!~#"], folder="data")assert files == ['data\\5uoj.pdb.gz', 'data\\1r39.pdb.gz', None]For failed downloads, None (or a list containing a None element) will be returned.
ProDy will provide a report of the download results and return a list of filenames. The report will be shown on the screen, which in this case would be:
@> WARNING '@!~#' is not a valid identifier.@> Connecting wwPDB FTP server RCSB PDB (USA).@> Downloading PDB files via FTP failed, trying HTTP.@> 5uoj downloaded (data\5uoj.pdb.gz)@> 1r39 downloaded (data\1r39.pdb.gz)@> PDB download via HTTP completed (2 downloaded, 0 failed).Parse PDB files
ProDy offers a fast and flexible PDB parser, parsePDB(). The parser can be used to read well-defined subsets of atoms, specific chains, or models (in NMR structures) to improve performance. This example shows how to use the flexible parsing options.
Three types of user input are accepted:
- PDB file path, for example
"../1MKP.pdb" - Compressed PDB file path (gzipped), for example
"5uoj.pdb.gz" - PDB identifier, for example
2k39
The output is an AtomGroup instance that stores atomic data and can be used as input for functions and classes for dynamics analysis.
Parse a file
You can parse PDB files by passing a filename (gzipped files are handled).
We do so after downloading a PDB file:
from prody import AtomGroupfrom prody.proteins import localpdb, pdbfile
# ...
## Parse PDB files
file = localpdb.fetchPDB("1OCE") # Acetylcholinesteraseatoms: AtomGroup = pdbfile.parsePDB(file)assert atoms.getTitle() == "1oce"Analysis functions
http://www.bahargroup.org/prody/tutorials/prody_tutorial/basics.html#analysis-functions
Tutorials
Make this tutorial:
- http://www.bahargroup.org/prody/tutorials/prody_tutorial/
- http://www.bahargroup.org/prody/tutorials/insty_tutorial/