Protein - Prody

Introduction

Terminal window
uv init prody --name prody-project
cd prody
uv add prody

File Parsers

Let’s start with parsing a protein structure and then keep working on that in this part.

File parser function names are prefixed with parse.

When using parsePDB(), usually an identifier will be sufficient. If corresponding file is found in the current working directory, it will be used, otherwise it will be downloaded from PDB servers.

Let’s parse structure 5uoj of p38 MAP kinase (MAPK):

from prody import *
p38 = parsePDB('5uoj') # returns an AtomGroup object
print(p38) # typing in a variable name will give some information

We see that this structure contains 3138 atoms.

@> 3138 atoms and 1 coordinate set(s) were parsed in 0.01s.
AtomGroup 5uoj

Now, similar to listing parser function names, we can use tab completion to inspect the p38 object:

Let’s use some of them to get information on the structure:

print(f"Num Atoms {p38.numAtoms()}")
print(f"Num Coordsets {p38.numCoordsets()}") # returns number of models
print(f"Num Residues {p38.numResidues()}") # water molecules also count as residues

PDB Files

This example demonstrates how to use the flexible PDB fetcher, fetchPDB(). Valid inputs are the PDB identifier, for example 2k39, or a list of PDB identifiers, for example ["2k39", "1mkp", "1etc"]. Compressed PDB files (pdb.gz) will be saved in the current working directory or in a specific folder.

Fetch PDB files

A single file

The fetchPDB function will return a filename if the download is successful:

pdb.py
import os
from prody.proteins import localpdb
file = localpdb.fetchPDB("5uoj", folder="data")
assert file == 'data\\5uoj.pdb.gz'

Even if the file is compressed, you can view it directly with Protein - PyMOL

Multiple files

This function also accepts a list of PDB identifiers:

pdb.py
files = localpdb.fetchPDB(["5uoj", "1r39", "@!~#"], folder="data")
assert files == ['data\\5uoj.pdb.gz', 'data\\1r39.pdb.gz', None]

For failed downloads, None (or a list containing a None element) will be returned.

ProDy will provide a report of the download results and return a list of filenames. The report will be shown on the screen, which in this case would be:

Terminal window
@> WARNING '@!~#' is not a valid identifier.
@> Connecting wwPDB FTP server RCSB PDB (USA).
@> Downloading PDB files via FTP failed, trying HTTP.
@> 5uoj downloaded (data\5uoj.pdb.gz)
@> 1r39 downloaded (data\1r39.pdb.gz)
@> PDB download via HTTP completed (2 downloaded, 0 failed).

Parse PDB files

ProDy offers a fast and flexible PDB parser, parsePDB(). The parser can be used to read well-defined subsets of atoms, specific chains, or models (in NMR structures) to improve performance. This example shows how to use the flexible parsing options.

Three types of user input are accepted:

  • PDB file path, for example "../1MKP.pdb"
  • Compressed PDB file path (gzipped), for example "5uoj.pdb.gz"
  • PDB identifier, for example 2k39

The output is an AtomGroup instance that stores atomic data and can be used as input for functions and classes for dynamics analysis.

Parse a file

You can parse PDB files by passing a filename (gzipped files are handled).

We do so after downloading a PDB file:

Acetylcholinesterase

pdb.py
from prody import AtomGroup
from prody.proteins import localpdb, pdbfile
# ...
## Parse PDB files
file = localpdb.fetchPDB("1OCE") # Acetylcholinesterase
atoms: AtomGroup = pdbfile.parsePDB(file)
assert atoms.getTitle() == "1oce"

Analysis functions

http://www.bahargroup.org/prody/tutorials/prody_tutorial/basics.html#analysis-functions

Tutorials

Make this tutorial:

TODO

Structural Analysis