PDBNucleicAcids

https://img.shields.io/pypi/v/pdbnucleicacids.svg Documentation Status Coverage Status

PDBNucleicAcids is a Biopython based package that can parse all nucleic acids in a PDB structure, with a special focus on base-pair representation.

Get Started

The official release is found in the Python Package Index (PyPI)

$ pip install pdbnucleicacids

You can parse single stranded and double stranded nucleic acids.

from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import DSNABuilder

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="1A02", filename="1a02-assembly1.cif"
)

# extract all double strand nucleic acids
builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)

# take the first double strand nucleic acid as an example
dsna = dsna_list[0]

# extract base-pairs data from double stranded nucleic acid
df = dsna.get_dataframe()
df.head()
  i_chain_id  i_residue_index  ... j_residue_index j_chain_id
0          A             4003  ...            5020          B
1          A             4004  ...            5019          B
2          A             4005  ...            5018          B
3          A             4006  ...            5017          B
4          A             4007  ...            5016          B

Check the official documentation for more information.

TODO

  • in search_paired_base maybe add a scoring function instead of simple distance

  • in search_paired_base add a warning if there is more than one candidate or maybe more than one candidate with similar dist or score

  • in BasePair get other information: shear, stretch, buckle, propeller, opening

  • explore the is_nucleic(non_standard) and maybe check if it needs updating

  • Proper tests (WIP)

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.