PDBNucleicAcids¶

https://img.shields.io/pypi/v/pdbnucleicacids.svg

Documentation Status

Coverage Status

PDBNucleicAcids is a Biopython based package that can parse all nucleic acids in a PDB structure, with a special focus on base-pair representation.

Free software: MIT license
Documentation: https://pdbnucleicacids.readthedocs.io.

Get Started¶

The official release is found in the Python Package Index (PyPI)

$ pip install pdbnucleicacids

You can parse single stranded and double stranded nucleic acids.

from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import DSNABuilder

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="1A02", filename="1a02-assembly1.cif"
)

# extract all double strand nucleic acids
builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)

# take the first double strand nucleic acid as an example
dsna = dsna_list[0]

# extract base-pairs data from double stranded nucleic acid
df = dsna.get_dataframe()
df.head()

  i_chain_id  i_residue_index  ... j_residue_index j_chain_id
        A             4003  ...            5020          B
        A             4004  ...            5019          B
        A             4005  ...            5018          B
        A             4006  ...            5017          B
        A             4007  ...            5016          B

Check the official documentation for more information.

TODO¶

in search_paired_base maybe add a scoring function instead of simple distance
in search_paired_base add a warning if there is more than one candidate or maybe more than one candidate with similar dist or score
in BasePair get other information: shear, stretch, buckle, propeller, opening
explore the is_nucleic(non_standard) and maybe check if it needs updating
Proper tests (WIP)

Credits¶

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.