PDBNucleicAcids¶

PDBNucleicAcids is a Biopython based package that can parse all nucleic acids in a PDB structure, with a special focus on base-pair representation.

Free software: MIT license
Documentation: https://pdbnucleicacids.readthedocs.io.

Get Started¶

The official release is found in the Python Package Index (PyPI)

$ pip install pdbnucleicacids

You can parse single stranded and double stranded nucleic acids.

from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import DSNABuilder

# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_pdb_file(pdb_code="10MH", pdir=".")
pdbl.retrieve_assembly_file(pdb_code="10MH", assembly_num=1, pdir=".")
# ... or else use your own

# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
     structure_id="10MH", filename="10mh-assembly1.cif"
)

# extract DataFrame with basepairs data
builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)

# take the first double strand nucleic acid as an example
dsna = dsna_list[0]
dsna.get_dataframe()

    i_chain_id  i_residue_index i_residue_name j_residue_name  j_residue_index j_chain_id
        B              402             DC             DG              433          C
        B              403             DC             DG              432          C
        B              404             DA             DT              431          C
        B              405             DT             DA              430          C
        B              406             DG             DC              429          C

In this case we have a gap in the basepairs at i_residue_index 407 and 408. This results in two distinct paired segments of dsDNA.

In reality only 408 is a mispair. 407 is a non-standard 5CM-Guanine pair. It’s ignored by PDBNucleicAcids because it currently supports only standard Watson-Crick basepairs.

Check the official documentation for more information.

TODO¶

regarding BasePairsRules:
- Distinguish between DNA and RNA bases (i.e. Deodyribose Adenine can pair with both
Deoxyribose Thyamine or Ribose Thyamine)
- Code other rules
Proper tests (WIP)

Credits¶

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

PDBNucleicAcids¶

Get Started¶

TODO¶

Credits¶

PDBNucleicAcids

Navigation

Related Topics