PDBNucleicAcids¶
PDBNucleicAcids is a Biopython based package that can parse all nucleic acids in a PDB structure, with a special focus on base-pair representation.
Free software: MIT license
Documentation: https://pdbnucleicacids.readthedocs.io.
Get Started¶
The official release is found in the Python Package Index (PyPI)
$ pip install pdbnucleicacids
You can parse single stranded and double stranded nucleic acids.
from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import DSNABuilder
# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_pdb_file(pdb_code="10MH", pdir=".")
pdbl.retrieve_assembly_file(pdb_code="10MH", assembly_num=1, pdir=".")
# ... or else use your own
# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="10MH", filename="10mh-assembly1.cif"
)
# extract DataFrame with basepairs data
builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)
# take the first double strand nucleic acid as an example
dsna = dsna_list[0]
dsna.get_dataframe()
i_chain_id i_residue_index i_residue_name j_residue_name j_residue_index j_chain_id
0 B 402 DC DG 433 C
1 B 403 DC DG 432 C
2 B 404 DA DT 431 C
3 B 405 DT DA 430 C
4 B 406 DG DC 429 C
In this case we have a gap in the basepairs at i_residue_index 407 and 408.
This results in two distinct paired segments of dsDNA.
In reality only 408 is a mispair. 407 is a non-standard 5CM-Guanine pair. It’s ignored by PDBNucleicAcids because it currently supports only standard Watson-Crick basepairs.
Check the official documentation for more information.
TODO¶
regarding
BasePairsRules:Distinguish between DNA and RNA bases (i.e. Deodyribose Adenine can pair with both
Deoxyribose Thyamine or Ribose Thyamine)
Code other rules
Proper tests (WIP)
Credits¶
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.