Usage¶
To use PDBNucleicAcids in a project:
import PDBNucleicAcids
You can parse single stranded and double stranded nucleic acids.
from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import DSNABuilder
# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_pdb_file(pdb_code="10MH", pdir=".")
pdbl.retrieve_assembly_file(pdb_code="10MH", assembly_num=1, pdir=".")
# ... or else use your own
# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="10MH", filename="10mh-assembly1.cif"
)
# extract DataFrame with basepairs data
builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)
# take the first double strand nucleic acid as an example
dsna = dsna_list[0]
dsna.get_dataframe()
i_chain_id i_residue_index i_residue_name j_residue_name j_residue_index j_chain_id
0 B 402 DC DG 433 C
1 B 403 DC DG 432 C
2 B 404 DA DT 431 C
3 B 405 DT DA 430 C
4 B 406 DG DC 429 C
In this case we have a gap in the basepairs at i_residue_index 407 and 408.
This results in two distinct paired segments of dsDNA.
In reality only 408 is a mispair. 407 is a non-standard 5CM-Guanine pair. It’s ignored by PDBNucleicAcids because it currently supports only standard Watson-Crick basepairs.