Usage¶
To use PDBNucleicAcids in a project:
import PDBNucleicAcids
Build All Strands of Nucleic Acids¶
PDBNucleicAcids can parse all strands of nucleic acids in a Biopython structure.
from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import NABuilder
# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_pdb_file(pdb_code="1A02", pdir=".")
pdbl.retrieve_assembly_file(pdb_code="1A02", assembly_num=1, pdir=".")
# ... or else use your own
# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="1A02", filename="1a02-assembly1.cif"
)
# build all nucleic acids
builder = NABuilder()
na_list = builder.build_nucleic_acids(structure)
na_list
[<NucleicAcid chain='A' type='DNA' start=4001 end=4020>,
<NucleicAcid chain='B' type='DNA' start=5001 end=5020>]
Every nucleic acid is like a Python list:
na = na_list[0]
na[:5]
[<Residue DT het= resseq=4001 icode= >,
<Residue DT het= resseq=4002 icode= >,
<Residue DG het= resseq=4003 icode= >,
<Residue DG het= resseq=4004 icode= >,
<Residue DA het= resseq=4005 icode= >]
PDBNucleicAcids can get a nucleic acid sequence:
na.get_seq()
Seq('TTGGAAAATTTGTTTCATAG')
PDBNucleicAcids can also get a nucleic acid chain id, nucleic acid type and all atoms:
print(na.get_chain_id(), na.get_na_type())
print(na.get_atoms()[:5])
A DNA
[<Atom O5'>, <Atom C5'>, <Atom C4'>, <Atom O4'>, <Atom C3'>]
Build All Double-Stranded Nucleic Acids¶
PDBNucleicAcids can parse all double-stranded nucleic acids in a Biopython structure.
from PDBNucleicAcids.NucleicAcid import DSNABuilder
builder = DSNABuilder()
dsna_list = builder.build_double_strands(structure)
dsna_list
[<DoubleStrandNucleicAcid type='dsDNA' i-th strand='A'
j-th strand='B' length=17>]
Get All Base-Pairs¶
PDBNucleicAcids can extract all base-pairs object in a double-stranded nucleic acid. Double straded nucleic acids are like a list of base-pairs:
dsna = dsna_list[0]
dsna[:5]
[<BasePair i_res=DG j_res=DC>,
<BasePair i_res=DG j_res=DC>,
<BasePair i_res=DA j_res=DT>,
<BasePair i_res=DA j_res=DT>,
<BasePair i_res=DA j_res=DT>]
PDBNucleicAcids can extract all base-pairs data in a double-stranded nucleic acid.
dsna = dsna_list[0]
df = dsna.get_dataframe()
df.head()
i_chain_id i_residue_index ... j_residue_index j_chain_id
0 A 4003 ... 5020 B
1 A 4004 ... 5019 B
2 A 4005 ... 5018 B
3 A 4006 ... 5017 B
4 A 4007 ... 5016 B
Search Individual Pair Bases¶
PDBNucleicAcids can search for paired nucleotide, given an input nucleotide.
from PDBNucleicAcids.NucleicAcid import search_paired_base
# input nucleotide
base = structure[0]["A"][4003] # DG
# search for paired nucleotide
paired_base = search_paired_base(base)
paired_base
<Residue DC het= resseq=5020 icode= >
PDBNucleicAcids will recognize unpaired bases.
# input nucleotide
base = structure[0]["A"][4001] # DT
# search for paired nucleotide
paired_base = search_paired_base(base)
print(paired_base)
None
DNA-RNA Complexes¶
PDBNucleicAcids base-pairing can be used for DNA-RNA base-pairs.
from Bio.PDB.PDBList import PDBList
from Bio.PDB.MMCIFParser import MMCIFParser
from PDBNucleicAcids.NucleicAcid import search_paired_base
# retrive file from PDB using Biopython
pdbl = PDBList()
pdbl.retrieve_assembly_file(pdb_code="9K7R", assembly_num=1, pdir=".")
# parse and build structure with Biopython
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="9K7R", filename="9k7r-assembly1.cif"
)
# input nucleotide
base = structure[0]["B"][8] # DT
# search for paired nucleotide
paired_base = search_paired_base(base)
# paired base is RNA base
paired_base
<Residue A het= resseq=2 icode= >
Custom Rules for Base-Pairing¶
PDBNucleicAcids base-pairing can be expanded, by changing parameters used in the base-pairing rules.
from PDBNucleicAcids.BasePairRules import dsDNAWatsonCrickBasePairRules
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="1A02", filename="1a02-assembly1.cif"
)
# custom base pairing rules
my_rules = dsDNAWatsonCrickBasePairRules(
max_distance = 3.5,
max_angle = 60,
max_stagger = 2.0,
)
# input nucleotide
base = structure[0]["A"][4003] # DG
# search for paired nucleotide
paired_base = search_paired_base(base, pairing_rules=my_rules)
PDBNucleicAcids base-pairing can be expanded even further by creating your own base-pairing rules.
from PDBNucleicAcids.BasePairRules import WatsonCrickBasePairRules
parser = MMCIFParser()
structure = parser.get_structure(
structure_id="1A02", filename="1a02-assembly1.cif"
)
# input nucleotide
base = structure[0]["A"][1] # G
# search for paired nucleotide with default rules
pairing_rules = WatsonCrickBasePairRules()
paired_base = search_paired_base(base, pairing_rules=pairing_rules)
# this returns None because it binds a non-standard DNA base: 5CM
# to circumvent this we can code our own rules
class MyRules(WatsonCrickBasePairRules):
def __init__(self):
super().__init__()
self.complementary_pairs += [("5CM", "G"), ("G", "5CM")]
self.pyrimidines.append("5CM")
self.accepted_nucleotides.append("5CM")
# search for paired nucleotide with custom base pairing rules
pairing_rules = MyRules()
paired_base = search_paired_base(base, pairing_rules=pairing_rules)
paired_base
<Residue 5CM het=H_5CM resseq=9 icode= >
Limitations¶
PDBNucleicAcids doesn’t support yet recognition of flipped bases, gaps and nicks.